English: Zipf law plot (frequency as function of frequency rank) for the words in three Afro-Asiatic ("Semitic") languages: Ge'ez, Hebrew, and Arabic.
The languages, texts and the frequency files are:
Ge'ez (Classical Ethiopian). Text of the Glory of the Kings (Kebra Nagast), a 14th century chronicle of Ethiopian kings, part of the Coptic Bible. Published by Michal Jerabek. In the SERA encoding, with numerals excluded. Sample: be'akWetEtu le'Igzi'AbHEr 'ab 'a`hazE kWulu webeweldu 'iyesus krstos [...] Syon baHr seged Hzbe 'ar`ad qdme seged Zan seged wdm 'ar`ad `amde Syon. File geez/gok/tot.1/gud.wfr (34291 words, N = 12272 distinct).
Hebrew. The first five books (Torah, Pentateuch) of the Hebrew Bible (Tanak). From the 10th century version (the Masoretic text) of the original, probably composed mainly around ~500 BCE from earlier texts. Obtained from the Sacred Texts site, maintained by John B. Hare In an ad-hoc single-byte encoding designed to look vaguely phonetic under an ISO-Latin-1 font: '¡' = alef, 'b' = bet, 'g' = gimel, '°' = sehva, 'ï'= hiriq, '¤' = dagesh/mapiq, etc.. With vowel points but without cantillation marks. Sample: b¤°rë¡s¹ïy± b¤ârâ¡ ¡°êlöhïym ¡ë± häs¤¹âmäyïm w°¡ë± hâ¡ârêþ w°hâ¡ârêþ [...] k¤âlhäy¤âmïym. File hebr/tav/tot.1/gud.wfr (original 66311 words, truncated/filtered to 35027 words, N = 12487 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the UNICAMP website. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.
to share – to copy, distribute and transmit the work
to remix – to adapt the work
Under the following conditions:
attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.