You are on page 1of 6

Hapax legomenon

Hapax legomenon
A hapax legomenon ( /hpkslmnn/ or /hepks/;[1] [2] pl. hapax legomena; sometimes abbreviated to hapax, pl. hapaxes) is a word which occurs only once in either the written record of a language, the works of an author, or in a single text. While technically incorrect, the term is also sometimes used of a word that occurs in only one of an author's works, even though it occurs more than once in that work. Hapax legomenon is a transliteration of Greek , meaning "(something) said (only) once".[3] The related terms dis legomenon, tris legomenon, and tetrakis legomenon refer respectively to double, triple, or quadruple occurrences, but are far less commonly used.

Hapax legomena are quite common, as predicted by Zipf's Law,[4] which states that the frequency of any word in a work or corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words (counting by type) occurring are hapax legomena, and another 10% to 15% are dis legomena.[5] Thus, in the Brown Corpus of American English, about half of the 50,000 words are hapax legomena within that corpus.[6] Note that the term hapax legomenon refers to a word's appearance in a body of text, not to its origins, nor to its prevalence in speech. It thus differs from a nonce word, which may never be recorded, or may find currency and be recorded widely, or may appear several times in the work which coins it, and so on.

Rank-frequency plot for words in the novel Moby-Dick. About 44% of the distinct set of words in this novel, such as "matrimonial", occur only once, and so are hapax legomena (red). About 17%, such as "dexterity", are dis legomena (blue). Zipf's Law predicts that the words in this plot should approximately fit a straight line.

Hapax legomenon

Significance
Hapax legomena in ancient texts are difficult to translate and decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew) hapax legomena pose sometimes difficult issues in translation. Hapax legomena also pose challenges in natural language processing.[7] Some scholars consider Hapax legomena useful in determining the authorship of written works. For example, each of Shakespeare's plays contains a roughly similar percentage of hapax legomena not found elsewhere in his work. P.N. Harrison, in The Problem of the Pastoral Epistles (1921)[8] made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual. Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W.P. Workman found the following numbers of hapax legomena in each Pauline Epistle: Rom. 113, I Cor. 110, II Cor. 99, Gal. 34, Eph. 43 Phil. 41, Col. 38, I Thess. 23, II Thess. 11, Philem. 5, I Tim. 82, II Tim. 53, Titus 33. At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others.[9] To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarised in the diagram on the right.[9] Although the Pastoral Epistles have more hapax legomena per page,

Hapax legomenon Workman found the differences to be moderate in comparison to the variation between other Epistles. This was reinforced when Workman looked at several plays by William Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarised in the second diagram on the right.[9] Apart from author identity, there are several other factors which can explain the number of hapax legomena in a work, such as: text length: this directly affects the expected number and percentage of hapax legomena; the brevity of the Pastoral Epistles also makes any statistical analysis problematic. text topic: if the author writes on different subjects, of course many subject-specific words will occur only in limited contexts. text audience: if the author is writing to a peer rather than a student, or their spouse rather than their employer, again quite different vocabulary will appear. time: over the course of years, both the language itself and a given author's knowledge and use of language, will change.[10] In the particular case of the Pastoral Epistles, all of these variables are quite different than in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as a strong indicator of authorship (although the authorship of the Pastorals is subject to debate on other grounds).[11] There are also subjective questions over whether two forms amount to "the same word": dog vs dogs, clue vs clueless, sign vs signature, and many other grey cases arise. The Jewish Encyclopedia points out that although there are 1500 hapaxes in the Old Testament, only about 400 of those are not obviously related to other attested word forms.[12] It would not be especially difficult for a forger to construct a work with any percentage of hapax legomena desired. However, it seems unlikely that forgers much before the 20th century would have thought of such a ploy, much less thought it worth the effort. A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors can show very similar values. In other words, it is not a reliable indicator. Authorship studies now usually use a wide range of measures, and look for a pattern across them, rather than relying on a single measurement.

Computer science
In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This has the added benefit of significantly reducing the memory usage of application, since by Zipf's law, many words are hapaxes.[13]

Examples
Some examples of hapax legomena in a given language or body of work are:

Hapax legomenon

Hebrew examples
Gvina ( - cheese) is a hapax legomenon of Biblical Hebrew, found only in Job10:10. The word has become extremely common in modern Hebrew. Akut ( - fought), only appears once in the Hebrew Bible, in Psalms95:10. Lilith ( )occurs once in the Hebrew Bible, in Isaiah34:14, which describes the desolation of Edom. The word is translated into English in several ways. Atzei Gopher (- - Gopher wood) is mentioned once in the Bible, in Genesis6:14, in the instruction to make Noah's ark "of gopher wood". Because of the single appearance, the literal meaning is lost. Gopher is simply a transliteration, although scholars today tentatively suggest that the wood intended is cypress.[14]

Greek examples
autoguos (), an ancient Greek word for a sort of plough, is found once (and exclusively) in Hesiod: the precise meaning remaining obscure. panaorios (), ancient Greek for "very untimely", is one of many hapax legomena of the Iliad. The Greek New Testament contains 686 local hapax legomena, sometimes called "New Testament hapax"[15] of which 62 occur in 1 Peter, and 54 occur in 2 Peter.[16] aphedron "latrine" was a hapax legomenon thought to mean "bowel" until an inscription was found in Pergamos.

Latin Examples
Mnemosynus, presumably meaning a keepsake or aide-memoire, only appears in Poem 12 of Catullus' Carmina. Deproeliantis, a participle of the word deproelior, which means "to fight fiercely" or "to struggle violently," only appears in line 11 of Horace's Ode 1.9.

Arabic examples
The proper nouns Iram (Q 89:7, Iram of the Pillars), Bbil (Q 2:102, Babylon), Bakka(t) (Q 3:96, Bakkah), ibt (Q 4:51), Raman (Q 2:185, Ramadan), ar-Rm (Q 30:2, Ancient Rome), Tasnm (Q 83:27), Quray (Q 106:1, Quraysh), Mas (Q 22:17, Magi), Mrt (Q 2:102, Harut and Marut), Makka(t) (Q 48:24, Mecca), Nasr (Q 71:23), () an-Nn (Q 21:87) and Hrt (Q 2:102, Harut and Marut) occur only once in the Qurn.[17] zanabl ( - ginger) is a Qurnic hapax (Q 76:17). The epitheton ornans a-amad ( - the One besought (Names of God in the Qur'an)) is a Qurnic hapax (Q 112:2).

Italian examples
Ramogna is mentioned only once in Italian literature, precisely in Dante's Divina Commedia (Purgatorio XI, 25). Trasumanar is another hapax legomenon mentioned in Dante's Divina Commedia (Paradiso I, 70, translated as "Passing beyond the human" by Mandelbaum).

English examples
Honorificabilitudinitatibus is a hapax legomenon of Shakespeare's works. Nortelrye, a word for "education", occurs exactly once in Chaucer. Slpwerigne occurs exactly once in the Old English corpus, in the Exeter Book. There is debate over whether it means "weary with sleep" or "weary for sleep". Flother, a synonym for snowflake, is a hapax legomenon of written English pre-1900, found in a manuscript from around 1275.

Hapax legomenon

External links
Open source Java software for text analysis and calculating hapax ratio ( JHapax ) [18]

References
[1] " hapax legomenon (http:/ / oed. com/ search?searchType=dictionary& q=hapax+ legomenon)". Oxford English Dictionary. Oxford University Press. 2nd ed. 1989. [2] "hapax legomenon" (http:/ / dictionary. reference. com/ browse/ hapax+ legomenon). Dictionary.com Unabridged. Random House, Inc. . [3] [[Category:Articles containing Ancient Greek language text (http:/ / www. perseus. tufts. edu/ hopper/ text?doc=Perseus:text:1999. 04. 0057:entry=a(/ pac)]]. Henry George Liddell, Robert Scott. A Greek-English Lexicon at Perseus Project [4] Paul Baker, Andrew Hardie, and Tony McEnery, A Glossary of Corpus Linguistics, Edinburgh University Press, 2006, page 81, ISBN 0748620184. [5] Andrs Kornai, Mathematical Linguistics, Springer, 2008, page 72, ISBN 1846289858. [6] Kirsten Malmkjr, The Linguistics Encyclopedia (http:/ / books. google. com. au/ books?id=IG7tE4-p-uUC& pg=PA87), 2nd ed, Routledge, 2002, ISBN 0415222109, p. 87. [7] Christopher D. Manning and Hinrich Schtze, Foundations of Statistical Natural Language Processing,MIT Press, 1999, page 22, ISBN 0262133601. [8] P.N. Harrison. The Problem of the Pastoral Epistles. Oxford University Press, 1921. [9] Workman, "The Hapax Legomena of St. Paul", Expository Times, 7 (1896:418), noted in The Catholic Encyclopedia, s.v. "Epistles to Timothy and Titus" (http:/ / www. newadvent. org/ cathen/ 14727b. htm). [10] Steven J. DeRose. "A Statistical Analysis of Certain Linguistic Arguments Concerning the Authorship of the Pastoral Epistles." Honors thesis, Brown University, 1982; Terry L. Wilder. "A Brief Defense of the Pastoral Epistles Authenticity". Midwestern Journal of Theology 2.1 (Fall 2003), 38-4. ( on-line (http:/ / www. mbts. edu/ pdfs/ academics/ wilder. pdf)) [11] Mark Harding. What are they saying about the Pastoral epistles?, Paulist Press, 2001, page 12. ISBN 0809139758, 9780809139750. [12] Article on Hapax Legomena in The Jewish Encyclopedia (http:/ / www. jewishencyclopedia. com/ view. jsp?artid=268& letter=H). Includes a list of all the Old Testament hapax legomena, by book. [13] D. Jurafsky and J.H. Martin (2009). Speech and Language Processing. Prentice Hall. [14] "Ark, Design and Size" Aid to Bible Understanding, Watchtower Bible and Tract Society, 1971. [15] e.g. Richard Bauckham The Jewish world around the New Testament: collected essays I p431 2008 "..a New Testament hapax, which occurs 19 times in Hermas.." [16] John F. Walvoord and Roy B. Zuck, The Bible Knowledge Commentary: New Testament Edition, David C. Cook, 1983, page 860, ISBN 0882078127. [17] Orhan Elmaz. "Die Interpretationsgeschichte der koranischen Hapaxlegomena." Doctoral thesis, University of Vienna, 2008, page 29 [18] http:/ / www. javaforge. com/ project/ 4269

Article Sources and Contributors

Article Sources and Contributors


Hapax legomenon Source: http://en.wikipedia.org/w/index.php?oldid=427482776 Contributors: A Macedonian, AnonMoos, Anthony, Anthony Appleyard, Arlen22, Audrey, Barbov, Bennylin, Bigbluefish, Bryan Derksen, Crazytales, Cuddlyable3, D3av, DaveGorman, Davegerbil, Deflective, Dnik, Drbreznjev, EdgarMCMLXXXI, Eratatosk, Erutuon, Eth, Feline Hymnic, Feureau, Fluffernutter, Gabbe, Gpvos, Hkd2029, Hmains, In ictu oculi, Irrbloss, Isis, Jengod, Jmrowland, Kevinpurcell, King Hildebrand, KnightRider, Kwamikagami, Lanceka, Leandrod, LittleSis1006, LizardWizard, Looris, Maratanos, McGeddon, Mdotley, Mississippifred, Mpost89, N5iln, Ohnoitsjamie, OperaJoeGreen, Oreo Priest, Philthecow, PierreAbbat, Provider uk, Quuxplusone, Qwertyus, Radagast3, Rob Hooft, Rui Gabriel Correia, Scarlight, Sderose, Shii, Sja.h, Sleigh, Someone else, Sonic3KMaster, Springhill40, Sputnikcccp, Squandermania, StAnselm, Tbjablin, Timberframe, Tipiac, Tomisti, Tothebarricades.tk, Totnesmartin, Tregoweth, Valley2city, Vicki Rosenzweig, Viriditas, Wetman, Woohookitty, Xanzzibar, Zinnmann, , 93 anonymous edits

Image Sources, Licenses and Contributors


File:Moby Dick Words.gif Source: http://en.wikipedia.org/w/index.php?title=File:Moby_Dick_Words.gif License: Public Domain Contributors: Radagast3 File:Loudspeaker.svg Source: http://en.wikipedia.org/w/index.php?title=File:Loudspeaker.svg License: Public Domain Contributors: Bayo, Gmaxwell, Husky, Iamunknown, Myself488, Nethac DIU, Omegatron, Rocket000, The Evil IP address, Wouterhagens, 9 anonymous edits File:Workman'sPaulineHapaxes.svg Source: http://en.wikipedia.org/w/index.php?title=File:Workman'sPaulineHapaxes.svg License: Public Domain Contributors: Radagast3 File:Workman'sShakespearePlays.svg Source: http://en.wikipedia.org/w/index.php?title=File:Workman'sShakespearePlays.svg License: Public Domain Contributors: Radagast3

License
Creative Commons Attribution-Share Alike 3.0 Unported http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/

You might also like