Professional Documents
Culture Documents
Resources
Grammar Books
Mainly: Ruth Schmidts Urdu Grammar, Eugene Glassmans Spoken Urdu
Urdu Classes at Konstanz Transliteration Systems Developed in various Masters Theses Other Computational Work on Urdu Morphology/Lexicon:
CRULP (Lahore) Savoie (http://www.lama.univsavoie.fr/~humayoun/UrduMorph/)
Current Activities
Shift to a more principled, broader coverage FST Morphology Integration 90% complete Systemize and Increase Morphological Tags Expand Grammar (Currently: Correlatives, more Complex Predicates)
Morphological Analyzer
Operates on ASCII transliteration in order to allow for processing of both Urdu (Arabicbased script) and Hindi (Devanagari) FST Transliterators from Urdu and Hindi scripts exist, remain to be integrated.
Verbs
Previous Morphology: problems with massive overgeneration of verbal morphology New Verbal Morphology flags included to impose restrictions on overgeneration (particularly the future paradigm) So far 28 verbs included, need to differentiate more verb classes. Irregularities are now treated via phonological rules
are morphologically one word, but not written this way in Urdu (but in Hindi)
Urdu Grammar Report - ParGram Meeting PARC July 2007
Hindi
Urdu
Nouns I
So far: includes 216 Nouns Size: 157.3 kb, 4768 states Includes Gender (masc / fem) Number (pl / sg ) Case ( Obl / Nom) Based largely on the use of flags (reducing rules and size (?)); Version using more continuation classes being worked on (efficiency difference?)
Urdu Grammar Report - ParGram Meeting PARC July 2007
Nouns II
Types of nouns: - Fem: kursI hill Masc: kamrA room - natural gender depending on subject: laRk (I/A) girl-boy - both genders (no gender marking): jIrAf giraffe - Arabic/Persian loanwords: tAliba fem. Student tAlib+Noun(Ar)+Fem+Pl+Nom To Do: Need to redo Natural Gender Better differentiation of Noun-classes,Pronouns, Case Systematize and Increase Morphological Tags (Names, etc.)
Urdu Grammar Report - ParGram Meeting PARC July 2007
Urdu Adjectives/Adverbs
262 adjectives so far Numbers up to 100 (each number has a different name) 38 adverbs so far 48.7 Kb, 1418 states, 1995 arcs, 1397 paths Difference between unmarked and marked adjectives
Unmarked Adjectives
dont overtly agree with the noun they modify: amIr laRkA rich boy+Sg+Masc amIr laRkIAN rich girl+Pl+Fem
Marked Adjectives
Overt morphology for gender number case which agree with the noun they modify.
Marked Adjectives
e.g. cHOtA little+Sg+Masc+Nom cHOtI little+Sg+Fem laRkA boy+Sg+Masc
laRkI girl+Sg+Fem
Reduplication
any content words in Urdu can be reduplicated accomplished by compile-replace operator phonological change in onset can occur with nouns (echo forms) example:
kursI kursI kursI vursI kursI+Noun+Fem+Sg+Nom+Redup kursI+Noun+Fem+Sg+Nom+Echo chair after chair, many chairs something like a chair
Demo
Demo: reduplication, flag elimination Note: reduplication still to be integrated into tokenizer for grammar (on to do list)
Correlatives
(only) 28 verbs included, (only) one verb class Irregularities are treated via phonological rules network size: 365.6 Kb. 9336 paths. flags included to impose restrictions on overgenerating future sublexicon future forms such as mArEgI:
xfst[1]: up mArEgI mAr+Verb+3P+Sg+Fut+Fem mAr+Verb+2P+Sg+Fut+Fem
are treated as one word, although this is the case in Hindi only (written Urdu: mArE gI)