You are on page 1of 31

Computational Model of Grammar for English to

Sinhala Machine Translation

By

Budditha Hettige
Department of Statistics and Computer Science,
University of Sri Jayewardenepura, Sri Lanka

&

Asoka S. Karunanada
Faculty of Information Technology,
University of Moratuwa, Sri Lanka

1
Overview
• Introduction
• Machine Translation
• Sinhala Language
• Computational Model of Grammar for Sinhala
Language
• Design & Implementation
• Evaluation
• Conclusion & further works

Computational Model of Grammar for English to Sinhala Machine Translation 2


Introduction
• Machine Translation
– Computer software that translates text or speech
from one natural language to another
• Machine Translation gives a potential solution
for language barrier
• Many countries use Machine Translation as a
solution for their language barrier
– India
– Japan etc.

Computational Model of Grammar for English to Sinhala Machine Translation 3


Existing Approaches
• Human-assisted
• Rule-based
• Statistical
• Example-based
• Knowledge-based
• Hybrid
• Agent-based

Computational Model of Grammar for English to Sinhala Machine Translation 4


NLP @ Sri Lanka
• UCSC
– Optical Character Recognizer
– Sinhala Corpus
– MT etc.
• Other NLP Systems
– Several undergraduate Research
• BEES
– Rule-based machine translation system run under
the concept of “Varanageema” (Conjugation)

Computational Model of Grammar for English to Sinhala Machine Translation 5


Sinhala Language

6
Sinhala Language
• Sinhala language has its own writing system, which is
an offspring of the Brahmi script
• Sinhala alphabet consists of 61 letters comprising 18
vowels, 41 consonants and 2 semi-consonants
• Part of speech
– Noun
– Verb
– Indeclinable particles (නිපාත, උපසර්ග)

Computational Model of Grammar for English to Sinhala Machine Translation 7


Sinhala Noun Morphology
• Sinhala Noun is a word that represents the
noun, pronoun and the adjective
• Is inflected for
– Gender (lingaya)
– Number (Wachana)
– Person (Purusha)
– Case (Vibhakthi)
– Definiteness

Computational Model of Grammar for English to Sinhala Machine Translation 8


Word conjugation (නාම වරණැගිල්ල)
වරණැගිල්ල)

• More than 27 forms of nouns that can be generated by


inflecting a single root word
• Contains more than hundred rules to conjugate a noun using a
given base form (Prakurthi)
• There are 15 conjugation patterns identified for generating a
Sinhala noun (GANA)
– Eath Ganaya (ඇත් ගණය)
– Wasu Ganaya (වසු ගණය)
– Tara Ganaya (තාර ගණය)
– etc.

Computational Model of Grammar for English to Sinhala Machine Translation 9


Eath Ganaya (ඇත් ගණය)
ගණය)

පතය නියත ඒක අනියත උක්ත

Example a r Example a r Example


ඇත් ◌ා ◌් ඇතා ෙතක් ත් ඇෙතක්
ෙකොක් ◌ා ◌් ෙකොකා ෙකක් ක් ෙකොෙකක්
ෙගොන් ◌ා ◌් ෙගොනා ෙනක් න් ෙගොෙනක්
නිකම් ◌ා ◌් නිකමා ෙමක් ම් නිකෙමක්
කිඹුල් ◌ා ◌් කිඹුලා ෙලක් ල් කිඹුෙලක්
මිනිස් ◌ා ◌් මිනිසා ෙසක් ස් මිනිෙසක්

Computational Model of Grammar for English to Sinhala Machine Translation 10


Noun Conjugation
• A Single Noun has 28 word forms

Computational Model of Grammar for English to Sinhala Machine Translation 11


Sinhala Verb Morphology
• More than 18 inflection forms are available in a Sinhala base
verb including inflection of the tense, number and the person

Person Number Present Past Future


First Singular බලමි බැලීමි බලන්ෙනමි
First Plural බලමු බැලීමු බලන්ෙනමු
Second Singular බලහි බැලීහි බලන්ෙනහි
Second Plural බලහු බැලීහු බලන්ෙනහු
Third Singular බලයි බැලී බලන්ෙන්ය
Third Plural බලති බැලූ බලන්ෙනෝය

Computational Model of Grammar for English to Sinhala Machine Translation 12


Verb Conjugation
• A Single Verb has more than 42 word forms

Computational Model of Grammar for English to Sinhala Machine Translation 13


Concept of Varanageema
(Conjugation)
• Words in a language can be generated by limited set of base words
• There are limited rules for generating word forms using base words
• Conjugation applies to both Noun and Verbs
• Using base words and those rules, we can reduce the need for storing large
number of words in dictionaries
• Varanageema in Sinhala creates not only derived words, but also handle the
following concepts in English
– Person
– Number
– Determinants
– Prepositions
– Tense

Computational Model of Grammar for English to Sinhala Machine Translation 14


Computational Model for Sinhala
Morphology
• Nama Gana and Kriya Gana give the way, how each
nouns and verbs are derived from its base form
• Iimplemented 85 grammar rules for Sinhala Nouns
• Implement the 18 rules for Kriya Gana

Computational Model of Grammar for English to Sinhala Machine Translation 15


KAPUTU GANAYA (කපුටු ගණය)
ගණය)

Kaputu Ganaya කපුටු ගණය

Base Form කපුටු


Form Add Remove Example
නියත ඒකවචන ◌ා ◌ු කපුටා
අනියත උක්ත ෙටක් ටු කපුෙටක්

අනියත අනුක්ත ෙටකු ටු කපුයටකු

බහුවචන උක්ත ෙටෝ ටු කපුෙටෝ


බහුවචන අනුක්ත න් ◌ු කපුටන්

Computational Model of Grammar for English to Sinhala Machine Translation 16


Finite State Automata for Sinhala
“kaputu Gana”
Gana”

Computational Model of Grammar for English to Sinhala Machine Translation 17


Syntax
• 36 syntax rules are implemented to generate
grammatically correct Sinhala sentences
• The Context-Free Grammar (CFG) stands for a
particular method of describing the syntax of
languages

Computational Model of Grammar for English to Sinhala Machine Translation 18


Sinhala Language Syntax
• Eight components
– Attributive adjunct of Subject (උක්ත විෙශේෂණය)
– Subject (උක්තය)
– Attributive adjunct of Object (කර්ම විෙශේෂණය)
– Object (කර්මය)
– Attributive adjunct of Predicate (ආඛ්යාත විෙශේෂණය)
– Attributive adjunct of the complement of predicate
(ආඛ්යාත පූර්ණ විෙශේෂණය)
– Complement of predicate (ආඛ්යාත පූර්ණය)
– Predicate (ආඛ්යාතය)

Computational Model of Grammar for English to Sinhala Machine Translation 19


Context-Free Grammar for Sinhala
Context-
language
subP = Subject Phrase
“දක්ෂ ගුරුවරයා තම ශිෂයා ඉක්මණින් දැණුමැති VebP = Verb Phrase
Sub = Subject
විශාරදයකු කෙළේය” Obj = Object
ObjP = Objective Phrase
AdjSub = Attributive adjunct of Subject
AdjObj = Attributive adjunct of Object
Pre = Predicate
AdjPre = Attributive adjunct of Predicate
AdjCmp = Attributive adjunct of Complement
CmpPre = Complement of predicate
CmpPreP = = Complement of predicate phrase
S -> SubP VebP
SubP -> Sub
SubP -> AdjSub Sub
VebP -> ObjP PreP
VebP-> PreP
ObjP -> Obj
ObjP -> AdjObj Obj
PreP ->? AdjPre CmpPrep
PreP -> CmpPrep
CmpPrep -> Pre
CmpPrep -> Pre CmpPre
CmpPre -> Cmp
CmpPre -> AdjCmp Cmp
Sub -> Noun
AdjSub ->? Noun
Obj -> Noun
AdjObj -> Noun
AdjPre -> Adv
Cmp -> Noun
AdjCmp -> Noun
Pre -> Verb
Computational Model of Grammar for English to Sinhala Machine Translation 20
Design

Computational Model of Grammar for


21
English to Sinhala Machine Translation
Design

Sinhala Sentence Composer


• Composes grammatically
correct Sinhala sentence
• Context-Free Grammar is
used to implement
• Implemented through SWI-
Prolog

Sinhala Morphological
Generator
• works through the concepts
of Varanegeema
• Implemented through SWI-
Prolog

Computational Model of Grammar for English to Sinhala Machine Translation 22


Implementation
• Sinhala Word conjugator
• BEES

Computational Model of Grammar for English to Sinhala Machine Translation 23


Evaluation
• Morphological generator successfully works with the
96 % accuracy
• Sinhala Morphological generator handles 85 grammar
rules for the Sinhala nouns and 36 grammar rules for
the Sinhala verbs
• Experimental result shows 89% accuracy of the
overall system

Computational Model of Grammar for English to Sinhala Machine Translation 24


Limitations
• The translation system perfectly works on the simple
sentences
• System does not successfully handle multi-word
expressions, idioms and compound sentences
• Lexical resources are limited

Computational Model of Grammar for English to Sinhala Machine Translation 25


Conclusion
• Computational model of grammar for Sinhala language has
been developed by considering the Morphology and the
Syntax of the Sinhala language
• Finite State Transducers (FST) and Context-free grammar
(CFG) have been used to describe the computational grammar
for Sinhala
• The grammar has been tested through the English to Sinhala
Machine Translation System
• The concept of Varanegeema (conjugation) is used as
theoretical basics of the translation

Computational Model of Grammar for English to Sinhala Machine Translation 26


Further works
• Grammar can be used to develop various types of Sinhala
language based computer applications such as spell and
grammar checkers, word generator etc.
• Handling compound sentences and expansion to the
parser for handling more grammatical structures
• Use of Agent technology for improving various aspects of
BEES including, Semantic handling and autonomous
updating of lexical resources
• Use the Stranded Evaluation matrix (BLUE) to evaluate
MT

Computational Model of Grammar for English to Sinhala Machine Translation 27


Demonstration
• Sinhala Word Conjugator
• BEES

Computational Model of Grammar for English to Sinhala Machine Translation 28


Computational Model of Grammar for
29
English to Sinhala Machine Translation
References
1. B. Hettige, A. S. Karunananda, “Varanageema: A Theoretical basics for English to Sinhala”,
Accepted to present, 7th Annual Sessions of Sri Lanka Association for Artificial Intelligence
(SLAAI), Kelaniya, 2010.
2. B. Hettige, A. S. Karunnanda, “An Evaluation methodology for English to Sinhala machine
translation”, Accepted to present 6th International conference on Information and Automation
foe Sustainability (ICIAfS 2010), IEEE., 2010.
3. B. Hettige, A. S. Karunananda, “Context-based approach to semantics handling in English to
Sinhala Machine Translation”, Poster presentation of the 26th National IT conference (NITC),
Sri Lanka. - Colombo, 2009.
4. B. Hettige, A. S. Karunananda, “Developing Lexicon Databases for English to Sinhala
Machine Translation”, proceedings of second International Conference on Industrial and
Information Systems (ICIIS2007), Colombo, IEEE, 2007.
5. B. Hettige, A. S. Karunananda, “A Morphological analyzer to enable English to Sinhala
Machine Translation”, Proceedings of the 2nd International Conference on Information and
Automation (ICIA2006), Colombo, IEEE, 2006, pp 21-26.
6. B. Hettige, A. S. Karunananda, “A Parser for Sinhala Language - First Step Towards English to
Sihala Machine Translation”, To appear in the proceedings of International Conference on
Industrial and Information Systems ICIIS, Colombo : IEEE, 2006.
7. B. Hettige, Bilingual Expert for English to Sinhala, Available:
http://dscs.sjp.ac.lk/~budditha/bees.htm.

Computational Model of Grammar for English to Sinhala Machine Translation 30


Thank you!

Computational Model of Grammar for


31
English to Sinhala Machine Translation

You might also like