Professional Documents
Culture Documents
ARTIFICIAL INTELLIGENCE
KUMBAKONAM
ABSTRACT
The automated conversion of Tamil and Hindi is used in Lexical Resources and
Computational Models. Tamil & Hindi are the morphologically rich languages. Most of the
grammatical functions are embedded into the word in the form of inflections. This language
conversion involves Phonetic analysis, Text analysis, Morphological analysis, Syntactic
analysis, Semantic analysis, Discourse analysis and Pragmatic analysis.
A Machine Translation System for Hindi-Tamil basically has three major components,
Morphological analyser of source language, Mapping unit and the target language generator.
The Morphological Analyser splits a word into its constituent morphemes. The root word and
its inflections are mapped and generated to equivalent target language terms.
INTRODUCTION
Today, people from all walks of life including professionals are confronted by
unprecedented volumes of information, the vast bulk of which is stored as unstructured text.
These estimates are also based on printed materials; increasingly the information is also
available electronically on the World Wide Web. A large and growing fraction of work and
leisure time by professionals and students is spent navigating and accessing this universe of
information.
The Natural Language Understanding is a Mundane tasks that humans can do easily
but very difficult to automate by computers. It reflects the Artificial Intelligence of the
computer. The Natural Language Processing involving Natural Language Understanding and
Natural Language Generation. The Natural Language Understanding refers the steps involved
in the Language Conversion. The Natural Language Generation refers the Generation of target
Language.
The Natural Language process was involving many steps. It is based on the source and
the target Languages. According to the Sentence pattern of the language the NLP was divided
into two types. These are Aspects model Standard theory and Extended Standard theory.
The Natural Language Processing involving the following steps that are used in the
automated conversion of the language. Because the positional languages are differed in the
many causes. These steps are
• Phonetic analysis
• Morphological analysis
• Syntactic analysis
• Semantic analysis
• Discourse and Pragmatic analysis
PHONETIC ANALYSIS
So it acts as an interpreter to convert the user desired application’s data into the
Natural Language Analysis. The data was send as an input unit for the morphological analysis.
MORPHOLOGICAL ANALYSIS
The root word is created from the morphemes. The Collective group of morphemes
reflects the sentence pattern of the source language. This identification was used in the
syntactic analysis and the semantic analysis.
SYNTACTIC ANALYSIS
The stages of semantic and pragmatic analysis are concerned with getting the meaning
of a sentence. Semantics is a partial representation of the meaning which is obtained based on
the possible syntactic structure(s) of the sentence, and on the meanings of the words.
Pragmatics is the meaning which is elaborated based on contextual and world knowledge.
The meaning of the whole sentence can be put together from the meaning of the parts
of the sentence. The division of the sentence into meaningful parts was done by syntactic
analysis and it is called as computational semantic. In general the meaning of a sentence may
be represented using any of the knowledge representation schemes.
Many words even with the same syntactic category may have more than one meaning
is called Semantic ambiguity. It is sometimes unclear which object a pronoun refers to
referential ambiguity or Pragmatic ambiguity. These are removed by the semantic analysis.
Discourse integration is one of the steps. The inter-sentence connections are made
here. For example considered the following sentence”The apple was black. John wanted it. He
always had.” Here “it” refers to a previous thing, namely the apple, where as “John” connects
to”He” in the third sentence. This type of integration is done during the discourse analysis.
The pragmatic analysis done the process of creating correct sentences which are not in
the grammatical manner. For example the grammatical answer of the following question “Can
you tell the time?” is “Yes”, But the ordinary answer tell the correct time. So the pragmatic
analysis removes these types of ambiguity.
TAMIL-HINDI SYSTEM
The choice of Tamil-Hindi MAT is because; both are free word-order languages unlike
English which is a positional language. Ultimately our aim is to built a Human Aided Machine
Translation System for Hindi-Tamil. A MT system basically has three major components,
A source language sentence is first processed by the MA. MA splits the sentence into
words and in turn the words are split into morphemes. The root word is obtained by this
process and this root word is given as input to the mapping block along with the other
morphemes. The other morphemes includes tense marker, GNP marker, Vibakthi etc. For
splitting a word into morphemes the dictionary is used.
Typically this dictionary contains the root words and its inflections of Tamil language
in its first field. The inflections includes GNP marker, TAM marker, vibakthi.A given word is
compared with the words/morphemes in the first field of the dictionary. Matching is done
from right to left. Thus the inflections of the words are split and finally we arrive at the root
form.
Each root word along with its inflections are given as the input to the mapping unit.
So the Morphological analyzer performs the following analysis Phonetic analysis and
Morphological analysis.
The current coverage of the morphological analyser is greater than 95% when tested
over the three million word CIIL corpus. This follows the paradigm-based approach and is
implemented as a Finite State Machine. This version can analyse nearly 3.5 million word
forms. The objective of this Tamil morphological analyser API is to retrieve the root from its
inflected form.
MAPPING UNIT
The root word and its inflections are mapped to equivalent target language terms in
this block. Explaining the structure of the dictionary will be very useful at this juncture.
Dictionary has seven fields for aiding in the process of mapping. As said earlier the first field
contains the Tamil root words and inflections.
The second field contains paradigm type followed by paradigm number which are
useful in the generation of words. Subsequent field contains the category of the word,
equivalent Hindi meaning(s), and gender information. The last field contains information
about the dictionary which is there for some maintenance work.
The gender information is important especially for Hindi because all the nouns in
Hindi will be either of the two genders and this information is very helpful for semantic
analysis.The corresponding Hindi equivalents of the words are taken and are given as input to
the generator part of the MT system. All equivalent Hindi words for a Tamil word are given in
the dictionary separated. So the Mapping Unit involves Syntactic analysis and Semantic
analysis.
GENERATOR
This is the reverse process of analyser. Given a root word and it inflections this
generates the equivalent Hindi word. While generating, this takes into account all the
information like the gender, tense etc. and the equivalent word is generated accordingly. For
the generator to generate the word the input to it should be in some proper order. The order
that is followed here is: Hindi root, Category, Gender, Number, Person and finally TAM
(Tense-Aspect-Modality if any). The Hindi generator that is being used here is from IIIT,
Hyderabad, which is also used for other anusaarakaa products. It is being used here as a black
box. So the Generator involves the Discourse and Pragmatic analysis.
CONCULSION
Nowadays millions of PCs are in the Internet. Besides we use Networked computers
which are not well utilized because of un-linguistic language. But these Networked computers
can be well utilized by using NLP to the user in their own language. We are the technical
fellows must construct software’s having these privileges. By developing this kind of
software’s will improve our Computer Network communication as well.
REFERENCE: