Professional Documents
Culture Documents
Prepared by
1
Table of Contents
0. INTRODUCTION ...................................................................................................... 3
1. OBJECTIVES OF SCRIPT GRAMMAR .................................................................. 4
2. END USERS FOR SCRIPT GRAMMAR ................................................................. 5
3. SCOPE ........................................................................................................................ 6
4. TERMINOLOGY ........................................................................................................... 7
5.PHILOSOPHY AND UNDERLYING PRINCIPLES .................................................. 11
6. SCRIPT GRAMMARSTRUCTURE ....................................................................... 12
6.1. PERIPHERAL ELEMENTS OF THE SCRIPT GRAMMAR .............................. 13
6.2. CONFORMITY TO THE SYLLABLE STRUCTURE ........................................ 14
6.3 SCRIPT GRAMMAR PROPER ............................................................................. 17
6.3.1. The Character Set of Malayalam. ................................................................... 17
6.3.2. Consonant Mātrā Combinations. ................................................................... 23
6.3.3. The Ligature Set of Malayalam. .................................................................... 29
6.3.4 The Collation Order of Malayalam. ................................................................ 41
7. REFERENCES ......................................................................................................... 42
8. ANNEXURES .......................................................................................................... 43
Annexure 1: Names of experts who have contributed to the script grammar ............... 43
Annexure :2 Report of the committee on Malayalam character Encoding and Keyboard
layout standardization ................................................................................................... 44
Annexure 3: Unicode Table of MALAYALAM .......................................................... 45
2
0. INTRODUCTION
The term script grammar refers to the behaviour pattern of the writing system of a given
language. Languages which have written representations do not use a haphazard manner
of storing the information within the system, but use a coherent pattern which is similar
to the linguistic grammar of a given language. With the help of specialists (not
necessarily linguist) who work in the area of the written representation of the language,
the manner in which the shapes of the characters of the language and the representation
of the conjunct forms is provided. In other words the Script Grammar deals with the
surface structure of the language and tries to provide the best possible “fit” for shapes
and their representation. Since this is a highly subjective issue, the shapes provided here
are recommendations at the best and conform to the perception of the mandating
body/evaluators who consensually arrive at the “best possible fit” which is acceptable to
a majority of users. An example will make the above clear. Although Marathi and Hindi
share the same script Devanāgarī, not only do they not share the same character inventory
but in addition the representation of certain characters is different. Thus the Marathi /la/
is different from the Hindi /la/ in so far as the placement of the stem is concerned Hindi ल
Marathi ल. This ensures that the Script Grammar conforms to the language in question
and provides the character shapes acceptable to a given user community. It should be
noted that this does not mean monotony. The Hindi and Marathi /la/ can have a variety of
forms once the intrinsic structure of the character is determined.
3
1. OBJECTIVES OF SCRIPT GRAMMAR
The Objectives of the script grammar for each language can be divided into two major
parts:
Societal:
• Provide a visual representation of shapes that are deemed to be in conformity with
the perception of a given community
Technical:
• Classify the language in terms of its ISO and also whether it belongs to the Abjad,
Akshar (Alphasyllabary) class.
• Provide an inventory of the characters pertinent to the language and classify the
same in terms of their taxonomy.
• Since Brahmi is written from left to right, and since certain characters do not
follow the linear LtoR order, provide an inventory of displaced catenators i.e.
characters such as Mātrās that concatenate to the Consonant
• Propose the best shape representation of the individual characters as well as of the
ligatures used within a given script. As a corollary request the expert(s) to identify
the largest possible strings of such ligatures.
• Finally provide the collation order pertinent to that script/language, which would
be of great utility to high-end NLP as well as to CLDR’s in the pertinent
language. The collationorder for Marathi is different from Hindi although both
languages share the same script. Thus in Marathi , are placed at the end of
the consonant inventory i.e. after ह in the sort order. In Hindi is sorted along
with क and with ज
4
2. END USERS FOR SCRIPT GRAMMAR
The script-grammar specific to a given language can be used by a large number of users.
• Certain features of the script grammar such as the shapes can also be used for
testing OCR and OHWR. Similarly information regarding Ligatures as well as
collation order can help in high-end NLP work such as detecting invalid
combinations, correct implementation of syllable structure, prediction routines to
name a few. Information regarding collation and character sets can be also used
for CLDR.
• They allow the font designer to design a font which is in compliance with the
norms and standards of that particular script. A major problem which will be dealt
with in the template is one of ligatures. The final list of ligatures defined by the
script grammar allows the font designer to write specific rules for such glyphs.
• It permits the software developer to design and implement the keyboard and the
input mechanism which will meet the requirement of the particular linguistic
community.
• The collation or sort order as described in a Script Grammar permits the software
developer to write software functions/ routines for sorting data in all applications.
• Script Grammars are equally important for keyboard design, especially when
supplemented by frequency data from a corpus.
As can be seen the script grammar has a wide range of use and can be of utility to font
developers, Indian language developers and linguists in the area of computation.
5
3. SCOPE
This script grammar document contains following information about the language and the
script used for writing the language.
6
1
4. TERMINOLOGY
Abjad: A writing system in which each symbol always or usually stands for a consonant.
The long vowels are indicated. However the short vowels are rarely marked and the
reader needs to supply these. Example: Urdu written in Perso-Arabic Script is an example
of this writing system.
Allo-Script: The term relates to languages which share a common script. Thus
Devanāgarī is used to write 9 official languages. However these languages do not use the
same set of characters. Thus Marathi uses the retroflex lla ळ [U+ 0933] which Hindi
does not use. Flaps used in Hindi ड़ [U+095C] ढ़ [U+095D] are not used in Konkani.
These sub-sets of scripts based on a single “matricial” script are termed as allo-scripts.
Alphabet: A set of letters used in writing a language. Example: The English Alphabet.
Aspirated consonant: A consonant which is pronounced with an extra puff of air coming
out at the time of release of the oral obstruction. This has a sound of an extra "h".
Basic alphabet: The minimal set of letters which can be used for uniquely encoding
every word of a language. The basic alphabet for English consists of only the upper-case
letters A-Z
Conjunct: The Indic scripts are noted for a large number of consonant conjunct forms
that serve as orthographic abbreviations (ligatures) of two or more adjacent letterforms.
This abbreviation takes place only in the context of a consonant cluster….Under normal
circumstances, a consonant cluster is depicted with a conjunct glyph if such a glyph is
available in the current font. In the absence of a conjunct glyph, the one or more dead
1
As in the case of the BIS Document, in order to make the terminology accessible for all readers, examples
have been chosen from English/Latin scripts, wherever possible. Some definitions have been excerpted
from the BIS ISCII91 document and suitably modified where necessary.
2
Wikipedia definition
7
consonants that form part of the cluster are depicted using half-form glyphs. In the
absence of half-form glyphs, the dead consonants are depicted using the nominal
consonant forms combined with visible virama signs.3
Consonant: A letter representing a speech sound in which the breath is at least partly
obstructed,
Diacritic:A mark added to a letter which distinguishes it from the same letter without a
mark, usually having a different phonetic value or stress.
Displaced Catenator: (see Catenator) Within the Brahmi script, the writing system is
linear and moves from left to right. However in the case of some catenators this rules is
not observed and the catenator (wholly or partially) is placed to the right of the consonant
to which it relates. The short vowel I in Devanāgarī is an example of a displaced
catenator.
Display composing: The process of organizing the basic shapes available in a font in
order to display (or print) a word.
Display rendition: The process by which a string of characters is displayed (or printed).
In this process several consecutive characters may combine with each other on the screen.
The sequence of display of the characters may become different.
Font: A set of symbols used for display or printing of a script in a particular style.
Latin alphabet: The alphabet used for writing the language of ancient Rome. Also
known as the Roman alphabet. The alphabet is used today for writing English and
European languages.
Letter: A character representing one or more of the simple or compound sounds used in
speech. It can be any of the alphabetic symbols.
3
Unicode 6.0 Chapter 9.0 pp 6-7
8
Ligature: (see Conjunct)
Nasal consonant: A consonant pronounced with the breath passing through the nose.
Example m ninEnglish.
Nasalized vowel: A vowel pronounced with the breath passing both through the nose and
the mouth. In Indian scripts this is denoted by a Chandrabindu and gives the vowel/vowel
modifier over which it placed a nasal value. Example: जाँच
Phonetic alphabet: An alphabet which has direct correspondence between letters and
sounds Example: The International Phonetic Alphabet..
Pure consonant: A consonant which does not have any vowel implicitly associated with
it.
Roman script: The script based on the ancient Roman alphabet, with the letters A-Z and
additional diacritic marks. Used for writing a language which is not usually written in the
Roman alphabet.
Script: A distinctive and complete set of characters used for the written form of one or
more languages.
Script numerals: The 0 to 9 digits in a script, which have shapes distinct from their
international counterparts.
9
Vowel: A letter representing a speech sound made with the vibration of the vocal cords,
but without audible obstruction
10
5.PHILOSOPHY AND UNDERLYING PRINCIPLES
1. The Grammar aims to depict the surface grammar of the written language: the
manner in which characters as well as conjuncts are depicted
3. Corollary to the above the result is a script and allo-scripts i.e. a given script
shared by many languages is not uniformly deployed across all the languages
but is subject to variations and modulations.
5. The Grammar is limited to its synchronic use i.e. the manner in which a given
language as of today admits a character set within the script used to write it. It
is not diachronic or historical in nature and does not study the evolution of the
given script across centuries.
4
It is recommended that such variations be culled by placing the Grammars of different scripts in public
review.
11
6. SCRIPT GRAMMARSTRUCTURE
Part 6.1. deals with peripheral elements such as the ISO of the language, the writing
system used: (Alphasyllabic) Abugida or Abjad.
Part 6.2. treats of the syllabic structure. It verifies whether the character set of the
language complies with the ISCII syllabic structure and if not which cases are not
compliant.
Part 6.3 is the script grammar proper and describes the character set as well as the
conjunct shapes of the given script along with the collation order
12
6.1. PERIPHERAL ELEMENTS OF THE SCRIPT GRAMMAR
These constitute the elements that are peripheral to the Script Grammar. The main
parameters considered are the mnemonic and name of the language (needed for CLDR
and also for language tags), the writing system used to inscribe the language and
wherever possible a short history of the language.
6.1.1. Name of the language and its representation in the 3 letter mnemonic as per
ISO 639.1. & 639.3
6.1.2. Identification of the writing system(s) used to inscribe the given language
All scripts derived from Brahmi are Abugidas i.e. syllabary driven systems. The main
features of Abugidas are as under:
• The consonant has an implicit vowel built-in which is normally the schwa.
• The inherent vowel can be modified by the addition of other vowels or
muted by a diacritic termed as a Virama or Halanta
• Vowels can be handled as full vowels with a vocalic value
When two or more consonants join together they form ligatures eg. .
Abugidas/Alphasyllabaries because of their syllabic structure require a special
description which is the subject of the discussion in 6.2. below.
13
6.2. CONFORMITY TO THE SYLLABLE STRUCTURE
Malayalam language complies with the syllable (akshar) structure described above.
It can admit up to 4 consonant clusters.
Alphasyllabaries are determined by the notion of the syllable or the Akshar. The
compositional grammar of the syllable determines it well-formedness. This is through a
series of formal constraints based on a Backus-Naur Formalism which is given below.
The syllable (akshar), first defined in the ISCII document (1991), identifies the following
character ‘sub-sets’ for the purposes of identifying the syllable (akshar). In what follows
the syllable analysis will be restricted to Malayalam.
(C) Consonants
! " # $
% & '
(V) Vowels
( ) * + , - . / 0
1 2 3 4
(D) Diacritics
14
(H):Chandrakkala (Halanta): ◌C
Chandrakkala (Halanta) used in most writing systems of the Indian subcontinent to
signify the lack of an inherent vowel.
Each of these sub-types has its restrictions in terms of what can precede or follow it
within a syllable (akshar), as shown in the table below:
C can be preceded by H or no subtype and followed by any one of the following: M,D,H
V can be preceded by no subtype and followed by D but not by another sub-type.
M can be preceded by C and followed by D.
D can be preceded by C, V, M and followed by no other subtype. It closes the syllable
(akshar).
H can be preceded by C alone and followed only by C and no other sub-set.
In the case of Malayalam however, a set of consonants termed as Chillu characters need
special treatment in the BNF. As per the experts there are 5 chillu characters in
Malayalam. These are as under:
D E F G H
5
The nukta is a small dot placed under a character in certain scripts to show that they are flapped or for
deriving 5 other consonants loaned from Urdu क़,ख़,ग़,ज़,फ़
15
The formalism defines the syllable (akshar) in terms of both what can constitute a
syllable (akshar) and what cannot. A valid syllable (akshar) as per this definition can be
of only two types:
The three other subsets viz. Mātrās, Diacritics, Chandrakkala (Halanta) cannot constitute
a syllable (akshar) by themselves or in combination among themselves.
A total number of 16 theoretical syllables is therefore possible. It will be seen that the
written syllable (akshar) is not very different in structure from the phonetic syllable and
16
that the movement from the written to the spoken levels is made feasible by application
of certain rules.
This formal structure of the syllable (akshar) explained above is common to all Brahmi
based scripts (with a few variations). It will form the basis of an exhaustive description of
the characters as well as their ligatural representations.
This section lays down in detail the different parameters of the Script Grammar for
Malayalam. These are:
This section provides detailed information about the characters in the language and the
list of the same and also more importantly shows the manner in which the character is to
be written. Each subsection comprises therefore two parts: the basic character set and the
shape each character should have, as mandated by the experts who have designed the
script grammar of Malayalam.
17
6.3.1.1. The Consonant Set
The Consonant set of Malayalam comprises the following characters:
Basic Consonant inventory arranged as per their Vargas.
Other consonants
!
" # $ % &
'
The exact shapes as desired by the experts are provided in the table below:
-voiced -voiced +voiced +voiced Nasal
-aspirated +aspirated -aspirated +aspirated
Velar
Palatal
Retroflex
Dental
B-labial
Other consonants
!
" # $ % &
'
6
These have been incorporated in Unicode 6.0. but are included here as per Expert’s recommendation so
that the Script grammar is truly representative of Malayalam
18
Chillu Characters.
D E F G H
19
5. Malayalam sign UU ◌: - C+ -= :
6. Malayalam sign vocalic R ◌; . C+.= ;
7. Malayalam sign E <◌ / C+/= <
8. Malayalam sign EE =◌ 0 C+0= =
9. Malayalam sign AI >◌ 1 C+1=>
10. Malayalam sign O <◌6 2 C+2= <6
11. Malayalam sign OO =◌6 3 C+3= =6
12. Malayalam sign AU ◌? 4 C+4= ?
” or “ '”
6.3.1.5. Shape of the combination of ra “
In Malayalam the combination of Consonant+ Chandrakkala+ or ' is shown as the
adjoining of a “ ”
M - ◌C
O - ◌C
M - ◌C '
O - ◌C '
The experts have recommended as under:
Although “ ” can be allograph of both and ', depending upon
the context, a unique representation should be adopted in keyboard
driver and font implementations. For compatibility reasons, is
preferred at present. However there should be a detailed discussion
in this matter and a final decision should be taken to avoid future
lexicography problems.
20
In addition to the above, the experts have also added the following comments on
allographs:
1. ALLOGRAPHS OF
P - ◌C
P - ◌C
2. ALLOGRAPHS OF
Q - ◌C
Q - ◌C
3. ALLOGRAPHS OF or %
- ◌C
R - ◌C
- ◌C %
R - ◌C %
6.3.1.6. Diacritics
◌@ Anuswaram
◌A Visargam
Avagraha/ Praslesham which is rarely used has been omitted.
6.3.1.7. Numerals
The experts have deemed that English Numerals (Latino-Arabic set: 0,1,2,3,4,5,6,7,8,9)
are alone used.
The Malayalam numeral set given below is rarely used.
7
The character can exist only as conjunct shape and hence has been inserted as an image.
21
S T U V W X Y Z [ \
A sample specimen of a calendar using Mala
Malayalam
yalam numerals is provided below by the
experts:
22
6.3.1.9 Other Symbols
These are religious, currency markers etc. included in Unicode:
൹-Malayalam Date Mark
൰-Malayalam number Ten
൱- Malayalam number one hundred
൲-Malayalam number one thousand
൳-Malayalam Fraction one quarter
൴- Malayalam Fraction one half
൵- Malayalam Fraction three quarters
₹: Rupee Sign as mandated by Government of India.
Due to constraints of space and also clarity, for each class a series of 3 tables are
provided.
Table 1:
Table 2:
Table 3: !
" # $ % & '
Wherever there is an X it implies that the combination does not exist. For the font
developer this is an indication that for this particular combination which is not possible in
the language but needs to be accommodated in the font table, a simple linear combination
be provided.
e.g. Although the combination of +Chandrakkala+ is theoretically not possible it
needs to be handled at the font level in the anticipation that a user could type this
combination. The font would show the following: C
23
6.3.2.1 Consonant and Mātrā combinations.
This set refers to a simple concatenation of Consonant and Mātrā.
◌6 6 6 6 6 6 6
6 6 6
6
◌7 7 7 7 7 7 7
7 7 7
7
◌8 8 8 8 8 8 8
8 8 8
8
◌9 9 9 9 9 9 9
9 9 9
9
◌: : : : : : :
: : :
:
◌; ; ; ; ; ; ;
; ; ;
;
<◌ < < < < < < <
< < <
=◌ = = = = = = =
= = =
>◌ > > > > > > >
> > >
<◌6 <6 <6 <6 <6 <6 < 6 <
6 <6 <6 <
6
=◌6 =6 =6 =6 =6 =6 = 6 =
6 =6 =6 =
6
◌? ? ? ? ? ? ?
? ? ?
?
This set is in continuation of set 1 which shows consonant and Matra combinations.
◌6 6 6 6 6 6 6 6 6 6 6
◌7 7 7 7 7 7 7 7 7 7 7
◌8 8 8 8 8 8 8 8 8 8 8
◌9 9 9 9 9 9 9 9 9 9 9
◌: : : : : : : : : : :
◌; ; ; ; ; ; ; ; ; ; ;
<◌ < < < < < < < < < <
=◌ = = = = = = = = = =
>◌ > > > > > > > > > >
<◌6 <6 <6 <6 <6 <6 <6 <6 <6 <6 <6
=◌6 =6 =6 =6 =6 =6 =6 =6 =6 =6 =6
◌? ? ? ? ? ? ? ? ? ? ?
24
Consonant and Mātrā combinations Set 3
This set is in continuation of set 2 which shows consonant and Matra combinations.
◌6 6 6 6 6 6 6 6
◌7 7 7 7 7 7 7 7
◌8 8 8 8 8 8 8 8
◌9 9 9 9 9 9 9 9
◌: : : : : : : :
◌; ; ; ; ; ; ; ;
<◌ < < < < < < <
=◌ = = = = = = =
>◌ > > > > > > >
<◌6 <6 <6 <6 <6 <6 <6 <6
=◌6 =6 =6 =6 =6 =6 =6 =6
◌? ? ? ? ? ? ? ?
' ! " # $
◌6 '6 6 6 !6 "6 #6 $6
◌7 '7 7 7 !7 "7 #7 $7
◌8 '8 8 8 !8 "8 #8 $8
◌9 '9 9 9 !9 "9 #9 $9
◌: ': : : !: ": #: $:
◌; '; ; ; !; "; #; $;
<◌ <' < < <! <" <# <$
=◌ =' = = =! =" =# =$
>◌ >' > > >! >" ># >$
<◌6 <'6 <6 < 6 <!6 <"6 <#6 <$6
=◌6 ='6 =6 = 6 =!6 ="6 =#6 =$6
◌? '? ? ? !? "? #? $?
25
Consonant and Mātrā combinations Set 5
This set is in continuation of set 4 which shows consonant and Matra combinations with retroflex
consonants % and &
% &
◌6 %6 &6
◌7 %7 &7
◌8 %8 &8
◌9 %9 &9
◌: %: &:
◌; %; &;
<◌ <% <&
=◌ =% =&
>◌ >% >&
<◌6 <%6 <&6
=◌6 =%6 =&6
◌? %? &?
◌6◌@ 6@ 6@ 6@ 6@ 6@ 6@
6@ 6@ 6@
6@
◌7◌@ 7@ 7@ 7@ 7@ 7@ 7@
7@ 7@ 7@
7@
◌8◌@ 8@ 8@ 8@ 8@ 8@ 8@
8@ 8@ 8@
8@
◌9◌@ 9@ 9@ 9@ 9@ 9@ 9@
9@ 9@ 9@
9@
◌:◌@ :@ :@ :@ :@ :@ :@
:@ :@ :@
:@
◌;◌@ ;@ ;@ ;@ ;@ ;@ ;@
;@ ;@ ;@
;@
<◌◌@ <@ <@ <@ <@ <@ < @ <
@ <@ <@ <
@
=◌◌@ =@ =@ =@ =@ =@ = @ =
@ =@ =@ =
@
>◌◌@ >@ >@ >@ >@ >@ > @ >
@ >@ >@ >
@
<◌6◌@ <6@ <6@ <6@ <6@ <6@ < 6@ <
6@ <6@ <6@ <
6@
=◌6◌@ =6@ =6 =6 =6 =6 = 6 =
6 =6 =6 =
6
◌?◌@ ?@ ?@ ?@ ?@ ?@ ?@
?@ ?@ ?@
?@
26
Consonant and Mātrā +Anuswaram combinations - Set 2
This set is in continuation of set 1 above which shows combinations of Consonant and
Mātrā + Anuswaram
◌6◌@ 6@ 6@ 6@ 6@ 6@ 6@ 6@ 6@ 6@ 6@
◌7◌@ 7@ 7@ 7@ 7@ 7@ 7@ 7@ 7@ 7@ 7@
◌8◌@ 8@ 8@ 8@ 8@ 8@ 8@ 8@ 8@ 8@ 8@
◌9◌@ 9@ 9@ 9@ 9@ 9@ 9@ 9@ 9@ 9@ 9@
◌:◌@ :@ :@ :@ :@ :@ :@ :@ :@ :@ :@
◌;◌@ ;@ ;@ ;@ ;@ ;@ ;@ ;@ ;@ ;@ ;@
<◌◌@ <@ <@ <@ <@ <@ <@ <@ <@ <@ <@
=◌◌@ =@ =@ =@ =@ =@ =@ =@ =@ =@ =@
>◌◌@ >@ >@ >@ >@ >@ >@ >@ >@ >@ >@
<◌6◌@ <6@ <6@ <6@ <6@ <6@ <6@ <6@ <6@ <6@ <6@
=◌6◌@ =6@ =6@ =6@ =6@ =6@ =6@ =6@ =6@ =6@ =6@
◌?◌@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@ ?@
This set is in continuation of set 2 above which shows combinations of Consonant and
Mātrā + Anuswaram
◌6◌@ 6@ 6@ 6@ 6@ 6@ 6@ 6@
◌7◌@ 7@ 7@ 7@ 7@ 7@ 7@ 7@
◌8◌@ 8@ 8@ 8@ 8@ 8@ 8@ 8@
◌9◌@ 9@ 9@ 9@ 9@ 9@ 9@ 9@
◌:◌@ :@ :@ :@ :@ :@ :@ :@
27
Consonant and Mātrā +Anuswaram combinations - Set 4
This set is in continuation of set 3 above which shows combinations of Consonant and
Mātrā + Anuswaram
' ! " # $
◌6◌@ '6@ 6@ 6@ !6@ "6@ #6@ $6@
◌7◌@ '7@ 7@ 7@ !7@ "7@ #7@ $7@
◌8◌@ '8@ 8@ 8@ !8@ "8@ #8@ $8@
◌9◌@ '9@ 9@ 9@ !9@ "9@ #9@ $9@
◌:◌@ ':@ :@ :@ !:@ ":@ #:@ $:@
% &
◌6◌@ %6@ &6@
◌7◌@ %7@ &7@
◌8◌@ %8@ &8@
◌9◌@ %9@ &9@
◌:◌@ %:@ &:@
◌;◌@ % ;@ &;@
<◌◌@ <%@ <&@
=◌◌@ =%@ =&@
>◌◌@ >%@ >&@
<◌6◌@ <%6@ <&6@
=◌6◌@ =%6@ =&6@
◌?◌@ %?@ &?@
28
6.3.3. The Ligature Set of Malayalam.
Malayalam has a large set of ligatural forms. These are combinations of
Consoanant+Chandrakkala (Halanta)+Consonant (CHC) or CHCHC or even rarer
CHCHCHC. The CHC combinations which are the most frequent are arranged in the
shape of a matrix: the abscissa or horizontal axis refers to the Consonant which
constitutes the ligature and the ordinate or vertical axis shows the consonant which forms
the ligature and which is followed by a Chandrakkala (Halanta).
As in 6.3.2. the ligature sets are divided into the following
6.3.3.1 CHC (in a matrix)
6.3.3.2 CHCHC
6.3.3.3.CHCHCHC
d d d d d d d d d d
d d d d d d d j d k
d d d d d d d d d d
d d d d d l d m d n
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
29
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
o d d d d d d d d d
d d d d d d d d d d
' d d d d d d d d d d
p d d d d d d d d d
d d d d d d d d d d
! d d d d d q d d d d
" r d d d d d d d d d
# s d d d d d d d d d
$ d d d d d d d d d d
% d d d d d d d d d d
& t d d d d u d d d d
30
The following set shows a combination of two consonants. To know how particular
combinations forms, select one consonant from the first column and second from first
row. For eg. Combination of consonant “” ⁿ⅛₤ “”№← ℮№℅ⁿ↑→r₧ “v”.
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
z d d d d d d d d d
d d d d d d d d d d
d d d { d d d d d d
d d d d d d d d d d
| d } d ~ d d d d d
d d d d d d d
d d d d d d d d d d
d d d d d d d d
d d d d d d d d d d
d d d d d
d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
31
d d d d d d d d d d
d d d d d d d d d
d d d d d d d d d d
' d d d d d d d d d d
d d d d d d d d d d
d d d d d d d d d d
! d d d d d d d d d d
" d d d d d d
# d d d d d d d d
$ d d d d d d d d d
% d d d d d d d d d d
& d d d d d d d d d
32
The following set shows a combination of two consonants. To know how particular
combinations forms, select one consonant from the first column and second from first
row. For eg. Combination of consonant “” ⁿ⅛₤ “”№← ℮№℅ⁿ↑→r₧ “Q”.
d d d d d d d d d d
d d d d d Q P
d d d d d
Q
P
d d d d d Q P
d d d d d Q ¡ ¡ d P
d d d d d d d d d
P
d d d d d Q ¢ ¢ d P
d d d d d Q d d d P
d d d d d Q £ £ d P
d d d d d Q ¤ ¤ d P
d d d d ¥ Q ¦ ¦ d P
d d d § ¨ Q © © ª P
d d d d d Q « « d P
d d d d d Q ¬ ¬ d P
d d d d d Q d P
d d d d ® Q d d d P
¯ d d d d Q O O R P
d d d d d Q ° ° ± P
d d ² d d Q ³ ³ ´ P
33
d d d d d Q µ µ d P
¶ d d d · Q ¸ ¸ ¹ P
º d d d » ¼ d d d P
d d d d d Q d d d P
' d d d d d 'Q d ½ d d
¾ d d d d Q d d ¿ P
d d d d d Q À À Á Â
! d d d d à !Q Ä Ä Å !P
# È d d d É #Q Ê Ê Ë #P
$ d d d d Ì $Q Í Í Î $P
% d d d d d %Q d d d %P
34
The following set shows a combination of two consonants. To know how particular
combinations forms, select one consonant from the first column and second from first
row. For eg. Combination of consonant “” ⁿ⅛₤ “"”№← ℮№℅ⁿ↑→r₧ “Ï”.
CHC( combination of two consonants) – Set 4
! " # $ % &
d Ï d d d
d d d d d
d d d d d
d d d d d
d d d d d d
d d d d d
d d d d d
d d d d d
d d d d d d
d d d d d d
d d d d d d
d d d d d d
d d d d d d
d d d d d d
d d d d d d
d d I d ª d
d d d d d d
d d d d d d
d d d d d d
d d d d d d
d d d d R d
d d d d ± d
d d d d ´ d
d d d d d d
d d d d ¹ d
d d d d d d
d d d d d d
35
' d d d d d d
d d d d d d
d d d d Á d
! Ð d d d Å d
" d d d d d d
# d d Ñ d Ë d
$ d d d d d d
% d d d d Ò d
& d d Ó d d d
36
In what follows the Chillu combinations with Consonants are presented:
Chillu + Consonant Clusters Set1
E d d d d d d d d d d
G d d d d d d d d d d
F d d d d d d d d d d
D d d d d d d d d d d
H d d d d d d d d d d
37
F d d d d d d
G d d d d d d
H d d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
d d d d d
' d d d d d
d d d d d
d d d d d
38
! d d d d d
" d d d d d
# d d d d d
$ d d d d d
% d d d d d
& d d d d d
39
6.3.3.2 CHCHC ( combination of three consonants)
These are not as frequent as the CHC combinations. Only the major are listed below.
"C+C+'=Õ C +C+=}Q
#C +C +' = Ö C +C += ×
C +C += Q C +'C += ©Q
C +C +Ú+=ÛQ
C +C +C + =ÜQ
C+ #C +C+= Ù Q
40
6.3.4 The Collation Order of Malayalam.
The collation order refers to the order in which the characters in a given language are
sorted. Sorting order is approved by govt. of kerala in
G.O.(MS)No:24/2001/ITD/22.11.01.It is published in the extra ordinary Gazette dated
18.12.2001. Report of the committee on Malayalam character Encoding and Keyboard
layout standardization – See chapter 5 reproduced below:
( ) * + ◌C(#@ ; ,6@)
, - . / 0
1 2 3 4
(D)
(E)
(◌@ )
(F ') (G )
! " # (◌A $) (H %) &
41
7. REFERENCES
1. http://www.unicode.org
2. ISCII’91
42
8. ANNEXURES
Annexure 1: Names of experts who have contributed to the script grammar
43
Annexure :2 Report of the committee on Malayalam character Encoding
and Keyboard layout standardization
44
Annexure 3: Unicode Table of MALAYALAM
1
The Unicode chart provided is for version 5.1 since the Script Grammar was prepared at that time. No
considerable change in the script grammar can be seen in the updated versions of Unicode, with the
possible addition of the Rupee Sign U+02B9
45
46
47