FINAL ANSCSE14 Proceedings W Cover

ANSCSE 14 Mae Fah Luang University, Chiang Rai, Thailand
March 23-26, 2010

ACKNOWLEDGMENTS

National Electronics and Computer Technology Center
(NECTEC)

Computational Science and Engineering Association
(CSEA)

Mae Fah Luang University
(MFU)
Program Summary
ANSCSE14 Mae Fah Luang University, Chiang Rai, Thailand
March 23-26, 2010

Tuesday, March 23, 2010

Time Building S1

09.00 10.00

Registration for Workshops

Room 101 Room 102

10.00 - 12.00

Workshop on Computational Material Science:
A Hands-on Tutorial

Workshop on Bioinformatics

12.00 13.00 LUNCH

13.00 14.45

A Hands-on Tutorial


14.45 15.00 BREAK

15.00 16.30

A Hands-on Tutorial


Wednesday, March 24, 2010

Time Building C3, Room 107

08.00 09.00

Registration

09.00 09.20

Opening Ceremony

9.05

- Welcome address and Report by Dean of School of Science, Mae Fah Luang University

9.15

09.20 10.00

- Opening Speech by the President of Mae Fah Luang University

Keynote Lecture

Chairman: Dr. Sornthep Vannarat

Dr. Mark Harris (NVIDIA)

Transforming Computational Science with CUDA

10.00 10.20 BREAK
Program Summary
March 23-26, 2010

10.20 12.00

Oral Presentation (6 parallel sessions)
Building C2
Room 208 Room 209 Room 210 Room 213 Room 214 Room 215

Computational
Chemistry (I)
Computational
Chemistry (II)
Computational
Fluid
Dynamics and
Solid
Mechanics (I)
Computational
Physics (I)
Computational
Biology and
Bioinformatics
(I)
Computer
Science and
Engineering
(I)

INV-3
INV-4

A00003
A00024

INV-1
INV-5

B00003
B00004

E00004
E00005
E00010
E00006
E00013

D00018
D00041
D00042
D00043
D00003

A00033
A00025
A00019
A00017
A00027

G00134
G00110
G00137
G00138
G00156
12.00 13.00 LUNCH

13.00 15.00



HPC (I)
Computational
Chemistry
(III)
Computational
Fluid
Dynamics and
Solid
Mechanics (II)
Computational
Physics (II)
Computational
Biology and
Bioinformatics
(II)
Computer
Science and
Engineering
(II)

F00006
F00001
F00014
F00019

B00007
B00008
B00011
B00012
B00014
B00016

E00022
E00023
C00005
E00027
E00007

D00015
C00012
D00011
D00005
D00022
D00062

A00036
A00028
A00015
A00039
A00007
A00014

G00008
G00034
G00082
G00119
G00131
G00152
15.00 15.20 BREAK

15.20 16.40

HPC (II)
Computational
Chemistry (IV)
Computational
Mathematics
(I)
Computational
Physics (III)
Computer
Science and
Engineering
(III)
Computer
Science and
Engineering
(IV)

F00002
F00007
F00012
F00009

B00017
B00019
B00020
B00021

C00015
C00002
C00004
C00014

D00010
D00012
D00016
D00061

G00026
G00160
G00006
G00135

G00089
G00093
G00115
G00039
LARN DAO, Mae Fah Luang University

16:40-17:00

Group Photo

Wiang Inn Hotel

18:00-20.00

Welcome Party
Program Summary
March 23-26, 2010

Thursday, March 25, 2010

Time Building C3, Room 107

09.00 09.40

Keynote Lecture

Chairman: Assoc. Prof. Dr. Supa Hannongbua

Prof. Kohji Tashiro (Department of Future Industry-oriented Basic Science and Materials, Graduate
School of Engineering, Toyota Technological Institute)

Harmonic Combination of Computer Simulation and Experimental Technique for the Study of
Structure-Property Relationship of Crystalline Polymers

09.40 10.00 BREAK

10.00 12.00

Building C2
Computational
Chemistry (V)
Computational
Chemistry (VI)
Computational
Mathematics
(II)
Computational
Physics (IV)
Computer
Science and
Engineering
(V)
Computer
Science and
Engineering
(VI)

INV-6
INV-8

B00006
B00042
B00050

INV-7
INV-2

B00022
B00027
B00030

C00011
C00013
C00020
C00009
G00072

D00009
D00017
D00047
D00056
D00048

G00002
G00080
G00151
G00081
G00090
G00038

G00141
G00123
G00041
G00120
G00070
G00013
12.00 13.00 LUNCH

13.00 15.00

HPC (III)
Computational
Chemistry
(VII)
Computational
Fluid
Dynamics and
Solid
Mechanics
(III)
Computational
Physics (V)
Computer
Science and
Engineering
(VII)
Computer
Science and
Engineering
(VIII)

F00016
F00013
F00015
F00010
F00011

B00033
B00034
B00035
B00037
B00038
B00040

E00021
E00008
E00003
E00001
E00018

D00007
D00031
D00037
D00008
D00040

G00164
G00059
G00075
G00091
G00068
G00128

G00003
G00025
G00027
G00094
G00140
G00133
15.00 15.20 BREAK
Program Summary
March 23-26, 2010

15.20 16.40

Computational
Chemistry
(VIII)
Computational
Chemistry (IX)
Computational
Biology and
Bioinformatics
(III)
Computational
Physics (VI)
Computer
Science and
Engineering
(IX)
Computer
Science and
Engineering
(X)

A00005
A00010
A00012
A00020
A00022

B00041
B00045
B00046
B00051
B00053
B00054
B00055

A00013
A00021
A00016
A00009
A00008

D00020
D00028
D00044
D00046

G00143
G00114
G00139
G00132
G00129
G00066

G00063
G00122
G00095
G00116
G00062

Friday, March 26, 2010
Site Excursion
8.00 16.30 Visit Doi Tung and Opium Hall

Oral Presentation Schedule
March 23-26, 2010
Building C2 Room 208

Wednesday, 24 March 2010

Computational Chemistry (I)
Chairman: Anan Tongraar
10.20-10.50 INV-3
Chalermpol Kanchanawarin, Molecular dynamics study of a mosquito
larvicidal protein Cry4Aa toxin from Bacillus thuringiensis in its trimeric
form
10.50-11.20 INV-4
Piyarat Nimmanpipug, Molecular Modeling of Low Energy Ion
Bombardment/Plasma Treatment in Polymer Design
11.20-11.40 A00003
Kan Sornbundit, Monte Carlo Simulation of Two-component Bilayers with
Interlayer Coupling
11.40-12.00 A00024
Prontipa Nokthai, Molecular Modeling of Peroxidase and Polyphenol
Oxidase : Substrate Specificity and Active Site Comparison
12:00-13:00 Lunch

High Performance Computing and Grid Computing (I)
Chairman: Sangsuree Vasupongayya
13.00-13.20 F00006
Sittikorn Thawornrattanawanit, Parallel Program Development for
Tsunami Simulation with the Message Passing Interface
13.20-13.40 F00001 Wongnaret Khantuwan, Multi-GPUs Voxelization of 3D Data
13.40-14.00 F00014
Jedsada Phengsuwan, Performance Evaluation of Cache Replacement
Policies for High-Energy Physic Data Grid
14.00-14.20

F00019
Klaokanlaya Silachan, Automatic Predictive URL-Categories
Classification to UM Model using Decision Tree Model
15:00-15:20 Break

High Performance Computing and Grid Computing (II)
Chairman: Chantana Phongpensri(Chantrapornchai)
15.20-15.40

F00002
Anucha Ruangphanit, Simulation Study of Channel Engineering Design
for Sub Micrometer Buried Channel PMOS Devices
15.40-16.00

F00007
Weera Pengchan, Optimization of Geometry of LOCOS Isolation in Sub
micrometer CMOS by TCAD Tools
16.00-16.20

F00012
Banpot Dolwithayakul, Solving Magnetic Sounding Integral Equations
from Multilayer Earth Using Message Passing Interface
16.20-16.40 F00009
Kanon Sujaree, Solving Nanocomputing Problem via Misic Inspired
Harmony Search Algorithm

March 23-26, 2010

Thursday, 25 March 2010

Computational Chemistry (V)
Chairman: Vannajan Sanghiran Lee
10.00-10.30 INV-6
Nadtanet Nunthaboot, Effects of Residues Changes on Human Receptor
Binding Affinity of H1N1 Hemagglutinins Insights from Molecular
Dynamics Simulation
10.30-11.00 INV-8
Thanyada Rungrotmongkol, Concerns, recent outbreak and molecular
insight into H5N1 and pandemic H1N1-2009 influenza A viruses
11.00-11.20 B00006
Nur kusaira Khairul ikram, Structure Based Drug Design for Swine Flu
Chemotherapeutics New Neuraminidase Inhibitors from Plants Natural
Compounds
11.20-11.40 B00042
Kanin Wichapong, Virtual Screening and Binding Free Energy Calculation
for Inhibitors of Dengue Virus NS2B/NS3 Protease
11.40-12.00 B00050
Panita Decha, Computational studies of HIV-1 Reverse Transcriptase
Inhibitors: as a Molecular Basis for Drug Development
12:00-13:00 Lunch

High Performance Computing and Grid Computing (III)
Chairman: Putchong Uthayopas
13.00-13.20 F00016 Sangsuree Vasupongayya, Impact of Workloads on Fair Share Policies
13.20-13.40 F00013
Sugree Phatanapherom, Parameters Self-Tuning Technique for Large
Scale Scheduler
13.40-14.00 F00015
Anupong Banjongkan, Modeling and Simulation of Large-scale
Virtualization based on the CloudSim Toolkit
14.00-14.20 F00010
Nopparat Nopkuat, Effective Workload Management Strategies for a
Cloud of Virtual Machine
14.20-14.40 F00011
Ekasit Kijsipongse, Two-Level Scheduling Technique for Mixed Best-
Effort and QoS Job Arrays on Cluster Systems
15:00-15:20 Break

Computational Chemistry (VIII)
Chairman: Chalermpol Kanchanawarin
15.20-15.40 A00005
Yie vern Lee, Virtual screening for inhibitors on isocitrate lyase of
Mycobacterium Tuberculosis with NADI database
15.40-16.00 A00010
Yee siew Choong, Isoniazid Resistance in Mycobacterium
Tuberculosis Inha Mutants
16.00-16.20 A00012
Sy bing Choi, Membrane Protein Simulation: A Case Study on Selected
Hypothetical protein from Klebsiella pneumoniae MGH78578
16.20-16.40 A00020
Kunan Bangphoomi, Finding new lead compound for anti-cancer drug by
using in silico screening technique
16.40-17.00 A00022
Jiraporn Yongpisanphop, De novo Design of HIV-1 Reverse
Transcriptase Inhibitor against K103N/Y181C Mutant: Bioinformatics
Approach

March 23-26, 2010


Computational Chemistry (II)
Chairman: Vudhichai Parasuk
10.20-10.50 INV-1
Hajime Hirao, QM/MM Computational Studies of Metalloenzymes:
Characterization of Elusive Intermediates and Elucidation of Reaction
Mechanisms
10.50-11.20 INV-5
Vannajan Sanghiran Lee, Molecular Dynamics Simulations of Antibody
ScFv Fragments without Disulfide Bond

11.20-11.40 B00003
Tuanjai Somboon, Hybrid Quantum Mechanical/Molecular Mechanical
studies on Two Families of cis,cis-Muconate Lactonizing Enzymes

11.40-12.00 B00004
Tammarat Piansawan, Kinetics of the Hydrogen Abstraction Cl + Alkane
HCl + Alkyl Reaction Class: An Application of the Reaction Class
Transition State Theory
12:00-13:00 Lunch

Computational Chemistry (III)
Chairman: Waraporn Parasuk
13.00-13.20 B00007
Apirak Payaka, QM/MM dynamics of HCOO--water hydrogen bonds in
aqueous solution
13.20-13.40 B00008
Rathawat Daengngern, Quantum Mechanics Simulation on Structure of
7-Azaindole(Methanol)
2
Cluster and Excited-State Triple-Proton Transfer
Reactions in the Gas Phase
13.40-14.00 B00011
Ang Lee sin, Computational Studies On The Structural Conformations of
N-Benzoyl-N-p-Substitued Phentlthiourea Derivatives
14.00-14.20

B00012
Supaporn Dokmaisrijan, Crystal structures and DFT studies on
[Tp
Ph2
Ni(S
2
CNR
2
)] (R = Et, Bz) and [Tp
Ph2
Ni(S
2
Cpyr)]
14.20-14.40 B00014
Atichat Wongkoblap, Computer Study for Characterization of Porous Solid
using Accessible Pore Volume Concept
14.40-15.00 B00016
Chutintorn Punwong, Direct QM/MM simulations of excited state
dynamics of Rhodopsin chromophore in different environments
15:00-15:20 Break

Computational Chemistry (IV)
Chairman: Piyarat Nimmanpipug
15.20-15.40 B00017
Muchtaridi ,Virtual Screening on Neuraminidase Inhibitors activity of
plant- derived natural products by using Pharmacophore Modelling and
Docking
15.40-16.00 B00019
Arthit Vongachariya, Electronic and Mechanical Properties on B-N Doped
Single-wall Carbon Nanotubes
16.00-16.20 B00020
Nopporn Kaiyawet, Mutation of Hemagglutinin H5 can change recognition
to human sialic acid-2,6-galactose using in silico technique
16.20-16.40 B00021
Arthitaya Meeprasert, A Comparative Study of Structural and Binding
Affinity of Pyrrolidinyl PNA and DNA Using MD Simulations

March 23-26, 2010


Computational Chemistry (VI)
Chairman: Supa Hannongbua
10.20-10.50 INV-7
Wolfgang Sippl, Computer-based methods in drug design how useful are
they?
10.50-11.20 INV-2
Yuthana Tantirungrotechai, A Molecular Dynamics Study of Carbazole
Derivatives as Universal Base
11.00-11.20 B00022
Thanisorn Yakhantip, Theoretical Study of Organic Molecules Use in
Dye-Sensitized Solar Cell (DSSC) Based on Time Dependent-Density
Functional Theory (TD-DFT)
11.20-11.40 B00027
Krit Prasittichok, Classification of Thai Fragrant Rice (Oryza sativa) Using
Gas Chromatographic Profiles in Conjunction with Statistical Methods
11.40-12.00 B00030
Thantip Krasienapibal, The effect of electron-donating groups on the
conducting property of polythiophene derivatives using PBC calculation
12:00-13:00 Lunch

Computational Chemistry (VII)
Chairman: Yuthana Tantirungrotechai
13.00-13.20 B00033
Jitrayut Jitonnom, QM/MM Study On The Catalytic Mechanism
of Family 18 Chitinase
13.20-13.40 B00034
Anurak Udomvech, Theoretical Study of Li and Li+ intercalated in
Double-Walled Carbon Nanotubes
13.40-14.00 B00035
Waleepan Sangprasert, Molecular Calculation of Plasma Treatment
Efficiency on PMMA and FRC as Denture Materials
14.00-14.20 B00037
Kanjarat Sukrat, To the best estimation of reaction barriers for proton
exchange reactions of C1-C4 alkanes in ZSM-5 zeolite
14.20-14.40 B00038
Mohd razip Asaruddin, Neuraminidase Inhibitor Identification by
Pharmacophore Modelling and Docking from NADI-VA compound
14.40-15.00 B00040
Janchai Yana, MD simulation of Nafion surface modification
by Ar+ bombardment
15:00-15:20 Break

Computational Chemistry (IX)
Chairman: Nawee Kungwan
15.20-15.40 B00041
Somphob Thompho, Influence of the silanol groups on the external surface
of silicalite-1 on the adsorption dynamics of methane
15.40-16.00 B00045
Purinchaya Sornmee, Loading of Doxorubicin on Single-Walled Carbon
Nanotube by MD Simulations
16.00-16.20 B00046
Uthumpon Arsawang, Molecular Dynamics Simulations of GEMZAR
encapsulated in carbon nanotube
16.20-16.40 B00051
Sufian M. Nawi, Docking of Dengue Virus Methyltransferase Inhibitor
from Nadi Database (In House Malaysian Medicinal Plant Database)
16.40-17.00 B00053
Auradee Punkvang, Investigating the Binding of Arylamide Derivatives as
Tuberculosis Agent in InhA using
Molecular Dynamics Simulations
17.00-17.20 B00054
Mayuree Phonyiem , Proton transfer reactions and dynamics at
sulfonic acid groups of Nafion
17.20-17.40 B00055 Charoensak Loa-ngam, Proton Conduction at Sulfonate Group of Nafion

March 23-26, 2010


Computational Fluid Dynamics and Solid Mechanics (I)
Chairman: Ekachai Juntasaro
10.20-10.40 E00004
Anat Srimungkala, Multiphysics Analysis of Gas Turbine Blade Cooling
using Computational Fluid Dynamics (CFD)
10.40-11.00 E00005
Jenwit Soparat, Computational Study of Totally Enclosed Fan Cooled
System in an Electric Induction Motor
11.00-11.20 E00010
Chaiwut Gamonpilas, Characterisation of Non-linear Viscoelastic
Properties via Indentation Techniques
11.20-11.40 E00006
Theeradech Mookum, Numerical Simulation of Two-Phase Flows and
Heat Transfer in Continuous Steel Casting Process
11.40-12.00 E00013
Perakit Viriyarattanasak, Semi-Solid Die Casting Mold Development
Utilizing CAE Technique
12:00-13:00 Lunch

Computational Fluid Dynamics and Solid Mechanics (II)
Chairman: Sirod Sirisup
13.00-13.20 E00022
Vejapong Juttijudata, Kinematics and Dynamics of Coherent Structures
within a Turbulent Spot in Plane Channel Flow
13.20-13.40 E00023
Kiattisak Ngiamsoongnirn, Towards an Extension of the SST-k- Model
for Transitional Flow
13.40-14.00 C00005 Wariam Chuayjan, Pressure Distribution along the Silo Wall
14.00-14.20 E00027
Bupavech Phansri, Inelastic Transient Dynamic Analysis by BEM Using
Domain Decomposition
14.20-14.40 E00007
Wattana Kanbua, Forecasting Tropical Cyclone Movement by Neural
Network
15:00-15:20 Break

Computational Mathematics (I)
Chairman: John Chiverton
15.20-15.40 C00015
Sergey Meleshko, On linearization of stochastic ordinary differential
equations
15.40-16.00 C00002
Nopparat Pochai, A Numerical Computation for Water Quality Model in a
Non-Uniform Flow Stream Using Maccormack Scheme
16.00-16.20 C00004
Sirod Sirisup, Tidal Analysis with Error Estimates: Local and Repositories
Variations
16.20-16.40 C00014
Songkran Siridejachai, Filling Incomplete Wind Speed Data by Using
Kriging Interpolation

March 23-26, 2010


Computational Mathematics (II)
Chairman: Sornthep Vannarat
10.00-10.20 C00011
Mohd Rivaie, A Comparative Study of Conjugate Gradient Method for
Unconstrained Optimization
10.20-10.40 C00013
Pongwit Promsuwan, A Matrix Partitioning Technique for Distributed
Solving Large Linear Dense Equations
10.40-11.00 C00020
Raywat Tanadkithirun, A Study on Numerical Methods for Mean-
Reverting Square Root Processes with Jumps
11.00-11.20 C00009
John Chiverton, Comparison of Reversible Feature Extraction Techniques
Applied to Anatomical Shape Modelling
11.20-11.40 G00072 Sutthinun Naknoi, Filter Rules and Thai big capital stocks trading
12:00-13:00 Lunch

Computational Fluid Dynamics and Solid Mechanics (III)
Chairman: Vejapong Juttijudata
13.00-13.20 E00021
Somporn Chuai-aree, VirtualFlood3D : Software for Simulation and
Visualization of Water Flooding
13.20-13.40 E00008 Wattana Kanbua, Analysis of Coastal Erosion by Using Wave Spectrum
13.40-14.00 E00003
Sirod Sirisup, Coastal Simulation of the Gulf of Thailand: Effects of tidal
forcing
14.00-14.20 E00001
Panat Guayjarernpanishk, Linear and weakly nonlinear solutions of
subcritical free-surface flow over submerged obstacles
14.20-14.40 E00018
Pairin Suwannasri, Numerical Simulation of the Fluid Flow Past a
Rotating Torus
15:00-15:20 Break

Computational Biology (III)
Chairman: Piyarat Nimmanpipug
15.20-15.40 A00013
Suwat Jutapruet, Mark-Recapture Model Testing for Indo-Pacific
Humpback Dolphin Population at Khanom Sea, Nakhon Si Thammarat
15.40-16.00 A00021
Uthai Kuhapong, Cross Association of Sea Surface Temperature of 13
Sites in Thailand
16.00-16.20 A00016
Sirilak Chumkiew, Sea Surface Temperature Declines at Coral Sites Using
Field Sensors and NOAA Data
16.20-16.40 A00009
Premrudee Noonsang, Developing Business Intelligent Tools for NBIDS
Coral Database System
16.40-17.00 A00008
Wittaya Pheera, Cluster Analysis of Temperature-Relative Humidity Data
at Mt. NOM Cloud Forest

March 23-26, 2010


Computational Physics (I)
Chairman: Anucha Yangthaisong
10.20-10.40 D00018
Thanapol Chanapote, Electronic Structures and Thermoelectric Properties
of SrTiO
3

10.40-11.00 D00041
Winya Dungkaew, Phase Characterization and Saturation Modeling of the
Calcium Phosphate-Arsenate Apatite System
11.00-11.20 D00042
Oratai Saisa-ard, Phase Characterization and Saturation Modeling of the
Calcium-Lead Phosphate Apatite System
11.20-11.40 D00043
Samroeng Krachodnok, Order-Disorder Structure in a New Zinc
Oxovanadate, [Zn(Im)
4
][V
2
O
6
]
11.40-12.00 D00003
Busara Pattanasiri, Vacancy-mediated dynamics with quenched disorder in
binary alloy: Monte Carlo simulations and dynamic scaling
12:00-13:00 Lunch

Computational Physics II
Chairman: Anant Eungwanichayapant
13.00-13.20 D00015
Achara Seripienlert, Collimation of Particle Beams by Two-Dimensional
Turbulent Structure
13.20-13.40 C00012 Thaipanya Chanpoom, Two-Dimensional Bisoliton Model in Cuprates
13.40-14.00 D00011
Charong Buachan, Effect of Magnetic Turbulence Structure on the Parallel
Transport of High Energy Particles
14.00-14.20 D00005
Noparit Jinuntuya, Numerical Investigations of the Distributions of
Elementary Excitations of the Bimodal Ising Spin Glass
14.20-14.40 D00022
Kathawut Kulsirirat, The Critical Temperature of Transition Energy of
Single Quantum Well
14.40-15.00 D00062
Suwat Pabchanda, First principle study on the optical band-edge
absorption of Fe-doped SnO
2

15:00-15:20 Break

Computational Physics (III)
Chairman: Piyanate Chuychai
15:20-16:40 D00010
Watcharawuth Krittinatham, Diffusion of Galactic Cosmic Rays in an
Interplanetary Magnetic Flux Rope
15.40-16.00 D00012
Nattapong Kamyan, Secondary Neutrons from Cosmic Rays in Earths
Atmosphere above the Princess Sirindhorn Neutron Monitor
16.00-16.20 D00016
Peerasak Sangarun, Computational Classification of Cloud Forest Using
Atmospheric Data from Field Sensors
16.20-16.40 D00061
Alejandro Saiz, On the Estimation of Solar Particle Fluence at Jupiter's
Orbit

March 23-26, 2010


Computational Physics (IV)
Chairman: Kenneth Haller
10.00-10.20 D00009
Ang Lee Sin, First Principle Investigations of Electronic Structures and
Hyperfine Interactions of Muonium in Tetraphenylmethane
10.20-10.40 D00017
Piyawong Poopanya, Band structures and thermoelectric properties of
CuAlO
2
from first-principles calculations
10.40-11.00 D00047
Ahchareeya Srisaikum, Electronic structures of CoSb3 calculated by first
principle method
11.00-11.20 D00056
Watchareeya Chaiyarat, First-principles study of cubic perovskites
Ba
1-x
Sr
x
TiO
3

11.20-11.40 D00048
Chewa Thassana, The Effect of the Coulomb Interaction and Exchange
Interaction on Spin Magnetic Moment of MnO
12:00-13:00 Lunch

Computational Physics (V)
Chairman: Rungrote Nilthong
13.00-13.20 D00007 Ang Lee Sin, The Effects of Dangling Bond Terminators in MOF-5
13.20-13.40 D00031
Monta Meepripruek, Solvation in 3-[(2-hydroxyethoxy)-methyl]-6-methyl-
3H-imidazolo [1,2-a]purin-9(5H)-one dihydrate; C
11
H
13
N
5
O
3
2H
2
O
13.40-14.00 D00037
Ratchadaporn Puntharod, Molecular and Supramolecular Structure of
Fe(OEP)picrate
14.00-14.20 D00008 Ang Lee Sin, Hyperfine Interactions of Muonium in Graphene
14.20-14.40 D00040
Kenneth Haller, Redetermination of the Structure of the Radical Cation of
9,9-Bis-9-azabicyclo[3.3.1]nonane
15:00-15:20 Break

Computational Physics (VI)
Chairman: Anucha Yangthaisong
15.20-15.40 D00020
Piyapong Premvaranon, The Study of Illuminance and Thermal Effect in
High Power LED Arrays
15.40-16.00 D00028
Weenawan Somphon, Refinement of a One-Dimensional Modulated
Structure
16.00-16.20 D00044
Weera Pengchan, The Defect Generated in PN Junction Analysis by the
Arrhenius Activation Energy Technique
16.20-16.40 D00046
Weera Pengchan, Diagnostics of Ion Implantation with 0.8 micron CMOS
Technology based on TCAD Simulation

March 23-26, 2010

Computational Biology (I)
Chairman: Jeerayut Chaijaruwanich
10:20-12:00 A00033
Pongmanee Thongbai, Estimating Carbon Sequestration of J. curcas L.
from Plant CO2 Assimilation and Dry Matter Accumulation
10.40-11.00 A00025
Somporn Chuai-aree, Automatic Measurement of Plant Growth Using
Region Growing Method
11.00-11.20 A00019
Visanu Wanchai, A study of niche adaptation in Cyanobacteria via
evolutionary scenario of photosynthetic machinery
11.20-11.40 A00017
Piyachat Udomwong, A Conditional Random Fields-Based for CpG islands
prediction in Rice
11.40-12.00 A00027
Worrawat Engchuan, The estimation of SNP-SNP interaction in pooled
DNA
12:00-13:00 Lunch

Computational Biology (II)
Chairman: Vannajan Sanghiran Lee
13.00-13.20 A00036
Wai keat Yam, Using MM-PBSA Method to Further Understand Molecular
Interaction in Large Ribosomal Subunit-Macrolide System
13.20-13.40 A00028
Panisa Treepong, Effects of RNA Quality on Gene Expression Functional
Profiles
13.40-14.00 A00015
Sitthichoke Subpaibbonkit, RNA Secondary Structure Prediction Using
Conditional Random Fields Model
14.00-14.20 A00039
Sasiprapa Krongdang, 3D Pharmacophore and Molecular Docking of AFB
Metalloprotease and Peptide Analogs for Novel Antimicrobial Inhibitors
14.20-14.40 A00007
Siriwan Wongkoon, Developing Predictive Models for Dengue
Haemorrhagic Fever Incidence Rate in Chiang Rai, Thailand
14.40-15.00 A00014
Thanaphongphan Narathanathanan, Predicting Functional Pathway of
Nevirapine inducing Skin Adverse Drug Reaction in HIV-infected Thai
Patients with Integrated Biological Networks
15:00-15:20 Break

Computational Science and Engineering (III)
Chairman: Vara Varavithya
15:20-16:40 G00026
Puangrat Jinpon, Developing Dashboard Decision Support System For
Subdistrict Administration Organization Network
15.40-16.00 G00160
Kinzang Wangdi, Credit Application Classification: A Case Study of
National Pension and Provident Fund of Bhutan
16.00-16.20 G00006
Jutarat Khiripet, Decision Tree Era Classification of Ancient Thai
Inscriptions
16.20-16.40 G00135
Kinzang Wangdi, Classification of Loan Borrowers of National Pension
and Provident Fund of Bhutan: A Case Study

March 23-26, 2010


Computational Science and Engineering (V)
Chairman: Nawapak Eua-anant
10.00-10.20 G00002
Alagan Anpalagan, Use of Genetic Algorithm in Computing the Capacity
of a Discrete Memoryless Channel and Corresponding Symbol Probability
Distribution
10.20-10.40 G00080
Sirimas Pongjanla, Variation Analysis of Neural Network Based
Approximation Function
10.40-11.00 G00151
Wanapun Waiyawut, Thai Numeric Hand Written Character Recognition
by Counter propagation and Hopfield Neural Network
11.00-11.20

G00081
Jaratsri Rungrattanaubol, Artifitial Neural Network and Kriging Model
Approximatios for The Deterministic Output Response
11.20-11.40 G00090
Pisit Nakjai, Multilayer Neural Networks for contacting load model of
Distributive Tactile Sensing
11.40-12.00 G00038
Wattana Kanbua, Decision Support System for Prediction Air Temperature
in Northern Part of Thailand by using Neural Network
12:00-13:00 Lunch

Computational Science and Engineering (VII)
Chairman: Chantana Chantrapornchai
13.00-13.20 G00164
Perawat Boonpuek, Development of Free Bulge Test Tooling for Flow
Stress Curve Determination of Tubular Materials
13.20-13.40 G00059
Somrerk Poodchakarn, Misalignment Compensation of Sheet Metal
Forming Tool by Loop-shaping Controller
13.40-14.00 G00075
Ramm Khamkaew, Survey of Metaheuristic Methodology for Solving
Container Loading Problem

14.00-14.20 G00091
Nitisak Charoenroop, Public Transport Route Design for Minimal Energy
Consumption
14.20-14.40 G00068
Shongpun Lokavee, The Geometry and Electronic Structures of
Functionalized Single-Walled Carbon Nanotubes by Carboxyl Groups on
Perfects and Defect Tubes
14.40-15.00 G00128
Kittipong Hi-ri-o-tappa, Development of Real-Time Short-Term Traffic
Congestion Prediction Method
15:00-15:20 Break

Computational Science and Engineering (IX)
Chairman: Putchong Uthayopas
15.20-15.40 G00143
Pariwat Wongsamran, Power Management for WLAN DAM
Environmental Monitoring System
15.40-16.00 G00114
Phayong Sornsiriaphilux, On Applying Simple Data Compression to
Wireless Sensor Networks
16.00-16.20 G00139 Wuttichai Wongsarasin, Web Spam Recognization by Edge Label
16.20-16.40 G00132
Thanawit Kumkurn, Application of Optical Data Glove to Hand Gesture
Interpretation
16.40-17.00 G00129
Pongdej Saovapakhiran, Clustering of Search Results:
A Case Study of Thai-Language Web Pages
17.00-17.20 G00066 Pratarn Chotipanbandit, Data Hiding and Security for Printed Documents

March 23-26, 2010


Computational Science and Engineering (I)
Chairman: Sornthep Vannarat
10.20-10.40 G00134 Napat Rujeerapaiboon, Flexible Grammar Recognization Algorithm
10.40-11.00 G00110
Noppadon Khiripet, Subgraph Isomorphism Search for Network Motif
Mining
11.00-11.20 G00137
Chaiwat Suwansaroj, Implementation of QRS detection with Python
on Linux system
11.20-11.40 G00138
Kawin Worrasangasilpa, Parallel Additive Operation in Flexible Interval
Representation System
11.40-12.00 G00156
Surachai Panich, Development of Mobile Robot Based on Differential
Drive Integrated with Accelerometer
12:00-13:00 Lunch

Computational Science and Engineering (II)
Chairman: Vara Varavithya
13.00-13.20 G00008
Rapeepun Boonsin, Natural Scene Matching using Inexact Maximum
Common Subgraph
13.20-13.40 G00034
Niyada Rukwong, Determining Appropriate Parameter Setting of Firefly
Algorithm Using Experimental Design and Analysis
13.40-14.00 G00082
Rosemarin Sukhasem, Analysis of Centers Initialization on K-means
Performance
in Clustering Problem
14.00-14.20 G00119
Chantana Chantrapornchai, Exploration of Parallelism in Developing
Fuzzy Applications
14.20-14.40 G00131
Anakapon Wiengpon, A Modified Version of Adaptive Arithmetic
Encoding Algorithm
14.40-15.00 G00152
Sithar Dorji, A Novel Hybrid Clustering Method for Customer
Segmentation
15:00-15:20 Break

Computational Science and Engineering (IV)
Chairman: Jeerayut Chaijaruwanich
15:20-16:40 G00089
Siriprapa Ritraksa, Structural Model of Blood Vessels in Heart Using
Lindenmayer Systems
15.40-16.00 G00093
Piyamas Suapang, Automatic Vessels Edge Detection for Low-Contrast
Babys Retinal Images
16.00-16.20 G00115 Ariya Namvong, Facial Reconstruction from Skull
16:20-16:40 G00039
Nittaya Kerdprasop, Probabilistic Knowledge Discovery from Medical
Databases

March 23-26, 2010


Computational Science and Engineering (VI)
Chairman: Sirod Sirisup
10.00-10.20 G00141
Somporn Chuai-aree, KML Generator for Visualizing of Numerical
Results from Weather and Ocean Wave Simulation in Google Earth API
10.20-10.40 G00123
Supat Sairattanain, One-Dimensional Hydrodynamic Calibration
Study of Mae Lao River Flow
10.40-11.00 G00041
Parkpoom Khamchuay, Classify Freshwater Fish Using Morphometric
Analysis and Image Processing Technique
11.00-11.20 G00120
Panuwat Mekha, Determination of Sequence-Similarity Kernel Function
for Support Vector Machines in Classification of Influential Endophytic
Fungi in Rice on Bakanae Disease
11.20-11.40 G00070
Itthi Sa-nguandee, An Improvement of Rainfall Estimation in Thailand
Using FY-2C Numerical Data
11.40-12.00 G00013 Adil Siripatana, Wind Circle 3D Visualization of Direction Weather Data
12:00-13:00 Lunch

Computational Science and Engineering (VIII)
13.00-13.20 G00003 Nunnapad Toadithep, Image Processing for Rice Diseases Analysis
13.20-13.40 G00025 Surapong Uttama, Adaptive Window Size for Spatial Image Segmentation
13.40-14.00 G00027
Sarin Watcharabutsarakham, Noise Reduction of Ancient Document
Images
14.00-14.20 G00094
Piyamas Suapang, Image Acquisition and Image Processing Program for
Dermatology Camera
14.20-14.40 G00140
Prat Nudklin, Enhanced Image Watermarking Using Adaptive Pixel
Prediction and Local Variance
14.40-15.00 G00133 Thitiporn Pramoun, Improved Image Watermarking using Pixel Averaging
15:00-15:20 Break

Computational Science and Engineering (X)
15.20-15.40 G00063
Hataikan Chiverton, Histogram Specification for Variable Illumination
Correction of Face Images
15.40-16.00 G00122
Bhovornsak Somkror, Digital Watermarking with 2D Barcode and General
Watermark using DCT for JPEG Image
16.00-16.20 G00095
Piyamas Suapang, Image Acquisition and Image Processing Program for
Firearms and Toolmarks Comparision in Forensic Science
16.20-16.40 G00116
Ninasree Charawae, Online Object Detection Program Using Fast Image
Processing
16.40-17.00 G00062
John Chiverton, Model Based Motor Vehicle Segmentation and Type
Classification Using Shape Based Background Subtraction
Contents
March 23-26, 2010

Keynote Lectures

Page
KY-1 Transforming Computational Science with CUDA
Harris, M.

KY-1
KY-2 Harmonic Combination of Computer Simulation and Experimental
Technique for the Study of Structure-Property Relationship of
Crystalline Polymers
Tashiro, K.

KY-2

Invited Lectures

INV-1 QM/MM Computational Studies of Metalloenzymes:
Characterization of Elusive Intermediates and Elucidation of
Reaction Mechanisms
Hirao, H.

INV-1
INV-2 A Molecular Dynamics Study of Carbazole Derivatives as Universal
Base
Tantirungrotechai, Y.

INV-2
INV-3 Molecular dynamics study of a mosquito larvicidal protein Cry4Aa
toxin from Bacillus thuringiensis in its trimeric form
Kanchanawarin, C.

INV-3
INV-4 Molecular Modeling of Low Energy Ion Bombardment/Plasma
Treatment in Polymer Design
Nimmanpipug, P.

INV-4
INV-5 Molecular Dynamics Sinulations of Antibody ScFv Fragments
without Disulfide Bond
Lee, V. S.

INV-5
INV-6 Effects of Residues Changes on Human Receptor Binding Affinity of
H1N1 Hemagglutinins Insights from Molecular Dynamics Simulation
Nunthaboot, N.

INV-6
INV-7 Computer-based methods in drug design how useful are they?
Sippl, W.

INV-7
INV-8 Concerns, recent outbreak and molecular insight into H5N1 and
pandemic H1N1-2009 influenza A viruses
Rungrotmongkol, T.

INV-8
Contents
March 23-26, 2010
Parallel Sessions

Computational Biology

A00007 Developing Predictive Models for Dengue Haemorrhagic Fever
Incidence Rate in Chiang Rai, Thailand
Wongkoon, S., Jaroensutasinee, M., and Jaroensutasinee, K.

1
A00008 Cluster Analysis of Temperature-Relative Humidity Data
Pheera, W., Jaroensutasinee, K., and Jaroensutasinee, M.

7

A00009 Developing Business Intelligent Tools for NBIDS Coral Database
System
Noonsang, P., Jaroensutasinee, M., and Jaroensutasinee, K.

13
A00013

Mark-Recapture Model Testing for Indo-Pacific Humpback Dolphin
Population at Khanom Sea, Nakhon Si Thammarat
Jutapruet, S., Jaroensutasinee, K., and Jaroensutasinee, M.

19
A00014 Predicting Functional Pathway of Nevirapine inducing Skin Adverse
Drug Reaction in HIV-infected Thai Patients with Integrated
Biological Networks
Narathanathanan, T., Prom-on, S., Chantratita, W.,
and Mahasirimongkol, S.

25
A00015 RNA Secondary Structure Prediction Using Conditional Random
Fields Model
Subpaiboonkit, S., Thammarongtham, C., Cutler, R.,
and Chaijaruwanich, J.

31
A00016 Sea Surface Temperature Declines at Coral Sites Using Field Sensors
and NOAA Data
Chumkiew, S., Jaroensutasinee, M., and Jaroensutasinee, K.

32
A00017 A Conditional Random Fields-Based for CpG islands prediction in
Rice
Udomwong, P., Lee, V. S., Anuntalabhochai, S., and Chaijaruwanich, J.

39
A00019 A study of niche adaptation in Cyanobacteria via evolutionary
scenario of photosynthetic machinery
Wanchai, V., Prommeenate, P., Paithoonrangsarid, K., Hongsthong, A.,
Senachak, J., Panyakampol, J., Plengvidhya, V., and Cheevadhanarak, S.

44
A00021 Cross Association of Sea Surface Temperature of 13 Sites in Thailand
Kuhapong, U., Jaroensutasinee, K., and Jaroensutasinee, M.

50
A00025 Automatic Measurement of Plant Growth Using Region Growing
Method
Chuai-Areee, S., Siripant, S., and Jaeger, W.

51
A00027 The estimation of SNP-SNP interaction in pooled DNA
Engchuan, W., Prom-on, S., Chan, J. H., and Meechai, A.
59
Contents
March 23-26, 2010
A00028 Effects of RNA Quality on Gene Expression Functional Profiles
Treepong, P., Prom-on, S., Chan, J. H., Meechai, A.,
and Hirankarn, N.

65
A00033 Estimating Carbon Sequestration of J. curcas L. from Plant CO2
Assimilation and Dry Matter Accumulation
Thongbai, P., Hadiwijaya, B., and Sengkeaw, P.

71
A00036 Using MM-PBSA Method to Further Understand Molecular
Interaction in Large Ribosomal Subunit-Macrolide System
Yam, W. K. and Wahab, H. A.

77
A00039 3D Pharmacophore and Molecular Docking of AFB Metalloprotease
and Peptide Analogs for Novel Antimicrobial Inhibitors
Krongdang, S., Chantawannakul, P., Nimmanpipug, P., and Lee, V. S.

84
Computational Chemistry

A00003 Monte Carlo Simulation of Two-component Bilayers with Interlayer
Coupling
Sornbundit, K., Ngamsaad, W., Triampo, D., and Triampo, W.

85
A00005 Virtual screening for inhibitors on isocitrate lyase of Mycobacterium
Tuberculosis with NADI database
Lee, Y. V., Choong, Y. S., and Wahab, H. A.

92
A00010 Isoniazid Resistance in Mycobacterium Tuberculosis Inha Mutants
Choong, Y. S. and Wahab, H. A.

98
A00012 Membrane Protein Simulation: A Case Study on Selected
Hypothetical protein from Klebsiella pneumoniae MGH78578
Choi, S. B., Normi, Y. M., and Wahab, H. A.

99
A00020 Finding new lead compound for anti-cancer drug by using in silico
screening technique
Bangphoomi, K. and Choowongkomon, K.

106
A00022 De novo Design of HIV-1 Reverse Transcriptase Inhibitor against
K103N/Y181C Mutant: Bioinformatics Approach
Yongpisanphop, J., Saparpakorn, P., Hannongbua, S., and
Ruengjitchatchawalya, M.

112
A00024 Molecular Modeling of Peroxidase and Polyphenol Oxidase :
Substrate Specificity and Active Site Comparison
Nokthai, P., Shank, L., and Lee, V. S.

118
B00003 Hybrid Quantum Mechanical/Molecular Mechanical studies on Two
Families of cis,cis-Muconate Lactonizing Enzymes
Somboon, T., Gleeson, M. P., and Hannongbua, S.

119
Contents
March 23-26, 2010
B00004 Kinetics of the Hydrogen Abstraction
Cl + Alkane HCl + Alkyl Reaction Class:
An Application of the Reaction Class Transition State Theory
Piansawan, T., Sattayanon, C., Daengngern, R., Yakhantip, T.,
Kungwan, N., and Truong, T. N.

120
B00006 Structure Based Drug Design for Swine Flu Chemotherapeutics New
Neuraminidase Inhibitors from Plants Natural Compounds
Ikram, N. K. K. and Wahab, H. A.

121
B00007 QM/MM dynamics of HCOO--water hydrogen bonds in aqueous
solution
Payaka, A. and Tongraar, A.

122
B00008 Quantum Mechanics Simulation on Structure of
2
Cluster and Excited-State Triple-Proton
Transfer Reactions in the Gas Phase
Daengngern, R., Barbatti, M., and Kungwan, N.

130
B00011 Computational Studies On The Structural Conformations of
N-Benzoyl-N-p-Substitued Phentlthiourea Derivatives
Deraman, R. , Mohamed-Ibrahim, M. I., Sulaiman, S., Ang, L. S. , and
Hussim, M. H.

135
B00012 Crystal structures and DFT studies on
[Tp
Ph2
Ni(S
2
CNR
2
Ph2
Ni(S
2
Cpyr)]
Dokmaisrijan, S., Harding, P., and Harding, D.

140
B00014 Computer Study for Characterization of Porous Solid using
Accessible Pore Volume Concept
Klomkliang, N., Wongkoblap, A., Tangsathitkulchai, C., and Do, D.D.

141
B00016 Direct QM/MM simulations of excited state dynamics of Rhodopsin
chromophore in different environments
Punwong, C. and Martnez, T. J.

150
B00017 Virtual Screening on Neuraminidase Inhibitors activity of plant-
derived natural products by using Pharmacophore Modelling and
Docking
Muchtaridi and Wahab, H. A.

151
B00019 Electronic and Mechanical Properties on B-N Doped
Vongachariya, A., Parasuk, V. , and Bovornratanaraks, T.

159
B00020 Mutation of Hemagglutinin H5 can change recognition to human
sialic acid-2,6-galactose using in silico technique
Kaiyawet, N., Rungrotmongkol, T., Malaisree, M., Decha, P.,
Sompornpisut, P. , and Hannongbua, S.

160
B00021 A Comparative Study of Structural and Binding Affinity of
Pyrrolidinyl PNA and DNA Using MD Simulations
Meeprasert, A., Kaiyawet, N., Rungrotmongkol, T., Sompornpisut, P.,
and Hannongbua, S.

161
Contents
March 23-26, 2010
B00022 Theoretical Study of Organic Molecules Use in Dye-Sensitized Solar
Cell (DSSC) Based on Time Dependent-Density Functional Theory
(TD-DFT)
Yakhantip, T., Jungsuttiwong, S., and Kungwan, N.

162
B00027 Classification of Thai Fragrant Rice (Oryza sativa) Using Gas
Chromatographic Profiles in Conjunction with Statistical Methods
Prasittichok, K., Prasitwattanaseree, S., and Wongpornchai, S.

163
B00030 The effect of electron-donating groups on the conducting property of
polythiophene derivatives using PBC calculation
Krasienapibal, T. S., Itngom, P., Ekgasit, S., Ruangpornvisuti, V. , and
Vchirawongkwin, V.

170
B00033 QM/MM Study On The Catalytic Mechanism of Family 18 Chitinase
Jitonnom, J., Nimmanpipug, P., Mulholland, A.J., and Lee, V.S.

174
B00034 Theoretical Study of Li and Li+ intercalated in Double-Walled
Carbon Nanotubes
Udomvech, A., Page, A. J., Kerdcharoen, T., and Morokuma, K.

175
B00035 Molecular Calculation of Plasma Treatment Efficiency on PMMA
and FRC as Denture Materials
Sangprasert, W., Lee, V. S., Boonyawan, D., and Nimmapipug, P.

176
B00037 To the best estimation of reaction barriers for proton exchange
reactions of C1-C4 alkanes in ZSM-5 zeolite
Sukrat, K., Parasuk, V., Tunega, D., Aquino, A. J. A., and Lischka, H.

177
B00038 Neuraminidase Inhibitor Identification by Pharmacophore Modelling
and Docking from NADI-VA compound
Asaruddin, M. R. and Wahab, H. A.

178
B00040 MD simulation of Nafion surface modification by Ar+ bombardment
Yana, J., Lee, V. S., Vannarat, S., Dokmaisrijan, S., Medhisuwakul, M.,
Vilaithong, T., and Nimmanpipug, P.

179
B00041 Influence of the silanol groups on the external surface of silicalite-1 on
the adsorption dynamics of methane
Thompho, S., Chanajaree, R., Remsungnen, T., Fritzsche, S., and
Hannongbua, S.

180
B00042 Virtual Screening and Binding Free Energy Calculation for Inhibitors
of Dengue Virus NS2B/NS3 Protease
Wichapong, K., Pianwanit, S., Sippl, W., and Kokpol, S.

181
B00045 Loading of Doxorubicin on Single-Walled Carbon Nanotube by MD
Simulations
Sornmee, P., Arsawang, U., Rungrotmongkol, T., Saengsawang, O.,
Intharathep, P., Sukrat, K., Remsungnen, T. and Hannongbua, S.

182
B00046 Molecular Dynamics Simulations of GEMZAR encapsulated in
carbon nanotube
Arsawang, U., Sornmee, P., Saengsawang, O., Rungrotmongkol, T.,
Intharathep, P., Pianwanit, A., Remsungnen, T., Hannongbua, S.
188
Contents
March 23-26, 2010
B00050 Computational studies of HIV-1 Reverse Transcriptase Inhibitors: as
a Molecular Basis for Drug Development
Decha, P., Intharathep, P., Udommaneethanakit, T., Sompornpisut, P.,
Hannongbua, S., Wolschann, P., and Parasuk, V.

189
B00051 Docking of Dengue Virus Methyltransferase Inhibitor from Nadi
Database (In House Malaysian Medicinal Plant Database)
Nawi, M. S. M., Wahab, H. A., Rahman, N. A., and Hamid, S. A.

190
B00053 Investigating the Binding of Arylamide Derivatives as Tuberculosis
Agent in InhA using Molecular Dynamics Simulations
Punkvang, A., Wolschann, P., Beyer, A., and Pungpo, P.

191
B00054 Proton transfer reactions and dynamics at
Phonyiem, M. and Sagarik, K.

196
B00055 Proton Conduction at Sulfonate Group of Nafion
Lao-ngam, Ch. and Sagarik, K.

201
Computational Mathematics

C00002 A Numerical Computation for Water Quality Model in a Non-
Uniform Flow Stream Using Maccormack Scheme
Pochai, N., Konglok, S. A., and Tangmanee, S.

202
C00004 Tidal Analysis with Error Estimates: Local and Repositories
Variations
Sirisup, S., Tomkratoke, S., and Harnsamut, N.

203
C00009 Comparison of Reversible Feature Extraction Techniques Applied to
Anatomical Shape Modelling
Chiverton, J.

209
C00011 A Comparative Study of Conjugate Gradient Method for
Rivaie, M., Mamat, M., Mohd, I., and Fauzi, M.

214
C00013 A Matrix Partitioning Technique for Distributed Solving Large
Linear Dense Equations
Promsuwan, P. and Charnsethikul, P.

220
C00014 Filling incomplete wind speed data by using kriging interpolation
Siridejachai, S., Ruttanapun, C., and Vannarat, S.

229
C00015 On linearization of stochastic ordinary differential equations
Meleshko, S.V. and Shulz, E.

234
C00020 A Study on Numerical Methods for Mean-Reverting Square Root
Processes with Jumps
Sirisup, S., Tanadkithirun, R., and Wong, K.

235
Contents
March 23-26, 2010
G00072 Filter Rules and Thai big capital stocks trading
Naknoi, S. and Kittiwutthisakdi, K.

241
Computational Physics

C00012 Two-Dimensional Bisoliton Model in Cuprates
Chanpoom, T.

247
D00003 Vacancy-mediated dynamics with quenched disorder in binary alloy:
Monte Carlo simulations and dynamic scaling
Pattanasiri, B., Nuttavut, N., Triampo, D., and Triampo, W.

253
D00005 Numerical Investigations of the Distributions of Elementary
Excitations of the Bimodal Ising Spin Glass
Jinuntuya, N. and Poulter, J.

254
D00007 The Effects of Dangling Bond Terminators in MOF-5
Hussim, M. H., Sulaiman, S., Mohamed-Ibrahim, M. I., Deraman, R., and
Ang, L. S.

258
D00008 Hyperfine Interactions of Muonium in Graphene
Ang, L. S., Sulaiman, S., and Mohamed-Ibrahim, M. I.

263
D00009 First Principle Investigations of Electronic Structures and Hyperfine
Interactions of Muonium in Tetraphenylmethane
Sulaiman, S., Mohamed-Ibrahim, M. I. , Toh, P., Ang L. S. , and
Jayasooriya, U. A.

269
D00010 Diffusion of Galactic Cosmic Rays in an Interplanetary Magnetic
Flux Rope
Krittinatham, W., Ruffolo, D., and Bieber, J. W.

274
D00011 Effect of Magnetic Turbulence Structure on the Parallel Transport of
High Energy Particles
Buachan, C., Ruffolo, D., Siz, A., Seripienlert, A., and Matthaeus, W.

275
D00012 Secondary Neutrons from Cosmic Rays in Earths Atmosphere above
the Princess Sirindhorn Neutron Monitor
Kamyan, N., Ruffolo, D., Siz, A., and Tooprakai, P.

276
D00015 Collimation of Particle Beams by Two-Dimensional Turbulent
Structure
Seripienlert, A., Tooprakai , P., Ruffolo, D., Chuychai, P.,
and Matthaeus, W. H.

281
D00016 Computational Classification of Cloud Forest Using Atmospheric
Data from Field Sensors
Sangarun, P., Pheera, W., Jaroensutasinee, K., and Jaroensutasinee, M.

282
D00017 Band structures and thermoelectric properties of CuAlO
2
from first-
principles calculations
Poopanya, P. and Yangthaisong, A.

287
Contents
March 23-26, 2010
D00018 Electronic Structures and Thermoelectric Properties of SrTiO
3

Chanapote, T., Yangthaisong, A., and Vannarat, S.

293
D00020 The Study of Illuminance and Thermal Effect in High Power LED
Arrays
Premvaranon, P., Pratumwal, Y., Teralapsuwan, A., and Soparat, J.

299
D00022 The Critical Temperature of Transition Energy of Single Quantum
Well
Techitdheera, W., Kulsirirat, K., and Pecharapa, W.

308
D00028 Refinement of a One-Dimensional Modulated Structure
Somphon, W., Haller, K.J. and Oeckler, O.M.

314
D00031 Solvation in 3-[(2-hydroxyethoxy)-methyl]-6-methyl-3H-imidazolo
[1,2-a]purin-9(5H)-one dihydrate; C
11
H
13
N
5
O
3
2H
2
O
Meepripruek, M. and Haller, K. J.

315
D00037 Molecular and Supramolecular Structure of Fe(OEP)picrate
Puntharod, R., Haller, K. J., and Wood, B. R.

316
D00040 Redetermination of the Structure of the Radical Cation of 9,9-Bis-9-
azabicyclo[3.3.1]nonane
Haller, K. J. and Boonkon, P.

317
D00041 Phase Characterization and Saturation Modeling of the Calcium
Phosphate-Arsenate Apatite System
Dungkaew, W., Saisa-ard, O., and Haller, K. J.

318
D00042 Phase Characterization and Saturation Modeling of the Calcium-
Lead Phosphate Apatite System
Saisa-ard, O., Dungkaew, W., and Haller, K.J.

324
D00043 Order-Disorder Structure in a New Zinc Oxovanadate,
Zn(Im)
4
][V
2
O
6
]
Krachodnok, S., Haller, K.J., and Williams, I. D.

325
D00044 The Defect Generated in PN Junction Analysis by the Arrhenius
Activation Energy Technique
Pengchan, W., Cheirsirikul, S., Phetchakul, T., Ruangphanit, A.,
and Poyai, A.

326
D00046 Diagnostics of Ion Implantation with 0.8 micron CMOS Technology
based on TCAD Simulation
Pengchan, W., Cheirsirikul, S., Phetchakul, T., Ruangphanit, A., and
Poyai, A.

331
D00047 Electronic structures of CoSb3 calculated by first principle method
Srisaikum, A., Yangthaisong, A., and Tanpipat, N.

336
D00048 The Effect of the Coulomb Interaction and Exchange Interaction on
Spin Magnetic Moment of MnO
Thassana, C. and Techitdheera, W.

341
Contents
March 23-26, 2010
D00056 First-principles study of cubic perovskites Ba
1-x
Sr
x
TiO
3
Chaiyarat, W. and Yangthaisong, A.

345
D00061 On the Estimation of Solar Particle Fluence at Jupiter's Orbit
Siz, A., Ruffolo, D., Bieber, J. W., and Evenson, P.

346
D00062 First principle study on the optical band-edge absorption of Fe-doped
SnO
2

Pabchanda, S., Putpan, J., Laopaiboon, R., and Yangthaisong, A.

347
Computational Fluid Dynamics and Solid Mechanics

C00005 Pressure Distribution along the Silo Wall
Chuayjan, W., Wiwatanapataphee, B., Wu, Y. H., and Tang, I.M.

348
E00001 Linear and weakly nonlinear solutions of subcritical free-surface flow
over submerged obstacles
Guayjarernpanishk, P. and Asavanant, J.

349
E00003 Coastal Simulation of the Gulf of Thailand: Effects of tidal forcing
Tomkratoke, S., Vannarat, S., and Sirisup, S.

350
E00004 Multiphysics Analysis of Gas Turbine Blade Cooling using
Computational Fluid Dynamics (CFD)
Srimungkala, A., Dechaumphai, P., and Juntasaro, V.

356
E00005 Computational Study of Totally Enclosed Fan Cooled System in an
Electric Induction Motor
Soparat, J. , Benyajati, C., Pitaksapsin, N., Wattanawongsakun, P., and
Phuchamnong, A.

362
E00006 Numerical Simulation of Two-Phase Flows and Heat Transfer in
Continuous Steel Casting Process
Mookum, T., Wiwatanapataphee, B., Wu, Y. H.,
and Orankitjaroen, S.

369
E00007 Forecasting Tropical Cyclone Movement by Neural Network
Kanbua, W., Khetchaturat, C., and Visuthsiri, K.

370
E00008 Analysis of Coastal Erosion by Using Wave Spectrum
Kanbua, W., Khetchaturat, C., and Chuai-aree, S.

377
E00010 Characterisation of Non-linear Viscoelastic Properties via Indentation
Techniques
Gamonpilas, C., Charalambides, M. N., and Williams, J. G.

385
E00013 Semi-Solid Die Casting Mold Development Utilizing CAE Technique
Viriyarattanasak, P., Koichi, A., Masayuki, I., and Osamu, N.

391
E00018 Numerical Simulation of the Fluid Flow Past a Rotating Torus
Suwannasri, P. and Moshkin, N. P.

398
Contents
March 23-26, 2010
E00021 VirtualFlood3D : Software for Simulation and Visualization of Water
Flooding
Busaman, A., Chuai-Aree, S., Kanbua, W., and Siripant, S.

399
E00022 Kinematics and Dynamics of Coherent Structures within a Turbulent
Spot in Plane Channel Flow
Juttijudata, V.

400
E00023 Towards an Extension of the SST-k- Model for Transitional Flow
Ngiamsoongnirn, K., Malan, P., and Juntasaro, E.

411
E00027 Inelastic Transient Dynamic Analysis by BEM Using Domain
Decomposition
Phansri, B., Park, K., and Warnitchai, P.

412
High Performance Computing and Grid Computing

F00001 Multi-GPUs Voxelization of 3D Data
Khantuwan, W. and Khiripet, N.

417
F00002 Simulation Study of Channel Engineering Design for Sub Micrometer
Buried Channel PMOS Devices
Ruangphanit, A., Phongphanchanthra, N., Klungien, N., Muanglhua, R.,
Niemcharoen, S., and Khunhao, S.

423
F00006 Parallel Program Development for Tsunami Simulation with the
Message Passing Interface
Thawornrattanawanit, S., Virochsiri, K., Muangsin, V., and
Ruangrassamee, A.

429
F00007 Optimization of Geometry of LOCOS Isolation in Sub micrometer
CMOS by TCAD Tools
Phongphanchantra, N., Pengchan, W., Atiwongsangthong, N. and
Ckeersirikul, S.

436
F00009 Solving Nanocomputing Problem via Misic Inspired Harmony Search
Algorithm
Sujaree, K., and Wacharanad, S.

441
F00010 Effective Workload Management Strategies for a Cloud of Virtual
Machine
Noppakuat, N. , Seangrat, J., and Uthayopas, P.

442
F00011 Two-Level Scheduling Technique for Mixed Best-Effort and QoS Job
Arrays on Cluster Systems
Kijsipongse, E., U-ruekolan, S., and Vannarat, S.

448
F00012 Solving Magnetic Sounding Integral Equations from Multilayer Earth
Using Message Passing Interface
Dolwithayakul, B., Chantrapornchai, C., and Yooyeunyong, S.

454
F00013 Parameters Self-Tuning Technique for Large Scale Scheduler
Phatanapherom, S. and Uthayopas, P.
460
Contents
March 23-26, 2010

F00014 Performance Evaluation of Cache Replacement Policies for High-
Energy Physic Data Grid
Phengsuwan, J. and Nupairon, N.

464
F00015 Modeling and Simulation of Large-scale Virtualization based on the
CloudSim Toolkit
Banjongkan, A., Prueksaaroon, S., Varavithya, V., and Vannarat, S.

471
F00016 Impact of Workloads on Fair Share Policies
Vasupongayya, S.

478
F00019 Automatic Predictive URL-Categories Classification to UM Model
using Decision Tree Model
Silachan, K.

484
Computational Science and Engineering

G00002 Use of Genetic Algorithm in Computing the Capacity of a Discrete
Memoryless Channel and Corresponding Symbol Probability
Distribution
Anpalagan, A. and Sabri, M.

490
G00003 Image Processing for Rice Diseases Analysis
Distsatien, A., Wilaisil, W., and Toadithep, N.

494
G00006 Decision Tree Era Classification of Ancient Thai Inscriptions
Khiripet, J. and Khiripet, N.

501
G00008 Natural Scene Matching using Inexact Maximum Common Subgraph
Boonsin, R. and Khiripet, N.

505
G00013 Wind Circle 3D Visualization of Direction Weather Data
Siripatana, A., Jaroensutasinee, K., and Jaroensutasinee, M.

511
G00025 Adaptive Window Size for Spatial Image Segmentation
Uttama, S.

517
G00026 Developing Dashboard Decision Support System For Subdistrict
Administration Organization Network
Jinpon, P., Jaroensutasinee, M., and Jaroensutasinee, K.

518
G00027 Noise Reduction of Ancient Document Images
Watcharabutsarakham, S., Marukatat, S., and Sinthupinyo, S.

524
G00034 Determining Appropriate Parameter Setting of Firefly Algorithm
Using Experimental Design and Analysis
Rukwong, N., Pansuwan, P. and Pongcharoen, P.

529
G00038 Decision Support System for Prediction Air Temperature in Northern
Part of Thailand by using Neural Network
Kanbua, W. and Khetchaturat, C.

535
Contents
March 23-26, 2010
G00039 Probabilistic Knowledge Discovery from Medical Databases
Kerdprasop, N. and Kerdprasop, K.

543
G00041 Classify Freshwater Fish Using Morphometric Analysis and Image
Processing Technique
Khamchuay, P., Jaroensutasinee, K., and Jaroensutasinee, M.

549
G00059 Misalignment Compensation of Sheet Metal Forming Tool by Loop-
shaping Controller
Poodchakarn, S. Sriprapai, D., Budcharoentong, D.,
Saimek, S., and Thanadngarn, C.

554
G00062 Model Based Motor Vehicle Segmentation and Type Classification
Using Shape Based Background Subtraction
Chiverton, J. and Uttama, S.

565
G00063 Histogram Specification for Variable Illumination Correction of Face
Images
Chiverton, H. and Chiverton, J.

571
G00066

Data Hiding and Security for Printed Documents
Chotipanbandit, P. and Vongpradhip, S.

575
G00068 The Geometry and Electronic Structures of Functionalized Single-
Walled Carbon Nanotubes by Carboxyl Groups on Perfects and
Defect Tubes
Lokavee, S., Udomvech, A., and Kerdcharoen, T.

581
G00070 An Improvement of Rainfall Estimation in Thailand
Sa-nguandee, I., Raksapatcharawong, M., and Veerakachen, W.

587
G00075 Survey of Metaheuristic Methodology for Solving Container Loading
Problem
Khamkaew, R. and Somhom, S.

593
G00080 Variation Analysis of Neural Network Based Approximation Function
Pongjanla, S. and Anussornnitisarn, P.

598
G00081 Artifitial Neural Network and Kriging Model Approximatios for The
Deterministic Output Response
Rungrattanaubol, J., Nakjai, P., and Na-udom, A.

603
G00082 Analysis of Centers Initialization on K-means Performance
Sukhasem, R. and Anussornnitisarn, P.

610
G00089 Structural Model of Blood Vessels in Heart Using Lindenmayer
Systems
Ritraksa, S., Chuai-Aree, S., and Saelim, R.

619
G00090 Multilayer Neural Networks for contacting load model of Distributive
Tactile Sensing
Nakjai, P. and Rungrattanaubol, J.

626
Contents
March 23-26, 2010
G00091 Public Transport Route Design for Minimal Energy Consumption
Charoenroop, N., Nilthong, R., and Eungwanichayapant, A.

632
G00093 Automatic Vessels Edge Detection for Low-Contrast Babys Retinal
Images
Suapang, P., Chuwhite, M. and Nghauylha, W.

641
G00094 Image Acquisition and Image Processing Program for Dermatology
Camera
Suapang P., Mueanpong, D., Sanglub, S., and Haomao, B.

646
G00095 Image Acquisition and Image Processing Program for Firearms and
Toolmarks Comparision in Forensic Science
Suapang, P., Prasitsathapron, C., and Janpuk, S.

652
G00110 Subgraph Isomorphism Search for Network Motif Mining
Khiripet, J., Khantuwan, W., and Khiripet, N.

658
G00114 On Applying Simple Data Compression to Wireless Sensor Networks
Sornsiriaphilux, P., Thanapatay, D., Kaemarungsi, K., and Araki, K.

662
G00115 Facial Reconstruction from Skull
Namvong, A. and Nilthong, R.

668
G00116 Online Object Detection Program Using Fast Image Processing
Charawae, N., Chuai-Aree, S., Wikaisuksakul, S.

675
G00119 Exploration of Parallelism in Developing Fuzzy Applications
Chantrapornchai (Phongpensri), C. and Pipatpaisan, J.

676
G00120 Determination of Sequence-Similarity Kernel Function for Support
Vector Machines in Classification of Influential Endophytic Fungi in
Rice on Bakanae Disease
Mekha, P. and Chaijaruwanich, J.

677
G00122 Digital Watermarking with 2D Barcode and General Watermark
using DCT for JPEG Image
Somkror, B. and Boonchieng, E.

678
G00123 One-Dimensional Hydrodynamic Calibration
Sairattanain, S., Nilthong, R., Eungwanichayapant, A., and Saenton, S.

683
G00128 Development of Real-Time Short-Term Traffic Congestion Prediction
Method
Hi-ri-o-tappa, K., Pan-ngum, S., Narupiti, S., and Pattara-Atikom, W.

689
G00129 Clustering of Search Results: A Case Study of Thai-Language Web
Pages
Sukriket, P., Sangchai, C., Saovapakhiran, P., Surarerks, A., and
Rungsawang, A.

703
G00131 A Modified Version of Adaptive Arithmetic Encoding Algorithm
Wiengpon, A. and Surarerks, A.

709
Contents
March 23-26, 2010
G00132 Application of Optical Data Glove to Hand Gesture Interpretation
Kumkurn, T. and Eua-anant, N.

715
G00133 Improved Image Watermarking using Pixel Averaging
Pramoun, T. and Amornraksa, T.

716
G00134 Flexible Grammar Recognization Algorithm
Rujeerapaiboon, N., and Surarerks, A.

722
G00135 Classification of Loan Borrowers of National Pension and Provident
Fund of Bhutan: A Case Study
Wangdi, K., Prayote, A., Phalavonk, U.

728
G00137 Implementation of QRS detection with Python on Linux system
Suwansaroj, C., Thanapatay, D., Thanawattano, C., and Sugino, N.

729
G00138 Parallel Additive Operation in Flexible Interval Representation
System
Worrasangasilpa, K., Jarangkul, W., and Surarerks, A.

730
G00139 Web Spam Recognization by Edge Label
Wongsarasin, W., Rungsawang, A., and Surarerks, A.

737
G00140 Enhanced Image Watermarking Using Adaptive Pixel Prediction and
Local Variance
Nudklin, P. and Amornraksa, T.

743
G00141 KML Generator for Visualizing of Numerical Results from Weather
and Ocean Wave Simulation in Google Earth API
Chuai-Aree, S. and Kanbua, W.

750
G00143 Power Management for WLAN DAM Environmental Monitoring
System
Wongsamran, P., Araki, K., Keinprasit, R., Lewlomphaisarl, U., and
Kasetkasem, T.

751
G00151 Thai Numeric Hand Written Character Recognition by Counter
propagation and Hopfield Neural Network
Waiyawut, W.

758
G00152 A Novel Hybrid Clustering Method for Customer Segmentation
Dorji, S. and Meesad, P.

759
G00156 Development of Mobile Robot Based on Differential Drive Integrated
with Accelerometer
Panich, S.

765
G00160 Credit Application Classification: A Case Study of National Pension
and Provident Fund of Bhutan
Wangdi, K., Prayote, A., and Phalavonk, U.

772
G00164 Development of Free Bulge Test Tooling for Flow Stress Curve
Determination of Tubular Materials
Boonpuek, P., Jirathearanat, S., Depaiwa, N., and Ohtake, N.

779
Contents
March 23-26, 2010

Author Index

789

ANSCSE 14 COMMITTEE
795

March 23-26, 2010
KY-1
Transforming Computational
Science with CUDA

Mark Harris

NVIDIA Corporation

Modern GPUs provide a level of massively parallel computation that was once the preserve of
specialized supercomputers. NVIDIAs latest GPUs are fully programmable, massively
multithreaded processors with hundreds of scalar processor cores capable of delivering
hundreds of billions of operations per second. The NVIDIA CUDA architecture provides a
parallel programming model that enables developers to program GPUs in C, C++, and Fortran,
as well as specialized GPU Computing languages such as OpenCL and Microsoft
DirectCompute. Researchers across many scientific and engineering disciplines are using this
platform to accelerate important computations by up to 2 orders of magnitude.

In this talk, we will provide an overview of NVIDIA GPU architectures and explore the
transition GPU Computing represents in massively parallel computing: from the domain of
supercomputers to that of commodity manycore hardware available to all. We will also
introduce CUDA, a scalable parallel programming model and software environment for
parallel programming. By providing a small set of readily understood extensions to the C/C++
languages, CUDA C allows programmers to focus on writing efficient parallel algorithms
without the burden of learning a multitude of new programming constructs.

Finally, we will examine the use of GPU computing in a variety of Computational Science
applications, and discuss how the rapid evolution of GPU hardware has enabled a transition
from brute-force parallel number crunching to the efficient use of sophisticated parallel data
structures that make possible much more complex simulations.

KY-2
March 23-26, 2010
Harmonic Combination of Computer
Simulation and Experimental Technique
for the Study of Structure-Property
Relationship of Crystalline Polymers

Kohji Tashiro

Department of Future Industry-Oriented Basic Science and Materials,
Toyota Technological Institute, Tempaku, Nagoya 468-8511, Japan
e-mail: ktashiro@toyota-ti.ac.jp; Fax: +81-(0)52-809-1793; Tel. +81-(0)52-809-1790

ABSTRACT
For developing the polymer materials with excellent physicochemical properties, it is
needed to clarify the relationship between structure and properties of these polymers.
In particular, the information on crystal structure is the most important as the basic
knowledge to understand the characteristic features of polymer aggregation state. Since
the crystalline region itself is more or less disordered and small in size, the direct
determination of crystal structure is quite difficult because of poor X-ray diffraction
data. It is rather useful to combine the various techniques together including the X-ray
diffraction, vibrational spectroscopy, etc. Especially the computer simulation technique
is highly useful for aiding the structure determination process based on the X-ray
diffraction data. In these few decades we have been challenged to develop a new idea
to combine the experimental and simulation methods in a harmonic way. Some case
studies will be shown here to emphasize the usefulness of computer-aided method to
clarify the structure-property relationship of crystalline polymers.

(1) Disordered Chain Packing Mode of PBO fiber Poly(p-phenylene
benzobisoxazole) (PBO) is one of the strongest synthetic fibers. The crystal structure
was analyzed successfully on the basis of the organized combination of poor X-ray
diffraction data and computer simulation method. The chains were found to be packed
with a so-called registered disorder about the relative height [1].

(2) Computer-aided Prediction of Crystal Phase Transitions in Nylon 10/10 This
polymer shows the phase transitions at the two stages. One is the so-called Brill
transition in which the methylene segments are disordered with keeping the
intermolecular hydrogen bonds. Another phase transition was newly discovered in the
temperature region immediately below the melting point, where the intermolecular
hydrogen bonds were broken and the thermally-activated molecular chain motions
occurred violently. The molecular dynamics calculation had predicted this
experimentally-found structural transition reasonably [2].

(3) Accurate Theoretical Evaluation of Mechanical Property of Polymer Crystals
The anisotropy in mechanical property of polymer crystals is in general governed by
intermolecular nonbonded H...H interactions. The precise determination of hydrogen
atomic positions is important for the accurate prediction of mechanical property. The
wide-angle neutron diffraction combined with the X-ray diffraction has made it possible
to get the accurate hydrogen atomic positions successfully, from which the quantitative
theoretical evaluation of anisotropic elastic constants have been made successfully [3].

KY-2
March 23-26, 2010
(4) Computer Prediction of Packing Modes of Low-molecular-weight Model
Compounds For analyzing the crystal structure of polymer substances, the
structural information of low-molecular-weight model compounds is quite
useful. The Polymorph Predictor (Accelrys, USA) has made it possible to
predict the accurate crystal structure of these compounds. The successful results
were obtained for a series of model compounds of poly(m-phenylene
isophthalamide) and their new derivatives [4].

Keywords: Computer-aided Structure Analysis of Polymer Crystals,
Crystal Structure Prediction by Computer Simulation Technique

REFERENCES
1. K. Tashiro et al., J. Polym. Sci. Part B: Polym. Physics, 39, 1296 (2001).
2. K. Tashiro et al., Chinese Journal of Polymer Science, 25, 73 (2007).
3. K. Tashiro et al., Polym. J., 39, 1253 (2007).
4. Piyarat Nimmanpipug et al., J. Phys. Chem.B. 107, 8343 (2003); 110, 20858 (2006).
INV-1
March 23-26, 2010
QM/MM Computational Studies of
Metalloenzymes: Characterization of
Elusive Intermediates and Elucidation of
Reaction Mechanisms

Hajime Hirao

Fukui Institute for Fundamental Chemistry, Kyoto University, JAPAN
E-mail: hirao@fukui.kyoto-u.ac.jp, Tel: +81-75-711-7647

ABSTRACT
Metallonzymes permit chemically very difficult reactions to proceed under mild
conditions, and thereby play indispensable roles in a variety of biological activities. We
can learn a lot from their ingenious machineries that are finely tuned to individual
catalytic reactions, and the knowledge, in turn, would open up new avenues for
practical applications such as rational design of potent enzyme inhibitors and
development of powerful biomimetic catalysts. To understand in detail the molecular
mechanisms of metalloenzymes, we have been doing computational studies using
QM/MM approaches, which are able to characterize even unstable or short-lived
species in reactions, with the effect of protein environment adequately taken into
account. In particular, we are focusing on characterization of elusive intermediates and
elucidation of reaction mechanisms, since these are key elements for a global
understanding of the catalytic functions of metalloenzymes. In this talk, some of our
recent work will be presented.

REFERENCES
1. Insights into the (Superoxo)Fe(III)Fe(III) Intermediate and Reaction Mechanism
of myo-Inositol Oxygenase: DFT and ONIOM(DFT:MM) Study, H Hirao and K.
Morokuma, J. Am. Chem. Soc. 2009, 131, 17206-17214.
2. Reactivity Patterns of High-Valent Iron-Oxo Species in Enzymes and
Synthetic Reagents: A Tale of Many States, S. Shaik, H. Hirao, and D. Kumar,
Acc. Chem. Res. 2007, 40, 532-542.
3. Reactivity Patterns of Cytochrome P450 Enzymes: Multifunctionality of the
Active Species and the Two States Two Oxidants Conundrum, S. Shaik, H. Hirao,
and D. Kumar. Nat. Prod. Rep. 2007, 24, 533-552

March 23-26, 2010
INV-2
A Molecular Dynamics Study of Carbazole
Derivatives as Universal Base

Y. Tantirungrotechai
1,C
, T. Benchawan
2
, and U. Wichai
2

1
National Nanotechnology Center (NANOTEC), National Science and
Technology Development Agency Pathumthani, 12120, Thailand,

2
Department of Chemistry, Faculty of Science, Naresuan University, Phitsanulok, 65000, Thailand
C
E-mail: yuthana@nanotec.or.th; Fax: +66-2-564-6985; Tel. +66-2-564-7100 ext 6592

ABSTRACT
Molecular dynamics simulations with constant temperature and pressure (NPT) were
carried out for the DNA duplex with 15 base pairs in explicit waters by using AMBER
package. A base in the middle position of the strand was substituted by carbazole
derivatives which is designed to be functioned as a universal base. Equilibrium
trajectories of 1 nanosecond show that there is a structural change in the duplex. A
large fluctuation occurs in the case of carbazole; this might be due to its nonpolar
nature. In addition to a common hydrogen bonding configuration, a slip configuration
was also observed for this universal base. The effect of nonpolar carbazole and its polar
carbazole derivatives on the duplex structures will be discussed in terms of electrostatic
and stacking interactions with neighboring base molecules.

Keywords: Carbazole, MD, Universal base.

INV-3
March 23-26, 2010

Molecular Dynamics Study of a Mosquito
Larvicidal Protein Cry4Aa Toxin from
Bacillus thuringiensis in Its Trimeric Form

T. Taveecharoenkool
1
, C. Angsuthanasombat
2
, and
C. Kanchanawarin
3,C

1
Department of Immunology, Faculty of Medicine, Siriraj Hospital,
Mahidol University, Bangkok, 10700, Thailand
2
Laboratory of Molecular Biophysics and Structural Biochemistry, Institute MolecularBiosciences,
Mahidol University, Salaya Campus, Nakornpathom, 73170, Thailand
3
Biophysics Laboratory, Department of Physics, Faculty of Science, Kasetsart University,
Bangkok, 10900, Thailand
C
E-mail: fscicpk@ku.ac.th; Fax: 02-942-8029; Tel. 085-819-4455

ABSTRACT
Cry4Aa toxin is one of the mosquito-larvicidal proteins produced by Bacillus
thuringiensis. It is thought to form trimeric pores in the larval gut membrane, causing
membrane leakage and subsequent insect death. In this study, a full-atomic pre-pore
structure of the Cry4Aa trimer was constructed by using the trimeric Cry4Ba coordinate
of the unit cell crystal structure as a template. Molecular dynamics simulations and
MM-PBSA (Molecular Mechanics and Poisson-Boltzmann Surface Area) calculations
were employed to show that the trimeric structure of Cry4Aa is stable in solution. The
results also revealed that Cry4Aa toxin uses electrostatic and steric interactions between
polar and charged residues on -helices 3, 4 and 6 to form trimer. We propose that pore
formation of Cry toxins may involve a 90
o
-hairpin rotation during the insertion of three
4-5 hairpins into the membrane.

Keywords: molecular dynamics simulations, Cry4Aa toxin, trimeric structure, pore
forming proteins, mosquito-larvicidal proteins

REFERENCES
1. Boonserm P, Mo M, Angsuthanasombat C, Lescar J, J. Bacteriol. 2006, 188, 3391-3401.
2. Ounjai P, Unger VM, Sigworth FJ, Angsuthanasombat C, Biochem. Biophys. Res.
Commun. 2007, 361, 890-895.
3. Ounjai P, Molecular biophysical study of the Bacillus thuringiensis Cry4Ba toxin pore
structure. (PhD Thesis), Mahidol University, Institute of Molecular Biology and Genetics;
2007.

INV-4
March 23-26, 2010
Molecular Modeling of Low Energy
Ion Bombardment/Plasma Treatment
in Polymer Design

P. Nimmanpipug
1, 2,C
, V. S. Lee
1,2
, J. Yana
1
,
W. Sangprasert
1
, and C. Ngaojampa
1

1
Computational Simulation and Modeling Laboratory (CSML), Department of Chemistry and Center
for Innovation in Chemistry, Faculty of Science, Chiang Mai University, Chiang Mai, 50200, Thailand
2
ThEP Center, CHE, 328 Si Ayutthaya Road, Bangkok 10400, Thailand
C
E-mail: piyaratn@gmail.com ; Fax: 6653-892277; Tel. 6686-0296430

ABSTRACT
Molecular simulations and model compound were applied to the ion bombardment
and plasma modifications for material design aspect. Combination of molecular
dynamics, Monte Carlo simulation, and density functional theory were utilized to
follow phenomena and propose consequence of the treatment. Various types of
materials: synthetic polymer, biopolymer, and biological hereditary material were
investigated in this study. Molecular dynamics simulations of ion bombardment on
perfluorinated ionomer membranes, a commercial fuel cell membrane, were used
for investigation of structural change related to improvement in the fuel cell
efficiency. Flame retardant and water resistance properties of silk after low
temperature plasma treatment were clarified related to theoretical perspective using
density functional calculations. Ion beam induced mutation experiments using low-
energy to bombard naked DNA were investigated via Monte Carlo and molecular
dynamics simulations in vacuum.

Keywords: Material design, Ion bombardment, Plasma treatment, Molecular dynamics
simulation, Monte Carlo simulation, Density functional theory

REFERENCES
1. Graves, D.B., Humbird, D. Applied Surface Science 2002, 192(1-4), 72-87.
2. Garrison, B.J., Delcorte, A., Krantzman, K.D. Accounts of Chemical Research 2000,
33(2), 69-77.

March 23-26, 2010
INV-5
Molecular Dynamics Simulations of
Antibody ScFv Fragments without
Disulfide Bond

V. S. Lee
1, C
, P. Nimmanpipug, K. Kodchakorn
1
, and
C. Tayapiwatana
2, 3, C

1
for Innovation in Chemistry, Faculty of Science,
2
Division of Clinical Immunology, Department of Medical Technology, Faculty of Associated Medical
Sciences,
3
Biomedical Technology Research Unit, National Center for Genetic Engineering and Biotechnology,
National Science and Technology Development Agency at the Faculty of Associated Medical Sciences,
Chiang Mai University, Chiang Mai, 50200, Thailand
C
E-mail: vannajan@gmail.com ; Fax: 6653-892277; Tel. 6689-1100216
asimi002@hotmail.com; Fax: 6653-946043; Tel. 6681-8845141

ABSTRACT
Intracellularly expressed antibody fragments (intrabodies) have been used as powerful
tools for clinical applications and for the functional analysis of proteins inside the cell.
Among several types of intrabodies, single chain fragment variables (scFv) composed
of only the variable regions (VH or VL) of antibodies are the smallest and thus the
easiest to design. However, normal antibody fragments do not form disulfide bonds in
the cytoplasm and usually are unable to achieve a stable native fold in the absence of
the disulfide bonds. Recently, the crystal structure of anti-RAS VH and VL fragments
without disulfide bonds after substitution of cysteine residues and the wild type VH and
VL with intact disulfide bonds showed no structural differences between the two types
of the VH and VL. There is great interest in engineering antibody fragments that will
fold and are stable under reducing conditions, and that could serve as framework to
which other specificities could be grafted. We have undertaken a molecular dynamics
simulations to investigate the two types of such systems with and without disulfide
bonds. The structural analysis in term of distance geometry analysis, hydrogen bond
analysis, residue interactions, and binding affinity between the domain, were observed
in order to explain the stability of the antibody scFv fragments without disulfide bonds.

Keywords: Disulfide bond, Antibody engineering, ScFv fragment, Molecular dynamics
simulation, Molecular modeling.

REFERENCES
1. Tanaka, T. and Rabbitts, T. H., J. Mol. Biol, 2008, 376, 749757.
2. Biocca, S. and Cattaneo, A., Trends Cell Biol, 1995, 5, 248-252.
3. Biocca, S., Ruberti, F., Tafani, M., Pierandrei-Amaldi, P., and Cattaneo, A.,
Biotechnology, 1995, 13, 1110-1115.
INV-6
March 23-26, 2010
Effects of Residues Changes on Human
Receptor Binding Affinity of H1N1
Hemagglutinins Insights from Molecular
Dynamics Simulation

N. Nunthaboot
1,C
, T. Rungrotmongkol
2,3
, M. Malaisree
2
,
N. Kaiyawet
2
, P. Decha
4
, P. Sompornpisut
2
and
S. Hannongbua
2,C

1
Department of Chemistry, Faculty of Science, Mahasarakham University, Mahasarakham, Thailand
2
Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
3
Center of Innovative Nanotechnology, Chulalongkorn University, Bangkok, Thailand
4
Department of Chemistry, Faculty of Science, Thaksin University, Phattalung, Thailand
C
E-mail: nadtanet@gmail.com, supot.h@chula.ac.th; Fax: 02-2187603; Tel. 02-2187603

ABSTRACT
The recent outbreak of the novel 2009 H1N1 influenza in humans has focused global
attention on this virus which could potentially have introduced a more dangerous
pandemic of influenza flu. In the initial step of the viral attachment, hemagglutinin
(HA), a viral glycoprotein surface, is responsible for the binding to the human SIA 2,6
linked sialopentasaccharide host cell receptor (hHAR). Dynamic and structural
properties, based on molecular dynamics simulations of the three different HAs of
Spanish 1918 (H1-1918), swine 1930 (H1-1930) and the novel 2009 (H1-2009) H1N1
bound to the hHAR, were compared. In all three HA hHAR complexes, major
interactions with the receptor binding were gained from HA residue Y95 and the
conserved HA residues of the 130-loop, 190-helix and 220-loop. However, substitution
of the charged HA residues K145 and E227 into the 2009 HA binding pocket was
found to increase the HA - hHAR binding efficiency in comparison to the two
previously recognized H1N1 strains. Changing of the non-charged HA G225 residue to
a negatively charged D225 provides a larger number of hydrogen bonding interactions.
The increase in hydrophilicity of the receptor binding region is apparently an
evolutionary trend of the current pandemic flu from the 1918 Spanish and 1930 swine
flues. Detailed analysis could help the understanding of how different HA effectively
attaches and binds with the hHAR.

Keywords: H1N1, Hemagglutinin, sialopentasaccharide, Molecular Dynamics

REFERENCES
1. Soundararajan, V., Tharakaraman, K., Raman, R., Raguram, S., Shriver, Z., Sasisekharan,
V., and Sasisekharan, R. Nat. Biotechnol., 2009, 27, 510-513.
2. Stevens, J., Blixt, O., Tumpey, T. M., Taubenberger, J. K., Paulson, J. C., and Wilson, I.
A. Science, 2006, 312, 404-410.
3. Gamblin, S. J., Haire, L. F., Russell, R. J., Stevens, D. J., Xiao, B., Ha, Y., Vasisht, N.,
Steinhauer, D. A., Daniels, R. S., Elliot, A., Wiley, D. C., and Skehel, J. J. Science, 2004,
303, 1838-1842

INV-7
March 23-26, 2010
Computer-based Methods in
Drug Design How useful are they?

Wolfgang Sippl

Department of Pharmaceutical Chemistry,Martin-Luther-University of
Halle-Wittenberg,06120 Halle/Saale, Germany
Email: sippl@pharmazie.uni-halle.de

ABSTRACT
Drug discovery is a more complex problem than it was in the past, in part due to the
fact that the etiology of the diseases we seek to control have grown in complexity. In
the past most drugs have been discovered either by identifying the active ingredient
from natural sources or by serendipitous discovery. A new approach has been to
understand how a disease is controlled at the molecular and physiological level and to
target specific entities based on this knowledge.

The amount of data generated in a typical drug discovery study can easily overwhelm
the scientists responsible for guiding the study, which emphasizes the importance of
effective techniques for the visualization and analysis of these large data sets.
Nowadays, computer-based approaches assist the design and synthesis of novel
compounds as well as the early prediction of physico-chemical properties of drug
candidates. Among them, quantitative structure-activity relationships, pharmacophore
modelling, docking and virtual screening are routinely applied to identify and optimize
lead structures. An overview of currently applied methods as well as application
examples from own research projects will be given [1-4].

REFERENCES
1. W. Sippl et al.Virtual Screening and Biological Characterization of Histone
2. Methyltransferase Inhibitors PRMT1. ChemMedChem. 4, 69-77, 2009.
3. W. Sippl et al. Thiobarbiturates as Sirtuin Inhibitors: Structure-based Virtual Screening,
4. Free Energy Calculations and Biological Testing. ChemMedChem 3, 1965-78, 2008.
5. W. Sippl et al. Generation of a homology model of the human histamine H3 receptor for
ligand docking and pharmacophor based screening. J. Comput. Aided-Mol. Design. 21,
437451, 2007.
6. M. Jung et al. Target-based approach to inhibitors of histone arginine methyltransferases.
J. Med. Chem. 50, 2319-2325, 2007.

March 23-26, 2010
INV-8
Concerns, Recent Outbreak and Molecular
Insight into H5N1 and Pandemic H1N1-2009
Influenza A Viruses

T. Rungrotmongkol
1,2
, M. Malaisree
1
, P. Intharathep
1
, P. Decha
1

N. Nunthaboot
3
, C. Laohpongspaisan
1
, O. Aruksakunwong
1
,
T.Udommaneethanakit
1
, S. Sompornpisut
1
and
S. Hannongbua
1
,C

1
Computational Chemistry Unit Cell, Chulalongkorn University, Bangkok, 10330, Thailand
2
Center of Innovative Nanotechnology, Chulalongkorn University, Bangkok, 10330, Thailand
3
Department of Chemistry, Faculty of Science, Mahasarakham University, Mahasarakham, Thailand
C
E-mail: supot.h@chula.ac.th; Tel. 02-2187603

ABSTRACT
This study aims at gaining insight into molecular details at Neuraminidase (NA),
Hemagglutinin (HA) and M2 protein channel of viral influenza A H5N1 and H1N1-
2009. In NA, interest is focused on drugs inhibitory activity against the wild-type and
mutated N1 strains. The H1N1-2009 virus was predicted to be susceptible to
oseltamivir, with all important interactions being well conserved. Loss of drug-target
interaction energies especially in terms of electrostatic contributions and hydrogen-
bonds were established in the probable E119V and R292K. For both viruses, the known
H274Y mutation conferred the high oseltamivir-resistance with decreased
hydrophobicity, pocket size and vdW interactions at the bulky group. Instead, N294S
was found to demonstrate medium drug-resistant level. In addition, combinatorial
chemistry was used to find potent NA inhibitors based on the oseltamivir and
pyrrolidine scaffolds. In HA target, the low and high pathogenic forms (HPH5 and
LPH5) were carried out using MD simulations, aimed at understanding why HPH5 was
experimentally observed to be 5-fold better cleaved by furin. The HPH5s cleavage
loop was found to fit well and bind strongly into the catalytic site of human furin,
serving as a conformation suitable for acylation process. Then, the HPH5-furin complex
was used as the starting structure for mechanistic investigation by QM/MM method.
The energy profile shows a concerted reaction of the first step of acylation, known as
the proton transfer and nucleophilic attack with a formation of tetrahedral intermediate.
Investigation was also extended to the M2 proton channel with/without adamantane
bound in many protonation states of selective filter residue His37, corresponding to
channel conformations. Two mechanisms of drug inhibiting the M2 functions are: (i)
drug facilitating the His37s imidazole to lie in close conformation and (ii) acting as
blocker at extracellular site. Loss of drug-M2 interactions was found to be a primary
source of resistance in the single mutants of H5N1, and H1N1-2009 containing the
S31N mutation.

March 23-26, 2010
INV-8
REFERENCES
1. Rungrotmongkol, T.; Malaisree, M.; Nunthaboot, N.; Sompornpisut, P.; and Hannongbua,
S. Amino Acids, 2010, DOI 10.1007/s00726-009-0452-3.
2. Rungrotmongkol, T.; Intharathep, P.; Malaisree, M.; Nunthaboot, N.; Kaiyawet, N.;
Sompornpisut, P.; Payungporn, S.; Poovorawan, Y.; and Hannongbua, S. Biochem.
Biophys Res. Commun., 2009, 385(3), 390-394.
3. Rungrotmongkol, T.; Frecer, V.; De-Eknamkul, W.; Hannongbua, S., and Miertus, S.
Antivir. Res., 2009, 82(1), 51-8.
4. Rungrotmongkol, T., Decha, P., Sompornpisut, P., Malaisree, M., Intharathep, P.,
Nunthaboot, N., Udommaneethanakit, T., Aruksakunwong, O., and Hannongbua, S.,
PROTEINs, 2009, 76, 62-71.

Computational
Biology

A00007
March 23-26, 2010
1
Developing Predictive Models for Dengue Haemorrhagic
Fever Incidence Rate in Chiang Rai, Thailand

S. Wongkoon, M. Jaroensutasinee, and K. Jaroensutasinee
Center of Excellence for Ecoinformatics and Computational Science Graduate Program, School of
Science, Walailak University, 222 Thaiburi, Thasala, Nakhon Si Thammarat 80161, Thailand
E-mail: swongkoon@gmail.com, jmullica@gmail.com, krisanadej@gmail.com
Fax: 66 0 7567 2004; Tel. 66 0 7567 2005-6

ABSTRACT
This study attempted to develop an epidemic forecasting model using climatic data to
predict Dengue Haemorrhagic Fever (DHF) incidence rate in Chiang Rai, Thailand. We
obtained monthly DHF incidence and climatic data from 1991 to 2008 from the Bureau
of Epidemiology, Department of Disease Control, Ministry of Public Health and the
Northern Meteorological Center. We used a cross-correlation to assess the degrees of
correlation between climatic data and DHF incidence rate over a range of time lags
from 0 to 7 months. From 12 climatic factors, we selected both 3 and 4 factors to
generate regression models. Time-series regression and the seasonal auto-regressive
integrated moving average (SARIMA) models were used to examine associations of
DHF incidence rate with climatic factors after adjustment for seasonality. The results
showed that when we randomly selected 3 out of 12 climatic factors, there were 24
significant regression models out of 220 models. The most suitable model was
comprised of rainfall at lag of 0 month, evaporation at lag of 4 months and fog at lag of
6 months. When we randomly selected 4 out of 12 climatic factors, there were 8
significant regression models out of 495 models. The most suitable model was
comprised of rainfall at lag of 0 month, evaporation at lag of 4 months and mean
relative humidity at lag of 6 months and negatively associated with minimum relative
humidity at lag of 6 months. Both models showed that rainfall at lag of 0 month and
evaporation at lag of 4 months were main predictive factors for DHF incidence rate in
Chiang Rai.

Keywords: Dengue, Climatic, Time series, Forecasting.

1. INTRODUCTION
Dengue infection and its potentially fatal forms, dengue haemorrhagic fever (DHF) and
dengue shock syndrome (DSS), have increased dramatically in recent decades, particularly in
rapidly expanding urban and semi-urban areas in middle and low income countries where
water storage and waste disposal services are limited [1]. Dengue fever is regarded as one of
the worlds most widespread vector-borne diseases [2]. In Thailand, DHF occurred first only
in Bangkok, but soon spread to the other areas [3-4]. Since 1991, the number of reported DHF
cases has been increasing although the mortality rate has decreased [5]. In June 2007, DHF
incidence in Chiang Rai was the highest in Thailand reported 464 cases. In 2008, DHF
incidence was 1,069 cases or 87.19 cases per 100,000 populations, with a case fatality rate of
17.4 and a mortality rate of 0.49 [5]. This study examines the potential impact of climate
variability on DHF incidence and explores the possibility of developing an epidemic
forecasting system for DHF incidence using the multivariate SARIMA technique in Chiang
Rai, Thailand.

2. THEORY AND RELATED WORKS
Dengue is an important mosquito-borne disease, transmitted mainly by Aedes aegypti. This
mosquito is well adapted to the urban environment and successfully breeds in containers such
as discarded cans, bottles, plastic containers and tires [6-7]. Aedes mosquitoes thrive in
warmer environments but not in dry environments. Thus, the effect of global warming on
dengue depends on precipitation, temperature, humidity, wind, duration of daylight, storm
A00007
March 23-26, 2010
2
severity, frequency of flooding or droughts, and rising sea levels [8-11]. Highest abundances
in breeding of Aedes mosquitoes in Buenos Aires, Argentina occurred during mean
temperatures above 20 C and accumulated rainfalls above 150 mm [12].
Time-series methodology has been used in econometrics. Recently, it has been
increasingly used in epidemiologic research [13-15]. In environmental health research, there
is often an obvious time lag between response and explanatory variables [16]. Some studies
approach this by examining models with simultaneous multiple lags of the explanatory
variables [17].

3. COMPUTATIONAL DETAILS
We obtained the data of DHF cases in Chiang Rai for the period of 1991-2008 from the
Bureau of Epidemiology, Department of Disease Control, Ministry of Public Health. Climatic
data obtained from the Northern Meteorological Center were comprising of mean/max/min
pressure, mean/max/min temperature, mean/max/min relative humidity, mean dew point,
evaporation, cloudiness, visibility, mean wind speed, monthly rainfall, number of rainy days,
daily max rainfall and number of days with fog/haze/thunderstorm.
The relationship between monthly climatic data and monthly DHF incidence rate was
examined by using Spearmans correlation. We used a cross-correlation to assess the degrees
of correlation between climatic data and DHF incidence rate over a range of time lags from 0
to 7 months. We fitted seasonal autoregressive integrated moving average (SARIMA) models
with the time series of DHF incidence rate. From 12 climatic factors, we selected 3 and 4
factors to generate regression models. Seven parameters were selected when fitting the
SARIMA model: the order of auto-regressive (p) and seasonal auto-regressive (P), the order
of integration (d) and seasonal integration (D), and the order of moving average (q) and
seasonal moving average (Q). The process is called SARIMA (p, d, q) (P,D,Q)
s
(s is the
length of seasonal period).
The original time series of monthly DHF incidence rate in Chiang Rai at time t (X
t
) was X
1
,
X
2
, , X
t-1
. A SARIMA process of period (s), with regular and seasonal AR orders p and P,
regular and seasonal MA orders q and Q, and regular and seasonal differences d and D is
defined by
(1B)
d
(1B
s
)
D
(B) (B
s
) X
t
= (B) (B
s
)Z
t

Where:
B: Backward shift operator, (1B
s
)X
t
= X
t
X
t-s

Z
t
: Random error at time t
(X
s
) = 1
1
X
s

2
X
2s

P
X
Ps

(X
s
) = 1 +
1
X
s
+
2
X
2s
+ +
Q
X
Qs

(B
s
)X
t
= (B
s
)Z
t

The smallest value of Akaikes information criterion (AIC) was set as the standard to
identify the best-fit model [18]. The SARIMA model constructed with the DHF incidence
data from 1991-2007 was used to predict the DHF incidence in Chiang Rai during January-
December 2008. All statistical analyses were conducted using Mathematica Software with
Time Series package.

4. RESULTS
DHF incidence in Chiang Rai, 1991-2008
The annual DHF incidence in Chiang Rai ranged from 3.72 cases per 100,000 population
in 2000 to 265.63 cases per 100,000 population in 1997 (Figure 1). There was a remarkable
reduction in the DHF incidence after 1998 (Figure 1). There was monthly variation in the
DHF incidence in Chiang Rai with August as peak times, especially during the outbreak
(Figure 1).
A00007
March 23-26, 2010
3

Figure 1. The actual DHF incidence (solid line) and predicted DHF incidence
(dashed line) from 1991-2008 by SARIMA model in Chiang Rai province,
Thailand.

Correlation between Climatic variables and DHF incidence rate
Monthly mean/max pressure, mean/max/min temperature, min relative humidity, dew
point, evaporation, cloudiness, visibility, mean/max wind speed, monthly rainfall, the number
of rainy days and daily max rainfall were positively associated with monthly DHF incidences
rate notified in Chiang Rai over the study period with zero-seven months lagged effect (Table
1). Monthly min pressure, mean/max relative humidity, fog, haze and thunderstorm were
negatively associated with monthly DHF incidences rate with zero-seven months lagged
effect (Table 1).

Table 1. Correlation and cross-correlation function (CCF) between climatic
variables and monthly DHF incidence rate in Chiang Rai, 1991-2008 at the
lags with maximum coefficients. *P<0.05
Climatic variable
Spearman Correlation CCF
Lag months Lag months
Mean temperature (C) 0.6732* 1 3-4
Max temperature (C) 0.6769* 3 4-6
Min temperature (C) 0.7792* 1 1
Mean relative humidity (%) -0.7357* 5 4-6
Max relative humidity (%) -0.6774* 4 -
Min relative humidity (%) 0.7471* 0 4-7
Evaporation (mm) 0.6928* 2 2-6
Mean wind speed (Knots) 0.4320* 2 7
Maximum wind speed (Knots) 0.5696* 2 7
Monthly rainfall (mm) 0.6224* 0 0-1
Daily max rainfall (mm) 0.4900* 0 0-1, 7
Cloudiness (0-10) 0.7505* 0 6
Fog (days) -0.6547* 2 6
Dew point (C) 0.7947* 0 -
Visibility (km) 0.6102* 0 -
Rainy days (days) 0.6863* 0 -
Mean pressure (hPa) 0.7965* 7 -
Max pressure (hPa) 0.8067* 7 -
Min pressure (hPa) -0.7045* 2 -
Haze (days) -0.6147* 0 -
Thunderstorm (days) -0.7504* 7 -
Cross-correlations after adjusted for seasonality showed that DHF incidence rate was
associated with mean temperature at lag of 3-4 months, max temperature and mean relative
humidity at lag of 4-6 months, min temperature at lag of one month, min relative humidity at
A00007
March 23-26, 2010
4
lag of 4-7 months, evaporation at lag of 2-6 months, mean/max wind speed at lag of 7 months,
monthly rainfall at lag of 0-1 months, daily max rainfall at lag of 0-1, 7 months, cloudiness
and fog at lag of six months (Table 1). DHF incidence rate was not associated with
mean/max/min pressure, max relative humidity, dew point, visibility, rainy days, haze and
thunderstorm (Table 1).
Regression model and climatic factors
Three climatic factors
When we randomly selected three out of 12 climatic factors, there were 24 significant
regression models out of 220 models. The most suitable model (Model 1) was comprised of
evaporation at lag of four months, monthly rainfall at lag of zero months and fog at lag of six
months (Table 2).
Four climatic factors
When we randomly selected four out of 12 climatic factors, there were eight significant
regression models out of 495 models. The most suitable model (Model 2) was comprised of
mean relative humidity at lag of six months, evaporation at lag of four months and monthly
rainfall at lag of zero months and negatively associated with minimum relative humidity at lag
of six months (Table 2).
Forecasting DHF incidence in Chiang Rai
The residuals in the SARIMA models fluctuated randomly around zero with no apparent
trend in variation according to the graphic analysis of residuals. The residuals were mutually
independent and had no correlation with each other implying a good SARIMA fitted model.
Actual incidence rate of DHF and predicted DHF incidence by SARIMA models matched
very well (Figure 1). When cross-comparison with the actual historical records, these
identified peaks were exactly the years and months when the outbreak was reported.

Table 2. Regression coefficients of climatic data on DHF incidence rate in
Chiang Rai, Thailand. * P<0.05, **P<0.01, *** P<0.001
Variables S.E. t-Statistic AIC
Model 1 1603.93
Constant -15.649 4.086 -3.830***
Evaporation at lag of 4 months 0.131 0.0382 3.426***
Monthly rainfall at lag of 0 months 0.0290 0.007 4.102***
Fog at lag of 6 months 0.759 0.194 3.920**

Model 2 1601.63
Constant -62.806 22.316 -2.814***
Mean relative humidity at lag of 6
months
1.346 0.377 3.566***
Min relative humidity at lag of 6 months -0.942 0.217 -4.346 ***
Evaporation at lag of 4 months 0.107 0.048 2.247 *
Monthly rainfall at lag of 0 months 0.018 0.008 2.276 *
AIC: Akaikes information criterion, : coefficients, S.E.: standard error.

5. DISCUSSION
Our study in Chiang Rai found positive correlations between monthly DHF incidences
and mean/max/min temperatures with one and three-months lag effect. For the correlation
coefficient for the association between monthly mean/max/min temperature and monthly
DHF incidences, the correlation coefficient of the min temperature and monthly DHF
incidences was highest. This indicates that minimum temperature plays a more important role
in the transmission of the disease than maximum temperature does. Previous studies [19-21]
show that a rise in temperature, especially minimum temperature, would enhance the survival
chances of dengue virus and Aedes larvae and adults during winter and therefore accelerate
the transmission dynamics of DHF and spread it into populations that are currently dengue
free and immunologically nave. In addition, changes in minimum temperature in high
altitude region influence monthly malaria incidence [22]. However, in low altitude zones,
A00007
March 23-26, 2010
5
rainfall and mean temperature are the most significant climatic factors influencing malaria
incidence [22].
This study found positive correlations between monthly DHF incidence and rainy days,
daily maximum rainfall and monthly rainfall with zero-month lag effect and rainy days gave
the highest correlation coefficient. This indicates that rainy days play an essential role in the
transmission of the disease than daily maximum rainfall and monthly rainfall. Rain plays an
important role in dengue epidemiology because water not only provides the medium for the
aquatic stages of the mosquito life cycle but also increase the relative humidity and the
longevity of the adult mosquito [23-25]. Rain may prove beneficial for mosquito breeding if
moderate but it may destroy breeding sites and flush out the mosquito larvae when excessive
[23-25].
Our study is the first to report the positive association of DHF incidence with dew point,
evaporation, cloudiness, visibility, and wind speed and the negative association of DHF
incidence with fog, haze and thunderstorm. This is not surprising because dew point,
evaporation, cloudiness, and visibility tend to positively associated with rainfall and relative
humidity which enhancing virus survival rate and mosquito breeding sites. Fog, haze and
thunderstorm tend to occur with heavy rainfall which in turn may flash out mosquito larvae.
From the regression model, we found that the monthly rainfall at a lag of 0 month and
evaporation at a lag of 4 months showed strong influences on the number of DHF incidence in
both Model 1 and 2. This finding corresponding with previous studies [23-25] that rainfall is
one of the most important elements for the breeding and development of mosquitoes.
Our study shows that the SARIMA model gives a good predicted DHF incidence in 2008.
This indicates that we could apply this model to other provinces in Thailand as a predictive
DHF outbreak warning. SARIMA modeling is useful for interpreting and applying
surveillance data in disease control and prevention [26, 27]. The SARIMA model has been a
useful tool for study the relationship between rainfall, mosquito density and the transmission
of Ross River virus in Australia [14]. Moreover, the SARIMA model is the suitable model for
this study due to the SARIMA modeling approach is the necessity of large amount of data
(i.e., a minimum of 50 observations) to build a reasonable SARIMA model [28]. Meanwhile,
the previous studies [29, 30] used the ARIMA model to analyze the association between
weather variation and incidence of Ross River virus in Australia and Dengue Fever in Taiwan
[15]. However, vector density record, a conventional approach often applied as an outbreak
predictor, did not appear to be a good predictor for diseases occurrence [15].

6. CONCLUSION
This study found positive correlations between monthly DHF incidences rate and
mean/max/min temperatures with one and three-months lag effect and rainy days, daily
maximum rainfall and monthly rainfall with zero and two-months lag effect. The correlation
coefficient for the association between monthly DHF incidence rate and min temperature with
one-month lag effect was the greatest. Monthly rainfall at lag of 0 month and evaporation at
lag of 4 months were main predictive factors for DHF incidence rate in Chiang Rai. The
SARIMA model is the suitable model for forecasting DHF incidence rate in Chiang Rai.

REFERENCES
1. Khun, S. and Manderson L. H., Acta. Trop., 2007, 101, 139-46.
2. Gubler, D. J., Comp. Immunol. Microbiol. Infect. Dis., 2004, 27(5), 319-30.
3. Halstead, S. B., Udomsakdi, S., Scanlon, J. E., and Rohitayodhin, S., Am. J. Trop. Med.
Hyg., 1969, 18, 1022-33.
4. Gould, D. J., Mount, G. A., Scanlon, J. E., Sullivan, M. F., and Winter, P. E., Am. J. Trop.
Med. Hyg., 1971, 20, 705-14.
5. Department of Disease Control, Ministry of Public Health, 1991-2008.
6. Wongkoon, S., Jaroensutasinee, M., and Jaroensutasinee, K., Dengue Bull., 2005, 29, 169-
75.
A00007
March 23-26, 2010
6
7. Wongkoon, S., Jaroensutasinee, M., and Jaroensutasinee, K., Dengue Bull., 2007, 31, 141-
52.
8. Alto, B. W., and Juliano, S. A., J. Med. Entomol., 2001, 38, 646-56.
9. Gubler, D. J., Reiter, P., Ebi, K. L., Yap, W., Nasci, R., and Patz, J. A., Environ. Health.
Perspect., 2001, 109(Suppl 2), 223-33.
10. Gage, K. L., Burkot, T. R., Eisen, R. J., and Hayes, E. B., Am. J. Prev. Med., 2008, 35(5),
436-50.
11. Reiter, P., Environ. Health. Perpect., 2001, 109(Suppl 1), 141-61.
12. Vezzani, D., Velazquez, S. M., and Schweigmann, N., Mem. Inst. Oswaldo. Cruz., 2004,
99, 351-56.
13. Helfenstein, U., Stat. Med., 1986, 5, 37-47.
14. Hu, W., Tong, S., Mengersen, K., and Oldenburg, B., Ecol. Model., 2006, 196, 505-14.
15. Wu, P., Guo, H., Lung, S., Lin, C., and Su, H., Acta. Trop., 2007, 103, 50-7.
16. Schwartz, J., Spix, C., Touloumi, G., Bacharova, L., Barumamdzadeh, T., Tertre, A.,
Pickarksi, T., Leon, A., Ponka, A., Rossi, G., Saez, M., and Schouten, J., J. Epidemiol.
Com. Health., 1996, 50(S), S3-S11.
17. Schwartz, J., Epidemiol., 2000, 11, 320-26.
18. Brockwell, P. J., and Davis, R. A., Springer-Verlang, 1998.
19. Sellers, R. F., J. Hyg., 1980, 85, 65102.
20. Reiter, P., The Arboviruses: Epidemiol. & Ecol., 1988, 1, 245-55.
21. Leake, C. J., Oxford: Oxford Uni. Press, 1998, 40113.
22. Loevinsohn, M. E., Lancet, 1994, 343(8899), 71418.
23. McMichael, A. J., and Martens, P., Ecosyst. Health., 1995, 1(1), 23-33.
24. Hennessy, K. J., and Whetton, P., Aust. Med. Assoc. & Greenpeace Inter., 1997.
25. Kelly-Hope, L. A., Purdie, D. M., and Kay, B. H., J. Med. Entomol., 2004, 41(2), 133-50.
26. Linthicum, K., Anyamba, A., Tucker, C., Kelley, P., Myers, M. F., and Peters, C.,
Science, 1999, 285, 347-48.
27. Hu, W., Nicholls, N., Lindsay, M., Dale, P., McMichael, A., Mackenzie, J., and Tong, S.,
Aust. Am. J. Trop. Med. Hyg., 2004, 71, 129-37.
28. Wei, W., Addison-Wesley Publishing Company Inc., Now York, 1990.
29. Tong, S., and Hu,W., Environ. Health. Perspect., 2001, 109(12), 127173.
30. Tong, S., Wenbiao, H., and McMichael, A. J., Trop. Med. Int. Health., 2004, 9(2), 298
304.

ACKNOWLEDGMENTS
This work was supported in part by the Thailand Research Fund through the Royal Golden
Jubilee Ph.D. Program (Grant No. PHD/0201/2548), Walailak University Fund 06/2552 and
Center of Excellence for Ecoinformatics, NECTEC/Walailak University. We also thank the
Bureau of Epidemiology, Department of Disease Control, the Ministry of Public Health for
DHF incidence data collection and Northern Meteorological Center for climatic data.
A0008
March 23-26, 2010
7
Cluster Analysis of Temperature-Relative Humidity Data

W. Pheera, K. Jaroensutasinee and M. Jaroensutasinee

Science, Walailak University 222, Thaiburi, Thasala, Nakhon Si Thammarat, 80161, Thailand
E-mail: wittayapheera@gmail.com, krisanadej@gmail.com, jmullica@gmail.com;
Fax: 086-4795011; Tel. 080-5371717

ABSTRACT
This study aimed at examining the association between temperature and relative
humidity data along the elevational transect from KN-3 Klong Gun station to Mt. Nom
Peak, Mt. Nan National Park, Nakhon Si Thammarat. HOBO Pro V2 data loggers were
installed along the elevational transect at five different elevations: 500, 700, 900, 1100
and 1300 m. a.s.l. These HOBO data loggers collected temperature and relative
humidity every five minutes during 16-29 January 2009. Temperatures and relative
humidity data among five elevations were calculated for their association using the
linear regression. R
2
, y-intercepts and slopes of these linear regressions were further
analyzed with a cluster analysis. Both R
2
and y-intercepts from these linear regressions
can be separated by the cluster analyses into two groups: (1) the cloud forest boundary
(CFB) and (2) combination of the tropical rain forest (TRF) and cloud forest (CF). This
result indicates that the cluster analysis of R
2
and y-intercepts for the regression
equation can be used to distinguish the association between the same forest type and
the different forest types. On the other hand, slopes from these linear regressions can be
separated by the cluster analyses into three groups: (1) TRF-TRF (2) TRF-CFB and
CF-CF and (3) TRF-CF, CFB-CF and CFB-CFB. This indicates that the cluster
analysis of the slopes can clearly group TRF data.

Keywords: Cluster Analysis, Cloud Forest Boundary, Mt. Nan National Park

1. INTRODUCTION
Tropical montane cloud forests (TMCF) typically have high levels of endemism, low rates
of net primary production and play an essential role in the hydrologic cycles of tropical
mountains [1-5]. Deforestation of a TMCF takes centuries to recover due to its slow growth
rate. These ecosystems are complex, relatively rare, and extremely vulnerable to climate
changes and human impacts [5-7]. The main climatic characteristics of cloud forests include
frequent cloud presence, usually high relative humidity and low irradiance [5]. TMCFs
typically occur at elevations between 1,500 to 3,300 m.

Discrete ecotones in climatic factors have fascinated ecologists. Vegetation composition
change along environmental changes tends to associate with abrupt changes in temperature
and moisture. Tropical montane forests have a discrete thermal zone with little temperature
overlap [8] and the trade wind inversion [9]. The trade wind inversion traps moist air and
clouds on windward slopes below a roughly constant elevation [10-11]. Little has been done
on tropical cloud forest ecosystem and how its ecotone changes along an elevational gradient.
This study aims at developing a new method to identify the cloud forest boundary using
temperature and relative humidity data.
A0008
March 23-26, 2010
8

(a)

(b)

Figure 1. (a) Mt. Nan National Park boundary and the location of KN-3 Klong Gun station
and (b) study sites along KN-3 Klong Gun station to Mt. Nom Peak.

Mt. Nom Peak is located at 8
o
22 to 8
o
45 N latitude and 99
o
37 to 99
o
51 E longitude,
Mt. Nan National Park, Nakhon Si Thammarat, Thailand. Mt. Nom Peak is 1300 m a.s.l. on
the southern side near KN-3 Klong Gun station (Figure 1a, b). Mt. Nom Peak is a pristine
tropical mountain forest, a main watershed source and a home of various endangered species.
HOBO Pro V2 data loggers were installed along the elevational transect from KN-3 Klong
Gun station up to Mt. Nom Peak at five different elevations: 500, 700, 900, 1100 and 1300 m.
a.s.l. (Figure 1b). These HOBO data loggers collected temperature and relative humidity
every five min during 16-29 January 2009. Data were analyzed using Mathematica software
[12]. Temperature and relative humidity were plotted and analyzed for the association using
the linear regression at various elevations. R
2
, y-intercepts and slopes of these linear
regressions were further analyzed with a cluster analysis.

4. RESULTS AND DISCUSSION
Temperature and relative humidity at five elevations were paired and plotted for linear
regression analysis. Both of temperature and relative humidity data, the pairing between
elevations were categorized into three groups: (1) tropical rain forest (TRF) located at 500,
700 & 900 m. a.s.l., (2) cloud forest boundary (CFB) located at 900 to 1100 m. a.s.l. and (3)
cloud forest (CF) located at 1100 to 1300 m. a.s.l. (Table 1). The cloud forest boundary group
had the lowest R
2
and y-intercept but highest slope values among all three groups in both
temperature and relative humidity data. This suggests that cloud forest boundary might be
started at 1100 m a.s.l. Mt. Nom seems to exhibit two forest types: TRF at lower elevations
and CF at upper elevations starting at 1100 m a.s.l.

Table 1. R
2
, y-intercept and slope values of temperature and relative humidity from linear
regression analysis.

Elevation Pairing
Group
Temperature Relative Humidity
R
2
y-intercept slope R
2
y-intercept slope
Tropical rain forest 0.86 - 0.94 0.90 - 3.52 0.71 - 0.88 0.64 - 0.82 16.37 - 33.04 0.66 - 0.82
Cloud forest boundary 0.67 - 0.76 -34.55 - -13.71 1.49 - 2.18 0.36 - 0.60 -22.08 - -2.41 1.04 - 1.23
Cloud forest 0.88 -5.85 1.30 0.82 4.37 0.97

TMCFs typically occur at elevations between 1,500 to 3,300 m occupying an altitudinal
belt of approximately 800 to 1,000 m. The lowermost occurrence of low-statured cloud forest
(300600 m a.s.l.) is reported from specific locations like small islands, where the cloud base
may be very low and the coastal slopes are exposed to both high rainfall and persistent wind-
driven clouds [3]. Our results showed that Mt. Nom cloud forest occurred in a relatively low
A0008
March 23-26, 2010
9
elevation. This might be because Mt. Nom is located on the coastal slope exposed to high
rainfall and persistent wind-driven clouds from the Gulf of Thailand.

R
2
, y-intercept and slope values of temperature and relative humidity data were further
analyzed with a cluster analysis. R
2
and y-intercept cluster analysis results were separated into
two groups: (1) CFB data and (2) TRF data and CF data (Figure 2 and 3). The results from the
cluster analysis clearly shows that CFB had unique temperature and relative humidity that
differed from TRF and CF.

Figure 2. Cluster analysis results of temperature and relative humidity R
2
values

Figure 3. Cluster analysis results of temperature and relative humidity y-intercept values

On the other hand, cluster analysis of slope values were grouped the data into three groups:
(1) TRF-TRF, (2) TRF-CFB and CF-CF, and (3) TRF-CF, CFB-CF and CFB-CFB (Figure 4).
The cluster analysis results of the slope values revealed less different in temperature and
relative humidity in TRF (blue dots) when compared to other forest types (Figure 4).
Surprisingly, the pairing between 900-1100 m which can be considered as CFB-CFB was in
the third group (green dots) (Figure 4). This third group (green dots) had high varied slope
values a distant apart. However, 900-1100 m was just only 200 m apart and still grouped in
A0008
March 23-26, 2010
10
this third group. This indicates that the slope value at 900-1100 m was high in such a short
distance.

Figure 4. Cluster analysis results of temperature and relative humidity slope values. Three
cluster groups: (1) TRF-TRF (blue dots), (2) TRF-CFB and CF-CF (red dots), and (3) TRF-
CF, CFB-CF and CFB-CFB (green dots)

Our study nicely shows that the association between temperature and relative humidity can
be used to identify where the ecotone occurs. In this case, the ecotone occurred when the
forest changed from TRF to the CF within a short distance. We could use this method to
detect the trend of CFB shifting upward as the global warming occurs. If the CFB moves up
in elevations, the association between temperature and relative humidity would be shifting
upward.

Climatic variation is expected to influence the position of ecotones through its effects on
fitness and competitive interactions [e.g. 13-14]. In our study, the key climatic factors are air
temperature and relative humidity. Temperature and relative humidity create discontinuities in
mesoclimates throughout the tropics on mountains [1, 15-18].

Low temperatures and high relative humidity are also important influences on vegetation
patterns on tropical mountains [1, 5]. The elevation of this temperature threshold is suggested
by that fact that TMCFs are generally complex with abundant mosses, lichens and epiphytes
[19-25]. The growth of epiphytic bryophytes is strongly determined by moisture availability
[26]. Trees at TMCF are stunted, have small leafs, and exhibit xeromorphic feature [1, 5, 27-
28].

5. CONCLUSION
Tropical forests belong to the structurally most complex ecosystems due to their high
species diversity [29-30]. This high species diversity displays striking patterns of changes
along gradients of ecological factors [30]. In contrast to temperate forests, epiphytes and
lianas form additional strata of varying importance in tropical forests. Although the indicator
value of lianas, herbs, epiphytes and shrubs is at least as high as that of canopy trees [31-32],
most studies describing the altitudinal changes in the forest vegetation deal only with trees
[33-37]. In contrast, the present work extends such studies for the first time to climatic factors
(i.e. temperature and relative humidity) to obtain a comprehensive view of the altitudinal
changes and the different data characteristics of such highly diverse habitats. The cluster
A0008
March 23-26, 2010
11
analysis of R
2
and y-intercepts for the regression equation can be used to distinguish the
association between the same forest type and the different forest types. On the other hand,
slopes from these linear regressions can clearly group TRF data from the rest of the data.

REFERENCES
1. Stadtmller, T., Los Bosques Nublados en el Tropico Humedo: Una Revision
Bibiografica. Universidad de las Naciones Unidas, Tokyo, 1987.
2. Tanner, E. V. J., Kapos, V., Freskos, S., Healey, J. R., and Theobald, A. M., J. Trop.
Ecol., 1990, 6, 231-8.
3. Bruijnzeel, L. A., and Proctor, J., Tropical Montane Cloud Forest, Springer-Verlag, New
York, 1995, 38-78.
4. Grubb, P. J., Tropical Forests: Management and Ecology. Ecological Studies, 1995, 112,
308-30.
5. Foster, P., Earth-Science Rev., 2001, 55, 73-106.
6. Byer, M. D., and Weaver, P. L., Biotropica, 1977, 9 (1), 35-47.
7. Scatena, F. N., Tropical Montane Cloud Forests, Springer-Verlag, New York, 1995, 296-
308.
8. Janzen, D. H., Am. Nat., 1967, 101, 233249.
9. Martin, P. H., Sherman, R. E., and Fahey T. J., J. Biogeogr., 2007, 34, 17921806.
10. Riehl, H., Climate and weather in the tropics, Academic Press, New York, 1979.
11. Schubert, W. H., Ciesielski, P. E., Lu, C., and Johnson, R. H., Am. Meteor. Soc., 1995, 52,
294152.
12. Wolfram, S., The Mathematica Book, 5th Edition, Wolfram Media, Inc., 2003.
13. Stevens, G. C., and Fox, J. F., Annu. Rev. Ecol. App., 1991, 22, 17791.
14. Grau, H. G., and Veblen, T. T., J. Biogeogr., 2000, 27, 110721.
15. Kitayama, K., and Mueller-Dombois, D., Phytocoenologia, 1994, 24, 11133.
16. Fernandez-Palacios, J. M., and de Nicolas, J. P., J. Veg. Sci., 1995, 6, 18390.
17. Hamilton, L. S., Juvik, J. O., and Scatena, F. N., Tropical montane cloud forests.
Ecological Studies, vol. 110. Springer-Verlag, New York. 1995.
18. Davis, S. D., Heywood, V. H., Herrera-MacBryde, O., Villa-Lobos, J., and Hamilton, A.
C., Centres of plant diversity: a guide and strategy for their conservation: the Americas,
vol. 3. The World Wide Fund for Nature and the World Conservation Union, Cambridge,
1997.
19. Grubb, P. J., Lloyd, J. R., Pennington, T. D., and Whitmore, T. C., J. Ecol., 1963, 51, 567-
601.
20. Grubb, P. J., Annu. Rev. Ecol. Syst., 1977, 8, 83-107.
21. Frahm, J. P., Nova Hedwigia, 1990, 51, 121-32.
22. Veneklaas, E. J., and Van Ek, R., Hydrol. Process., 1990, 4, 311-26.
23. Ingram, S. W., and Nadkarmi, N. M., Biotropica, 1993, 25, 370-83.
24. Young, K. R., and Len, B., Mountain Research and Development, 2000, 20, 208-11.
25. Holder, C. D., Forest Ecol. Manag., 2004, 190, 373-84.
26. Proctor, M. C. F., Bryophyte ecology (ed. by A. J. E. Smith), Chapman & Hall, London,
1982, 33381.
27. Leigh, E. G., Annu. Rev. Ecol. Syst., 1975, 6, 67-86.
28. Grubb, P. J., Mineral Nutrients in Tropical Forest and Savanna Ecosystems. Blackwell
Scientific Publications, Oxford, 1989, 417-39.
29. Vazquez, G. J. A., and Givnish, T. J., J. Ecol., 1988, 86, 999 1020.
30. Givnish, T. J., J. Ecol., 1999, 87, 193 210.
A0008
March 23-26, 2010
12
31. Hall, J. B., and Swaine, M. D., J. Ecol., 1976, 64, 913 51.
32. Gentry, A. H., and C. H. Dodson. Biotropica, 1987, 19, 149 56.
33. Hamilton, A. C., Vegetation, 1975, 30, 99 106.
34. Clutton-Brock, T. H., and Gillett, J. B., Afr. J. Ecol., 1979, 17, 131-58.
35. Friis, I., and Lawesson, J. E., Opera Botany, 1993, 121, 125-27.
36. Lovett, J. C., J. Trop. Ecol., 1996, 12, 629 50.
37. Lovett, J. C., J. Trop. Ecol., 1998, 14, 719 22.

ACKNOWLEDGMENTS
This work was supported in part by PTT Public Company Limited, TRF/Biotec special
program for Biodiversity Research Training grant BRT T_351004, The Institute of Research
and Development Fund WU50602, Walailak University Fund 07/2552, and Center of
Excellence for Ecoinformatics, NECTEC and Walailak University. We thank Mt. Nan
National Park staff for their helpful assistance in the field.
A0009
March 23-26, 2010
13
Developing Business Intelligent Tools for NBIDS Coral
Database System

P. Noonsang
C
, M. Jaroensutasinee, and K. Jaroensutasinee
Center of Excellence for Ecoinformatics, and Computational Science Graduate Program,
School of Science, Walailak University, 222 Thaiburi, Thasala, Nakhon Si Thammarat, 80161,
Thailand
C
E-mail: npremrudee@gmail.com, jmullica@gmail.com, krisanadej@gmail.com;
Fax: 075-673420; Tel. 089-1966993

ABSTRACT
Business Intelligent tools for NBIDS Coral Database System was developed to
integrate data on the coral biodiversity and environmental factors. The NBIDS Coral
Database System was established because there is a paucity of collated information on
the coral studies and school researches in Thailand. The NBIDS Coral Database System
was established on MS-SQL server, webMathematica and Google Earth for data
visualization. Field surveys were undertaken at three sites: (1) Ngai/Hai and Ma
Islands, Trang, (2) Racha Island, Phuket, and (3) Tan Island, Suratthani. The
percentage of seven coral growth forms per photograph was averaged online. Coral
biodiversity indices were calculated real-time online through web data entry. Seven
coral growth forms were comprised of massive, branching, columnar, laminar,
foliacious, free living and encrusting. The NBIDS Coral database outputs were overlaid
on the Google Earth to display site-based data including coral photographs, coral
growth forms and sea surface temperature. The NBIDS Coral Database System
provides a useful tool to establish the conservation and sustainable use of the coral
resources. The highly visual and interactive system should prove valuable for education
and public awareness purposes.

Keywords: Business Intelligent, Intelligent Database System, Coral, GIS, Google Earth

1. INTRODUCTION
The advent of information technology (IT) and information systems, together with Internet
technology, has inspired many changes in todays business world [1, 2]. Business Intelligence
(BI) allows organizations to centralize data for easy and fast analysis. A Geographical
Information System (GIS) was developed to integrate data on the distribution of biodiversity
[3, 4], environmental factors governing distributions, human activities such as fishing and
Marine Protected Area (MPA) planning. By means of BI software, key personnel can analyze
all crucial information right at their own desktop or via the Internet without affecting the
original data sources. Because it is capable of fulfilling all needs in term of analyzing,
monitoring, sharing and reporting data. When Google Earth first appeared many people
marveled at the ability to zoom in on almost any part of the planet and see objects at little
more than 1 m resolution. However, the imagery is static and relatively out of date. But what
made Google Earth come alive was the ability to create so-called 'network links'displays of
data, geo-reference, map, and share geographic information [5] often captured in real-time
which could be overlaid on the basic Google Earth mapping [6].
A computer based BI system is designed to generate information in a user-friendly way.
This offers decision-makers with limited knowledge of computers the ability to specify their
own analysis. Good BI system offers tools to apply any imaginable delimitation to data before
displaying them in comprehensible forms, thereby helping identification of possible problem
areas. With a BI system, it is possible to analyze many factors, both internal and external,
which may affect the corporation. The only requirement is that there is relevant data to
A0009
March 23-26, 2010
14
analyze. The foundation of a BI system is, therefore, the ability to periodically extract data
from several sources (databases) into one analysis database usually referred to as the Data
Warehouse. The purpose of this study was to develop the Coral Database System for school
projects for three user groups: administrators, researchers and general users.

2.1 Database Design
There were three steps for the database design [7]: conceptual, logical and physical
database design. First, conceptual database design was composed of identifying entity types
and their relationship, associating attributes with entity or relationship types, determining
attribute domains, candidate, primary, and alternate key attributes, checking the model for
redundancy, and validating conceptual model against user transactions. Second, we developed
logical database design for the relational model and validated the logical data model. Third,
physical database design was created in relational databases such as MS SQL Server 2000.

2.2 Biodiversity Tool
Shannon-Wiener and Simpson's diversity indices (also known as species diversity index)
were diversity indices, used to measure of diversity. In ecology, it was often used to quantify
the biodiversity of a habitat. It took into account the number of species present, as well as the
relative abundance of each species. The advantage of Shannon-Wiener index was that it took
into account of the number of species and the evenness of the species. The Shannon-Wiener
index was increased either by having additional unique species, or by having greater species
evenness. The Simpson index represented the probability that two randomly selected
individuals in the habitat belong to the same species.
Shannon-Wiener Biodiversity Index:

k
i
i i
p p BI
1
ln
k = number of taxa
p
i
= the relative abundance of each species, calculated as the proportion of individuals
of a given species to the total number of individuals in the community
Simpson's Biodiversity index:
k
i
i
p
D
1
2

k = number of taxa
p
i
= the relative abundance of each species, calculated as the proportion of individuals
of a given species to the total number of individuals in the community

3. EXPERIMENTAL AND COMPUTATIONAL DETAILS
3.1 Obtaining Coral Growth Form
We used a random sampling technique to estimate coral growth forms. We took coral
photographs with digital cameras, Canon Power Shot G9, G10 and G11 with underwater
casings. We used the digital cameras, swam in straight lines and took 100 coral photographs.
Coral photographs were classified into seven coral growth forms: massive, encrusting,
branching, foliaceous, laminar, free-living, and columnar [11] (Figure 1 a-g). We randomly
selected 40 out of 100 coral photographs per study site for a random sampling technique to
estimate the percentage of coral growth form. We used the percentage of coral growth forms
to calculate Shannon-Wiener index and Simpson index.

A0009
March 23-26, 2010
15

3.2 Obtaining Sea Surface Temperature
We deployed one to two HOBO Pendant temperature and light data loggers model UA-
002-64 to measure water temperature and light intensity at study sites. We tied these data
loggers to the buoys. This allowed each data logger to receive an accurate and maximum light
intensity. We used a shuttle to upload the water temperature and light intensity data from data
loggers. The data were recorded every ten min. At present, we had 52,560 records per year at
each site. Data were saved in Mathematica (.mx) format because this .mx format can be
queried faster than other formats (i.e. .xls, .csv and MS SQL Server2000).

3.3 Developing Tools
Database: We used MS SQL Server 2000 for database server to record our data except for
sea surface temperature. We used Mathematica v.6 to convert sea surface temperature data in
.mx file. It was the fastest way to visualize sea surface temperature graphs instead of using
.xls, .csv, and MS SQL Server 2000.
Coding: We coded our website with Java Server Pages (JSP). JSP was rapidly developed
and easily maintained, information-rich and platform independent. JSP technology separated
the user interface from content generation, enabling designers to change the overall page
layout without altering the underlying dynamic content. We used the Apache Tomcat v.
5.5.27, a servlet container developed by the Apache Software Foundation (ASF). Tomcat
implemented the Java servlet and the JSP specifications from Sun Microsystems, and
provided a "pure Java" HTTP web server environment for Java code to run.
Google Earth: We overlaid our results and study sites on Google Earth version 5.0.
Google Earth version 5.0 offered maps and satellite images for complex or pinpointed
regional searches.
Architecture: The architecture of this system was composed of Apache Tomcat Database
Server, SQL server 2000 for the database, and the user connection via a web browser (Figure
2). There were three groups of users: administrators, researchers and general users.
Administrators managed web and database server. Researchers (ie. students, teachers and
scientists) were given user names and passwords to log in and do web data entry. General
users had accessed to some part of coral database and could visualize some data. However,
they cannot edit or input their data in the coral database. If general users would like to be able
to input their data on the Coral Database, they could send their requests to the web
administrators.

(a) Massive (b) Columnar (c) Encrusting (d) Branching

(e) Foliaceous (f) Laminar (g) Free-living

Figure 1. Seven coral growth forms. (a) Massive, (b) Columnar, (c) Encrusting, (d)
Branching, (e) Foliaceous, (f) Laminar and (g) Free-Living
A0009
March 23-26, 2010
16
Database Server : MS SQL Sever2000
Administrator General User User:Input data

Web Server : Apache Tomcat
Web browser
Client

Figure 2. Architecture of the NBIDS coral database system

NBIDS Coral Database System was available online at URL http://www.twibl.org/Coral.
Homepage was composed of data entry, coral gallery, visualization, collaboration and contact
us (Figure 3a). For data entry, researchers had to login in order to gain access to the website
before they could input their study site definition and data (Figure 3b).

(a)

(b)

Figure 3. (a) Coral Database System Homepage and (b) Data entry page

NBIDS Coral Database System was tested with the actual data collected from two sites:
Wichienmatu school, Trang and Racha Island, Phuket, Thailand (Figure 4a). Now NBIDS
Coral Database System was comprised of two research projects with 234 records and three
study sites. Users were able to input their data in an excel format, uploaded coral photographs,
calculated the percent coral growth form per photograph and biodiversity indices (Figure 4b).
Users were able to use GIS tools to visualize their study locations in three dimensions with
Google Earth (Figure 4c). Users were able to calculate Species richness, Simpson's diversity
and Shannon-Wiener indices at each study site real-time online (Figure 4c). Users were able
to view the sea surface temperature from our website and the data were every ten min for each
site (Figure 4d).
A0009
March 23-26, 2010
17

(a)

(b)

(c)

(d)

Figure 4. Data visualization. (a) study site, (b) coral form estimation, (c) biodiversity indices
and (d) sea surface temperature

A0009
March 23-26, 2010
18
5. CONCLUSION
NBIDS Coral Database System was available online at URL http://www.twibl.org/Coral.
This NBIDS Coral Database System was designed to assist students, teachers and scientists in
their researches. Students can do their data entry via web technology, visualize their results
real-time online, locate their study sites on Google Earth and plan their next field work
effectively. Scientists can use this NBIDS Coral Database System to monitor school research
projects and visualize student research results real-time. This will enable scientists to give
students good advice on their research questions anytime and anywhere where the Internet
available. For general users, they could visualize coral picture, coral forms, coral diversity
and sea surface temperature at each study site. The NBIDS Coral Database System provides a
useful tool to establish the conservation and sustainable use of the coral resources. The highly
visual and interactive system should prove valuable for education and public awareness
purposes.

REFERENCES
1. Xu, L., Systems Res. Behav. Sci., 2000, 17(2), 10516.
2. Xu, L., Wang, C., Luo, X., and Shi, Z., Systems Res. Behav. Sci., 2006, 23, 14756.
3. Boulos, M. N. K., and Burden, D., Int. J. Health Geogr., 2005, 4(22), 18.
4. Boulos, M. N. K., and Burden, D., Int. J. Health Geogr., 2007, 6(51), 116.
5. Chapman, B., and Turner, J. R., J. Nat. Hist., 2004, 38, 293757.
6. Elwood, S., Prog. Human Geogr., 2009, 33(2), 25663.
7. Boulos, M. N. K., Scotch, M., Cheung, K., and Burden, D., Int. J. Health Geogr., 2008,
7(38), 116.
8. Kamadjeu, R., Int. J. Health Geogr., 2009, 8(4), 112.
9. Chow, T. E., Transaction GIS, 2008, 12(2), 17991.
10. Curtis, A. J., Mills, J. W., and Leitner, M., Int. J. Health Geogr., 2006, 5(44), 112.
11. Knowlton, N., The future of coral reefs. Proc. Nat. Acad. Sci. USA, 2001, 10, 5419-25.

ACKNOWLEDGMENTS
This work was supported in part by the Office of the Higher Education Commission, Thailand
for support funding under the program Strategic Scholarships for Frontier Research Network
for the Ph.D. Program Thai Doctoral degree for this research 2/2552. Walailak University
Fund 10/2552 and Center of Excellence for Ecoinformatics, the Institute of Research and
Development, Walailak University and NECTEC. We thank Sirilak Chumkiew and Wittaya
Panduang for their invaluable assistance in web database and programing.
A00013
March 23-26, 2010
19
Mark-Recapture Model Testing for Indo-Pacific Humpback
Dolphin Population at Khanom Sea, Nakhon Si Thammarat

S. Jutapruet, K. Jaroensutasinee, and M. Jaroensutasinee
E-mail: ball153@gmail.com, krisanadej@gmail.com, jmullica@gmail.com;
Fax: 086-4795011; Tel. 081-9664224

ABSTRACT
There are many methods to estimate populations with various wildlife animals
depending on close or open population. This study aims at estimating Indo-Pacific
humpback dolphin population size at Khanom Sea, Nakhon Si Thammarat. Photo-
identification technique with mark-recapture technique is widely used to estimate
cetacean populations. We photographed Indo-Pacific humpback dolphin dorsal fins
with a DSLR digital camera with lenses 18-135 mm and 70-300 mm. The long tail
boat-based survey was conducted from Racha ferry pier, Donsak, Surat Thani to Koh
Kao beach, Khanom, Nakhon Si Thammarat during July 2008 June 2009 twice a
month. From 39 sightings, more than 3,000 dolphins dorsal fin pictures, natural
markings, dorsal outlines, and pigmentation were used to identify individual Indo-
Pacific dolphins. Encountered dolphins were scored as 0 (absent) and 1 (present). We
tested three close population models: (1) Peterson, (2) Schumacher-Eschmeyer, (3)
Schanabel and one open population model: Jolly-Seber Model. Estimated Indo-Pacific
humpback dolphin population size at Khanom Sea were 49 dolphins from the Peterson
model, 49 dolphins from the Schumacher-Eschmeyer model, 53 dolphins from the
Schanabel model and 49 dolphins from the Jolly-Seber model. The results showed that
all models estimated similar population size (i.e. 49-53 dolphins) regardless of close or
open population estimated model.

Keywords: Mark-Recapture, Population estimation, Indo-Pacific Humpback Dolphin,
Khanom Sea.

1. INTRODUCTION
Indo-Pacific humpback dolphins are threatened throughout its range by incidental catches,
primarily in gillnets, and in several areas by habitat degradation and capture for captive
display [1-3]. In Thailand, a major cause of Indo-Pacific humpback dolphin mortality is from
incidental entanglement of gill nets. This mortality rate may surpass the possible replacement
rate of Indo-Pacific humpback dolphin population in Thailand. Few studies of dolphins have
been done in Thailand [4-10]. A small population of Indo-Pacific humpback dolphins was
reported in Khanom coastline, Thailand with 2, 6 and 3 reported carcasses in 2006, 2007 and
2009 by the Department of Marine and Coastal Resource Research Area 3, Songkla, Thailand.
Serious conservation concerns about this population led to a survey to a better understanding
the status of this population. Distribution and abundance information is essential for
improving our understanding of the dolphin biology and assessing its conservation status,
guiding conservation actions and decisions.
Photo-identification technique has been used in field studies of cetaceans and proven to
be a useful tool in population estimations [11-14]. The shape of the trailing edge of the dorsal
fin in most dolphins is the most diagnostic feature [13, 15]. The notch pattern of dorsal fin
tends to vary between individual dolphins due to incidental events. Defran et al. [16]
developed the technique for analyzing and cataloging dorsal fin photographs for bottlenose
dolphins and used by many researchers since [13]. Most studies on dolphins in Thailand were
obtained from stranded and by-catch specimens, available skeletons and interviews [10]. This
A00013
March 23-26, 2010
20
study is the first to estimate an Indo-Pacific humpback dolphin population at Khanom
coastline, Thailand using three close population models: (1) Peterson, (2) Schumacher-
Eschmeyer, (3) Schanabel and one open population model: Jolly-Seber Model.

Capture-mark-recapture analyses for population estimation
Peterson Method
Peterson method is the simplest mark and recapture method because it is based on a single
episode of marking animals and a second single episode of recapturing individuals. The basic
procedure is to mark a number of individuals over a short time period, release them, and then
recapture individuals to check for marks. The second sample must be a random sample for
this method to be valid, i.e., marked and unmarked individuals must have the same chance of
being captured in the second sample. So, manipulating the equation is

(1)
N = The estimated size of population at time of marking
M = The number of individuals marked in the first sample
C = Total number of individuals captured in the second sample
R = The number of individuals in second sample that are marked
This equation (1) is intuitively clear but it is a biased estimator of population size, tending to
overestimate the actual population. Unbiased equation is

(2)
Variance of =

(3)
Standard error r
(4)
95% confidence limits of N = N (Standard error) (5)

Schnabel Method
Schnabel [18] extended the Petersen method to a series of samples in which there is a 2,
3, 4, , n
th
sample. Individuals caught at each sample were first examined for marks, and
then released. Only a single type of mark is used, because we need to distinguish only marked
and unmarked animals. If different marks or tags were used for different samples, then the
capture recapture history of any animal caught during the experiment is known, N
t
is
estimated by the ratio of the number of marked animals release into the population to the
estimated proportion of marks in the population. In fact, the Schnabel estimate of N
t
is simply
a weighted average of individual Peterson estimates namely

(6)
Variance of samples =
(7)
m = number of days (or sightings) in which dolphins were actually caught.
Variance of
(8)
Standard error
(9)
95% confidence limits of N = N (Standard error) (10)

Schumacher-Eschmeyer Method
The Schumacher-Eschmeyer method uses the same data and assumptions as the Schnabel
method. The differences from Schnabel method, in that it estimates the N
t
as the reciprocal of
the slope of the line describing the relationship between the R
t
/C
t
(the dependent variable) and
A00013
March 23-26, 2010
21
M
t
(the independent variable) that also passes through the origin. Thus, an estimate of 1/N
t
can
be obtained from the estimate of the slope of regression of R
t
/C
t
on M
t
. The CI of N
t
.

Jolly-Seber Method
We estimated humpback dolphin population size, the probability of survival and the
dilution rate from mark-recapture analysis using the Jolly-Seber open population model. The
Jolly-Seber method estimates initial population size, N
0
, from multiple mark-recapture
samplings on an open population. The proportion of marks in a recapture sample is an
estimate of the proportion of marks in the population. However, because there are multiple
marking and recapturing samples and we allow for an open population, the way we account
for the animals is more complicated.
m
t
= number of marked animals caught in sample t
u
t
= number unmarked animals caught in sample t
n
t
= number of animals caught in sample t, m
t
+ u
t

s
t
= number of animals released after sample t
m
rt
= number of marked animals caught in sample t, last caught in sample r
R
t
= number of the s
t
individuals released at sample t and caught again in a later
sample
Z
t
= number of individuals marked before sample t, not caught in sample t, but caught
in some sample after t.
The calculations that will lead to an estimate of N
t
, the number of animals just before time
t, for each time period (except the first and last) and several other parameters. First we need to
compute the proportion of animals marked as,

(11)
Estimation of N
t
, we need to divide the number of marks by alpha (eqn 11). However, the
number of marked animals in the population, M
t
, must be estimated in this method. Seber [19]
showed that M
t
can be estimated by,

(12)
Then,

(13)
The individuals dolphin was encountered was represent by 0 (absent) and 1 (present),
then data that include 0, 1 value was computed from above equation and put it in Table 1.

3. EXPERIMENTAL DETAILS
Study Area
Khanom coastline is located at latitude 9 19N and longitude 99 51E in Nakhon Si
Thammarat province, Thailand covering 222 km
2
(Figure 1). Most of Khanom coastline has a
depth less than 7.5 m. The mean spring and neap tidal ranges are 0.40, and 0.90 m, with the
mean sea level of 1.43 m (Hydrographic chart no. 1210, 2002, The Gulf of Thailand). A main
river, Bang Paeng River, discharges into the Khanom coastal area flows from west to east.
Daily tide changes are semi-diurnal, and the mean tidal range was 1.56 m.
A00013
March 23-26, 2010
22

Figure 1. Khanom coastline, Nakhon Si Thammarat, Thailand.

Data Collection
Observations and photographs from boats are the most practical approach to study
dolphins in most areas. We conducted a boat-based survey over approximately 50 km of
Khanom coastline (Figure 1) throughout a one year period from July 2008-June 2009 twice a
month. When the weather/sea conditions were suitable, we carried out the boat-based surveys
bimonthly from 0700-1400 hr using a 8-m long-tailed boat (a unique local vessel) by a 115
HP outboard engine travelling at an averaged target speed of 15 km/h except in severe
weather conditions. A minimum of two observers searched the waters. The track lines of the
surveys were parallel to the shoreline [17].
When dolphins were sighted, we collected date, time, geographic positions using
Garmin GPS MAP 76CSx, species observed, the number of individuals, and the number of
mother-calf pairs. Dolphins were photographed using the Nikon D80 DSLR digital camera
with lens 18 135 mm, and 70 300 mm. The camera was angled perpendicular to dolphin
body axis and dorsal fins as much as possible. Individual dorsal fins were subsequently
identified using the Digital Analysis Recognized of Whale Images and Network (DARWIN)
software. The mean number of photographs taken for each identifiable dolphin per survey was
50 photographs per dolphin. Even thought, we have been collected the physical data: water
visibility with Secchi Disc, water depth with the depth sounder, and distance off the shore
by measuring distance between encountered position that perpendicular with shoreline on the
Google Earth.

Cumulative Curve
Humpback dolphins were sighted 20 out of 22 times from boat surveys. The population
range was 73 km
2
. No sightings were made more than 2 km from shore. The cumulative
number of identified dolphins increased and reached the plateau (Figure 2). The exponential
power fit for all individuals was Y = , R
2
= 0.998, n = 39, P < 0.001 , for
adult was Y = , R
2
= 0.995, n = 39, P < 0.001, calf was Y
= , R
2
= 0.995, n = 39, P < 0.001 and juvenile was Y = ,
R
2
= 0.993, n = 39, P < 0.001 (Figure 2).

A00013
March 23-26, 2010
23

Figure 2. Cumulative number of identified dolphins and time with the exponential
power fit equation. = all individuals, = adults, = juveniles, = calves.

Population estimation
Forty nine identified individuals had been photographed from this population. The
estimated abundance of this population was made using Capture-Recapture method. We
tested three close population models: (1) Peterson, (2) Schumacher-Eschmeyer, (3) Schanabel
and one open population model: Jolly-Seber Model. Estimated Indo-Pacific humpback
dolphin population size at Khanom Sea were 49 dolphins from the Peterson model, 49
dolphins from the Schumacher-Eschmeyer model, 53 dolphins from the Schanabel model and
49 dolphins from the Jolly-Seber model (Figure 3a-d). The results showed that all models
estimated similar population size (i.e. 49-53 dolphins).

(a)

(b)

(c)

(d)

Figure 3. Close population estimation methods (a) Peterson model, (b) Schmacher-
Eschmeyer model and (c) Schnabel model; Open population method (d) Jolly-Seber model.

A00013
March 23-26, 2010
24
5. CONCLUSION
IndoPacific humpback dolphin population at Khanom sea estimated with four population
estimation models were 49-53 dolphins. This might be because there was no new individual
dolphin found after 28 sightings. This suggests that we have found and photographically
marked all dolphin individuals in Khanom population (i.e.49 dolphins). IndoPacific
humpback dolphin population at Khanom sea can be considered as a closed population due to
no migration.

REFERENCES
1. Cockcroft, V. G. and Krohn, R., Rep. Int. Whal. Commis. (Special Issue), 1994, 15: 317-
28.
2. Reeves, R. R. and Leatherwood, S., Dolphins, porpoises and whales: 1994-1998 action
plan for the conservation of cetaceans, IUCN, Gland, Switzerland, 1994, 92 pp.
3. Karczmarski, L., Oryx, 2000, 34, 207-16.
4. Pilleri, G. and Gihr, M., Investigations on Cetacea, 1974, 5, 95-149
5. Perrin, W. F., Miyazaki, N. and Kasuya, T., Mar. Mammal Sci., 1989, 5, 213-27.
6. Andersen, M. and Kinze, C. C., Annotated checklist and identification key to the whales,
dolphins and porpoises (Order Cetacean) of Thailand and adjacent waters. In:
Proceedings Workshop on Small Cetaceans of Thailand, Phuket Marine Biological Center,
Phuket, Thailand, 1995, 18 pp.
7. Chantrapornsyl, S., Adulyanukosol, K. and Kittiwattanawong, K., Phuket Mar. Biol.
Research Bull., 1996, 62, 39-63.
8. Chantrapornsyl, S., Adulyanukosol, K. and Kittiwattanawong, K., International Marine
Biological Research Institute Reports, Kamogawa, Japan, 1999, 9, 55-72.
9. Mahakumlayanakul, S., Species, distribution and status of dolphins in the inner Gulf of
Thailand. M.Sc. Thesis, Chulalongkorn University, Bangkok, Thailand. 1996, 130 pp.
10. Adulyanukosol, K., Proceedings of the first Korea-Thailand Joint Workshop on
Comparison of Coastal Environment, Korea-Thailand, Seoul, Korea, 1999, 5-15.
11. Hammond, P. S., Mizroch, S. A. and Donovan, G. P. (eds)., Rep. Int. Whal. Commis.
(Special Issue), 1990, 12, 1-440.
12. Aragones, L. V., Jefferson, T. A. and Marsh, H., Asian Mar. Biol., 1997, 14, 15-39.
13. Karczmarski, L. and Cockcroft, V. G., Aquatic Mammals, 1998, 24(3), 143-47.
14. Karczmarski, L., J. Zool., 1999, 249, 283-93.
15. Wrsig, B. and Jefferson, T. A., Rep. Int. Whal. Commis. (Special Issue), 1990, 12, 43-52.
16. Defran, R. H., Schultz, G. M. and Weller, D. W., Rep. Int. Whal. Commis. (Special Issue),
1990, 12, 53-5.
17. Wang, J. T., Yang, S. C., Hung, S. K. and Jefferson, T. A., Mammalia, 2007, 157-65.
18. Schnabel, Z. E., Math. Monog., 1938, 45, 348-52.
19. Seber, G. A. F., The estimation of animal abundance and related parameters, Second
edition, Macmillan Publishing Co. Inc., New York, 1982, 1-654.

ACKNOWLEDGMENTS
program for Biodiversity Research Training grant BRT T351131, Walailak University Fund
04/2552 and Center of Excellence for Ecoinformatics, the Institute of Research and
Development, Walailak University and NECTEC.

A00014
March 23-26, 2010
25
Predicting Functional Pathway of Nevirapine inducing Skin
Adverse Drug Reaction in HIV-infected Thai Patients with
Integrated Biological Networks

T. Narathanathanan
1,C
, S. Prom-on
2
, W. Chantratita
3,5
, and S. Mahasirimongkol
4,5
1
Bioinformatics and Systems Biology Program, King Mongkut's University of Technology Thonburi, Thailand

2
Department of Computer Engineering, Faculty of Engineering, King Mongkut's University of Technology Thonburi, Thailand

3
Department of Pathology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Thailand

4
The National Institute of Health, Department of Medical Sciences, Ministry of Public Health, Thailand

5
Thai Pharmacogenomics Project, Thailand

C
E-mail: mr.jaobao@gmail.com; Tel. (88)7885-7696

ABSTRACT
The integration of multiple biological interaction networks helps us to identify
plausible mechanisms which may involve with a cellular phenomenon under scrutiny.
In recent years, biomarkers for predicting adverse reaction of Nevirapine in HIV-
infected patients have been reported and successfully applied to screen patients.
However, underlying mechanisms that may orchestrate such phenomenon has yet to be
identified. This paper presents the development and result of network integration and
reconstruction for identifying plausible underlying mechanisms. Human disease
network, HIV-1 and human protein interaction network, and protein interaction
network were reconstructed from the online database and integrated together to form
the baseline interaction network. The biomarkers of Nevirapine inducing skin adverse
reaction of Thai HIV-infected patients, and the expressed genes of Steven-Johnson
syndrome and toxic epidermal necrolysis were overlaid onto the baseline interaction
network. The subnetwork that relate to adverse drug reaction was then extracted from
the integrated network. This subnetwork can be used to identify gene and protein
clusters that may associate with the skin adverse drug reaction. The result suggests that
the integration of human disease network and protein interaction network can be
effectively used to identify plausible underlying mechanisms of given biomarkers from
a genomic study.

Keywords: Human Disease Network, Biological Networks, Network Integration,
Functional Pathway, Adverse Drug Reaction, Nevirapine, Steven-Johnson syndrome
and toxic epidermal necrolysis

1. INTRODUCTION
The most one benefits of the Human Genome Projects are understanding of disease
causation and treatment, and our ability to screen for disease predisposition and treatment
responsiveness. Pharmacogenomics has the potential to improve drug discovery and
development, as well as improve drug safety and effectiveness (Davis and Khoury, 2006).
However, generally drug treatment to treat patients with the same treatment can cause in
difference result although we used similar drug, treatment, and same dose. In order to, in each
patient has different respond to drug treatment which can cause from many factor such as
patient physiology, environment of patient (food, smoking etc.) and genetic of patient. The
example of genetic variation involve to drug respond in patient as an enzyme deficiency in
drug metabolism, and drug hypersensitivity. In this case, the drug therapy may fail to be
curative and cause substantial drug side-effects and adverse reactions. An adverse drug
reaction (ADR) is the nature or severity of which is not consistent with domestic labeling or
market authorization, or expected from characteristics of the drug that occurs at a dose
A00014
March 23-26, 2010
26
normally used for prophylaxis, disease diagnosis or therapy, or for the modification of
physiological function (Edwards and Aronson, 2000).
In the field of pharmacogenomics, research has been required a way to solve the problem
of ADRs. By can be explored an indicator signaling an event or condition in a biological
system for giving a measure of exposure, effect, or susceptibility. This an indicator has used
to predict the efficacy of a drug and the likelihood of the occurrence of an adverse event in
individual patients that is pharmacogenomic markers (DNA, RNA, or protein) (Ingelman-
Sundberg, 2008). The finding of biomarkers in human population studies is an effective
strategy to gain knowledge about the occurrence of disease, to improve the understanding of
etiology and pathogenesis, and to measure the effort to control the disease outcome. The
potential applications of biomarkers in pharmacogenomic researches are numerous.
In this study, we apply an integrated network-based approach to reconstruction and
integration of human disease network, HIV-1 and human protein interaction network (HIV-1,
HPPI), and human protein interaction network (HPPI). Then this integrated networks were
overlaid with the genetic data of Nevirapine inducing rash adverse drug reaction (ADR) in
HIV-infected Thai patients and significant skin ADR (the 200 most differentially expressed
genes of blister cells and peripheral blood mononuclear cells (PBMCs) of Steven-Johnson
syndrome (SJS) and toxic epidermal necrolysis patients (TEN)) (Chung et al., 2008). Then
we extracted the subnetwork from the integrating network which has the individual gene
markers of the Nevirapine-induced skin rash adverse drug reaction. These subnetworks can
identify markers as the association of genes, proteins, and human genetic disorders. The
resulting sub-networks provide novel hypotheses for functional pathways involved in
Nevirapine-induced rash adverse drug reaction in HIV-infected patients.

2. MATERIALS AND METHODOLOGY
As a starting point of our analysis, we used a number of datasets to reconstruct the
integrated network. There are two types of datasets used to identify the plausible mechanisms
of ADRs. The first is the relationship datasets which consists of (i) The Online Medilian
Inheritance in Man (OMIM), (ii) Human Protein-Protein Interaction (HPPI), and (iii) HIV-1,
Human Protein Interaction Database (HIV-1, HPPI). The OMIM (Hamosh et al., 2002)
database is the list of disease-gene association pairs that are used to identify the diseases
associated with each of the gene to visualize the relationship between phenotype and
genotype in human genetic disorders. There are 3,366 unique genes and 1,593 disease names
among the total of 5,644 records (update on June 2009) of OMIM dataset to reconstruction of
the Human Disease Network (HDN) (Goh et al., 2007). Next, the list of protein-protein
interaction in human (HPPI) is available in the Human Protein Reference Database (HPRD)
(Keshava et al., 2009) which used to find interactions between proteins involved in common
cellular functions. In the HPPI, there are 9,628 proteins from the estimated number of human
protein-coding genes around 25,000. Finally, we used the HIV-1, Human Protein Interaction
(HIV-1, HPPI) database which is the interaction dataset between HIV-1 and human protein
(Fu et al., 2009) which contains the relationship between proteins that interacted with HIV-1
proteins. There are 1,439 proteins associated with 9 HIV-1 proteins of 5,147 the total records.
The second type of data is the genomic biomarkers which include (i) biomarkers from the
study of Nevirapine (NVP) inducing rash in HIV-infected Thai patients from Thai
Pharmacogenomics Project and (ii) significant skin ADR. The biomarker from NVP-induced
rash consists of 315 associations between SNPs and rash ADR genes. These two datasets used
for extracting the network that associate with skin ADR. There are 391 genes/proteins of
significant skin ADR that used to extract the other genes and/or proteins in the integrated
network that associated with skin ADR.
The dataset used to reconstruct and integrate network is the distinct data types provide a
different, partly independent and complementary, view of the whole genome. However, to
understand the functional or the pathway of skin ADR, we proposes an alternative approach
to identify pharmacogenomic markers that using network theory to reconstruct and integrate
of heterogeneous biological interaction networks as a genes from genome-wide network,
A00014
March 23-26, 2010
27
overlaying with the skin ADR network and HIV-related network in HIV-infected patients,
may yield new perspectives on gene network that permit the identification of functionally
related pharmacogenomic markers of skin ADRs. The idea of the integrated analysis of
multiple heterogeneous biological interactions or in the other hand of multiple data type
integration will improve the identification of biomarkers of clinical endpoints and improve the
predictive power (Reif et al., 2004; Khoury et al., 2004; Chung et al., 2008; Hamid et al.,
2009).

3.1 Characteristic of skin ADR network
The characteristic of the skin ADR network (in Figure 1.) shows many connections
between five individual genomic data relative to skin ADR as diseases, genes, proteins, SNPs,
and gene or protein of skin ADR. From the skin ADR network; there are 74 diseases which
consist of 22 diseases in peripheral network that the diseases have least one link to other
diseases, and 52 diseases from giant network. The majority disease class of disease in skin
ADR network is dermatological. Next, there are 1,713 genes/proteins; 253 genes/proteins in
peripheral network and 1,460 genes/proteins in giant network. Finally, there are 315 of total
SNPs, 190 and 125 SNPs in peripheral and giant network respectively. These characteristic of
the skin ADR network suggest that the genetic origins of skin ADR shared with other
genes/proteins and diseases in HDN and HPPI. The number of association or link between
each node (degree) has a high connected, implying that the majority skin ADR genes/proteins
related to share with other node of genes, protein, diseases and SNPs in giant network.
However, the characteristic of the skin ADR network, and HDN are quite similar because
they created from the disease class base on OMIM database, and international classification of
diseases (ICD-10) to grouping of the disease into 21 disease classes. The result of network is
clustered according to major disease classes. Consequently, there are visual differences
between different classes of diseases and diseases. On the other hand, in HDN the large
cancer cluster is thickly interconnected due to the many genes associated with multiple types
of cancer such as TP53, APC, FGFR3, MSH2, etc. Nevertheless, in this cluster includes
several diseases with tendency to cancer cluster such as Fanconi anemia, Costello syndrome.

3.2 Overview of subnetwork marker identification
We applied the integrated network-based approach to analyze the expression profiles of
the two skin ADR genome-wide association studies previously reported by Chung et al.
(Chung et al., 2008) this investigate the significant differently expression of gene involved
Stevens-Johnson syndrome and toxic epidermal necrolysis. Next, we used Nevirapine-
induced rash ADR in HIV-infected Thai patients from Thai Pharmacogenomic Project. We
restricted our analysis to the 240 genes/proteins present in both data sets (total of both
datasets are 392 genes/proteins). To integrate the significant genes/proteins of skin ADR and
network data sets, we overlaid the significant values (p-value) of each gene/protein onto the
integrated network and searched for one level of node associated with skin ADR gene/protein
to reconstruct the skin ADR network. After we got the skin ADR network, we used clusterViz
(Cai et al, 2008) (the analysis plug-in in Cytoscape) to extract subnetwork and compute the
significant subnetwork biomarkers of skin ADR. This process involved several scoring and
search steps, and described further in methodology part. Briefly, a candidate subnetwork was
first scored to assess its activity in each cluster, defined by Fishers combined p-value method
(Hess and Iyer, 2007) and Benjamini-Hochbergs adjusted p-value method (Benjamini and
Hochberg, 1995), its significant values.
A00014
March 23-26, 2010
28

3.3 Functional pathway of ADR reaction analysis
Next, we examined the predictive information of subnetwork markers through the
functional pathway of skin ADR in HIV-infected patient as shows as follows.

Table 1. Skin ADR subnetwork markers information

Gene ID Gene Symbol Fishers p-value Network Type
5432 POLR2C 1.40826E-19 Giant Network
6770 STAR

*cholesterol transporter activity
54535 CCHCR1
9618 TRAF4
5582 PRKCG
rs1265112 rs1265112
rs130072 rs130072
rs746647 rs746647

The previous study (Tiala et al., 2008; Suomela et al., 2007) suggested that CCHCR1 was
shown to promote steroidogenesis by interacting with the steroidogenic acute regulator
protein (STAR). They examined the role of CCHCR1 in psoriasis and cutaneous steroid
metabolism. They found that CCHCR1 and STAR are expressed in basal keratinocytes in
overlapping areas of the human skin, and CCHCR1 stimulated pregnenolone production in
steroidogenesis assay. Finally, their results suggest a role for CCHCR1 in the pathogenesis of
psoriasis via the regulation of skin steroid metabolism.
In addition, CCHCR1 associated with POLR2C this association demonstrate that RNA
polymerase II subunit 3 (RPB3) is retained in the cytoplasm by its interaction with
CCHCR1/HCR, the psoriasis vulgaris candidate gene product (Corbi et al., 2005).

Figure 1. The skin ADR network, circle shape node represents to a disease, diamond shape node
represents to gene, rounded rectangle shape node is protein, and octagon shape node is SNPS.
The colored based on the disease class to which they belong, the name of the 21 disease classes
appear on the right hand side of this network. The size of each node is proportional to the number
of degree which in the disease node is the number of genes involving in the similarity disease.
*
A00014
March 23-26, 2010
29
Table 2. GoGene (Plake et al, 2009) enrichment terms of subnetwork marker functional
module analysis

Enrichment
Term
Enrichment Name Gene
Count
Gene Symbol
Chemicals and
Drugs
RNA, Messenger 3 POLR2C, CCHCR1, STAR
Proteome 3 POLR2C, CCHCR1, STAR
Pregnenolone 1 STAR
Biological
Process (BP)
Mineralocorticoid biosynthesis 1 STAR
Granulosa cell differentiation 1 STAR
Progesterone biosynthesitc process 1 STAR
Mineralocorticoid metabolic process 1 STAR
Homocysteine metabolic process 1 CCHCR1
C21-steroid hormone biosynthetic
process
1 STAR
Progesterone metabolic process 1 STAR
Luteiniztion 1 STAR
Regulation of steroid biosynthetic
process
1 STAR
RNA elongation from RNA
polymerase II promoter
1 POLR2C
Molecular
Function (MF)
Cholesterol transporter activity 1 STAR
Cellular
Component
(CC)
Mitochondrial intermembrane space 1 STAR
Cytoplasm 2 STAR, CCHCR1
Nucleoplasm 1 POLR2C
Mitochondrion 1 STAR
Nucleus 2 POLR2C, CCHCR1
Membrane 1 STAR
Intracellular 3 STAR, POLR2C, CCHCR1
Mitochondrial envelope 1 STAR
Diseases Luteoma 1 STAR
Brenner Tumor 1 STAR
Menopause, Premature 1 STAR
Follicular Cyst 1 STAR

4. CONCLUSION
The multiple biological data types of human genomic, proteomic and interaction network
are growing. Each of these distinct data types provides different information. Therefore,
comprehensive side effect and adverse drug reaction requires more information than provided
by each of the data types. An integrative data from multiple data sources is an important part
for determining cause and effect relationships within and between the network modules. At
present, the success of integrated network-based approach also helps to uncover plausible
mechanisms. The resulting subnetworks provide novel hypotheses for functional pathways
involved in Nevirapine-induced rash adverse drug reaction in HIV-infected patients. The
result suggests that subnetwork markers are more understand the function or mechanism of
skin adverse drug reaction than individual marker genes, and that they will improve the
identification of biomarkers of clinical endpoints and the predictive power to minimize
adverse drug reaction in HIV-infected patients.

A00014
March 23-26, 2010
30
REFERENCES
1. Benjamini, Y.; Hochberg, Y. Journal of the Royal Statistical Society. Series B
(Methodological) 1995, 57, 289-300.
2. Cai, J.; Li, M.; Wang, J.; Chen, G., 2008.
3. Chung, W.-H.; Hung, S.-I.; Yang, J.-Y.; Su, S.-C.; Huang, S.-P.; Wei, C.-Y.; Chin,
S.-W.; Chiou, C.-C.; Chu, S.-C.; Ho, H.-C.; Yang, C.-H.; Lu, C.-F.; Wu, J.-Y.; Liao,
Y.-D.; Chen, Y.-T. Nat Med 2008, 14, 1343-1350.
4. Corbi, N.; Bruno, T.; De Angelis, R.; Di Padova, M.; Libri, V.; Di Certo, M. G.;
Spinardi, L.; Floridi, A.; Fanciulli, M.; Passananti, C. Journal of Cell Science 2005,
118, 4253-4260.
5. Davis, R. L.; Khoury, M. J. Pharmacogenomics 2006, 7, 331-337.
6. Edwards, I. R.; Aronson, J. K. Lancet 2000, 356, 1255-1259.
7. Fu, W.; Sanders-Beer, B. E.; Katz, K. S.; Maglott, D. R.; Pruitt, K. D.; Ptak, R. G.
Nucleic Acids Research 2009, 37, D417-422.
8. Goh, K.-I.; Cusick, M. E.; Valle, D.; Childs, B.; Vidal, M.; Barabsi, A.-L.
Proceedings of the National Academy of Sciences 2007, 104, 8685-8690.
9. Hamid, J.; Hu, P.; Roslin, N.; Ling, V.; Greenwood, C.; Beyene, J. Human Genomics
and Proteomics 2009, 2009, 1-13.
10. Hamosh, A.; Scott, A. F.; Amberger, J.; Bocchini, C.; Valle, D.; McKusick, V. A.
Nucl. Acids Res. 2002, 30, 52-55.
11. Hess, A.; Iyer, H. BMC Genomics 2007, 8, 96.
12. Ingelman-Sundberg, M. N Engl J Med 2008, 358, 637-639.
13. Keshava Prasad, T. S.; Goel, R.; Kandasamy, K.; Keerthikumar, S.; Kumar, S.;
Mathivanan, S.; Telikicherla, D.; Raju, R.; Shafreen, B.; Venugopal, A.;
Balakrishnan, L.; Marimuthu, A.; Banerjee, S.; Somanathan, D. S.; Sebastian, A.;
Rani, S.; Ray, S.; Harrys Kishore, C. J.; Kanth, S.; Ahmed, M.; Kashyap, M. K.;
Mohmood, R.; Ramachandra, Y. L.; Krishna, V.; Rahiman, B. A.; Mohan, S.;
Ranganathan, P.; Ramabadran, S.; Chaerkady, R.; Pandey, A. Nucleic Acids Research
2009, 37, D767-772.
14. Khoury, M. J.; Millikan, R.; Little, J.; Gwinn, M. Int. J. Epidemiol. 2004, 33, 936-
944.
15. Plake, C.; Royer, L.; Winnenburg, R.; Hakenberg, J.; Schroeder, M. Nucleic Acids
Research 2009, 37, W300-304.
16. Reif, J. C.; Xia, X. C.; Melchinger, A. E.; Warburton, M. L.; Hoisington, D. A.; Beck,
D.; Bohn, M.; Frisch, M. Crop Sci 2004, 44, 326-334.
17. Suomela, S.; Kainu, K.; Onkamo, P.; Tiala, I.; Himberg, J.; Koskinen, L.; Snellman,
E.; Karvonen, S.-L.; Karvonen, J.; Uurasmaa, T.; Reunala, T.; Kiviks, K.; Jansn, C.
T.; Holopainen, P.; Elomaa, O.; Kere, J.; Saarialho-Kere, U. Acta Dermato-
Venereologica 2007, 87, 127-134.
18. Tiala, I.; Wakkinen, J.; Suomela, S.; Puolakkainen, P.; Tammi, R.; Forsberg, S.;
Rollman, O.; Kainu, K.; Rozell, B.; Kere, J.; Saarialho-Kere, U.; Elomaa, O. Human
Molecular Genetics 2008, 17, 1043-1051.

ACKNOWLEDGMENTS
I would like to express my sincere gratitude and deep appreciation to Dr. Santitham Prom-on,
Prof. Dr. Wasun Chantratita, Surakameth Mahasirimongkol, The Systems Biology and
Bioinformatics (SBI) research team, and Thai Pharmacogenomics Project, for kindness,
invaluable guidance, and useful comments for enabled me to complete this research study
successfully. A special debt of gratitude is expressed to both National Center for Genetic
Engineering and Biotechnology (BIOTEC) Thailand and King Mongkut's University of
Technology Thonburi (KMUTT) for the full scholarship.
A00015
March 23-26, 2010
31
RNA Secondary Structure Prediction Using Conditional
Random Fields Model

S. Subpaiboonkit
1
, C. Thammarongtham
2
, R. Cutler
1
, J. Chaijaruwanich
1, C

1
Bioinformatics Research Laboratory, Faculty of Science, Chiang Mai University, Chiang Mai,
Thailand 50200
2
Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for
Genetic Engineering and Biotechnology, Bangkok, Thailand
C
E-mail: jeerayut@cs.science.cmu.ac.th; Tel. 05394-3455 ext 3455

ABSTRACT
Recently, non-coding RNAs have been found to have important biological functions in
living cells. They have conserved secondary structures according to the complementary
base pair interactions. Finding their secondary structure is an interesting task. Here, we
focus on computational RNA secondary structure prediction from their primary
sequence. In this work, RNA secondary structures prediction is defined as sequence
labeling problem and represented by dot-parentheses format. We propose the use of
conditional random fields (CRFs) method, which is a machine learning technique for
sequence labeling task, as probabilistic models with suitable feature selection from
known RNA secondary structures. Our CRFs models can predict the secondary
structures of the testing RNAs with corrections of prediction between 72.30% - 98.02%
from different topologies.

Keywords: Conditional Random Fields, RNA Secondary Structure, Non-Coding RNA,
Sequence Labeling

REFERENCES
1. Durbin, R., Eddy, S., Krogh, A., and Michison, G., Biological Sequence Analysis, 8
th
ed.,
Cambridge University Press, Cambridge, 2003, 223-323.
2. Eddy, S., Nat.Rev.Genet, 2001, 2, 919-929.
3. Sakakibara, Y., Bioinformatics, 2003, 19(1), 232-240.
4. Sato, K., and Sakakibara, Y., Bioinformatics, 2005, 21(2), 237-242.
5. Lafferty, J., McCallum, A., and Pereira, F., Proceedings of the 18
th
International
Conference on Machine Learning (ICML), 2001, 282-289.

A00016
March 23-26, 2010
32
Sea Surface Temperature Declines at Coral Sites Using
Field Sensors and NOAA Data

S. Chumkiew, M. Jaroensutasinee, and K. Jaroensutasinee
Center of Excellence for Ecoinformatics and Computational Science Graduate Program,
School of Science, Walailak University, 222, Thaiburi, Thasala, Nakhon Si Thammarat, 80161,
Thailand
E-mail: sirilak.chumkiew@gmail.com, jmullica@gmail.com, krisanadej@gmail.com
Fax: 66 0 7567 2004; Tel. 66 0 7567 2005-6

ABSTRACT
Sea Surface Temperature (SST) is one of the critical parameters for understanding how
the ocean connected with climate. We computationally examined satellite derived sea
surface temperature (SST) and sea surface temperature anomaly (SSTA) from
IRI/LDEO Climate Data Library database during 1981- 2009 from 20 coral reef sites
throughtout the world. The results showed that the mean SST was higher near the
equator and lower further away from the equator as expected. On the other hand, SST
showed a lower amount of fluctuations near the equator than of the locations further
away from the equator. When we considered the SD of the SSTA data over the northern
hemisphere (10 to 30 ) and the southern hemisphere (-10 to -30 ), the northern
hemisphere had higher SSTA fluctuations than in the southern hemisphere. We installed
HOBO temp/light sensors at two sites: Koh Tan, Gulf of Thailand, Thailand and Koh
Racha, Andaman Sea, Thailand during July 2007-July 2009. We compared SST from
field sensors with NOAA data and found that SST obtained from field sensors were
similar to SST from NOAA. When SST data at Koh Tan and Koh Racha obtained from
NOAA were plotted over time during 1981-2009, the results showed that SST at both
sites were consistently declined from 2004 to 2009. The further details of SST
variability were extensively discussed in the paper.

Keywords: Sea Surface Temperature (SST), Sea Surface Temperature Anomaly
(SSTA), Koh Tan, Koh Racha, Climate Changes, Coral Reef

1. INTRODUCTION
There is an increasing evidence that warming global temperatures will have profound
effects on the Earths ecosystems [1-3], with the global mean air surface temperature rising by
around 0.6 C during the 20th century, with 11 of the last 12 years (19952006) ranking
amongst the 12 warmest years in the instrumental record of global surface temperature since
1850 [4]. Oceans play a crucial role in regulating the climate [5], and since the 1950s the heat
content of the worlds oceans has increased by ~210
23
J, equivalent to a mean volume
warming of 0.06 C [6]. While this increase is an order of magnitude less than that observed
for terrestrial systems, it may be even more important as water heats at a much slower rate
than air because of its heat capacity.

In order to understand the process involved in global climate change, various scientific
measurements are essential. One of the critical parameters for understanding how the ocean is
connected with climate is the sea surface temperature (SST). Characterization of the SST
variability on all pertinent temporal and spatial scales still challenges science [7]. Various
studies have focused on SST and its variability; however, few have approached the problem
from the viewpoint of an evolutionary complex [7].
A00016
March 23-26, 2010
33
In this study, we aim at finding the evidence of climate change using SST and its anomaly
data, a satellite-based climatology from low to high latitudes (30 and -30 at 20 coral reef
sites during 1981-2009. Though satellite derived SSTA climatology data are collected
relatively over a short period, nevertheless, continuous 28 years data from 1981-2009 with
almost no gaps seem to give better results in performing the trend analysis. We validated SST
data from NOAA by SST from field sensors.

Satellite derived SST and SSTA are the only SST climatology which can be used to
represent the spatial and temporal variations with fine resolution. Analysis of SST archived
data depicts seasonal and long-term trends of climate change [8]. The main source of SST
data for all these studies were bulk surface temperature as measured by buoys, ship intakes
and other techniques. The high resolution satellite derived SSTA climatology is used to study
long term trends and its relationship with global phenomenon [9].

We installed HOBO temp/light sensors at two sites: Koh Tan, the Gulf of Thailand,
Thailand (9.37022 N, 99.94500 E) and Koh Racha, the Andaman Sea, Thailand (7.58645
N, 98.36319 E) duing July 2007 - July 2009. We compared SST from field sensors with
NOAA data. We selected 20 coral reef sites from various latitudes (Figure 1). The SST data
were downloaded from the Reynolds-National Center for Environmental Prediction (NCEP).
SST data were weekly data in a 11 grid SST product [10]. A monthly climatology of SST
was generated using the weekly SST data from 1984 through 1996 (those years were chosen
to exclude the very anomalous periods of 1982/83 and 1997/98). SSTA was computed from
this monthly climatology [10]. Long term SST and SSTA data were obtained from NOAA
NCEP EMC CMB GLOBAL OI version 2 (OI. v2). Data represented the most recently
available and complete historical dataset for global SST values. Optimum interpolation (OI)
SST analysis was widely used for weather forecasting, climate monitoring, climate prediction
and both oceanographic and atmospheric research, as well as specifying the surface boundary
condition for atmospheric analysis and reanalysis [10]. The SST and SSTA were obtained
from November 1981 - August 2009.

Figure 1. 20 Coral Study sites (dots) and field sensor stations (dots in rectangular)

SST was affected by local weathers, currents, and seasonal changes. The temperature
anomaly used to track changes in SST defined as the differences between the expected
temperature and the actual one. The expected temperature was the average temperature for
A00016
March 23-26, 2010
34
that day of the year, based on data from the last several decades. The SSTA was the
differences between coral reef watch's night time SST and the SST climatology for the
corresponding period. The base period for the climatology was from 1950 to 1979. Whenever
positive SSTA occurred during the warmest months of the year, often a 1 C elevation above
the monthly mean maximum accompanies coral bleaching.

Computer programming was written to analyze field SST and SST and SSTA data from
NOAA by using Mathematica software. Mean plots were used to analyze SST and SSTA
data. Piecewise function was used to find some association between the SD of SST and SSTA
and latitudes. In order to find out about the trend of SSTA changes over 28 years period, we
computed the moving average for every five year, plotted and fitted the moving average graph
by piecewise functions. We plotted the slope of SD of SSTA. The ANCOVA test was used to
test for the homogeneity of slopes of SD of SSTA between the Northern and Southern
hemispheres.


SST and SSTA variations along 20 coral reef sites

MeanSD of SST at Koh Tan from field sensor data was 29.500.66 C and from NOAA
was 29.080.81 C (Figure 2a,b). MeanSD of SST at Koh Racha from field sensor data was
29.350.51 C and from NOAA was 29.240.66 C (Figure 2a,b). This indicates that SST
data from NOAA were very much similar to our field sensor data at both Koh Tan and Koh
Racha. This suggests that we could use SST data from NOAA for further analysis.

(a)

(b)

Figure 2. Comparison of SST data from field and NOAA during July 2007 July 2009.
(a) Koh Tan, Surat Thani, Thailand and (b) Koh Racha, Phuket, Thailand
(red dot represent the data from NOAA, solid line represents data from field sensor).

A00016
March 23-26, 2010
35
Mean SST was higher near the equator and lower as it moved further away from the
equator (Figure 4a). SST had a lower amount of fluctuations near the equator and a higher
amount of fluctuations as it moved further away from the equator (Figure 3a, c). The level of
SST fluctuation at the northern hemisphere (10 to 30) were higher than at the southern
hemisphere (-10 to -30) (Figure 3a). SSTA had a lower amount of fluctuations near the
equator and a higher amount of fluctuations as it moved further away from the equator
(Figure 3b, d).

SDs of the SST data were decreased (Linear regression with piecewise function: y =
1.014 - 0.033x, x 8.63, P < 0.05), reached its minimum at the latitude of 8.63 and increased
(Linear regression with piecewise function: y = -0.573 + 0.151x, x > 8.63, P < 0.05, Figure
3c). SDs of the SSTA data were decreased (Linear regression with piecewise function: y =
0.485 - 0.007x, x 6.59, P < 0.05), reached its minimum at the latitude of 6.59 and increased
(Linear regression with piecewise function: y = 0.340 + 0.015x, x > 6.59, P < 0.05, Figure
3d). SDs of the SSTA data in the moving average method were decreased (Linear regression
with piecewise function: y = 0.483 0.005x, x 0.30, P < 0.05), reached its minimum at the
latitude of 0.30and increased (Linear regression with piecewise function: y = 0.348 +
0.011x, x > 0.30, P < 0.05, Figure 3e).

SST changed gradient along the equator, the waters of the equatorial thermocline
originate at the surface at higher latitudes. If the source of water temperature was increased,
then the equatorial thermocline temperatures would eventually increase. The cooling effect
would then be reduced on a time scale set by the renewal time for the equatorial thermocline.
The slopes of SD of SSTA increased in both the Northern hemisphere (Linear equation: y =
0.011 + 6.57510
-6
x, R
2
= 0.397, P < 0.001, Figure 4) and the Southern hemisphere (Linear
equation: y = 0.005 + 4.15410
-6
x, R
2
= 0.295, P < 0.001, Figure 4). The northern hemisphere
had a higher slope SD of SSTA than in the southern hemisphere (ANCOVA test for
homogeneity of slopes: F
1, 2292
= 132.020, P < 0.001).

A00016
March 23-26, 2010
36
(a)

(b)

(c)

(d)

(e)

Figure 3. Mean and SD of (a, c) SST, (b, d) SSTA at 20 coral reef sites during 1981-2009
and (e) The example of moving average during 1981-1985 of SSTA in every 5 years.

Figure 4. Latitudinal trend in slopes of SD of SSTA moving average. Northern
Hemisphere (Solid line) and Southern Hemisphere (Dashing line)

SST and SSTA variation at Koh Tan, the Gulf of Thailand and Koh Racha, the
Andaman Sea, Thailand
A00016
March 23-26, 2010
37
(a)

(b)

(c)

(d)

(e)

Figure 5. SST and SSTA during 1981-2009 at (a,b) Koh Racha, (c,d) Koh Tan and (e) their
histogram. Light and dark grey bar represents data from Koh Racha and Koh Tan,
respectively.

SST data at Koh Tan was positively associated with Koh Racha (Linear regression: y =
5.005 + 0.832x, R
2
= 0.905, P < 0.001). Koh Tan had a higher number of SSTA above 1 C
than Kho Racha. Koh Tan had SSTA below -1 C 9 times and above 1 C 48 times and Koh
Racha had SSTA below -1C 1 times and above 1C 32 times (Figure 5b,d). This result
indicates that SST at Koh Tan was subjected to stronger temperature fluctuation than at Koh
Racha. This could be due to two possible reasons: (1) effect from global phenomena such as
La Nia and El Nio and (2) poor ocean circulation in the Gulf of Thailand comparing with
the Andaman Sea. Coral reef sites were facing cool phase during SSTA below -1 C and
facing warm phase during SSTA above 1 C. During both warm and cool phases, we might be
likely to encounter coral bleaching if these two phases last long. Interesting both coral study
sites had warmer phase than cool phase. This suggests that El Nio might occur or has
stronger effect in this region than La Nia.

A00016
March 23-26, 2010
38
SST Variability and ENSO
The correlation between mean annual SST observed in Koh Tan and Koh Racha, Thailand.
The variability of SST in the Indian Ocean is quite prominent. The influence of Southern
Oscillations on the SST of Thailand coastal area is evident while examining the SST and
SSTA behavior during the past El Nio and La Nia years. Four El Nio episodes occurred
during 1986-1987, 1991-1992, and 1997-1998. Three La Nia episodes occurred during 1988-
1989, 1998-1999, and 2000-2001 were taken here as case studies. Looking at the annual mean
SST distribution, the SST was lower at the start of El Nio events. However, as the event
progress, the rise in SST was observed. The rise in SST continues and peaks in SST were
observed at the start of La Nia events. Though one to one correspondence may not be
possible while perform correlation analysis [9].

5. CONCLUSION
The increasing trend of slopes of SD of SSTA in this study might be the first evidence of
climate change using the SST observation data. From the result we found the climate change
become increase in both Southern and Northern hemisphere. Therefore, the trend of SSTA
changes would be one of the ways to predict the climate change and how it would affect coral
reef ecosystem. The careful monitoring of SST trends is required before assessing its impact
on the weather as well as on climate of the region and associated socio-economic
implications.

REFERENCES
1. Peuelas, J., and Filella, I., Science, 2001, 308, 793-95.
2. Thomas, C. D., Cameron, A., Green, R. E., Bakkenes, M., Beaumont, L. J., Collingham,
Y. C., Erasmus, B. F. N., de Siqueira, M. F., Grainger, A., Hannah, L., Hughes, L.,
Huntley, B., van Jaarsveld, A. S., Midgley, G. F., Miles, L., and Ortega-Huerta, M. A.,
Nature, 2004, 427, 145-48.
3. Hobson, V. J., McMahon, C. R., Richardson, A., and Hays, G. C., Deep-Sea Research,
2008, 55, 115-162.
4. IPCC. Climate Change The Physical Science Basis. Contribution of Working Group I to
the Fourth Assessment Report of the Intergovernmental Panel on Climate Change.
Cambridge University Press, Cambridge, 2007.
5. Folland, C. K., Karl, T. R., Christy, J. R., Clarke, R. A., Gruza, G. V., Jouzel, J., Mann,
M. E., Oerlemans, J., Sailnger, M. J., Wang, S. W. Observed climate variability and
change. In: Houghton, J. T., Ding, Y., Griggs, D. J., Noguer, M., van der Linden, P. J.,
Dai, X., Maskell, K., Johnson, C. A. (Eds.), Climate Change 2001. Cambridge University
press, Cambridge, 2001, 881.
6. Leitus, S., Antonov, J. I., Boyer, T. P., Stephens, C., Science, 2000, 287, 2225-29.
7. Gan, Z., Yan, Y., and Qi, Y., J. Atmos. Ocean. Tech., 2007, 24(4), 681-87.
8. Hobson, V. J., McMahon, C. R., Richardson, A., and Hays, G. C., Deep-Sea Research,
2008, 55, 115-62.
9. Khan, T. M. A., Khan, F. A., and Jilani, R., J. Basic Applied. Sci., 2008, 4(2), 67-72.
10. Reynolds, R. W., and Smith, T. M., J. Climate, 1994, 7, 929-48.
ACKNOWLEDGMENTS
This work was supported in part by the Thailand Research Fund through the Royal Golden
Jubilee Ph.D. Program (Grant No. PHD/0307/2550), and Center of Excellence for
Ecoinformatics, the Institute of Research and Development, NECTEC/Walailak University.
A00017
March 23-26, 2010
39
A Conditional Random Fields-Based for
CpG islands prediction in Rice

P. Udomwong
1
, V. S. Lee
2,C*
, S. Anuntalabhochai
3
, and J. Chaijaruwanich
1,C**
1
Bioinformatics Research Lab, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand
50200
2
Department of Chemistry,

Computational Simulation and Modeling Laboratory (CSML), Faculty of
Science, Chiang Mai University, Chiang Mai, Thailand 50200
3
Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand 50200
E-mail:
C*
vannajan@gmail.com; Fax: 053-892275; Tel. 053-941971
C**
jeerayut@science.cmu.ac.th; Fax: 053-943439; Tel. 053-943455

ABSTRACT
CpG islands (CGIs) are clusters of CpG dinucleotides in GC-rich regions. In rice
genome, genes are grouped into one of the five classes by associated position of CpG
with genes. The difference in the methylation level between these genes in rice
demonstrated that CpG islands may be useful for figuring out the uncharacterized genes
expression pattern. In this work, the implementation of Conditional Random Fields
(CRFs), a machine learning approach, is applied to predict CGIs which are associated
with the promoter regions of genes. Experiments have been relied on the available
genomic sequence data from the RiceGASS databases, on the rice chromosome 1. The
results from our CRFs-based model are consistent with previous studies by Northern
hybridization analysis.

Keywords: Conditional Random Fields (CRFs), CpG islands, CpG clusters, CGIs,
machine learning, methylation, rice.

1. INTRODUCTION
CpG Islands (CGIs or CpG dinucleotides) are CpG clusters or CpG-rich regions. A number
of reports point out the role of CGIs and their methylation in the regulation of mammalian
gene expression. In plants genomes, CpG has participated an integral role in regulating
development through repressing transcription of genes. In the plant genome, methylcytosines
are visible both of symmetrical and asymmetrical sequences, CpG and CpNpG. After
treatment with methylation-sensitive restriction enzymes in plant DNA sequences and
investigations of plant genomic DNA proposed the existence of undermethylated CpG-rich
analogous to vertebrate CpG islands. However, information about plant CpG, especially in
rice (monocot), islands has not been exposed.

Most of these CpG clusters in rice genome, which are over-represented about 80% of rice, are
correlated with genes expression. Rice genes are grouped into one of the five classes
according to the position of an associated CpG cluster. In class 1 genes, a CpG cluster is
located around the 5-end of the associated gene. In class 2 genes, a CpG cluster covers the
whole gene region. In class 3 genes, a CpG cluster occurs noticeably downstream of the 5-
end. Class 4 genes contain a CpG cluster around the 5-end of the associated gene and another
cluster downstream of the 5-end. Class 5 genes lack CpG clusters. Rice genes expressed in
single or more tissues can be described by incidence of CGIs position. These results suggest
that plant CpG islands may be useful for deducing the expression pattern of uncharacterized
genes.

A00017
March 23-26, 2010
40
To better understand the genome features of CGIs in rice, conditional random field (CRF) is
applied to being a CGIs investigator of each gene cluster. CRF, a supervised machine-
learning technique, is a framework for building discriminative probabilistic models to
segment and label sequential data (Lafferty et al., 2001; Wallach, 2004; Sutton and
McCallum, 2006). Early, CRFs were introduced as a tool in Natural Language Problems
(NLPs); Part-of-Speech tagging (POS), chunking problem, and text segmentation problem
(Sha and Pereira, 2003). CRFs models contribute its role in bioinformatics by formulating
problem as a labeling problem.

The recent approach is suggested to classify the CGI cluster of among rice genes. The study
case is done with 3 main class of CGI; class 1, class 2, and class 5. Especially, class 1 genes,
which are the largest class and single specific tissue expression, are decided to be class
member as positive label. The others are decided not to be class member. The studies here
have incorporated their biochemical features in the rice genome, such as DNA sequence
properties, DNA structure into computational methods to predict the strength of each CGI
quantitatively. The 5-fold cross-validation model was trained and tested on 107 sequences
from class 1, 88 sequences from class 2, and 40 sequences from class 5 which are obtained
from the RiceGAAS.
Table 1 Grouping of nucleic acids according to their biochemical property (Dougherty et al., 2006)
Biochemical property Property Group
Type of base
Purine R ={A, G}
Pyrimidine Y={C, T}
Strength of links
Double H bond W={ A, T}
Triple H bond S={C, G}
Radical content
Amino group (NH
3
) M={ A, C}
Keto group (C=O) K = { G, T}

Conditional Random Fields Preliminaries: Conditional random fields (CRFs) (Lafferty et
al., 2001), are an undirected graphical model, initially introduced to solve the problem of
segmenting and labeling sequential data such as bioinformatics and natural language
processing. The main idea is to represent the conditional probability distribution over the
hidden variables given observations. In sequence labeling problems, the purpose of the
technique is to make a prediction of the sequence labels, that is } ,..., , {
2 1 iT i i i
y y y y = ,
corresponding with this sequence of observations each data item
i
x is a sequence of
observations } ,..., , {
2 1 iT i i
x x x . The dependencies among the labeled components of a random
variable Y are represented by an undirected graph ) , ( V E G= .

Definition: Let ) , ( V E G= be a graph such that
V v v
Y Y
e
= ) ( , so that Y is indexed by the
vertices of G. Then ) , ( Y X is a conditional random field, when conditioned on X , the
random variables
v
Y obey the Markov property with respect to the graph
w Y X Y p v w Y X Y p
w v w v
, , | ( ) , , | ( = = ~
) v
, where w ~ v means that w and v are neighbors
inG.

Therefore, by the fundamental theorem of linear-chain random fields the joint distribution
over the label sequence Y given X has the form (Sha and Pereira, 2003)
A00017
March 23-26, 2010
41
) (
1
) | (
X Z
X Y p =
u
exp(

=

T
i k
i i k k
i X y y F
1
1
)) , , , ( ( (1)
Where

e =

=
S s
T
i k
i i k k
i X y y F X Z
1
1
)) , , , ( exp( ) ( is a normalization term for the
whole sequential state, and ) , , , (
1
i X y y F
i i k
are feature functions, and each of them is either
a state feature function or a transition function. We assume that the feature functions are
fixed, and denote = } {
k
as a weight vector which to be learned through a training data set.

For the training step, the likelihood function of the CRF is used to find a global maximum
value for the training label sequences to make the state sequences explicit.

Training CRFs: Let ) , ( )} , {(
1
k k T
k
k k
y x y x D
=
= be the training data set. CRFs are trained by
finding the weight vector } ,..., , {
2 1 k
u = to maximize the log likelihood

A

= =
k
k k k
k
k k
x Z x y F x y p )] ( log ) , ( [ ) | ( log

(2)

Inference in CRFs: Inference in CRFs is to find a state sequence y which is most likely given
the observation sequence x . As mentioned in earlier studies, the most likely ) | ( x y p can be
found through the Viterbi algorithm which can be formally written as

) , ( max arg ) | ( max arg x y F x y p y
y y
= =
(3)

Conditional Random Fields for CGIs prediction: As an example, the Part-of-Speech tagging
task is similar to promoter region prediction where words are categorized as different types.
In the case of CGIs prediction, CRFs (conditional random fields) are used to discriminate
particular sequence. The value space of observation is annotated by the four type of
nucleotide in DNA; Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). So x belongs
to X where } , , , { T G C A X = .

State set: Consequently, types of CGIs class are assigned to be hidden states ) (
1
n
i
i
y Y
=
= .
This is similar to a token in a sentence. The representations of hidden states are described and
the CRF focus is on an upstream sequence. NON_CGIs_Class1 is the annotation of all not
belongs to CGIs Class 1 member. CGIs_Class is used to symbolize the member of CGIs Class
1. Therefore the label space ) (
1
n
i
i
y Y
=
= is A which A = {NON_CGIs_Class1, CGIs_Class1}.
In the present work, biochemical properties of nucleic acids can also be grouped into sub-
classes. All mentioned properties in table1 are expected to provide some additional
information about the signal of promoter regions. The three main additional properties of
nucleic acid are showed in table1.

3. MATERIALS AND DESIGNED EXPERIMENTS
The training and test data sets are imported from the RiceGAAS. The gene sequences are
from class-1 107 sequences, class-2 88 sequences, and class-5 40 sequences. To analyze the
performance of CRF methodology for categorized aspects. Several features were considered
in addition to the effective additional biochemical properties. All designed experiments were
generated and tested against the same dataset. The results from the 5 cross-validation
accuracy study is summarized for each model and shown in Table 1. All input data for the
constructed model, CRF++ (Kudo, 2005), and the implementation of CRFs, was used to build
the model.

A00017
March 23-26, 2010
42
Precision and recall are two used techniques for evaluation of CRF-Based method in
Classifying CGIs , which are defined as following, where TP is the true positive (the match
between CGIs_Class1 manual and model predict label), FN is the false negative (the match
between CGIs_Class1 manual and NON_CGIs_Class1 from model predict label), FP is a
false positive (the match between NON_CGIs_Class1 manual and CGIs_Class1 model predict
label) and TN is a true negative (the match between NON_CGIs_Class1 manual and model
predict label).
FP TP
TP
precision
member
+
=
;
FN TN
TN
precision
member not
+
=

FN TP
TP
recall
member
+
=
;
FP TN
TN
recall
member not
+
=

2
member not member
member
precision precision
precision

+
=

2
member not member
member
recall recall
recall

+
=

recall precision
recall precision
F
measure
+

=
) ( 2
1

Table 2 A 5 cross-validation of predicted result performance with different types of input sequences by statistical
measurement
Type of Input
Sequence
Statistical performance
member
pre

member not
pre

pre
member
rec

member not
rec

rec
1 F
DNA 0.49 0.57 0.53 0.44 0.62 0.53 0.53
Base Types 0.45 0.51 0.48 0.42 0.54 0.48 0.48
Bond Strength 0.59 0.63 0.61 0.52 0.69 0.60 0.61
Radical content 0.50 0.55 0.59 0.42 0.55 0.53 0.53

To consider the result, there is relative correlation of CGIs class combine with biochemical
properties. The interested factor which influence the results as presented in our experiment
rely on bond strength. The prediction ability of this property is higher than all previous our
selected features since C and G are link each other by triple bond of hydrogen while A
and T are annotated by W (double bond of hydrogen). This states that in the plenty S
area can be inferred to be CGIs cluster. However, this choice is not determined to be
the best. In order to improve discriminative performance and expose the correlation
between CGIs cluster and other physical property, we plan to consider DNA with 1 or
more additional information, in the near future.

5. CONCLUSION
Conditional model based method have been introduced to predict quantitative promoter as
CGIs clusters in DNA sequences by the problem formulation, the POS (part of speech
tagging). CGIs cluster classification is formulated by their membership of each CGI cluster.
The mentioned case in this work is the largest cluster among all 5 clusters and single specific
tissue expression. However, this is framework can be applied to other case which require the
specific condition. Our method performance also includes the biochemical information to the
model which gives the better result. This enlightens us to know how each property effect to
CGIs for further analysis in laboratory. In the future, we plan to incorporate 1 or more
features related to CGIs into our method in order to capture more realistically the natural
constrains on the model.

A00017
March 23-26, 2010
43

REFERENCES
1. Ashikawa, I. The Plant Journal 2001, 26, 617-625.
2. Ashikawa, I. DNA Res 2002, 9, 131-134.
3. Ashikawa, I.; Numa, H.; Sakata, K. Molecular Genetics and Genomics 2006, 275, 18-25.
4. Lafferty, J., and McCallum, A., Pereira, F. In Proceedings, Eighteenth International
Conference on Machine Learning, ICML 2001 2001, p 282-289.
5. Sakata, K.; Nagamura, Y.; Numa, H.; Antonio, B. A.; Nagasaki, H.; Idonuma, A.;
Watanabe, W.; Shimizu, Y.; Horiuchi, I.; Matsumoto, T.; Sasaki, T.; Higo, K. Nucl. Acids
Res. 2002, 30, 98-102.
6. Shahmuradov, I., Gammerman, A., Hancock, J. M., Bramley, P. M., and Solovyev, V. V.
Nucleic Acids Res 2003, 31, 114 - 117.
7. Shahmuradov, I. A.; Solovyev, V. V.; Gammerman, A. J. Nucleic Acids Research 2005,
33, 1069 - 1076.
8. Sutton, C., McCallum, A., Getoor, L., and Taskar, B. In Introduction to Statistical
Relational Learning; MIT Press: 2007.

ACKNOWLEDGMENTS
The research described in this paper was partially supported by the Institute for The Graduate
School Chiang Mai University, Chiang Mai University, and Center for Innovation in
Chemistry: Postgraduate and Research Program in Chemistry (PERCH-CIC), and also
National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand. We also
would like to thank R. Cutler for sharing with us his experience in biology area and
improving my rational idea.

A00019
March 23-26, 2010
44
A study of niche adaptation in Cyanobacteria via
evolutionary scenario of photosynthetic machinery

V. Wanchai
1,C
, P. Prommeenate
3
, K. Paithoonrangsarid
3
, A. Hongsthong
2
, J. Senachak
3
,
J. Panyakampol
4
, V. Plengvidhya
2
, and S. Cheevadhanarak
1,4

1
Bioinformatics Program, School of Bioresources and Technology and School of Information
Technology, King Mongkuts University of Technology Thonburi, Bangkok, 10140, Thailand

2
National Center for Genetic Engineering and Biotechnology,
National Science and Development Agency, Pathumthani 12120, Thailand
3
Biochemical Engineering and Pilot Plant Research and Development Unit,
National Center for Genetic Engineering and Biotechnology, King Mongkuts University of Technology
Thonburi, Bangkok 10150, Thailand
4
School of Bioresources and Technology, King Mongkuts University of Technology Thonburi,
Bangkok 10150, Thailand
C
E-mail: visanuw86@gmail.com

ABSTRACT

Cyanobacteria are one of the oldest members of the bacterial domain and also are the
most abundant photosynthesis organisms on earth. They have originated almost every
available niche for the last 3.5 billion years, demonstrating an excellent ability for
adaptation to extremely different habitats. Thus, it was not surprising to discover them
in extreme environments with a very poor nutrient supply, such as thriving in various
temperature ranges, at the limits of pH values, in salt solutions, under UV radiation,
dryness, heavy metals, anaerobic niches, under various levels of illuminations, and
under hydrostatic pressure. Because of being the photosynthesis organisms, one of
targets of environmental stresses in cyanobacteria are stress-sensitive sites in the
photosynthetic machinery[1]. These make the photosynthesis genes the interesting
subject to study how each cyanobacteria group have adopt themselves to cope with
variety of environmental stresses through evolution. However, the photosynthesis genes
underlying niche-specific adaptation of cyanobacteria are poorly understood. In order
to address this issue, the unique niche-specific photosynthesis gene of cyanobacteria
will be revealed by analyzing gene gain and loss scenarios during evolution[2]. There
are cyanobacterial photosynthetic machinery genes such as CO
2
fixation showed the
adaptation in different niches of cyanobacteria. The obtained results will reveal to a big
picture of understanding the cyanobacteria. Moreover, the knowledge gained from the
photosynthesis gene group of cyanobacteria will not only contribute to the
understanding of their photosynthesis systems and niche adaptation, but also to the
applications of the knowledge gained to the strain improvement of the industrial
important strains, such as Spirulina platensis, and breeding programs of higher plants.

Keywords: Cyanobacteria, Evolution, Niche adaptation, Photosynthesis.

1. INTRODUCTION
Being ancient colonizers of the earth, Cyanobacteria are thought to be ancient phototrophs
providing the formation of atmospheric oxygen [3]. Moreover, it is suggested that
photosynthesis originated in the cyanobacterial lineage allowing the photosynthetic ability in
present-cyanobacteria. Due to the photosynthetic organisms, stress-sensitive sites in the
photosynthetic machinery were one of sensitive targets when they were faced with
environmental stresses [1]. According to these reasons, the photosynthesis genes are the most
interesting subject to study the adaptation of cyanobacteria to variety of environmental
A00019
March 23-26, 2010
45
stresses, but even so the photosynthesis genes underlying niche adaptation of cyanobacteria
are poorly understood.
To address this issue, 36 complete genomes and additional 13 draft genomes of
cyanobacteria were used to study by the way of evolutionary aspect, gene gain and loss during
evolution. The gainned results will reveal to a big picture of understanding the cyanobacteria.
Moreover, the knowledge gained from the photosynthesis gene group of cyanobacteria will
not only contribute to the understanding of their photosynthesis systems and niche adaptation,
but also to the applications of the knowledge gained to the strain improvement of the
industrial important strains, such as Spirulina platensis, and breeding programs of higher
plants.

2.1 Photosynthesis in cyanobacteria
Photosynthesis is the way that photoautotrophs use to gain their source of energy by
harvesting from light. However, there are some differences in photosynthesis apparatus
between plants and cyanobacteria, such as the reaction center types and antenna systems. In
addition, for the photosynthesis machinery in cyanobacteria, phycobilisome served as the
primary light-harvesting antennae for the Photosystem II, whereas some marine
picocyanobacteria, such as Prochlorococcus use a chlorophyll a
2
/b
2
light-harvesting complex.
The picture demonstrating of photosynthesis antenna proteins and light harvesting complex
has been shown in Figure 1.

Figure1: The reference photosynthesis antenna proteins and light harvesting complex from
KEGG databases.

2.2 Evolutionary analysis
There are several approaches were employed to study the evolutionary, for example,
using presence-absence of genomes in clusters of orthologous gene, conserving of local gene
order (gene pairs) among prokaryote genomes, using parameter of identities distribution of
probable ortholog, comparing of trees constructed from multiple protein families, analyzing
of concatenated alignments of ribosomal proteins [4]. The tree based on presence-absence of
genomes in orthologous clusters and the trees based on conserved gene pairs appear to be
strongly affected by gene loss and horizontal gene transfer. The trees based on identity
distributions for orthologs and particularly the tree made of concatenated ribosomal protein
sequences seemed to carry a stronger phylogenetic signal.
A00019
March 23-26, 2010
46
FASTA-formatted protein sequences of 36 complete genomes and additional 13 draft
genomes of cyanobacteria were retrieved from NCBI database
(http://www.ncbi.nlm.nih.gov/). The niches of each cyanobacteria using in this study were
simultaneously collected. All protein sequence of 49 cyanobacteria genomes were clustered
by using the cluster of orthologous group (COG) method [5], the orthoMCL [6] with the e
-05

cut-off value. The COG function category of overall resulting protein clusters (cyanoCOG)
were assigned by using BLAST with the COGs database [7]. Miss-match cyanoCOGs would
be marked for further curation. The photosynthesis-related COGs in cyanoCOGs were then
collected. After that a pattern of presence and absence genes in cyanobacteria, a phyletic
pattern, was then constructed for analyzing differences of photosynthesis genes in those
organisms. Each of evolutionary scenarios of selected photosynthesis genes was revealed by
using the constructed phylogenetic tree and the inception of the parsimonious evolutionary
scenarios algorithm [8]. The evolutionary scenarios describing the niche adaptation of each
cyanobacteria group were analyzed by in-house Python programs.

4.1 Cluster of orthologous proteins in cyanobacteria
Clustering of 182,663 proteins encoded in 49 cyanobacterial genomes (36 complete
genomes and additional 13 draft genomes) yielded 15,741 clusters [cyanobacterial clusters of
orthologous groups of proteins (cyanoCOGs)]. There are 11 groups and 195 photosynthetic
machinery genes occurring in cyanoCOG; ATP synthases, water soluble electron carriers,
cyanotochrome b
6
f complex subunits, cyanotochrome C oxidases, photosystem II proteins,
photosystem I proteins, carotenoid biosynthesis, chlorophyll biosynthesis, phycobilisone,
NAD(P)H dehydrogenases, and CO
2
fixation, respectively. The distribution patterns of the
presence or absence genes relating the photosynthetic machinery in cyanoCOG have shown
the rough relation between the genome size and the niche adaptation of each cyanobacteria,
such as possessing to the poor representation of genes involved in photosynthetic machinery
in the marine picocyanobacteria which survive in the nutrient-poor and constant oceanic
environments.

4.2 Evolutionary scenarios of photosynthetic machinery and niche adaptation of
cyanobacteria
All cynobacteria members have been divided into 7 groups due to their biological
property and environmental niches to describe the niche adaptation; I; extreme environmental
niches, II; uncharacterized niches, III; filamentous nitrogen fixing cyanobacteria, IV;
filamentous non-nitrogen fixing cyanobacteria, V; unicellular nitrogen fixing cyanobacteria,
VI; Obligate photoautotrophs, and VII; marine picocyanobacteria. The summarization of
analyzing groups of cyanobacteria is described in Table1.

Table1: A representation of cyanobacteria groups divided by their biological property and
environmental niches

up Description
Inferred biological property and
environmental niche
I GVI7421, SJA33AB, SJA23BA Ancestor Extreme environmental niches
II TELOBP1, AMA1017, CPC7425 Ancestor Uncharacterized
III Nostocales Ancestor Filamentous nitrogen fixing cyanobacteria
IV Oscillatoriales Ancestor
Filamentous non-nitrogen fixing
cyanobacteria
V Chroococcales Ancestor Unicellular nitrogen fixing cyanobacteria
VI Synechococcus elongates Ancestor Obligate photoautotrophs
VII Picocyanobacteria Ancestor Marine cyanobacteria, small genome sizes

A00019
March 23-26, 2010
47
The study of evolutionary scenarios of photosynthesis machinery and niche adaptation
was performed by employing parsimonious evolutionary scenarios algorithm. Table 2 shows
the distribution of gained and lost photosynthetic machinery genes through the evolution.
There are 166 out of 195 photosynthetic machinery genes assigned to be essential genes
which the last common ancestor of cyanobacteria has used to perform photosynthesis. all list
of genes involving the Atp synthase and the Chlorophyll biosynthesis were inherited to all
cyanobacteria without gain or loss scenario via evolution. There are the vast majority of
photosynthetic genes were lost in marine cyanobacteria. Moreover, the CO
2
fixation genes
were inherited from the last common ancester to entire cyanobacteria, whereas
Phrochlorococcus ancestor extensively lost in these group. In contrast, genes were
extensively gained in the freshwater Synenccococcus ancestor, the cyanobacteria that habit in
the extreme environment niches and the Nitrogenfixing filamentous cyanobacteria
(Nostocales). In case of acquisition, we found that there are different sources of gaining some
genes; a group of phycobilosome linker proteins, fresh water cyanobacteria inherited these
genes from the last common ancester, while picocyanobacteria gained phycobilisome proteins
from other sources.

Table 2: Distribution of photosynthesis-related genes in cyanobacterial genomes via
evolutionary scenario

A00019
March 23-26, 2010
48

The distribution of photosynthesis-related gene via evolutionary scenarios in 7 groups of 49 cyanobacteria
genomes described in Table 1. Ancesteral gene set; genes occurring in a last common ancestor of cyanobacteria, Y;
genes gained by HGT, gene duplication, or genomic recombination, N; gene lost via evolution.

5. CONCLUSION
In this study, the evolutionary scenarios of photosynthetic machinery genes among the
cyanobacterial clades were analyzed by comparative genomics and phylogenetic analysis. We
found that the main photosynthesis machinery genes of cyanobacteria were inherited from
their ancestral photoautotrophs but several photosynthesis genes, such as phycobilisome
gained by the way of extensive HGT, genomic recombination, or gene duplication. Moreover,
some genes have shown the niche adaptation of each cyanobacteria, such as the enormous
loss of genes involved in CO
2
fixation in the marine picocyanobacteria which survive in the
nutrient-poor and constant oceanic environments, whereas the Filamentous nitrogen fixing
cyanobacteria living in fluctuating environments have an abundant of photosynthetic
machinery genes. The further study combining global approaches such as molecular analyses,
transcriptomics, proteomics, and metabolomics, will lead to novel insight in photosynthesis
systems and niche adaptation mechanisms in cyanobacteria.

REFERENCES
1. Murata, N., Takahashi, S., Nishiyama, Y., and Allakhverdiev, I. S., Photoinhibition of
photosystem II under environmental stress, BBA, 2006, Vol. 1767, 414-421.
2. Makarova, K., Sorokin, A., Novichkov, P., Wolf, Y, and Koonin, E., Clusters of
orthologous genes for 41 archaeal genomes and implications for evolutionary genomics
of archaea, Biology Direct, 2007, Vol. 2, 33.
3. Bekker A., Holland D., Wang L., Rumble D., III, Stein J., Hannah L., Coetzee L., Beukes
J., Dating the rise of atmospheric oxygen, 2004, Nature. Vol. 427, 117120.
4. Wolf, Y., Rogozin, B., Grishin, V., Tatusov L., Koonin E., Genome trees constructed
using five difference approaches suggest new major bacterial clades, BMC Evolutionary
Biology, 2001, Vol. 1, No. 8.
A00019
March 23-26, 2010
49
5. Makarova, K., Slesarev, A., Wolf, Y., Sorokin, A., Mirkin, B., Koonin, E., et al.,
Comparative genomics of the lactic acid bacteria, Proceedings of the National Academy
of Sciences of the United States of America, 2006, Vol. 103, No. 42, 15611-15616.
6. Li, L., Stoeckert, J. Jr., and Roos, S., OrthoMCL: Identification of Ortholog Groups of
Eukaryotic Genomes, Genome Research, 2003, Vol. 13, No. 9 , 2178 2189
7. Tatusov, L., Galperin, Y., Natale, A., Koonin, E., The COG database: a tool for genome-
scale analysis of protein functions and evolution, Nucleic Acids Research, 2000, Vol. 28,
No. 1, 33 36.
8. Mirkin, G., Fenner, I., Galperin, Y., Koonin, E., Algorithms for computing parsimonious
evolutionary scenarios for genome evolution, the last universal common ancestor and
dominance of horizontal gene transfer in the evolution of prokaryotes, BMC evolutionary
biology, 2003, Vol. 3, No. 2.
9. Vermaas, W., Photosynthesis and Respiration in Cyanobacteria, Encyclopedia of life
sciences, 2001.
10. Mulkidjanian, A., Koonin, Y., Makarova, S., Mekhedov, S., Sorokin, A., Wolf, Y., et al.,
The cyanobacterial genome core and the origin of photosynthesis, PNAS, 2006, Vol. 103,
No. 35, 13126-13131.
11. Rothschild, L., The evolution of photosynthesis...again?, Philosophical Transactions of the
Royal Society B, 2008, Vol. 363, 27872801.

ACKNOWLEDGMENTS
This study was supported by King Mongkuts University of Technology Thonburi (KMUTT),
and National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand.
A00021
March 23-26, 2010
50
Cross Association of Sea Surface
Temperature of 13 Sites in Thailand

U. Kuhapong, K. Jaroensutasinee, and M. Jaroensutasinee
E-mail: rkuthai@gmail.com, krisanadej@gmail.com, jmullica@gmail.com;
Fax: 086-4795011; Tel. 086-4707199

ABSTRACT
This study aimed at examining the cross association of sea surface temperature data
among 13 sites around Thailand both of Andaman sea and Gulf of Thailand. Sea
surface temperature data were provided by NOAA NCEP EMC CMB Global
Reyn_SmithOIv2 weekly dataset in IRI/LDEO Climate Data Library
(http://ingrid.ldeo.columbia.edu). The temporal range was from November 1981 to
October 2009. Priori to compute cross association, the data were cross calculated for
their associations among themselves using the linear regression. R
2
, y-intercepts and
slopes of these linear regressions were further analyzed with a cluster analysis.
Surprisingly, we found that groups of the locations of Andaman sea suffered from
higher temperatures than those in the Gulf of Thailand.

Keywords: Sea Surface Temperature, Correlation Analysis, Thailand

REFERENCES
1. Khan, T. M. A., Razzaq, D. A., Chaudary, Q. Z., Quadir, D. A., Kabir, A., and
Sarker, M. A., Natural Hazards, 2004, 31(2), 549-560.
2. Reynolds, R. W., Gentemann, C. L., and Wentz, F, J. Climate, 2004, 17, 2938-
2952.
3. Walther, G. R., Post, E., Convey, P., Menzel, A., Parmesan, C., Beebee, T. J.
C., Fromentin, J. M., Hoegh-Guldberg, O., and Bairlein, F, 2002, Nature, 416,
389-395.

A00025
March 23-26, 2010
51
Automatic Measurement of Plant Growth Using Region
Growing Method

S. Chuai-Areee
1,C
, S. Siripant
2
and W. Jaeger
3

1
Department of Mathematics and Computer Science, Faculty of Science and Technology, Prince of
Songkla University, Pattani Campus, 181, Rusamilae, Muang, Pattani, 94000, Thailand
2
Advanced Virtual and Intelligent Computing (AVIC), Department of Mathematics, Faculty of Science,
Chulalongkorn University, Bangkok, 10330, Thailand
3
Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Im Neuenheimer
Feld 368, Heidelberg 69120 Germany
E-mail: csomporn@bunga.pn.psu.ac.th; Fax: 073-312179; Tel. 080-4261112

ABSTRACT
Measurement of plant growth is very important for inverstigating plant growth
function. Each plant has own growths properties in time. Image processing methods
are applied to measure growth properties from image sequence of experiments. This
paper proposes an automatic method using image processing for plant growth
measurement. The algorithms so-called region growing method (RGM) and volume
growing method (VGM) are used for noise removal. The overlapped leaves area is
considered for leaf area measurement. The software in this paper is implemented using
Delphi programming for user friendly interactive software and OpenGL library for 3D
graphic visualization. The final comparisons of leaf area from actual leaf measurement
and our algorithms are illustrated. This algorithm can be applied for other plant growth
for further studies.

Keywords: Plant Growth, Image Processing, Region Growth Method, Automatic
Growth Measurement.

1. INTRODUCTION
Plant photosynthesis is the important process for plant growth which mostly occurred at
leaf area. To measure the plant growth, the leaf area plays an important role for biomass
growth. Leaf area can be calculated by leaf area equipment or scanner, but it is destructive
method. Interesting problem for leaf growth segmentation is leaves overlapping phenomenon
since younger leaves are over the older leaves from the top view image. Liu and Sclaroff
presented region segmentation via deformable model-guided split and merge for leaf
segmentation from a compound leaves [4]. Lu et.al. proposed a method of using segmentation
of color images and shape factors with frontier rates to identify onion and weeds in field [6].
The overlapping leaf can be subtracted from each other, but the below leaf regions are
separated. Manh et.al. used deformable templates for weed leaf image segmentation [7]. The
segmentation used a priori knowledge to the object searched.
A method for spectral band selection and testing of edge-subtraction leaf segmentation
was introduced by Noble and Brown [8]. It used a combination of a vegetation mask and an
edge strength image that was calculated for a band that provided contrast for leaf-leaf
boundaries. In this paper, we propose an algorithm for leaf growth measurement using
webcam which is set up for the fixed position and measured leaf growth. Leaf segmentation
process is proceeded by using the algorithm from [1,2]. In this experiment we used
Arabidopsis thaliana [5] as a case study.
The paper is organized as follows: in section theory and related works, experimental on
leaf surface segmentation, results and discussion. Finally, the conclusion and further work.

A00025
March 23-26, 2010
52
Flow diagram of algorithm for leaf growth measurement (ALGM) is given in Figure 1.
The process of growth measurement starts to read list of images ordered by time, then the
region growing method is applied. Segmented regions from all images are combined over
time, then the 3D information is reconstructed. There are some noises in 3D volume
combination. Noise removal in 3D volume is used to eliminate noise region. 2D segmentation
is improved after 3D noise removal. Leaf growth measurement provides growth data from
leaf area for each time step. The growth curve is plotted and parameter estimation for sigmoid
function is optimized. Finally, the segmented images and growth function with parameter
values are provided for these plant growth properties.

Figure 1. Flow diagram of algorithm for leaf growth measurement in time.

3. EXPERIMENTAL ON LEAF SURFACE SEGMENTATION
This section describes a methods for leaf and plant segmentation. We define two
parameters for segmenting the leaf region from the input image namely Ldv (Leaf Color
Difference Value) and Lcv (Leaf Color Value).

Figure 2. Example of input plant growth images for every 20 frames (16 hours 15 minutes),
the video was done by Dr. Nick Kaplinsky (Swarthmore College, PA).

A00025
March 23-26, 2010
53
Leaf Area Segmentation by Color Difference and Color Value
Let I be a set of input images with a width W and a height H, P be a set of pixels in I (P e
I), B be a set of background pixels, L be a set of leaf pixels, and p
i,j
be a pixel in column i and
row j. The pixel p
i,j
consists of four elements namely red (R), green (G), blue (B) for color
image and gray (Y). The description of each set is given in the following equations (1).

L B P
Ldv
Lcv Lcv Y Ldv B G Ldv R G P p L
Y B G R Y B G R p
H j W i p p P
j i j i j i j i j i
j i j i j i j i j i j i j i j i j i
j i j i
=
s s .
s s . > . > . > e =
e =
s s . s s . s s =
)} 255 0 (
) 255 0 ( ) ( ) ( ) ( | {
} 255 ,..., 2 , 1 , 0 { , , , }, , , , {
)} 1 ( ) 1 ( ) 255 0 ( | {
, , , , ,
, , , , , , , , ,
, ,
(1)

The pixel p
i,j
in color image can be transformed to gray-scale (Y
i,j
) by the following
equation (2).

} 255 ,..., 2 , 1 , 0 { ), 114 . 0 587 . 0 299 . 0 (
, , , , ,
e + + =
j i j i j i j i j i
Y B G R Round Y (2)

The gray-scale is used to condition all pixels in P. Each pixel has red, green, blue and gray
channel. The gray value and the different values between red-green, green-blue are
conditioned by the specified parameters for checking the group of leaf pixels.

Figure 3. The illustration of two parameters, Ldv and Lcv.

Algorithm for checking leaf pixels
For checking all pixels p
i,j
in P, the different values between red and green, green and
blue are bounded by the value of Ldv. The gray-scale value is greater than or equal to the
parameter Lcv. The parameters Ldv and Lcv are shown in Figure 3. If these conditions are
true, then the current pixel p
i,j
is satisfied to be a leaf pixel in L. The algorithm is given below.

Parameter Estimation for Sigmoid Growth Function
To approximate the growth function of plant growth, the sigmoid function is appropriate
for a growth model. It is given in the following equation.
For all pixels do {
Calculate gray values Y
i,j
from P
i,j
with equation (2)
Define two parameters : Ldv and Lcv
Pixel is leaf if all following conditions are true :
(G
i,j
-R
i,j
> Ldv) and
(G
i,j
-B
i,j
> Ldv) and
(Y
i,j
> Lcv)
}
A00025
March 23-26, 2010
54

) (
) 81 ln(
1
) (
t T
t
e
U
t G
A
+
= (3)
where G(t) is growth value at time t, U is a maximum growth value, At is parameter of time
difference, T is the time when G(t) = U/2 and t is time for dependent variable.
At each time step, the leaf area is calculated for its growth value (t
i
, G(t
i
)). All data are
approximated to optimize the three parameters, U, At and T. In this paper, we use multi-
logistic growth function for more complicated growth properties. The multi-logistic growth
function is

+
=
=
A
N
i t
i
T
i
t
i
e
U
t G
1 ) (
) 81 ln(
1
) ( (4)
where N is the number of logistic pulses. The parameter estimation for this multi-logistic
function is calculated using Levenberg-Marquardt algorithm (see [3]).

This section shows some experiment of leaf segmentation and growth measurement
algorithm and their results. Figure 4 shows single leaf segmentation using mouse-click from
user. The leaf area is calculated automatically when leaf is selected corresponding to its scale
in the image.

Figure 4. Interactive 2D automated segmentation using region growing method with
condition in algorithm by graphical user interface (mouse click), the algorithm calculates leaf
area automatically.
Figure 5 expresses the software for plant growth measurement and its growth behavior
without noise removal processing. The leaf region is filled by green and the time step is based
on the sequence of image.

A00025
March 23-26, 2010
55

Figure 5. Segmented leaf area with Ldv=20, Lcv=193 and its growth behavior before noise
removal.

Leaf Area Segmentation by Region Growing Method in 2D and 3D
For all input sequence images, we combine all frames together to get 3D information
from 2D slice segmentation. The growth step of Arabidopsis thaliana is given in Figure 6 (1
st

row), the segmented plant area is shown in Figure 6 (2
nd
row) and 3D combination from 2D
segmented region is illustrated in Figure 6 (3
rd
row).

Figure 6. Segmented leaf area, input image (1
st
row), segmented leaf area by Ldv=20,
Lcv=128 ( 2
nd
row), and 3D segmented leaf area with noise (3
rd
row), images shown frame
number 100, 200, 300, 400 and 500 from left to right, respectively.
To combine all segmented region of plant at each time step, Figure 7 shows the 3D plot of
all segmented region in time axis in the vertical direction. The noise voxels are connected.
Some of them are connected to plant volume, but some noise voxels are not connected to the
plant volume. We need to remove those noise voxels which are not connected to the plant
volume. We applied the region growing method in 3D space so-called volume growing
method (see [3]). After noise removal processing, noise voxels are eliminated from whole
plant as shown in Figure 8. Figure 9 shows the comparison between plant growth without and
with noise removal processing in different time steps.
A00025
March 23-26, 2010
56

Figure 7. 3D Segmented leaf area composition with time axis from t=t
0
to t=t
n
with noise.

Figure 8. 3D Segmented leaf area composition with noise (left), noise removal (middle), and
segmented 3D composition without noise (right).

Figure 9. 3D Segmented leaf area before noise removal (1
st
row), and after noise removal (2
nd

row), images shown frame number 100, 200, 300, 400, and 500 from left to right,
respectively.

After finishing noise removal processing, we apply all information from segmented 3D
volume of plant region at each time step to segment the plant region in 2D input images.
Noise regions in each image are reduced. Segmented leaf area after noise removal is shown in
Figure 10.
A00025
March 23-26, 2010
57

Figure 10. 3D Segmented leaf area after noise removal, input image (1
st
row), segmented leaf
area by Ldv=20, Lcv=128 with noise (2
nd
row), and segmented leaf area without noise (3
rd

row) after noise removal by volume growing method, images shown frame number 100, 200,
300, 400, and 500 from left to right, respectively.
Growth function
Measurement data are automatically collected from segmented leaf area at each time step.
Curve is fitted and approximated by parameter estimation using Levenberg-Marquardt
algorithm to optimize the parameter value from equation (4). Figure 11 shows the growth
measurement of leaf area plotted against time step. This growth behavior can be described by
the multi-logistic function with 2-pulses. Figure 12 expresses the comparison between the
collected data and fitted model using LM-2pulses.

Figure 11. Growth curve from leaf growth measurement, leaf area (square cm.) against time
step (t=1 to t=570).

In this experiment, the measurement data can be approximated by using parameter
estimation to get the 6 parameters from equation (4) for 2 pulses multi-logistic growth
function in Table 1.

Table 1. Parameter values from parameter estimation using LM-method.

i
th
pulse U[i] At[i] T[i]
1 453.45399 6657.33153 5846.72917
2 240.77485 2411.09676 8029.35474
A00025
March 23-26, 2010
58

Figure 12. Growth function approximation using multi-logistic parameter estimation.

5. CONCLUSION AND FURTHER WORK
This paper has proposed a methodology for measuring the leaf and plant growth from
observed plant by capturing input images. The ALGM algorithm provides the possibility to do
automatically plant growth measurement. Algorithm allows user not only to segment and
measure leaf area from input image, but also provides the leaf and plant growth measurement
from sequential captured images based on time. This prototype can be applied for other plant
growth, cell growth or other organism growth measurement. All collected data can be fitted
using parameter estimation based on multi-logistic growth function with N-pulses. All
parameter values can be used for each plant species for other classification.

REFERENCES
1. Bock, H. G., Chuai-Aree, S., Jaeger, W., Kanbua, W., Kroemker, S., and Siripant, S., 3D
cloud and storm reconstruction from satellite image, Proc. of Intern. Conf. on High
Performance Scientific Computing (HPSCHanoi 2006), March 6-10, Hanoi, Vietnam,
2006.
2. Bock, H. G., Chuai-Aree, S., Jaeger, W., and Siripant, S., Inverse problem of
Lindenmayer systems on branching structures, Proc. of Intern. Conf. on High
Performance Scientific Computing (HPSCHanoi2006), March 6-10, Hanoi, Vietnam,
2006.
3. Chuai-Aree, S., Modeling, Simulation and Visualization of Plant Growth. PhD
Dissertation, University of Heidelberg, Germany, 2009.
4. Liu, L. and Scharoff, S., Region segmentation via deformable model-guided split and
merge, Boston University Computer Science Technical Report No. 2000-24, December,
2000.
5. Boyes, D. C., et al.: Growth stagebased phenotypic analysis of arabidopsis, The Plant
Cell, 13, 14991510, 2001.
6. Lu, J., Gouton, P., Guillemin, J. P., Ma, C., Coquille, J. C., A method of using
segmentation of color images and shape factors with frontier rates to identify onion and
weeds in field, GRETSI, Groupe d'Etudes du Traitement du Signal et des Images, 2001.
7. Manh, A. G., Rabatel, G., Assemat, L., and Aldon, M. J., Weed Leaf Image
Segmentation by Deformable Templates, J. agric. Engng. Res. 80 (2), 139-146, 2001.
8. Noble, S.D., Brown, R.B.: Spectral band selection and testing of edge-subtraction leaf
segmentation. Canadian Biosystems Engineering/Le g nie des biosyst mes au Canada 50,
2008.
A00027
March 23-26, 2010
59
The estimation of SNP-SNP interaction in pooled DNA

W. Engchuan
1
, S. Prom-on
2C
, J. H. Chan
3
and A. Meechai
4

1
Bioinformatics Program, King Mongkuts University of Technology Thonburi,Bangkok, Thailand
2
Computer Engineering Department, Faculty of Engineering,
King Mongkuts University of Technology Thonburi, Bangkok, Thailand
3
School of Information Technology, King Mongkuts University of Technology Thonburi, Bangkok,
Thailand
4
Chemical Engineering Department, King Mongkuts University of Technology Thonburi, Bangkok,
Thailand
{ Corresponding Author, Phone: +66-2470-9081; Fax: +66-2872-5050; E-mail: santitham
@cpe.kmutt.ac.th}

ABSTRACT
Genome-wide association study or GWAS is the powerful method for studying the
genetic association, for the whole genome. Currently, the most of GWAS projects focus
on identifying the SNP-SNP interactions or epistasis, which involve in disease
development. However, the cost of the large scale GWAS is still prohibitive, study with
limited budget and limited samples cannot realistically be studied using the original
GWAS. By pooling DNA materials in each subject group together, the cost of the study
is substantially reduced as a factor of the number of subjects used in each pool. The
limitation of pooled DNA technique is that it can provide only allele data instead of
genotype data, thus this research aims to extract the genotype information and diplotype
configurations from pooled DNA sample, which appropriate for study of SNP-SNP
interaction or epistasis. The main idea of methodologies, which use for extract both
genotype information and diplotype configurations, is based on Monte Carlo
Simulation. The result shows that these methodologies can find SNP-pairs, which can
classify disease-normal persons with acceptable accuracy (75% prediction accuracy).
So, applying this methodology for studying epistasis in pooled DNA sample will be the
alternative way, which can identify disease related SNP-SNP interaction at low cost.

Keywords: GWAS, Pooled DNA, SNP-SNP interaction, Epistasis, MDR.

1. INTRODUCTION
DNA pooling technique offers a means to economically conduct a genome-wide scan.
This technique has been used for initial scanning of plausible causative genetic loci [1-5].
Since the availability of high-throughput microarray technology, a single nucleotide
polymorphism (SNP) has become the main targets for genome-wide scan. However, the cost
of the technology still makes the large-scale SNP microarray study prohibitive.
By pooling DNA materials in each subject group together, the cost of the study is
substantially reduced as a factor of the number of subjects used in each pool. The
investigators, instead of aiming to find exact genotyping frequencies, have to estimate allele
frequencies of the whole subject group and compare the difference of these allele frequencies
between case and control groups [6-8]. However, DNA pooling technique has a few
limitations. First, pooling DNA materials together introduces an additional error [9-10]. The
effect of such errors can be alleviated with a careful design of experiment to select the optimal
number of either technical and/or biological pooling replicates. Second, the loss of genotype
information limits the use of conventional multi-loci analysis techniques. This imposes a
serious limitation on the DNA pooling technique when used in the genome-wide association
study since the investigator cannot directly identify SNP interactions, or epistasis, from the
allele data.
A00027
March 23-26, 2010
60

Previous works in developing computational methods for multi-loci analysis for DNA
pooling technique have been focusing on the estimation of haplotype frequencies [11-14]. Ito
et al. [11] and Yang et al. [12], proposed a method, based on the expectation-maximization
(EM) algorithm, to estimate haplotype frequencies and linkage disequilibrium coefficient
from pooled DNA data. Wang et al. [13] examined the cost-effectiveness of DNA pooling in
haplotype frequency estimation. Kirkpatrick et al. [14] developed HAPLOPOOL that utilizes
a perfect phylogeny model and an EM algorithm to estimate the haplotype frequencies from
pooled data. Although these efforts are beneficial to the development of computation
techniques for DNA pooling, they are not directly designed for the study of epistasis because
mostly they were developed based on the assumption of Hardy-Weinberg equilibrium.
There are a number of methods for detecting epistasis in the conventional association
study [15-21]. The common goal of these methods is to detect SNP interactions while
accounting for the sparseness of the data. We can generally divide these methods into
parametric and non-parametric. The parametric methods rely on the basic assumption that
there is a model underlying SNP interactions. Examples are the B statistic in BEAM [15], and
the
2
test used in HFCC [16], MegaSNPHunter [17], and epiForest [18]. The non-parametric
methods do not rely on the model. Examples are MDR [19-20], the Classification And
Regression Tree (CART) used in PIA v 2.0 [21] and MegaSNPHunter [17]. The non-
parametric methods are suitable when the sample size is small. This is particularly true in the
case of DNA pooling because, in this particular case, the minimum number of biological
replicates is often used to minimize the cost of the study.
This paper presents the development of MDRpool which is a method for detecting
epistasis from the data of a pooled-DNA genome-wide association study. MDRpool is
inspired by MDR [19-20] but extended by incorporating stochastic simulation so that it can
predict epistasis from a set of allele information. We demonstrate our method by simulating
DNA pooling process using the genotype data with known epistatic interactions from the
original MDR and then applying MDRpool to detect these epistatic interactions from pooled-
DNA allele data.

3. METHODS
3.1. Pooling Simulation
To test our method, we simulated
DNA pooling so that the epistasis results
can be compared with those estimated
from the original genotype data. The
individual samples in each subject group
are randomly divided into subgroups.
Samples in each subgroup are then used
to simulate a DNA pool, which, in each
pool, contains approximately 2-10
individual samples. The pooling
simulation process is applied to genotype
data of samples in each subgroup. Let x
A
and x
B
be the number of allele A and B, respectively.
Let x
AA
, x
AB
, and x
BB
be the number of genotype AA, AB, and BB, respectively. The
conversion from genotype to allele data can be expressed by the following equations,
A AA AB B BB AB
2 , 2 x x x x x x = + = + (1)
The allele data are then added with the pooling error (approximately 5-10%) according to
the hypothesis that DNA pooling introduces certain errors in many steps such as during
hybridization, microarray scanning, etc. The last step of pooling simulation is to eliminate the
absolute allele information since in DNA pooling. We can only detect the relative allele
frequency of each allele. This relative frequency can be found through the relative allele
Fig. 1. The loss of genotype configuration in
DNA pooling. After pooling the individual
genotype data, the pooled data show similar allele
number for both allele A and B. This implies two
possible genotype configurations.
A00027
March 23-26, 2010
61
signal (RAS) scores that are usually obtained from a pooling-based study. Let r
A
and r
B

denote relative allele frequencies of allele A and B, respectively. The relative allele frequency
can be calculated by the following equations,
A B
,
A B
A B A B
x x
r r
x x x x
= =
+ +
(2)
3.2. Genotype Estimation
Pooling-based allele data usually come in forms of RAS scores [22-23]. The RAS score
can provide only the allele information, which is unsuitable for studying of genetic
interactions of multiple genetic loci. This is because genotype information is lost in DNA
pooling process (see Fig. 1 for simplified demonstration). To estimate the epistatic
interactions from pooled data, it is necessary to recover possible genotype configuration from
allele information. The stochastic simulation is employed to estimate the possible genotype
configurations from the pooled data. Monte Carlo simulation is used in this study. The
genotype configuration of each individual in each sub-pool is estimated from the probability
and the availability of alleles A and B. The input of genotype estimation is the number of
allele of each SNP,
A A B B
, x r N x r N = = % % (3)
where
A
x% and
B
x% are the estimated number of alleles. The simulation process then
iteratively and randomly assigns genotype based on the marginal probabilities of genotype
AA, BB, and AB, respectively. In each iteration, the marginal probabilities of each genotype
are recalculated and used for assigning genotype.
( )
AA A BB B AB A B
2, 2, min , p x p x p x x = = = % % % % (4)
where p
AA
, p
BB
, and p
AB
are marginal probabilities of genotypes AA, BB, and AB,
respectively. The number of alleles A and B is reduced accordingly when each genotype is
assigned. When the combined number of alleles A and B is less than two, the termination
condition is reached. We then compute the relative genotype frequency from the simulated
genotype information.
3.3. Diplotype Estimation and Two-Locus Analysis
In this study, the basic idea for identifying epistatic interactions is based-on MDR.
Traditionally, epistasis study requires the data in forms of genotype combinations of two or
more SNPs. This combination is also known as a diplotype. From the genotype estimation
step, the relative genotype frequencies of each SNP are obtained. Then, these relative
genotype frequencies are used as the probability for reconstructing the diplotype of each
individual.
To identify SNP interactions that relate with the phenotype outcome or disease status, a
method that can test interactions between two or more factors simultaneously is required [25].
MDR is a non-parametric method that compares the risk of having diseases in each genotype
combination [19-20]. First, the reconstructed individual data are randomly divided for a part
of cross-validation. For each round in cross-validation, two SNPs are selected at a time for
two-locus analysis. The two SNPs and their three genotypes are represented in a 3-by-3
contingency table. The ratio of cases to controls is computed in each cell in the table. Each
cell is labeled as "high-risk" if the number of cases exceeds the number of controls.
Otherwise, it is labeled as "low-risk. This contingency table is used as the model for
predicting the disease status. The process is then repeated for all possible SNP combinations
and the candidate model. The model with the fewest misclassified interactions in the training
set is chosen as the best model. The classification from the best model is used to predict
disease status for the remaining subset. These candidate model selection and classification
processes are circularly repeated as parts of the cross validation process. Finally, the fewest
misclassification interactions in both training and validation sets will be selected as the best
epistatic interaction.

A00027
March 23-26, 2010
62
4. RESULTS
The simulated case-control genotype data were generated for pooling simulation and
evaluation of the method. There are 200 individuals for each case and control samples. Each
sample consists of 50 candidate SNPs (with identification number S1-S50). The simulated
genotype data are equally divided into subgroups as sub-pools (2-10 individuals per
subgroups). We varied the number of individuals of each subgroup to evaluate the sensitivity
of the method to the degree of genotype information loss. At the 5% pooling error rate, we
obtained 80 simulated pools when using 5 samples per pool. The simulated relative allele
frequency was calculated for each pool. These relative frequencies were then used for
estimating the relative genotype frequencies. The result of relative genotype frequencies is
represented by the scatter plots between actual relative genotype frequencies and the
estimated relative genotype frequencies (Fig. 2). The variation mostly occurs on the
heterozygous pairs. The leftmost and rightmost scatter plots of Fig. 2 represent the genotypes
AA and BB, respectively. The results show that both of them can be correctly estimated
(~96% classification accuracy). The middle plot of Fig. 2 represents the genotype AB, which
can be correctly estimated with a ~92% classification accuracy. The estimated relative
genotype data obtained from this process were then used as the marginal probabilities of
diplotype reconstruction. After diplotype was reconstructed, they were used in two-locus
analysis. In this work we tested the sensitivity of our method by doing the two-locus analysis
with different number of individuals per sub-pool (2, 5 and 10 samples per pool). The result
of each analysis was compared to the result of two-locus analysis of genotype-based MDR.
For reference, MDR can find the best interacted SNP pair (S5 and S11) and also build the
prediction model based on S5-S11 with 80% classification accuracy.
For the 5% pooling error, two-locus analysis was executed in different number of samples
per pool. Each different number of samples per pool was repeated a hundred times to
preliminarily test the consistency. The result indicates that our method can correctly identify
the SNP pair for constructing the disease prediction model. The prediction accuracy of each
model is also close to the model that is constructed from actual genotype-based MDR (Table
1). The observed decrement of accuracy is due to the loss of information in the DNA pooling
process. It is interesting that the best interacted SNPs pair is often pick as a prediction model
of the simulated pooled data from 2 and 5 samples per pool and the accuracy of the chosen
model closely approximates the MDR prediction model of the actual genotype-based data. As
the number of samples per pool increases, the loss of genotype information causes the
classification accuracy to deteriorate.

Fig. 2. Scatter plot between actual relative genotype frequencies and
estimated relative genotype frequencies.
A00027
March 23-26, 2010
63

5. DISCUSSIONS
MDRpool is a MDR-based method which implements a stochastic simulation approach to
estimate the genotype data from pooled DNA samples. It reconstructs the approximated
diplotype configurations of DNA pools, based on their estimated relative genotype
frequencies. These allow investigators to perform traditional MDR on pooled data. The results
of MDRpool show that we can correctly identify the best candidate SNPs pair from a
simulated pool for constructing the disease prediction model. The outcome prediction model
can classify case and control individuals with approximately 75% classification accuracy,
which is close to the traditional MDR that identified the best SNPs pair with approximately
80% classification prediction accuracy. Thus, analysis of the epistasis in pooled samples with
MDRpool could reduce the cost of study by reducing both the quantity of DNA samples and
the number of microarray usages.
However, the identification ability of the best SNPs pair decreases, although statistically
insignificant, when the number of samples per pool increases. This is a limitation of
MDRpool, which may be caused by the loss of genotype information in DNA pooling
process. So, the number of samples per pool will be an issue that should be taken into
consideration when performing an experimental design to study epistasis in pooled DNA.
Here, the results of sensitivity testing show that by defining the number of samples as 2-5
samples per pool, an acceptable accuracy can be obtained.

6. CONCLUSION
In conclusion, DNA pooling technique can be used to substantially reduce the cost of the
genome-wide association study. It only allows researchers to investigate the allele data. This,
therefore, makes it difficult to extract the SNP interaction, or epistasis. This paper presents a
novel method for detecting epistasis in the pooled-DNA, called MDRpool. MDRpool
generates the genotype combination using Monte Carlo simulation method and summarizing
the converged SNP interactions by MDR. MDRpool can correctly estimate genotypic
frequencies, reconstruct the diplotype configuration and also identify the best epistasis model
from pooled DNA data with a prediction accuracy of around 75 percent. MDRpool thus
provides a means to conduct epistasis study in the pooled-DNA genome-wide association
study.
Table 1. The result of MDRpool at 5% pooling error(Run for 100 Times)

#Samples/pool Best SNP pair
#pick up
(times)
%accuracy of
prediction model
2 S5S11 92 76.35
S11S12 3 68
S11S30 2 67.63
S11S20 1 71.58
Average accuracy 75.65
5 S5S11 45 74.37
S11S33 6 65.25
S11S21 5 66.2
Other 44 66.14
10 S5S11 49 72
S11S23 5 67.45
S11S17 4 66.13
S11S21 3 65.57
A00027
March 23-26, 2010
64
REFERENCES
1. Barcellos , L. F., et al. Am. J. Hum. Genet., Sep 1997, 61, no. 3, pp. 734-747.
2. Pearson, J. V., et al. Am. J. Hum. Genet., January 2007, 80, no. 1, pp. 126-139.
3. Steer , S., et al., Genes Immun. , December 2006, 8, pp. 57-68.
4. McKnight , A. J., et al., J. Am. Soc. Nephrol., February 2006, 17, 831-836.
5. Macgregor, S., et al., Nucleic Acids Res., February 2008, 36, no. 6, e35.
6. Shifman, S., Pisant-Shalom, A., Yakir, B., and Darvasi, A., Mol. Cell. Probes, December
2002, 16, no. 6, pp. 429-434.
7. Kirov, G., et al., BMC Genomics, February 2006, 7, Article 27.
8. Wilkening, S., et al., BMC Genomics, March 2007, 8, Article 77.
9. Macgregor, S., Eur. J. Hum. Genet., January 2007, 15, no. 4, pp. 501-504,.
10. Jawaid, A. and Sham, P., Ann. Hum. Genet., January 2009, 73, no. 1, pp. 118-124,.
11. Ito, T., et al., Am. J. Hum. Genet., February 2003, 72, no.2, pp. 384-398.
12. Wang, S., Kidd, K. K., and Zhao, H., Genet. Epidemiol., January 2003, 24, no. 1, pp. 74-
82.
13. Yang, Y., et al., PNAS, June 2003, 100, no. 12, pp. 7225-7230.
14. Kirkpatrick, B., Armendariz, C. S., Karp, R. M., and Halperin, E., Bioinformatics,
September 2007, 23, no. 22, pp. 3048-3055.
15. Zhang, Y. and Liu, J. S., Nat. Genet., August 2007, 39, no. 9, pp. 1167-1173.
16. Gayn, J., et al., BMC Bioinformatics, July 2008, 9, Article 360.
17. Wan, X., et al., BMC Bioinformatics, January 2009, 10, Article 13.
18. Jiang, R., Tang, W., Wu, X., and Fu, W., BMC Bioinformatics, 10(Suppl 1), S65.
19. Hahn, L. W., Ritchie, M. D., and Moore, J. H., Bioinformatics, September 2002, 19, no. 3,
pp. 376-382.
20. Ritchie, M. D., et al., Am. J. Hum. Genets, June 2001, 69, pp. 138-147.
21. Mechanic, L. E., et al., BMC Bioinformatics, March 2008, 9, Article 146.
22. Greene, C. S., Penrod, N. M., Kiralis, J., and Moore, J. H., BioData Mining, 2, Article 5.
23. Pearson J.V., et al., The American Society of Human Genetics, 80, pp. 126-139.
24. Fisher, P.J., et al., Human Molecular Genetics, 8, pp. 915-922.
25. Coffey, C., et al, BMC Bioinformatics, 2004, 5, p. 49.

ACKNOWLEDGMENTS
W. Engchuan would like to thank National Center for Genetic Engineering and
Biotechnology, Thailand (BIOTEC), and King Mongkuts University of Technology
Thonburi for a scholarship.
A00028
March 23-26, 2010
65
Effects of RNA Quality on Gene Expression Functional
Profiles

P. Treepong
1
, S. Prom-on
2, C
, J. H. Chan
3
, A. Meechai
4
, and N. Hirankarn
5

1
Bioinformatics and Systems Biology Program, King Mongkuts University of Technology Thonburi,
Thailand
2
Computer Engineering Department, Faculty of Engineering, King Mongkuts University of
Technology Thonburi, Thailand
3
School of Information Technology, King Mongkuts University of Technology Thonburi, Thailand
4
Chemical Engineering Department, King Mongkuts University of Technology Thonburi, Thailand
5
Department of Microbiology, Faculty of Medicine, Chulalongkorn University, Thailand
C
E-mail: santitham@cpe.kmutt.ac.th; Fax: +66-2872-5050; Phone: +66-2470-9081

ABSTRACT
RNA quality is a critical issue for gene expression study. Contaminated RNA with
other DNA or any protein will distort the expression. Analysis of distorted expression
may lead to erroneous results. Furthermore, degraded RNA may result in improper
hybridization. RNA quality affects the outcome of microarray analysis. Furthermore,
using functional analysis that guide to obviously understand in the gene expression
mechanism. This work aims to study the effects of RNA quality on the gene expression
profiles and functions using the correlation analysis approach. The purpose of this
work is to reveal the plausible relationship between RNA quality and gene expression
profiles in each gene set. The results are expected to show the relationship between
them in both pre-chip and post-chip. The results show that the strongly associated
functions are cell development and apoptosis. In contrast, we found functions groups
that associate with RNA quality in post-chip. We can imply that the post-chip specific
related function may be come from the error of the microarray experiment.
Accordingly, the analysis step of microarray experiment should be concerned about
functions which RNA quality dependency in the same level of RNA quality for
avoiding of the error analysis.

Keywords: RNA quality, Microarray analysis, Gene set.

1. INTRODUCTION
Microarray is a powerful technology for research in molecular biology and biomedical
science. It allows researchers to monitor the expression of thousands of genes simultaneously
by means of hybridization between probes and a pool of cDNA or DNA targets. The
microarray technology has facilitated research and development in gene discovery,
pharmacogenomics, disease diagnostics, drug target identification and toxicology research
[1].
Quality of RNA and DNA sample is critical in obtaining reliable results from microarray
experiments [2]. In gene expression study, RNA quality strongly influences the outcome of
the experiment. For example, RNA with poor quality which may result from the
contamination of other DNA, or protein, will have distorted expression. Performing analysis
of distorted expression may lead to erroneous results. In other cases, degraded RNA may
result in incomplete synthesis of the cDNA [3]. Thus purity and integrity of RNA are the
important factors of RNA quality. The level of RNA integrity, which is the length of mRNA
and its concentration, depends on the level of RNA degradation which is related to the
physiological state of the tissue at the point of removal and the delay between time of death
and tissue collection [4]. The level of RNA purity depends on the amount of contaminants
A00028
March 23-26, 2010
66
from other DNA or protein. These contaminants may come from RNA extraction, probe
labeling and hybridization.
There have been only few studies investigating the effects of RNA quality that may
influence the outcome of microarray experiments. The study of post-mortem brain tissues in
mouse found that the post-mortem interval (PMI) is associated with RNA degradation which
is measured by the 28S/18S rRNA ratio and its effects on transcript levels of mRNA [5]. This
implies that RNA degradation is associated with transcript levels of mRNA. Another study of
post-mortem brain tissues in human showed that RNA quality is related to the quality of data
from microarray analysis [6]. This work suggested that testing for RNA quality dependency
should take into account the step of data preprocessing. This study measured the quality of
RNA from 5/3 of beta actin ratio and manipulated gene expression data from Affymetrix
GeneChip. Moreover, there are also other studies that focus on the association between gene
expression and RNA quality, which are measured from different metrics such as 28s/18s
rRNA ratio, RIN and degradation factor. These results corroborate the association between
gene expression and RNA quality [7,8].
However, as RNA quality affects gene expression patterns, it may influence gene
expression results in certain functional patterns. This paper presents the results of the
correlation study between pre-chip and post-chip RNA quality indexes and their functionally
influenced gene expression pattern. This may lead to understanding in the RNA degradation
mechanism in microarray environments.

2. THEORY
2.1. RNA quality measurement metric
There are several methods for measuring the quality of RNA, which may be classified into
either pre-chip or post-chip categories. Pre-chip methods are designed for assessing RNA
quality before hybridization. It consists of many tools such as the 28s/18s rRNA ratio, which
is used to determine the RNA integrity, the optical density (OD) 260/280 ratio and OD
260/230 ratio, which represent the RNA purity [9], and also RNA Integrity Number (RIN),
which provides a reliable prediction of RNA integrity as a numbering system from 1 to 10 to
indicate the most degraded to the most intact [10]. For post-chip methods, they are designed
for assessing RNA quality after the hybridization step. It can be determined by the
degradation of mRNA, which starts at 3 to 5 [11]. Accordingly, the 3/5 ratio of
housekeeping genes such as GAPDH and ACTB [12] is used as post-chip RNA quality, which
is a reference to the RNA integrity.
2.2. Correlation method
Correlation is a statistical method for measuring the relationship between two variables.
The correlation analysis approaches are generally divided into parametric and nonparametric
approaches. Parametric approaches such as the Pearson product-moment correlation
coefficient, (r) represent the strength of the linear relationship between two variables. It is
also appropriate with variables in form of interval or ratio scale. It can handle data with large
sample size and suitable for data without outlier. For nonparametric approaches such as the
Spearman rank-order coefficient (r
s
) that determine the relationship between two variables in
terms of the ranking of each case within each variable. It is used with data which are in
the form of ranks. It can handle the data with the outlier and suitable for small sample size
data [13].

A00028
March 23-26, 2010
67
3. EXPERIMENTAL
3.1. Sample
In this work, the gene expression data contain 22,184 probes from Illumina HumanRef-8
microarray platform. This data were obtained from Lupus Research Unit, Faculty of
Medicine, Chulalongkorn University. There are 30 samples from Illumina microarray chips
but, in this work, we use 18 samples of them which contain pre-chip value (RIN).
In post-chip, we use expression level of housekeeping gene (ACTB), which is performed by
dividing the 3 probe by the 5 probe of ACTB according to the RNA degradation. If mRNA is
degraded, post-chip value is close to 0 whereas the intact mRNA has post-chip value is equal
to 1.
3.2. Workflow
This study consists of 4 steps (see Figure 1). First, the analysis of correlation between pre-
chip and post-chip RNA quality is performed for revealing relationships between
measurement matrices. Moreover, this relationship is used to make a decision on
measurement metric selection. In the case that pre-chip measurement correlates with post-chip
measurement, we can simply use just one of them; otherwise we should use both of them to
analyze the correlation between RNA quality and gene expression.
Correlation analysis between RNA quality and gene expression is performed in both pre-
chip and post-chip. After this state is completed, we will get the list of pre-chip correlated
probes and the list of post-chip correlated probes.
After that, we use the expression data of pre-chip correlated probes to get the list of pre-
chip co-expressed gene by correlation analysis, and also get the list of post-chip co-expressed
gene with the expression data of post-chip correlated probes.
Then we cluster the pre-chip co-expressed genes and post-chip co-expressed genes into
appropriate functional group by DAVID functional annotation clustering tool. The correlated
functional groups will be obtained to analyze their mechanisms which involve in RNA
quality. Finally, the public databases and public researches are retrieved for interpretation the
mechanisms induce their relationships.
Correlation
between pre-
and post-chip
RNA quality
Relationship
between pre-
and post-chip
Correlation
between RNA
quality and gene
expression
Post-chip
correlated
probes
Pre-chip
correlated
probes
Correlation
between pre-
chip correlated
probes
The list of
pre-chip co-
expressed
genes
Correlation
between RNA
quality and gene
expression
Correlation
between post-
chip correlated
probes
The list of
post-chip co-
expressed
genes
Functional
clustering by
DAVID
Interpretation of
correlated
functional
groups
Public
databases
Correlated
Functional
Groups
Public
researches
Expression data
The relationships
of correlated
functional groups
Pre-chip
RNA quality
(RIN)
Post-chip
RNA quality
(3'/5' ACTB ratio)

Figure 2. Research workflow.

4.1. Measurement dependency
RNA quality measurement can be classified into pre-chip and post-chip. However we
cannot certainly identify their reflection. Thus, we analyze correlation between pre- and post-
chip for exposing their relationship. Based on hypothesis, if there is any correlation between a
pair of quality measurement indices, it would indicate that both of them reflect the same
occurrence of RNA quality so we can use only one index instead. If there is no correlation, it
A00028
March 23-26, 2010
68
would indicate that two quality measurements are independent from each other. Hence in
consideration of RNA quality, it is essential to
include both indices.
The scatter plot and correlation analysis of pre-
chip and post-chip quality measures indicate no
correlation between RIN and 3/5 ratio of ACTB
(Figure 2 and Table 1). For both Pearsons and
Spearmans correlation, both correlation values are
less than 0.5 which refer to the uncorrelated
measurement metrics. Thus, in this study, we will
use both pre-chip and post-chip quality
measurements in further investigations.
4.2. Effects of RNA quality on gene expression
In this section, we investigated the effects of
RNA quality, both pre-chip and post-chip, on the
gene expression and its functional profile.
We analyzed the data using both Pearsons and
Spearmans correlation methods in pre- and post-
chip environments. To ensure that all plausible
correlation between RNA quality and gene
expression are covered, we used the cut-off
correlation threshold at 0.5 for both Pearson and
Spearmans methods. This allows both moderate to
high correlation pair to pass the criteria. We found
789 probes correlated with pre-chip quality index
and 5,959 probes correlated with post-chip quality
index. The larger number of post-chip indicates the
effect of errors from hybridization. To consider the
overall effect, we merged the correlated probes in
pre- and post-chip together. There are totally 6,706
correlated probes, of which 42 of them can be
found in both pre- and post-chip (Figure 3).
4.3. Co-expression analysis
There are a large number of probes correlated
with the quality indices from analysis of the effects
of RNA quality on gene expression. Since genes
that share similar functions usually express in similar expression profiles. We therefore
identified these relationships by means of co-expression analysis.
Collections of genes that express in the similar pattern refer to the list of co-expressed
genes. Having co-expression network would help investigators identifying groups of genes
that are associated with the quality indices. In this paper, we constructed the co-expression
network by using correlation analysis of gene expressions. The nodes in the network are the
genes and the edges in the network are the correlation strength. There are 789 correlated
probes in pre-chip and 5,959 correlated probes in post-chip. Higher number of probes
indicates that there is certain deterioration in data quality.
We gained the lists of co-expressed gene in both pre- and post-chip, which are strongly co-
expressed. There are 467 genes and 1,075 relationships in the gene co-expression network in
pre-chip, while for post-chip; there are 1,143 probes and 4,546 relationships.
4.4. Functional clustering
To investigate the plausible biological mechanisms that may influence the changes in RNA
quality, we conducted the functional clustering analysis on gene sets that are highly co-
expressed. These gene sets are obtained from ranking the connectivity of nodes in the co-
expression network. By separating the functional analysis into pre- and post-chip, the
R
I
N
3'/5' ACTB ratio
3'/5' ACTB ratio vs. RIN
3'/5' ACTB ratio
R
I
N
3'/5' ACTB ratio vs. RIN
Figure 2.Scatter plot of 3/5ACTB ratio
and RIN. Each dot represents a pair of
3/5 ACTB ratio and RIN of each sample.
Figure 3.Correlated probes in the part of
RNA quality metric.
Pre-chip
Post-chip
747 42 5,917
Table 1.Correlation between 3'/5'ACTB
ratio and RIN.
Method Correlation
Pearson 0.006
Spearman 0.031

A00028
March 23-26, 2010
69
functional groups can reflect the factors of RNA quality which depend on each metric (pre-
and post-chip). The functional groups are obtained using DAVID functional annotation
clustering tool with classification stringency at medium level [14,15].
There are 43 functional groups, which consist of 6 pre-chip specific functional groups and
37 shared functional groups with post-chip. The shared functional groups indicate the
functional groups which actually depend on RNA quality. The pre-chip specific functional
groups point to the disappearance of functional groups in post-chip which is caused by the
hybridized error. For post-chip environment, there are 255 correlated functional groups. They
consist of the 186 post-chip specific functional groups, which are occurred from the error of
hybridization, and 69 functional groups that are shared with pre-chip.
Studying the correlated functional groups leads to identification of the cause of RNA
quality dependency through their mechanisms. For example of the shared functional groups in
pre-chip and post-chip are SPRY-associated and apoptosis. The SPRY-associated group
contains 15 terms which interact with the SPRY domain. Some genes are in this group have a
function in development and cell growth [16]. The relationship between cell growth and RNA
quality was described by Z. Darzynkiewicz et al. [17]. They investigated the level of RNA in
the progression of cell cycle from Chinese hamster ovary (CHO) cells and in mitogen-
stimulated human lymphocytes. The result of that research shows the decreasing of RNA
level, when the time in progression through G1 and S phase are increased. The assumption is
that level of RNA or RNA quality correlates with the progression of cell cycle which is a part
of cell growth. The Apoptosis group consists of 14 functional terms. They are negatively
correlated with RNA quality. Apoptosis is the process of degraded cell, which affects the
degradation of RNA. The above is supported by a previous work that found apoptosis induces
degradation of cytoplasmic mRNA and degradation of 28rRNA [18]. In addition, the specific
pre-chip functional groups which may be correlated with RNA quality but they disappear in
post-chip by error of hybridization. These functional groups mostly relate in development and
apoptosis. However, the functional groups from correlated error of hybridization are
impossible to correlate with RNA quality. For example, they contain the signaling of cancer
development and immune response that should express differently only between normal and
cancer. As same as, case of virus infection, the expression should express differently between
normal and virus infected cell.

5. CONCLUSION
Our result displays relationships between RNA quality and gene expression at functional
level which reasonably describes the mechanism of RNA quality impaction to correlated
functional group. We perform correlation analysis in both parametric and nonparametric
approaches with moderate correlation cut-off value and also, apply pre- and post-chip as RNA
quality measurement metrics to acquire all possible correlated genes. Conducting of co-
expression analysis supports for functional clustering in the same expression pattern. The
result also imply to the regarding of RNA quality in microarray analysis will increase the
reliability of result. There are many solutions to reach the reliable result. For examples,
considering the gene expressions at the same level of RNA quality, p-value adjustment for
reducing false positive and constructing of confident index of the p-value of correlated genes.

REFERENCES
1. Weeraratna, A.T., Nagel, J.E., de Mello-Coelho, V., and Taub, D.D., Journal of Clinical
Immunology, 2004, 24, 213-224.
2. Ye, S.Q., Bioinformatics: A Practical Approach, Chapman & Hall/CRC, 2007.
3. Gingrich, J., Rubio, T., and Karlak, C., Bio-Rad Tech note 5452, 2006.
A00028
March 23-26, 2010
70
4. Palmer, M., and Prediger, E., TechNotes. [Online]. Available:
http://www.ambion.com/techlib/tn/111/8.html
5. Catts, V.S., Fernandez, H.R., Taylor, J.M., Coulson, E.J., and Lutze-Mann, L.H.,
Molecular Brain Research, 2005, 138, 164-177.
6. Popova, T., Mennerich, D., Weith, A., and Quast, K., BMC Genomics, 2008, 9, 91.
7. Auer, H., Lyianarachchi, S., Newsom, D., Klisovic, M.I., Marcucci, U., and Kornacker,
K., Nat Genet, 2003, 35, 292-293.
8. Copois, V., Bibeau, F., Bascoul-Mollevi, C., Salvetat, N., Chalbos, P., Bareil, C., Candeil,
L., Fraslon, C., Conseiller, E., Granci, V., Mazire, P., Kramar, A., Ychou, M., Pau, B.,
Martineau, P., Molina, F., and Del Rio, M., Journal of Biotechnology, 2007, 127, 549-
559.
9. Genome Center Maastricht, RNA quality control [Online].
Available:http://www.biomedicalgenomics.org/RNA_quality_control
10. Schroeder, A., Mueller, O., Stocker, S., Salowsky, R., Leiber, M., Gassmann, M.,
Lightfoot, S., Menzel, W., Granzow, M., and Ragg, T., BMC Molecular Biology, 2006, 7,
3.
11. van Hoof, A., and Parker, R., Current Biology, 2002, 12, R285-R287.
12. Croner, R., Guenther, K., Foertsch, T., Siebenhaar, R., Brueckl, W., Stremmel, C.,
Hlubek, F., Hohenberger, W., and Reingruber, B., Journal of Laboratory and Clinical
Medicine, 2004, 143, 344-351.
13. Boslaugh, S., Watters, P.A., Sarah, B., and Paul, W., Statistics in a Nutshell: A Desktop
Quick Reference, O'Reilly Media, 2008.
14. Dennis, G., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C., and Lempicki,
R.A., Genome Biology, 2003, 4(5), P3.
15. Huang, D.W., Sherman, B.T., and Lempicki, R.A., Nature Protocols, 2009, 4, 44-57.
16. Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P.,
Das, U., Daugherty, L., Duquenne, L., Finn, R.D., Gough, J., Haft, D., Hulo, N., Kahn,
D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J.,
McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C.,
Quinn, A.F., Selengut, J.D., Sigrist, C.J.A., Thimma, M., Thomas, P.D., Valentin, F.,
Wilson, D., Wu, C.H., and Yeats, C., Nucl. Acids Res., 2009, 37, D211-215.
17. Darzynkiewicz, Z., Evenson, D.P., Staiano-Coico, L., Sharpless, T.K., and Melamed,
M.L., Journal of Cellular Physiology, 1979, 100, 425-438.
18. del Prete, M.J., Robles, M.S., Guao, A., Martinez-A, C., Izquierdo, M., and Garcia-Sanz,
J.A., The FASEB Journal, 2002.

ACKNOWLEDGMENTS
P. Treepong would like to thank National Center for Genetic Engineering and Biotechnology,
Thailand (BIOTEC), and King Mongkuts University of Technology Thonburi for a full
scholarship.
A00033
March 23-26, 2010
71
Estimating Carbon Sequestration of J. curcas L.
from Plant CO
2
Assimilation and
Dry Matter Accumulation

P. Thongbai
C
, B. Hadiwijaya, and P. Sengkeaw

School of Science, Mah Fah Luang University, 333, Moo 1, Ta-Sud, Muang, Chiang Rai, 57100,
Thailand
C
E-mail: pongmanee@mfu.ac.th; Fax: +66(0) 53916776; Tel. +66(0) 53916775

ABSTRACT
Jatropha (Jatropha curcas L.) has recently been interested as an additional biodiesel
crop. Although still not feasible for commercial production, its wide adaptation and
low input makes Jatropha more suitable for reforestation in the sub-tropics; and its
photosynthesis might help reduce atmospheric CO
2
, increase carbon sequestration, and
thus reduce global warming. However, the exact net photosynthesis and its relationship
between dry matter accumulation and carbon sequestration of Jatropha have not been
reported so far. The objectives of this study were to measure net CO
2
assimilation in
Jatropha in the form of net photosynthetic and to investigate the relationships between
net CO
2
assimilation, dry matter accumulation and carbon sequestration of Jatropha
with different growing pattern. One month old J. curcas L. seedlings were treated with
2 growth retardants, paclobutazol and mepiquat chloride, each at the rate of 0, 250, 500
and 750 M for 3 consecutive weeks to manipulate their growth pattern. Plant CO
2

assimilation was measured using Infrared Gas Analyzer (IRGA: LCi Portable
Photosynthesis System, ABC BioScientific Ltd., England.) at 0, 1, 2, 3, 4, 6, 9, 12, 15,
20, 25, and 30 weeks after application, as well as destructive plant sampling for growth
in terms of height (H), number of leaves, nodes, internodes and secondary branches,
leaf area and total dry matter (DM). Net carbon assimilation (NCAR), cumulative CO
2

uptake (CCU), crop growth rate (CGR), net assimilation rate (NAR), and plant carbon
sequestration (PCS) were then calculated; their relationships were tested and compared
with the model used for olive (Villalobos et. al., 2006). Highly significant correlation
between DM, NCAR and CCU with PCS suggests that PCS from Jatropha could be
estimated using in situ photosynthesis measurement without destructive plant sampling
for dry matter, which might benefits rural farmers who grow Jatropha for the carbon
credit scheme in the future.

Keywords: photosynthesis, dry matter, carbon sequestration, J. curcas L, CO
2

assimilation

1. INTRODUCTION
Recently Jatropha curcas has been identified as an additional, sustainable source of
biodiesel which used to replace petroleum-based diesel due to its advantage with only have
78% less CO
2
emission than diesel [14][18]. Additional benefit of growing biofuel crop is
reducing high atmospheric CO
2
through photosynthesis, in which CO
2
is sequestrated from
atmosphere into plant dry matter or biomass and/or organic matter in the soil. The process,
known as carbon sequestration, would therefore help reduce global warming/greenhouse
effect [7]. However, carbon sequestration from a cropland normally estimates from the data
of plant dry matter, which is destructive and not suitable for perennial crops. Photosynthesis
A00033
March 23-26, 2010
72
rate or amount of CO
2
directly assimilated by plant should be more useful to estimate plant
carbon sequestration, but the exact net photosynthesis and its relationship between dry matter
accumulation and carbon sequestration of Jatropha have not been reported so far.

Plant biomass, including above-ground and below-ground parts GPP, is the main product
from atmospheric CO
2
removal. About half of the GPP is respired by plants, and returned to
the atmosphere, with the remainder constituting net primary production (NPP), which is the
total production of biomass and dead organic matter in a year. In non-forest ecosystems (i.e.
cropland), biomass is predominantly nonwoody perennial and annual vegetation, turns over
annually or within a few years and hence net biomass carbon stocks may remain roughly
constant (IPCC, 2006). Furthermore, half of biomass production from the carbon
sequestration is directed to vegetative growth, which in turn is divided in 30% for leaves and
70% for supporting organs (stems, branches, and trunk) in olive tree [15]. The estimation of
above-ground dry biomass accumulation is important for monitoring crop growth, predicting
potential yield, and estimating crop residues in the context of the carbon cycle [9]. Simulation
models describing carbon balance of leaves, whole plants, and ecosystems use a biochemical
model of the net photosynthetic rate (P
N
) [6]. Measurement of relationship between
photosynthetic rate and Photosynthetically Active Radiation (PAR) from Jatropha plant also
has been established in olives orchards [16], but the exact net photosynthesis and its
relationship between dry matter accumulation and carbon sequestration of Jatropha have not
been reported so far.
Purposes of this study were to measure net CO
2
assimilation and to investigate
relationship between net CO
2
assimilate, dry matter and plant carbon sequestration of Jatropha
plants of which their growth pattern were manipulated by using GA inhibitors.

3. EXPERIMENTAL or COMPUTATIONAL DETAILS
3.1 Plantation
Experiments were conducted at Jatropha field of Mae Fah Luang Universitys, Chiang
Rai, Thailand start from April 2008. Seeds of Jatropha curcas cv. Thailand were sown on
trays covered wet tissue paper and transplanted to 1 kg black plastic bags filled with black ash
when four little peripheral roots expanded. Jatropha plants which have 4-5 true leaves then
transplanted into a 5 kg plastic pot contained mixed of soil, husk and manure in ratio of 3:1:1.
Fertilizer was applied for each plant after 2 months old with 5 g N.P.K at ratio 15:15:15.
Trials were setup in factorial randomized complete block design with two GA inhibitors:
paclobutrazol [17] and mepiquat chloride (PIX) [12], three replications and four rate of
inhibitors: 0, 250, 500 and 750 M/L per pot using uniform foliar spray (50 mL per pot).
Control plants were sprayed with deionized (50 mL per pot). Freshly-prepared aqueous
solutions of the retardants were applied each week for 3 consecutive weeks after plants have 5
true leaves.

3.2Measurement
3.2.1 Plant growth
At 0, 1, 2, 3, 4, 6, 9, 12, 15, and 20 weeks after first treatment, Jatropha plants were taken to
measure dry matter production. Plants were cut and partitioned into green leaves, stem
(include nodes and branches), and roots. Leaf area (LA) was measured using punch method.
Dry weights were recorded after 48 h drying at 62
o
C.

A00033
March 23-26, 2010
73
3.2.2 CO
2
assimilation
Photosynthetic rate measured using Infrared Gas Analyzer (IRGA: LCi portable
Photosynthesis System, ADC BioScientific Ltd., England) on 4
th
leaf from the top.
Measurement conducted at 0, 1, 2, 3, 4, 6, 9, 12, 15, 20 weeks after first application. The
measurement use broad type chambers with full leaf area 625 mm
2
, this is the default area
value as the leaf normally covers the whole chamber. The flow rate through the leaf chamber
was set at 0.17 m
2
s mol
-1
and the transmission factor of PAR into the leaf chamber at the
exposed leaf surface was set at 0.88.
Photosynthetic rate was calculated using Eq. 2 in order to get plant net photosynthesis.

where LA is Leaf Area.
The total net photosynthesis in whole period was calculated to get information about
cumulative CO
2
uptake and then compared with total dry matter as relationship between CO
2

assimilation rate and biomass production.
Plant carbon sequestration was calculated by multiplying biomass production by 0.5 for above
ground and for below-ground was 25% of the aboveground [2].

3.3 Statistical Analysis
Statistical analysis of the data was performed using Repeated-measures ANOVA using
SPSS (ver. 16.0 for Windows). Linear regression between two subjects and plotting were
made using SigmaPlot (ver. 11).
.

4.1 Plant growth
From analysis of variance there was highly significant difference of dry matter production
each week (Table 1), indicate Jatropha plants increase the dry matter production each week.
High significant difference also occur between time, type of inhibitor and each rate as well as
their main effects for stem dry matter production. Stem dry matter production influence the
shoot dry matter production which also had high significant difference between time, type of
inhibitor and rate, either with main effect being significant.

Root dry matter production had no significant difference for type of inhibitor, but highly
significant for each rate. There was also no significant different between type of inhibitor, as
well as the interaction between type and rate of inhibitor for the leaf dry matter production.
There was significant difference between plants which treated with each rate of GA
inhibitors for total dry weight. Control plants which untreated with GA inhibitors had highest
Tabel 1. Jatropha dry matter production and CO
2
assimilation

Source of variations
SDM RDM StDM LDM Photosynthetic rate
Net CO
2

assimilation
(kg) (g) (kg) (g) (mol m
-2
s
-1
) (mol
s-1
)
Time ** ** ** ** ** **
Inhibitor ** ns ** ns ns ns
Rate ** ** ** ** ns ns
Inhibitor*rate ** ** ** ns ns ns
Time*inhibitor ** ** ** ns * *
Time*rate ** ** ** ** * **
Time*inhibitor*rate ** ** ** ** ns *
ns
Non-significant
*
Significant at P<0.05
**
Significant at P<0.01

A00033
March 23-26, 2010
74
total dry weight than the treated plants (Fig. 1). Jatropha plants which treated with highest
rate produce lowest total dry weight. The treated plants produce dry matter slower than
untreated plants at 3
rd
week and gain recovery at 9
th
week due to the treatment application.
Time (week)
0 5 10 15 20 25
T
o
t
a
l

D
r
y

W
e
i
g
h
t

(
g
)
0
50
100
150
200
250
300
0 M/L
250 M/L
500 M/L
750 M/L

Figure 1. Jatropha plants total dry matter

The difference of dry matter production between untreated plants and treated plants
suggest that GA inhibitor treatment could manipulate Jatropha plant growth especially for dry
matter production, but not for leaf area. There was only minor difference between untreated
plants and treated plants for leaf area (Fig. 2). Treated plants had less leaf area than untreated
plant which started at 3
rd
week.
Time (week)
0 5 10 15 20 25
L
e
a
f

A
r
e
a

(
c
m
2
)
0
2000
4000
6000
8000
0 M/L
250 M/L
500 M/L
750 M/L

Figure 2. Jatropha plants leaf area which calculated using Eq. 1

4.2 Photosynthetic rate and net photosynthesis
The increasing of photosynthetic rate and net photosynthesis each week showed by the
high significant difference (Table 1), but there was no significant difference between treated
plants and untreated plants. Control plants which had the widest leaf area (Fig. 2) should
assimilate CO
2
much more than treated plants due to wide surface for carbon fixation. Higher
leaf area increased the overall carbon supply and the fraction of total plant biomass [13], but
canopy width could decrease photosynthetic activity due to increased shading of older leaves
[5]. Transport of CO
2
for photosynthesis use transport pathway across the canopy boundary
and the leaf stomata. Canopy photosynthesis has its dependence on absorbed solar radiation as
the energy source, which in turn is dependent on leaf area and degree canopy cover [3].

A00033
March 23-26, 2010
75

Figure 3. Cumulative carbon uptake per plant for 20 weeks

3.4 Relationship between CO
2
assimilation, dry matter and carbon sequestration
Cumulative CO
2
uptake of Jatropha plant and was highly correlated with biomass
production (r
2
=0.94) as shown in Fig. 4. The key determinant of biomass production is the
rate of carbon uptake as a result of shoot photosynthesis, which is known to be responsive to
several environmental variables, particularly atmospheric CO
2
[1][10].

Figure 4. Relationship between NCAR and TDM of Jatropha plant

Carbon sequestration also highly correlated with cumulative CO
2
uptake (r
2
=0.94), as shown
in Fig. 5. As photosynthetic carbon sequestration is the net photosynthesis primary products
which is result of photosynthesis during the growing period minus carbon losses via
decarboxylation.

Figure 5. Relationship between NCAR and carbon sequestration of Jatropha plant
log Cumulative CO
2
Uptake (mol/m
2
)
2.0 2.5 3.0 3.5 4.0 4.5 5.0
l
o
g

T
o
t
a
l

D
r
y

W
e
i
g
h
t

(
g
)
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
log(y)=-0.164+0.526*log(x), r
2
=0.94
log Cumulative CO
2
Uptake (mol/m
2
)
2.0 2.5 3.0 3.5 4.0 4.5 5.0
l
o
g

C
a
r
b
o
n

S
e
q
u
e
s
t
r
a
t
i
o
n

(
g

C
)
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
log(y)=-0.368+0.526*log(x), r
2
=0.94
A00033
March 23-26, 2010
76
This close relationship indicates the increase of accumulative CO
2
uptake by plant followed
by number of carbon which converted into plant biomass.

5. CONCLUSION
Highly significant correlation between DM, NCAR and CCU with PCS suggests that PCS
from Jatropha could be estimated using in situ photosynthesis measurement without
destructive plant sampling for dry matter, which might benefits rural farmers who grow
Jatropha for the carbon credit scheme in the future.

REFERENCES
1 Ainsworth, E.A. and Long, S.P. 2005, New Phytologist, 165, 351-371.
2 Albrecth, A. and Kandji, S.T. 2003, Agriculture, Ecosystem and Environment, 99, 15-27.
3 Asseng, S. and Hsiao, T. C. 2000, Field Crops Research, 67, 191-206.
4 Baumgart, S. 2007, Jatropha cultivation Belize, Expert seminar on Jatropha curcas L. Agronomy
and genetics, 26-28 March 2007, Wageningen, the Netherlands, published by FACT Foundation.
5 Campbell, D.E., Lyman, M., Corse, J., and Hautala, E. (1986), Plant Physiol, 80, 711-715.
6. IPCC 2006, 2006 IPCC Guidelines for National Greenhouse Gas Inventories. Eggleston H.S.,
Buendia L., Miwa K., Ngara T. and Tanabe K. (eds). The National Greenhouse Gas Inventories
Programme, IGES, Japan.
7. Harnos, N., Nagy, Z., Balogh, J., and Tuba, Z. (2006), Applied Ecology and Environmental
Research, 4, 47-53.
8. Heller, J. (1996) Physic nut (Jatropha curcas L.), Promoting the conservation and use of
underutilized and neglected crops I, Institute of Plant Genetics and Crop Plant Research, Germany,
and International Plant Genetics Resources Institute, Rome.
9. Hunt, R., Causton, R., Shipley, R., and Askew, A.P. (2002), Annals of Botany, 90, 485-488.
10. Liu, J., Miller, J.R., Pattey, E., Haboudane, D., Strachan, I.B., and Hinther, M. (2004),
Proceedings IEEE International Geoscience and Remote Sensing Symposium (IGARSS 04), 1637-
1640.
11. Lloyd, J. (1999), Functional Ecology, 13, 439-459.
12. Peterson, A.G., Ball, J.T., Luo, Y., Field, C.B., Curtis, P.S., Griffin, K.L., Gunderson, C.A.,
Norby, R.J., Tissue, D.T., Forstreuter, M., Rey, A., and Vogel, C.S. (1999), Plant, Cell, and
Environment, 22, 1109-1119.
13. Reddy, A.R., Reddy, K.R., and Hodges, H.F. (2004), Plant Growth Regulation, 20, 179-183.
14. Thomas, D.S., Montagu, K.D., and Conroy, J.P. (2006, Trees, 20, 725-733.
15. Thongbai, P., O Donnel, A.G., Wood, D., and Syers, J.K. (2006) Biofuels research and
development at Mae Fah Luang University, The 2
nd
Joint International Conference on
Sustainable Energy Environment (SEE 2006), 21-23 November 2006, Bangkok, Thailand, pp.
412-417.
16. Villalobos, F.J., Testi, L., Hidalgo, J., Pastor, M., and Orgaz, F. (2006), Europ. J. Agronomy, 24,
296-303.
17. Voronin, P.Y., Konovalov, P.V., and Zijun, M. (2003), Russian Journal of Plant Physiology, 50,
108-111.
18. Watson, G.W. (2006) Arboriculture & Urban Forestry. 32, (3), 114-117.
19. Wood, D. (2006) The future of biofuels in Thailand, The 2
nd
Joint International Conference on
Sustainable Energy Environment (SEE 2006), 21-23 November 2006, Bangkok, Thailand, pp.
418-422.

ACKNOWLEDGMENTS
This work was partly supported by National Research Council of Thailand. The 2
nd
author
would like to thanks the BPKLN Depdiknas scholarship for Double Degree program between
Brawijaya University, Indonesia, and Mae Fah Luang University, Thailand.

A00036
March 23-26, 2010
77
Using MM-PBSA Method to Further Understand Molecular
Interaction in Large Ribosomal Subunit-Macrolide System.

Wai Keat Yam
1
, and Habibah A Wahab
1,2,C

1
Pharmaceutical Design and Simulation (PhDS) Laboratory, School of Pharmaceutical Sciences,
Universiti Sains Malaysia, 11800 Minden, Pulau Pinang, Malaysia.
2
Centre for Advanced Drug Delivery, Malaysian Institute of Pharmaceuticals and Nutraceuticals,
Malaysian Ministry of Science, Technology and Innovation, Level 1, J05 Science Complex, Universiti
Sains Malaysia, 11800 Minden, Pulau Pinang, Malaysia.

C
E-mail: habibahw@usm.my, habibah@ipharm.gov.my; Tel. + 6 04 653 4533, +6 04 653 2206

ABSTRACT
Following MD simulation on large ribosomal subunit and macrolide (erythromycin A
and roxithromycin) systems, there is a need to further quantify physico-chemical
interaction observed. We have employed MM-PBSA (molecular mechanics Poisson
Boltzmann surface area) calculation to further understand their binding in terms of
estimated free energy. Estimated binding free energy of both systems showed fair
correlation with the experimental determined value and served a good basis to rank
them. Decomposition results on residual basis were able to further explain the
molecular action that happened between the interacting residues in the binding pocket
with each macrolide. These interesting results showed the importance of quantifying
residues interaction to further strengthen the understanding of ribosome-macrolide
interactions as previously explained in MD simulations and related experimental
observations.

Keywords: MM-PBSA, MD simulation, ribosome, erythromycin A, roxithromycin.

1. INTRODUCTION
Erythromycin A (ERYA), serves as one of the first generation of macrolide antibiotics and
is still commonly used in medical prescriptions today to treat infections caused by common
bacterial pathogens and some atypical pathogens [1-3]. ERYA has a broad antibacterial
spectrum and is often used as substitutes for penicillin. Roxithromycin (ROX), a second
generation macrolide antibiotic derived from ERYA, is more potent towards broader
antibacterial spectrum and has high inhibitory activity against Gram-negative bacteria. It has
longer half time, better absorption and more effective in treating gastrointestinal infections as
compared to ERYA.

Both ERYA and ROX have similar chemical structures where both of them have a 14-
membered lactone ring, a desosamine sugar branched from the C-5 of lactone ring and a
cladinose ring branched from C-3 of the lactone ring. However, the ROX has an additional
etheroxime chain at C-9 position of lactone ring as compared to ERYAs oxygen atom
(Figure 1). Both of them have similar mechanism of action, where they selectively bind to
50S large ribosomal subunit and consequently inhibit the bacterial protein synthesis process.
The 3D structure showed both macrolides interact solely with residues in Domain V of 23S
rRNA, at the entrance of the exit tunnel in the peptidyl transferase center [4]. In addition,
other experimental studies clearly demonstrated that these macrolides interact with the H35 of
Domain II [5-10], while our previous MD studies showed possible interaction with Domain
IV [11].

A00036
March 23-26, 2010
78
High drug affinity and specificity of these macrolides towards ribosome in the nanoMolar
range have been reported in equilibrium dialysis [12], footprinting protection experiments [6,
10, 13] and binding kinetics [14, 15] studies. The understanding of the actual inhibition mode
of ERYA and ROX to ribosome is only at the most general level and therefore,
comprehensive understanding and explanation of the drug inhibition at the molecular level is
much needed. The MM-PBSA method [16, 17] calculates binding free energy by summing up
contribution from the molecular mechanical energies, polar and non polar solvation free
energies and relative solute entropy effects [18, 19]. The individual non-bonded energetic
contributions to the ligand/drug are able to be decomposed based on each residue to the
overall binding. This method enables identification of important residues in binding and
correlating them with data obtained from experimental and MD simulations. Successful
applications of MM-PBSA method can be seen in the literatures [16, 17, 19-25].

Figure 1. Chemical structures of ERYA (A) and ROX (B).

Here we address these issues by employing molecular dynamics (MD) simulation,
combined with MM-PBSA for the complexes of 50S ribosomal subunit and macrolides
ERYA and ROX. The estimated binding free energies are in general agreement with
experimental results. The decomposition results of residue basis non-bonded contribution
towards the macrolides showed insights that were exerted by each interacting residues and
thus, quantified the molecular interactions that happen in the binding as observed in the MD
simulations. This study would be helpful and useful for the understanding of ribosomal
subunit-macrolides interaction, and hence facilitating research and development of rational
antibiotics designing

The 3D structure of the 50S large ribosomal subunit in complexed with ERYA and ROX
(Protein Data Bank code: 1JZY and 1JZZ [4]) were used as initial structures. Residues that
were missing from the X-ray crystal structure were reconstructed as described in [11, 26].
Partial charges for ERYA and ROX were obtained by restrained electrostatic potential
(RESP). The Amber 99 (RNA) all atom force field [27] and general amber force field (GAFF)
[28] were used to describe the molecular mechanics for the ribosomal subunit and macrolides,
respectively. Potassium and sodium counterions were added to the most negative position of
the complexes prior to solvation in TIP3P [29] water box. Complexes then underwent
minimization stages using Steepest Descent and Conjugate Gradient algorithms before
switching to thermalization and equilibration stages of MD simulations. Equilibration stages
were done in the NVT ensemble before changing to NPT for MD production stage.
Temperature and pressure for the systems were regulated at 300K and 1 bar, respectively.
Particle Mesh Ewald (PME) were used to correctly calculate the long range interaction, while
the short range cutoff was set to 8.0 . SHAKE was turned on to constraint bonds involving
A00036
March 23-26, 2010
79
hydrogen to allow force equation integrated at 2 fs. A total of 1.5 ns of MD simulation were
performed, where thermodynamics properties were monitored throughout the simulation and
MM-PBSA energetic analysis were done in the 500-1500 ps window. The detail procedure
can be found in [26].

MM-PBSA energetic analysis were done using single trajectory approach, where each
snapshots for systems of large ribosomal subunit-macrolide, large ribosomal subunit and
macrolide, respectively were taken from snapshot of the performed MD simulation. Binding
free energy (G
bind
) of each system in MM-PBSA could be conceptually summarized as
follows:

G
bind
= G
com
- G
rec
- G
lig
(1)
G = E
MM
+ G
solv
TS (2)
In which, E
MM
= E
bond
+ E
angle
+ E
torsion
+ E
vdw
+ E
EEL
(3)

G
solv
= G
PB
+ G
SA
(4)

G
SA
= SA+b (5)

where G
com
, G
rec
and G
lig
are the free energy for the complex, receptor and ligand (macrolide),
respectively. Each term is calculated by averaging energy of molecular mechanics (E
MM
),
solvation free energy (G
solv
) and vibrational entropy term (TS) as in (2). E
MM
(3)

contributed by bonded (E
bond
, E
angle
and E
torsion
) and non-bonded (E
vdw
and E
EEL
) terms and the
individual non-bonded contribution of the binding pocket to macrolide were further
decomposed on a residue basis. G
PB
was calculated using DELPHI [30, 31] software with
dielectric medium of 1 and 80 for solute and solvent, respectively. Atomic radii were taken
from PARSE [32] with additional value of 1.90 for phosphorus [33] and in order to be
consistent with molecular mechanics energy calculation, the partial charges on solute was
taken from Amber 99 (RNA) force field [27] and from our ab-initio calculation for ERYA
and ROX. An 80% boxfill lattice with grid spacing 0.5 grid/ was applied and 2,000 linear
iteration steps were required to obtain energy convergence.

G
SA
was calculated using molsurf and and b are 0.00542 kcal/mol
2
and 0.92 kcal/mol,
respectively. Residues within 25 from the mass center of macrolide were used for PBSA
calculation and a total of 1000 snapshots were extracted from the last 1000 ps of MD
trajectory. The solute entropy contribution was estimated by normal mode analysis [16] using
NMODE. Normal mode calculation was extremely time-consuming and computationally
expensive for large system; therefore only residues within 10 from the mass center of
inhibitor were used here. The calculation was based on the average entropy value obtained
from 25 snapshots taken from the final 500 ps MD trajectories with a time interval of 20 ps.

Over the 1.5 ns MD trajectories of 50S ribosomal subunit with ERYA and ROX systems,
the overall structure of all complexes appeared to be equilibrated after ~500 ps, as shown by
the total energy and RMS deviation plots. From the perspective of energetics, a summary of
the MM-PBSA calculation, including the calculated binding free energies, contribution of
energy terms to G
bind
from molecular mechanics, polar and non-polar solvation energy and
solute entropy between inhibitors and its binding pocket are presented in Table 1.

The calculated binding free energies for binding pocket-ERYA was -7.64 kcal/mol and was
found to correlate fairly, but slightly underestimated compared to the experimental value
ranging from 1.1 x 10
-8
M (G
exp
=-10.92 kcal/mol) [15], 1.4 x 10
-8
M (G
exp
= -10.78
kcal/mol) [6, 10] to 3.6 x 10
-8
M (G
exp
= -10.22 kcal/mol) [14], as determined from
footprinting protection experiments and slow binding kinetic studies, respectively. It is a
different scenario in the 50S ribosomal subunit-ROX system, where the calculated binding
free energy for binding pocket-ROX was -20.66 kcal/mol and was found to be overestimated
A00036
March 23-26, 2010
80
compared to the experimental value of -10.57 kcal/mol (2.0 x 10
-8
M) obtained from slow
binding kinetic studies [14]. The discrepancies in the calculated binding free energy and
experimentally determined values might due to the difference in temperature and bacteria
species used in both experimental and MD studies [26].

Table 1. Binding free energy and other energy terms to Gbind for binding
pocket-ERYA, binding pocket-ROX and binding pocket-ERYB.
Contribution
binding pocket-ERYA
(kcal/mol)
binding pocket-ROX
(kcal/mol)

E
EEL
-4.42 7.29 9.85 18.13
E
vdw
-85.65 12.17 -78.99 5.50
E
MM
-90.07 12.79 -69.13 17.57
G
PB
74.86 6.61 24.78 9.07
G
SA
-8.44 0.30 -9.63 0.20
G
MMPBSA
-23.65 15.70 -53.98 19.41
-TS 16.01 33.32
G
bind
-7.64 -20.66
G
bind
(exp) -10.22 -10.57

The corresponding experimental G
bind
(exp) values were obtained using K
diss
value from [14] with the
following relationship [23, 34]: G
bind
(exp) = RT ln K
diss
= RT ln (IC
50
+ 0.50C
enz
) RT ln IC
50
, where
R is the ideal gas constant, T is temperature (300K is used here) and C
enz
is the concentration of
enzyme.

As shown in Table 1, the major differences between both systems were the contribution
from electrostatic and polar solvation free energy terms. ROX contributed non favorable
electrostatic term compared to ERYA. The significant differences might be due to their
structural modification in both macrolides and thus, yielded significant E
EEL
values amongst
them. In the case of polar solvation energy term, the significant value in the ERYA system
compared to ROX system could mean that ERYA received more polar contribution from
solvation effects, as agreed by the observations found in the hydration effects analysis [11].

The calculated binding free energies obtained suggested ROX was more favorable in binding
as compared to ERYA. This observation was found to be consistent with previous
experimental studies and researches, where ROX has been demonstrated to be more favorable
and superior as an antimicrobial agent against different range of Gram positive and Gram
negative bacteria [14, 35-37]. In terms of their structural characteristics, ROX was more
hydrophobic than the parent compound [38] and it has additional 9-oxime chain that consists
of 3 oxygen and 1 nitrogen atoms that are able to act as hydrogen bond acceptors. These
characteristics contributed to ROX for being more favorable in binding.

The overall binding free energy of ERYA and ROX on the binding pocket were also
decomposed into electrostatic and van der Waals contribution using the same method (Tables
2-3). Based on the decomposition results, several important residues like C759, A764, C765
(from Domain II), C1773 (Domain IV), A2041, A2042, G2484, C2565, C2589, U2590
(Domain V) contributed to the overall binding of macrolides to the binding pocket. These
residues were previously shown as important residues in previous studies [2, 5, 6, 8, 10, 11,
13, 26, 39-44].
A00036
March 23-26, 2010
81
Table 2. Residue based decomposition of interaction energies (kcal/mol) for van
der Waals (E
vdw
) and electrostatic (E
EEL
) between ERYA and binding
pocket.
Residue Number E
vdw
E
EEL

C759 -0.93 -0.77
C765 -0.73 -0.33
C1773 -1.72 -2.14
A2040 -0.20 -0.19
A2041 -0.55 -0.12
A2042 -5.31 -0.70
G2044 -2.95 -0.91
A2045 -2.06 -0.61
C2046 -2.04 0.26
A2418 -1.66 0.01
A2430 -0.41 0.22
A2482 -1.61 -0.41
G2484 -1.36 -0.02
G2562 -0.73 -1.24
U2564 -1.66 -1.64
C2565 -3.60 0.26
U2588 -2.42 -0.61
C2589 -3.75 1.12
U2590 -3.90 -0.32

E
subtot
-37.59 -8.14

Table 3. Residue based decomposition of the interaction energies (kcal/mol) for
van der Waals (E
vdw
) and electrostatic (E
EEL
) between ROX and binding
pocket.
Residue Number E
vdw
E
EEL

G758 -0.19 -0.55
A764 -2.07 -1.33
C765 -0.26 0.09
C803 -2.52 -0.57
G805 -0.48 -0.11
C1772 -0.36 0.36
C1773 -4.89 -0.34
A2040 -0.19 -0.17
A2041 -1.81 -0.16
A2042 -1.65 0.47
A2045 -1.56 0.81
C2421 -0.98 2.59
G2484 -1.87 0.02
U2563 -0.44 0.20
U2564 -1.18 0.30
C2565 -2.72 1.92
A2566 -1.03 1.18
G2587 -0.81 -0.55
U2588 -1.61 0.21
C2589 -2.91 -2.18
U2590 -1.06 -0.71

E
subtot
-32.29 0.70
A00036
March 23-26, 2010
82
4. CONCLUSION
MM-PBSA calculation enabled the evaluation of the binding free energy of large
ribosomal subunit with macrolides ERYA and ROX. The estimated binding free energy for
both systems were slightly overestimated when comparing to experimental determined value.
Nevertheless both values correlated fairly with the experimentally determined value and were
able to rank their binding affinity correctly. Decomposition results clearly indicated van der
Waals interaction is more superior than electrostatic interaction. These results not only give us
a quantitative view but a qualitative view when correlating with the physico-chemical
interaction as observed in the MD trajectories.

REFERENCES
1. Pal, S., Tetrahedron., 2006, 62 (14), 3171-200.
2. Nilius, A. M. and Ma, Z., Curr. Opin. Pharmacol., 2002, 2 (5), 493-500.
3. Gasc, J. C., d'Ambrieres, S. G., Lutz, A. and Chantot, J. F., J. Antibiot. (Tokyo), 1991, 44
(3), 313-30.
4. Schlunzen, F., Zarivach, R., Harms, J., Bashan, A., Tocilj, A., Albrecht, R., Yonath, A. and
Franceschi, F., Nature, 2001, 413 (6858), 814-21.
5. Xiong, L., Shah, S., Mauvais, P. and Mankin, A. S., Mol. Microbiol., 1999, 31 (2), 633-9.
6. Hansen, L. H., Mauvais, P. and Douthwaite, S., Mol. Microbiol., 1999, 31 (2), 623-31.
7. Weisblum, B., Antimicrob. Agents Chemother., 1995, 39 (4), 797-805.
8. Vester, B. and Douthwaite, S., Antimicrob. Agents Chemother., 2001, 45 (1), 1-12.
9. Xiong, L., Korkhin, Y. and Mankin, A. S., Antimicrob. Agents Chemother., 2005, 49 (1),
281-8.
10. Douthwaite, S., Hansen, L. H. and Mauvais, P., Mol. Microbiol., 2000, 36 (1), 183-93.
11. Wahab, H. A., Yam, W. K., Samian, M. R. and Najimudin, N., J. Biomol. Struct. Dyn.,
2008, 26 (1), 131-46.
12. Pestka, S. and Lemahieu, R. A., Antimicrob. Agents Chemother., 1974, 6 (4), 479-88.
13. Douthwaite, S. and Aagaard, C., J. Mol. Biol., 1993, 232 (3), 725-31.
14. Dinos, G. P., Connell, S. R., Nierhaus, K. H. and Kalpaxis, D. L., Mol. Pharmacol., 2003,
63 (3), 617-23.
15. Lovmar, M., Tenson, T. and Ehrenberg, M., J. Biol. Chem., 2004, 279 (51), 53506-15.
16. Kollman, P. A., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., Lee, M., Lee, T.,
Duan, Y., Wang, W., Donini, O., Cieplak, P., Srinivasan, J., Case, D. A. and Cheatham, T.
E., 3rd, Acc. Chem. Res., 2000, 33 (12), 889-97.
17. Kuhn, B., Donini, O., Huo, S., Wang, J. M. and Kollman, P. A., Free energy calculations
in rational drug design, 2001, 243-51.
18. Massova, I. and Kollman, P. A., J. Am. Chem. Soc., 1999, 121 (36), 8133-43.
19. Wang, W. and Kollman, P. A., J. Mol. Biol., 2000, 303 (4), 567-82.
20. Xu, Y. and Wang, R., Proteins, 2006, 64 (4), 1058-68.
21. Chong, L. T., Duan, Y., Wang, L., Massova, I. and Kollman, P. A., Proc. Natl. Acad. Sci.
USA, 1999, 96 (25), 14330-5.
22. Srinivasan, J., Cheatham, T. E., Cieplak, P., Kollman, P. A. and Case, D. A., J. Am. Chem.
Soc., 1998, 120 (37), 9401-09.
23. Wang, J., Morin, P., Wang, W. and Kollman, P. A., J. Am. Chem. Soc., 2001, 123 (22),
5221-30.
24. Page, C. S. and Bates, P. A., J. Comput. Chem., 2006, 27 (16), 1990-2007.
25. Gouda, H., Kuntz, I. D., Case, D. A. and Kollman, P. A., Biopolymers, 2003, 68 (1), 16-
34.
26. Yam, W. K. and Wahab, H. A., J. Chem. Inf. Model., 2009, 49 (6), 1558-67.
27. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., Merz, K. M. J., Ferguson, D. M.,
Spellmeyer, D. C., Fox, T., W., C. J. and Kollman, P. A., J. Am. Chem. Soc., 1995, 117
5179-97.
28. Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. and Case, D. A., J. Comput.
Chem., 2004, 25 1157-74.
A00036
March 23-26, 2010
83
29. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. and Klein, M. L., J.
Chem. Phys., 1983, 79 926-35.
30. Rocchia, W., Sridharan, S., Nicholls, A., Alexov, E., Chiabrera, A. and Honig, B., J.
Comput. Chem., 2002, 23 (1), 128-37.
31. Honig, B., Sharp, K. and Yang, A. S., J. Phys. Chem., 1993, 97 (6), 1101-09.
32. Sitkoff, D., Sharp, K. A. and Honig, B., J. Phys. Chem., 1994, 98 (7), 1978-88.
33. Rashin, A. A., Biopolymers, 1984, 23 (8), 1605-20.
34. Liu, B., Bernard, B. and Wu, J. H., Proteins, 2006, 65 (2), 331-46.
35. Auerbach, T., Bashan, A., Harms, J., Schluenzen, F., Zarivach, R., Bartels, H., Agmon, I.,
Kessler, M., Pioletti, M., Franceschi, F. and Yonath, A., Curr. Drug Targets Infect.
Disord., 2002, 2 (2), 169-86.
36. Champney, W. S., Curr. Top. Med. Chem., 2003, 3 (9), 929-47.
37. Omura, S., Macrolide Antibiotics: Chemistry, Biology, and Practice, 2nd, Academic
Press, Amsterdam, Boston, 2002,
38. Bertho, G., Gharbi-Benarous, J., Delaforge, M. and Girault, J. P., Bioorg. Med. Chem.,
1998, 6 (2), 209-21.
39. Moazed, D. and Noller, H. F., Biochimie., 1987, 69 (8), 879-84.
40. Tenson, T. and Mankin, A. S., Peptides, 2001, 22 (10), 1661-8.
41. Leclercq, R. and Courvalin, P., Antimicrob. Agents Chemother., 2002, 46 (9), 2727-34.
42. Ma, Z., Clark, R. F., Brazzale, A., Wang, S., Rupp, M. J., Li, L., Griesgraber, G., Zhang,
S., Yong, H., Phan, L. T., Nemoto, P. A., Chu, D. T., Plattner, J. J., Zhang, X., Zhong, P.,
Cao, Z., Nilius, A. M., Shortridge, V. D., Flamm, R., Mitten, M., Meulbroek, J., Ewing, P.,
Alder, J. and Or, Y. S., J Med Chem, 2001, 44 (24), 4137-56.
43. Pfister, P., Jenni, S., Poehlsgaard, J., Thomas, A., Douthwaite, S., Ban, N. and Bottger, E.
C., J. Mol. Biol., 2004, 342 (5), 1569-81.
44. Tu, D., Blaha, G., Moore, P. B. and Steitz, T. A., Cell, 2005, 121 (2), 257-70.

ACKNOWLEDGMENTS
Authors would like to acknowledge MIMOS (M) Sdn. Bhd. for the generous supply of
computational resources.

A00039
March 23-26, 2010
84

3D Pharmacophore and Molecular Docking of AFB
Metalloprotease and Peptide Analogs for Novel
Antimicrobial Inhibitors

S. Krongdang
1,2
, P. Chantawannakul
2,3
, P. Nimmanpipug
1,4
,

and V. S. Lee
1,4,C

1
Computational Simulation and Modeling Laboratory (CSML), Computational Simulation and
Modeling Laboratory (CSML), Department of Chemistry and Center for Innovation in Chemistry,
Chiang Mai University, Chiang Mai, 50200, Thailand
2
Division of Biotechnology, Graduate School, Chiang Mai University, Chiang Mai, 50200 Thailand
3
Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, 50200 Thailand
4
Department of Chemistry, Faculty of Science, Chiang Mai University, Chiang Mai, 50200 Thailand
C
E-mail: vannajan@gmail.com; Fax: +66-53-892277; Tel. +66-53-943-341 ext 117

ABSTRACT
American Foulbrood (AFB) disease caused by Paenibacillus larvae. During
sporulation process, the AFB protease is produced and probably a key role in
proteolytic activity involving in pathogenic mechanism. It is classified into zinc
metalloprotease as thermolysin families which contain the HEXXH, a short zinc
binding consensus sequence. Homology modeling was performed a significant
sequence similarity between target enzyme and template; thermolysin, a major of
bacterial zinc metalloprotease. Consequently, in this study three dimensional
pharmacophore modeling calculated by the Catalyst/HipHop algorithm has been
investigated from its inhibitors; phosphoramidon analogs, peptide hydrazides and
hydroxamic acids. Two hydrogen bond acceptors (HBA), one hydrogen bond donor
(HBD) and hydrophobic group (HB) were presented with the best hypothesis model
with GLU137, in binding region. Molecular docking study using CDOCKER were
applied in comparison of the interaction between the thermolysin inhibitors and our
proposed antimicrobial inhibitors from short hexapeptides in propeptide region with
our AFB metalloprotease model.

Keywords: American Foulbrood, Metalloprotease, Pharmacophore, Molecular docking.

REFERENCES
1. Antunez, K., Anido, M., Schlapp, G., Evans, J. D. and Zunino, P. J Invertebr Pathol 2009,
102, 129-32.
2. Dancer, B. N. and Chantawannakul, P. Journal of Invertebrate Pathology 1997, 70, 79-
87.
3. Demidyuk, I. V., Zabolotskaya, M. V., Safina, D. R. and Kostrov, S. V. Russian Journal
of Bioorganic Chemistry 2003, 29, 418-425.
4. Hooper, N. M. FEBS Lett 1994, 354, 1-6.
5. Komiyama, T., Suda, H., Aoyagi, T., Takeuchi, T. and Umezawa, H. Arch Biochem
Biophys 1975, 171, 727-31.
6. Nishino, N. and Powers, J. C. Biochemistry 1978, 17, 2846-50.

Computational
Chemistry

A00003
March 23-26, 2010

85
Monte Carlo Simulation of Two-component Bilayers with
Interlayer Coupling

K. Sornbundit
1
, W. Ngamsaad
1
, D. Triampo
2
and W. Triampo
1,3,4,C

1
R&D Group of Biological and Environmental Physics, Department of Physics, Faculty of Science,
Mahidol University, Bangkok 10400, Thailand
2
Department of Chemistry, Faculty of Science, Mahidol University, Bangkok 10400, Thailand
3
4
Center of Excellence for Vectors and Vector-Borne Diseases, Faculty of Science, Mahidol University,
C
E-mail: wtriampo@gmail.com

ABSTRACT
We present an Ising-like model to study phase separation dynamics and domain
coarsening in lipid bilayers. In this model, lipid bilayers can be viewed as a stack of
two-dimensional monolayers. Interlayer coupling is included to represent the interaction
of a pair of molecules that locates in the same site of opposed layers. To study domain
coarsening, the system is simulated by using the Monte Carlo method via conserved-
order-parameter dynamics. In order to enhance the effect of bulk-diffusion and suppress
the effect of surface-diffusion, a bulk-diffusion algorithm is applied. We found that the
characteristic domain size R(t) as the function of time grows according to the 1/3 power
law of time at the late stage.

Keywords: lipid bilayers, Monte Carlo method, interlayer coupling

1. INTRODUCTION
Plasma membranes in animal cells are complex systems. They are composed of many
types of proteins and about 500 different lipid species. They can be divided into two basic
types: saturated phospholipids and unsaturated phospholipids. A major component of the
plasma membrane is cholesterol. When plasma membranes are placed below the critical
temperature, Tc, lateral heterogeneities are observed. Saturated phospholipids and
cholesterols are laterally organized into nano-scale clusters known as raft and bounded by a
region or sea of unsaturated phospholipids [1]. From this point of view, the systems have two
liquid phases: the liquid-ordered phase (lo), which is the cholesterol-rich region or raft; and
the liquid-disordered phase (ld), which represents the cholesterol-poor region or the region
that contains only unsaturated phospholipids. We believe that the raft is involved with many
cell functions, such as signaling, recruitment of specific proteins, and endocytosis [2]. In
order to study phase separation in plasma membranes, researchers usually use giant
unilamellar vesicles (GUVs) instead of real membranes, because this way they can control
lipid compositions and exclude components, such as proteins, that are not involved in phase
separation [3]. Typically, the vesicle has only three components: saturated phospholipids,
unsaturated phospholipids, and cholesterolbut even though it contains only these three
components, phase separation does occur. Recent experiments [4] show that if lipid
composition in both layers are the same, we can induce or suppress rafts in one layer by
overlapping rafts and tuning lipid composition in the other layer. This implies that there is an
interaction or coupling that occurs between both layers.

Nowadays, an acceptable model to describe phase separation in membranes is two-
dimensional (2D) Ising model. Previous experiments [5-7] reveal that lipid bilayer systems
containing saturated phospholipids, unsaturated phospholipids, and cholesterol exhibit phase-
A00003
March 23-26, 2010

86
separating behavior of the 2D Ising universality class. Researchers have measured the critical
exponent of several quantities (the order parameter, line tension, correlation length, and
susceptibility) finding that its values are close to the prediction of the 2D Ising model. Thus
they have concluded that lipid membranes exhibit critical behavior in the 2D Ising model
class.

Recently, the group of Gomez et. al. [8] studied phase separation in three-component
lipid bilayers using the Ising model and Monte Carlo method. In their model, the lipid bilayer
is viewed as two interconnected lattices. One lattice is a triangular lattice that is fully
occupied by two components A and B (representing saturated phospholipids and unsaturated
phospholipids, respectively), while the other lattice is a hexagonal lattice and not fully
occupied by cholesterol or the C componentor they are empty sites. The researchers
considered nearest-neighbor interaction in the A-B lattice and between two lattices, and found
that the interaction between two lattices corresponds to preferential affinity between A and C
components and represents the coupling of two opposed lattices. The point of their work was
to find the scaling law of domain in terms of typical domain size R(t) as a function of time (t).
They applied the Enhanced bulk diffusion (EBD) dynamics with a periodic boundary
condition, which can provide the scaling laws R(t) ~ t
1/3
at late stages. These laws are also
obtained from the binary-mixture monolayer system known as Lifshitz-Slyozov (LS)
dynamics [9].

In our work, we consider the model of a lipid bilayer composed of two opposed
monolayers, shown in Fig. 1. We model the system as two opposed fully-filled square lattices
of the lipid monolayer. Each layer contains two components: particle A, which represents the
pack of a saturated phospholipid molecule and a cholesterol molecule because it has
preferential affinity; and particle B, which represents an unsaturated phospholipid molecule.
One particle (A or B) interacts only with its four-nearest-neighbor particles. The coupling is
represented by the interaction of the pair of particles that locates at the same site on both
lattices. The systems were simulated by using the Monte Carlo method with a periodic
boundary condition and EBD dynamics. The scaling law R(t) ~ t
1/3
at late stages was obtained.

This work is organized as follows. In Sec.2 the systems are modeled by using the Ising
model; the lattice system is defined and the Ising-like Hamiltonian is shown. Then, in sec. 3,
the dynamics that we use (a bulk diffusion dynamics) is explained, and the advantages of this
procedure are revealed. The domain size measurement method is then explained in Sec.4,
followed by the measurement of the growth law at varying coupling strength in sec. 5. We
conclude our results in the last section.

2. THE LATTICE SYSTEM AND THE ISING-LIKE HAMILTONIAN
As mentioned above, a study of domain formation in lipid bilayers should be consistent
with the 2D Ising model. Therefore, we will transform lipid bilayer systems to the Ising
model point of view and define the Hamiltonian of the systems. Then the domain formations
of the systems will be studied by simulating systems using the Monte Carlo method with EBD
dynamics and a periodic boundary condition.

In our model, we represent a lipid monolayer as a fully-filled square lattice. A lipid
bilayer in our model therefore is composed of a stack of two-identical-square lattices, an
upper and a lower. Each site is either occupied by +1 or -1 spinrepresenting spin A and B,
respectively. Our Hamiltonian is based on the Ising Hamiltonian; that is, spins (or particles)
on the upper and lower layer have short-range interaction and interact with their four nearest-
neighbors. This Ising-like Hamiltonian has the form

tot t b t t b b t b
i j i j i i
ij ij i
H H H H J S S J S S S S
A
= + + = A

, (1)
A00003
March 23-26, 2010

87
where the total Hamiltonian of the system
( )
tot
H is the combination of the Hamiltonian of the
upper layer
( )
t
H , lower layer
( )
b
H and coupling term
( )
H
A
. Spin
i
S has the values +1 or -1,
which represent particle A

and B at site i respectively. The same-plain interaction is denoted
by 0 J > and the coupling between two opposed layers is denoted by > 0. This implies that
preferential affinity of the same spin is required for minimum energy of the system. The
summations in the first two terms denote nearest-neighbors interaction of one spin. The
coupling between layers is represented by the last term, which does not appear in a standard
Ising Hamiltonian because we are concerned with the interactions of spins at the same
position of opposed layers. In fact, the interaction between the lipid monolayer could come
from various mechanisms [10], but we absorb it into the single coupling strength parameter
for simplicity. In the coupling term, the Ising-like Hamiltonian leads to a matching spin in
both layers because equal values of opposed spin at the same site make the Hamiltonian of the
systems decrease. This is consistent with the experiment that reported no overlapping of
domains when both layers have equal fraction of lipid composition.

3. ENHANCED BULK DIFFUSION DYNAMICS
The usual algorithm used to simulate a conserved-order-parameter system, such as lipid
bilayers and binary alloys, is the Kawasaki algorithm [11]. This dynamics allows the
movement (or exchange) of only local spins or the spins of the nearest neighbor pairs.
However, Marko et. al. [12] suggest that the Kawasaki algorithm is inefficient at reaching the
late stages of simulation because a lot of time is required. Therefore, a new algorithm is
required that maintains local movement, while reducing the time needed to reach the
equilibrium state.

The movement of particles or spins in a conserved-order-parameter can be divided into
two types: bulk diffusion and surface diffusion. Bulk diffusion is the movement of particles
(or spins) from one domain to another or between parts of the same domain, while surface
diffusion is the sliding of particles (or spins) along the surface of a domain without them ever
detaching from the domain, requiring no energy [13]. In the bulk diffusion process, the energy
needed to take particles from the small domain is less than from the large domain, so particles
on the surface of small domains will be induced to diffuse to larger domains. The process of
surface diffusion causes domain growth by joining small domains to form larger domains. In
the early stage of simulation, usually called spinodal decomposition, the systems have many
small domains, so surface diffusion dominates the systems in this stage. When enough time
passes and the systems are primarily composed of a small number of big domains, the joining
of domains becomes hard. Thus, bulk diffusion replaces surface diffusion as the dominating
system.

In the Kawasaki algorithm the acceptance ratio of surface diffusion is always 1, whereas
the acceptance ratio of bulk diffusion is often less than 1. This means that a lot of movements
are accepted in surface diffusion, so it causes domains to grow slowly in the late stages.
However, Marko et. al. proposed a new algorithm to reduce the effect of surface diffusion and
enhance the effect of bulk diffusion. This algorithm uses the spin coordination number (the
number of nearest-neighbor spins which point in the same direction of the central spin) to
modify the form of the acceptance ratio, so that the move according to surface diffusion has
an acceptance ratio of less than one. The energy change from the exchange of nearest-
neighbor spins i and j according to the enhanced bulk diffusion dynamic is given by
4 [ 1] 4 [ 1]
t t b b
i j i j
H J n n z J n n z H
A
A = + + + + + + A , (2)
A00003
March 23-26, 2010

88
where
i
n and
j
n are the spin coordination numbers of spin at sites i and j . The parameter
z is a lattice coordination number (the total number of nearest-neighbor spins). Thus, if we
use square lattices the value of z is equal to 4. The superscripts t and b denote upper and
lower layers while
t b
i i
i
H S S
A
= A
is the energy due to the interaction between layers. To

decide accepts or rejects the exchange, the system move from the initial state u to the final
state v follows from this Metropolis acceptance probability [13]

4
4
( )
2
j
i
Jn
Jn
e e
A
v
u
|
|
u v

= , (3)
where
1
( )
B
k T |

= ,
B
k is a Boltzmann constant and T is a temperature of the system.

4. TYPICAL DOMAIN SIZE ESTIMATION
In the previous sections, we have shown an Ising-like model of lipid bilayers. In this
section, we will investigate the growth law of domains. The domain size at time t, R(t), was
obtained by estimating the peak of the circularly-averaged structure factor,
( ) I q
.

4.1 THE TWO-DIMENSIONAL STRUCTURE FACTOR
The easiest way to estimate the domain size is to calculate the position of the peak of the
structure factor ( ) I q
[14]. It is obtained by performing the Fourier transform of spins and

divided by total number of spins [13]:

2
1
( ) ( ) I q S q
N
=

(4)
where ( ) I q
is t Fourier transform of spins and N is the number of spins. From this formula,
the structure factor is obtained just from the perform Fourier transform of spin divided by the
number of spins. In order to smooth ( ) I q
, we average two dimensional structure factors over

all directions, which is called the circular average. We then estimate the position of the
maximum peak by calculating the first moment
*
q
of the circularly-averaged structure factor,
( ) I q
. The characteristic domain size at time t is given by

*
2
( ) . R t
q
t
= (5)
We use this technique to calculate the typical domain size in the next section.

We calculated the growth law of domain associated with the phase separation on opposed
square lattices (Fig 1). The system is 512 512 with an equal fraction of two spins. We
performed Monte Carlo simulations with a periodic boundary condition and ran them until the
typical domain size reached 1/10 of the lattice width, because the finite-size effect will
appear. Initially, systems were set to homogeneous state, it imply that systems are placed at a
very high temperatureT and then quenched to T = 2.7 J/k. One may note that this
temperature is larger than the critical temperature of the single lipid layer, Tc ~ 2.27 J/k,
because of the presence of an interlayer coupling constant [15]. At this temperature systems
separated into two phases: A-rich phase and B-rich phase. To find the growth law, we varied
the coupling strength: 1, 2,3and 4 A= , while in-plane interaction, J, is always equal to 1.
We did 6 runs for each A.
A00003
March 23-26, 2010

89

To compare the effect of the coupling strength to the domain-coarsening process, we
revealed a sequence of domain evolutions at varying coupling strengths, as shown in Fig 2. At
the early MC steps, many small domains of both phases occurred, but the systems still looked
to be in a homogeneous state. At the later time, small domains were seen joining together with
bigger domains. Although the domain-coarsening process occurred in every coupling, the
domain growing rates are different and the rate is slow when the coupling is large.

After calculating a relation between the typical domain size and time, we found that the
typical domain size of any A growth looks like a power law function. Thus, we assumed that
domain growth obeys the power law in this form:
( )
n
R t t , (6)
where n is a growth exponent. The relation of typical domain sizes versus time (in MCS unit)
is shown in Fig. 3, and the values of growth exponents at each A are shown in table 1.
Growth exponent values tend to 1/3; this is analogous to the LS law in the monolayer and the
results obtained from Gomez et. al. [8].

Figure 1. Representation of the two-opposed-square lattices used in our work. Solid and open
circles denote species A and B, respectively. Spins on each layer interact on its nearest-
neighbor spins with in-plane interaction J, while interlayer interaction, , results from the
interaction of a pair of spins that locate at the same site.

Table 1. Growth exponent values, n, measured at the late stages with different coupling
strengths.

n
1 0.299
2 0.321
3 0.322
4 0.328

A00003
March 23-26, 2010

90

Figure 2. A sequence of domain evolution with varying coupling strength at MCS = 1000,
100000, and 1000000 from left to right. At the beginning, systems were separated into many
small A-rich and B-rich domains. Later, bigger domains appeared due to the joining of small
domains.

Figure 3. Scaling behavior of EBD dynamics for lipid bilayer systems at late stages. When
coupling strength is large domains grow slower. The linear fitting line with slope 1/3 means
R(t) ~ t
1/3
at the late stage.

1
3

= 1
= 2
= 3
= 4
A00003
March 23-26, 2010

91
6. CONCLUDING REMARK
We have performed MC simulation of an Ising-like model to explain phase separation in
a three-component-coupled bilayer, whose two components have preferential affinity. An
example of such a system is lipid bilayers, which have three components: saturated
phospholipids, unsaturated phospholipids, and cholesterol. This three-component system can
be viewed as a binary-mixing system. We have investigated the simple situation, namely an
equal fraction of two components. We have assumed that the interlayer interaction or
coupling interact on a pair of particles locating in the same position on both layers. The effect
of coupling strength was studied, and it was found that great coupling strength would result in
a small domain size, thus the scaling law of domain was examined. We have found that the
typical domain size is in accordance with the 1/3 power law of time, R(t) ~ t
1/3
, at the late
stage.

REFFERENCES
1. K. Simons, and E. Ikonen, Nature, 1997, 387, 569-572.
2. L. Bagatolli, and P. B. Sunil Kumar, Soft Matter, 2009, 5, 3234-3248.
3. M. D. Collins, and S. L. Keller, PNAS, 2008, 105, 124-128.
4. S. L. Veatch, and S. L. Keller, Phys. Rev. Lett, 2002, 89, 268101-4.
5. H. M. McConnell, ACS Chem. Biol., 2008, 3, 265-267.
6. A. R. Honerkamp-Smith, S. L. Veatch, and S. L. Keller, Biochim. Biophys. Acta., 2009,
1788, 53-63.
7. A. R. Honerkamp-Smith et al., Biophys. J., 2008, 95, 236-246.
8. J. Gomez, F. Sagues, and R. Reigada, J. Chem. Phys, 2008, 129, 184115-9.
9. I. M. Lifshitz, and V. V. Slyozov, J. Phys. Chem. Solids, 1961, 19, 35.
10. S. May, Soft Matter, 2009, 5, 3148-3156.
11. K. Kawasaki, Phase transition and Critical Phenomena, Academic, London, 1983, 443.
12. J. F. Marko, and G. T. Barkema, Phys. Rev. E, 1995, 52, 2522-2534.
13. M. E. J. Newman, and G. T. Barkema, Monte Carlo Methods in Statistical Physics,
Oxford University Press, New York, 2001, 284-287.
14. J. G. Amar, F. E. Sullivan, and R. D. Mountain, Phys. Rev. B, 1988, 37, 196-209.
15. E. Sloutskin, and M. Gitterman, Physica A, 2007, 376, 337-350.

ACKNOWLEDGMENTS
The authors thank David Blyler for editing the manuscript and providing helpful comments.
This work is partially supported by the Center of Excellence for Innovation in Chemistry
(PERCH-CIC), the Thailand Center of Excellence in Physics (ThEP), the Thailand Research
Fund (TRF), The Commission on Higher Education (CHE), and the Development Promotion
of Science and Technology (DPST), Thailand.

A00005
March 23-26, 2010
92
Virtual screening for inhibitors on isocitrate lyase of
Mycobacterium Tuberculosis with NADI database.

Yie Vern Lee
1
, Yee Siew Choong
2,c
3
1,2
Institute For Research In Molecular Medicine (INFOMM), Universiti Sains Malaysia, Minden
11800, Penang, Malaysia
3
School of Pharmaceutical Sciences, Universiti Sains Malaysia, Minden 11800, Penang, Malaysia

E-mail:
c
yeesiew@usm.my

ABSTRACT
Tuberculosis (TB) is a one of the most serious infectious disease of the world, almost
one-third of the world population has infected with this latent disease
.
The causative
agent, Mycobacterium tuberculosis, survives in both the replication mode (active TB)
and the dormant mode (inactive TB) in human host. There are drug available for active
TB, but none for the dormant TB. The drug resistant strain that emerged due to the long
treatment regime has furthered complicate the treatment. Thus the discovery for new
TB drug is in urge as new classes of TB drug is yet to be found for the past 40 years
.
In
this research, possible inhibitor for the isocitrate lyase (ICL) of M. tuberculosis is
screened. ICL is selected in this study as the potential drug target as it is one of the most
important enzymes to metabolized lipid for the carbon source during the dormant phase.
Literature reviews also show that inhibition of this enzyme will totally eliminates the
bacterium from human host. Molecular docking method is carried out to dock the
compounds from NADI database (consists of more than 3000 active natural compounds
from Malaysia plants) onto the solved crystal structure of ICLs. By using molecular
docking approach, these natural compounds are virtually screened prior to wet lab
experiment, thus offering a time and cost effective alternative in potential drug
discovery. It is hope that possible potential inhibitor can be identified and contributed in
future TB drug development.

Keywords: Isocitrate lyase, Mycobacterium tuberculosis, tuberculosis

1. INTRODUCTION
Tuberculosis (TB) is an air-borne latent infectious disease, which mainly affect lung
cavity of human[9]. It is caused by Mycobacterium tuberculosis (MTB) and was first
discovered by Robert Koch in 1882. Till date, more than one-third of the world population
has been infected with TB and more than 1.7 million people die of TB annually[1]. TB is
curable but its treatment faces many difficulties such as the threat of multidrug resistant
strain, poverty and synergy of HIV. TB research does not get that much attention of profit-
making pharmaceutical company compared to the disease such as diabetes, cancer and
neurological disease. Thus, TB does not have new drug for decades[1].

MTB survives in two phases within human host: the replication (active) phase and the
dormant (inactive) phase [11]. TB therapy involves a long and complex treatment regime (six
to nine months with combination of drugs). The emergence of drug resistant strains have
furthered complicated the situation[1]. Instead of finding new class of drug to fight against
replication phase MTB, it is far more interesting to find a new drug target for dormant phase
MTB as drugs are yet to be available. In this research, virtual screening with natural
compounds from Nature-based Drug Discovery Intelligent (NADI) database has been carried
out on isocitrate lyase of MTB as the potential drug target in order to search for potential
inhibitor that can contribute to new TB drug development.
A00005
March 23-26, 2010
93

2. BACKGROUND OF STUDY
2.1. TB Conventional Treatment Regime
During replication phase, MTB is infectious but it still can be eliminated by human cell-
mediated immune respond and drugs such as isoniazid, rifampin, and pyrazinamide[12].
However, the elimination of MTB is not that effective as expected due to the slow cell-
mediated immune responds and strong drug defence of MTB by its thick, lipid and
polysaccharide enriched cell wall[2]. MTB enters dormant phase to evade from the immune
respond and the drugs by masking their presence into the host macrophage, in where MTB
become inactive and not infectious. However, the low oxygen and nutrient level in
macrophage is not a favorable condition for strict aerobic MTB[12]. Thus, MTB needs to shift
its metabolic pathway in order to survive in dormancy. One of the most typical examples of
metabolic shift is the change of glucose to lipid as the primary carbon source [10].

Conventional treatment regime of TB are the first line (isoniazid, rifampin,
pyrazinamide) and second line drugs (fluoroquinolones, ethambitol, and para-aminosalicyclic
acid)[9]. Usually, TB patient will be treated with first line drug for six to nine months. If the
MTB strain is multidrug-resistent (MDR) (resist to isoniazid and rifampin)[8], second line
drug will be considered to prolong the treatment up to two years. Till date, there are as much
as 30% of the MDR TB cases and the number is on the rise[1]. Ten percent of MDR cases
have showed to the development of the extensive multidrug resistant (XDR) (MDR which
resist to fluoroquinolones and one more class of second line drug)[4]. The emergence of
MDR or XDR mostly cause by the patient who did not complete or not afford to complete the
treatment regime within the treatment period[1].

2.2. The Isocitrate Lyase (ICL) As Potential Drug Target
MTB has to utilize different metabolic pathways during dormant phase in macrophage in
order to survive[10]. During replication phase, MTB normally utilizes glucose as primary
carbon source and TCA (tricarboxylic acid) cycle to generate energy. However, during
dormant phase, MTB will utilizes lipid as primary carbon source and converts the lipid into
carbon source using the glyoxylate pathway to keep itself inactive yet viable.

Figure 1. Schematic diagram of TCA cycle and gyoxylate cycle. The bold blue arrow is
enzymatic steps in TCA cycle, the blue arrow shows the glyoxylate cycle and the dashed
arrow shows the initial step for gluconeogenesis [6]. The protein structure shown is the
isocitrate lyase. Each of the different colour represent each of the subunit of isocitrate
lyase[10].
A00005
March 23-26, 2010
94
Glyoxylate cycle (Figure 1) will bypass the TCA pathway that can help MTB to convert
lipid into glucose during dormancy. Isocitrate lyase (ICL) and malate synthase are the crucial
enzymes involve in this glyoxylate pathway. ICL plays an important role to split the isocitrate
into glyoxylate and succinate, preparing the glyoxylate to enter the glyoxylate pathways. Due
to oxygen depletion in macrophage, dormant MTB will skip several -oxidation steps in the
TCA cycle to avoid further use of oxygen. Literature shows that, the absence of functional
ICL in MTB will cause MTB totally eliminated from the lung[7]. Therefore, in this paper, we
are aimed to search for possible inhibitor for ICL from Nature-based Drug Discovery
Intelligent (NADI) database.

2.3. Virtual Screening With NADI Database
Nature-based Drug Discovery Intelligent (NADI) database is the collection of around
3,000 natural active compounds from variety of plants found in Malaysia. Previously, 3-
bromopyruvate and 3-nitropropionate was found as ICL inhibitors[10]. However, the
inhibitors did not develop into drug as they are toxic to human. To avoid the toxicity of
inhibitor towards human, the search of nature-based inhibitor is rather a new approach for
drug discovery. This approach is usually done by virtual or high throughput screening. In this
research, virtual screening using molecular docking is carried out.

The coordinates of ICL and pyruvic acid were obtained from RCSB Protein Data Bank.
The pyrivic acid was refined using InsightIIs Builder module. The polar hydrogen and partial
charges were added to the ICL using the program of protonate and the kollua.amber option of
AUTODOCK 3.0.5 respectively prior to assign solvation parameter. After the grid map was
constructed (60 x 60 x 60 points with a spacing of 0.375) with AutoGrid, the molecular
docking of ICL with NADI compounds was performed using Lamarckian genetic algorithm
(LGA) with psedo-solis and Wets local search. The following were the parameter for docking:
population size of 50; energy evaluation of 1000000; maximum generation of 27000; elitism
of 1; mutation rate of 0.02; local search rate of 0.06; translation step of 2.0; orientational
and tortional step of 50
o
; crossover rate of 0.8; 300 iterations per local search with termination
value of 0.01; consecutive success or failure before doubling or reducing local search step
size of 4 and a total of 100 docking runs.

4. RESULT AND DISCUSSION
Docking of pyruvic acid to ICL showed it binds (with RMSD 2.24) at almost the same
binding mode as in the crystal structure (1F8M). This shows that docking simulation is able to
reproduce experimental/crystallography data. Table 4.1 shows the top 10 NADI compounds
from the virtual screening according to their free energy of binding with ICL.
All 10 compounds have more negative free energy of binding compared to pyruvic acid
free energy of binding, that is from -25.52kcal/mol to -17.84kcal/mol. Half of the compounds
can be found in juice of Purica anatum and each of them shared the structural similarity of
having at least two carboxylic groups within a linear carbon chain. Among these 10
compounds, top two compounds which have three carboxylic groups (citric acid and
hydroxycitric acid) is structurally similar to isocitratic acid (natural substrate) while the rest
which have two carboxylic group sharing some similarity with succinic acid. This showed
that carboxylic group is important especially the terminal carboxylic group to interact with the
residues of the binding site. This also explained that for those NADI compound which have
two terminal carboxylic groups have better free energy of binding compare to pyruvic acid
that only have one carboxylic group. To further understand the position of carboxylic group,
superimposition of these 10 compounds with pyruvic acid were carried out. It was found that
most of the carboxylic groups were at the same position regardless of their chain length
(Figure 4.1).
A00005
March 23-26, 2010
95
Table 4.1. Top ten compounds docked on ICL.

Rank
Compound
Structure
Compound
Name
Free Energy of
Binding (kcal/mol)
K
i
Plant Source
-

Pyruvic acid -7.09 6.33 x10
-06

Crsytal Stucture
(PDB id: 1F8M)
1

Citric acid -25.52 1.97 x 10
-19

Punica granatum
(juice)
2

Hydroxycitric
acid
-25.44 2.24 x 10
-19

Garcinia
atroviridis
(fruit)
3

3-Methyl-3-
hydroxy-
pentanedioic
acid
-19.25 7.71 x 10
-15

Rosa damascene
(petal)
4

Azelaic acid -19.20 8.48 x 10
-14
Brucea javanica
5

Hexanedioic
acid /Adipic
acid
-18.76 1.77 x 10
-14

Morinda
citrifolia
(fruit)

6

Tartaric acid -18.50 2.74 x 10
-14

Punica granatum
(juice)
7

Malic acid -18.13 5.18 x 10
-14

Punica granatum
(juice),
Aloe vera
(leaf)
8

Fumaric acid -18.00 6.38 x 10
-14

Punica granatum
(juice),
Agwratum
conyzoides
9

Succunic acid -17.86 8.12 x 10
-14

Rosa damascene
(petal)
10

Succunic acid -17.84 8.45 x 10
-14

Punica granatum
(juice)

A00005
March 23-26, 2010
96

Figure 4.1. Superimposition of 10 docked NADI compounds with pyruvic acid (stick
representation). Carboxylic groups were highlighted with yellow circle.

All ligands obtained from the docking result were very polar due to the carboxylic
group. The orientations of the ligand were highly influenced by the polarity of the ligand and
the residues in the binding site. The binding site residues were TYR89, SER91, GLY92,
TRP93, GLN94, ASP108, ASP153, LYS189, LYS190, CYS191, GLY192, HIS193,
ARG228, GLU253, TRP283 ASN313, SER315, SER317, LEU348 and THR347. Some of
these residues do not directly interact with the ligands but either to influence the orientation of
neighboring residues that directly form hydrogen bond with the ligands carboxylic group or
to have van der Waals and electrostatic interaction with the ligands or among other binding
site residues. Figure 4.2 shows the binding site of ICL with ligand.

Figure 4.2. Electrostatic potential surface representation of the ICL binding site with the first
NADI compounds from the virtual screening (citric acid) in the stick representation. The blue
and red shows the positively and negatively charged residue respectively.

5. CONCLUSION
Virtual screening manages to find potential inhibitor for ICL. From the result, we can
conclude that carboxylic group is the key element for good ICL inhibitors. Future research
will need to more focus on examine the potential inhibitors with the Lipinskis Rule of Five
[5], the study on adsorption, distribution, metabolism and excretion (ADME) properties [3]
for the drug-likeliness. Besides that, the accuracy of the result can be improved by more
stringent docking simulation. Molecular dynamics simulation can be carried out to further
analyse on the dynamics behaviour of ligand-ICL, as well as the effect of the environment
A
A00005
March 23-26, 2010
97
such as solvent, ion, temperature, pressure and density. This will aids the understanding of
binding interaction between the ligands and ICL and thus contribute to the TB drug
development.

REFERENCES
1. Global Alliance for TB Drug Development Annual Report 2008, 2008
2. Baker, E., Journal of Structural and Functional Genomics, 2007. 8(2): p. 57-65.
3. Balani, S.K., G.T. Miwa, L.-S. Gan, J.-T. Wu, and F.W. Lee, Current Topics in Medicinal
Chemistry, 2005. 5: p. 1033-1038.
4. Caminero, J.A., Eur Respir J, 2008. 32(5): p. 1413-a-1415.
5. Lipinski, C.A., F. Lombardo, B.W. Dominy, and P.J. Feeney, Advanced Drug Delivery
Reviews, 2001. 46(1-3): p. 3-26.
6. Lorenz, M.C. and G.R. Fink, Eukaryotic Cell, 2002. 1(5): p. 657-662.
7. Muoz-Elas, E.J. and J.D. McKinney, Nature Medicine 11, 2005: p. pp638 - 644.
8. Rabia Johnson, Elizabeth M. Streicher, Gail E. Louw, Robin M. Warren, Paul D. van
Helden, and T.C. Victor, Curr. Issues Mol. Biol, 2006. 8: p. 97-112.
9. Rinaggio, J., Dental Clinics of North America, 2003. 47(3): p. 449-465.
10. Vivek Sharma, Sujata Sharma, Kerstin Hoener zu Bentrup, John D. McKinney, David G.
Russell, W.R.J. Jr., and J.C. Sacchettin, Nature Structural Biology 7, 2000: p. 663 - 668.
11. Ying, Z., Frontiers in Bioscience, 2004. 9: p. 1136-1156.
12. Zahrt, Thomas C., Microbes and Infection, 2003. 5(2): p. 159-167.

A00010
March 23-26, 2010
98
Isoniazid Resistance in Mycobacterium
Tuberculosis Inha Mutants

Yee Siew Choong
1,C
and Habibah A Wahab
2

1
Institute for Research in Molecular Medicine, Universiti Sains Malaysia, 11800 Minden, Penang,
Malaysia.
2
Malaysian Institute of Pharmaceutical and Nutraceutical, sains@USM, Block A, 10 Persiaran Bukit
Jambul, 11900 Bayan Lepas, Penang, Malaysia.
C
E-mail: yeesiew@usm.my; Fax: +604-6534803; Tel: +604-6534837

ABSTRACT
Isoniazid (INH) is the oldest synthetic anti-tuberculosis drug. It is bactericidal and the
most commonly prescribed drug for the treatment and prophylaxis of tuberculosis since
the year 1952 (1). However, statistic shows the continuing rise in multi drug resistant
tuberculosis, extensively drug resistant tuberculosis and extremely drug resistant
tuberculosis (2-4). Thus, there is an urgent need in the basic science to address the key
contribution in the phenomena of drug resistance prior to new drug development (5). In
this research, we report results from molecular dynamics simulation of wild type and
seven mutant types Mycobacterium tuberculosis enoyl-acyl carrier protein reductase
(InhA) in complex with its inhibitor (INADH- activated INH). Results showed that
mutated residues (I16T, I21T, I21V, I47T, V78A, S94A and I95P) led to the
hydrophobicity change and reduction in side chain volume of mutant type InhA. Thus,
both INADH and mutant type InhA will have to rearrange their conformations to
accommodate and alleviate the steric and electrostatic effects. However, the greater
atomic fluctuations and structural instabilities in mutant type InhA-INADH compared
with that of wild type InhA-INADH was not significant. Therefore, only cause slightly
lower binding affinity of INADH in mutant type InhA with that of wild type InhA.
G54, I15 and A22 of InhA formed hydrogen bond with INADH or water molecules
during all molecular dynamics simulations suggested that these molecules might be an
essential part in INADH binding. Our studies have thus contributed to a better
understanding of isoniazid resistance in mutant type InhA.

Keywords: Mycobacterium tuberculosis, InhA mutations, molecular dynamics
simulation.

REFERENCES
1. Bernstein, J., Lott, W. A., Steinberg, B. A. and Yale, H. L., Am. Rev. Tuberc., 1952, 65,
357-364.
2. Ahmad, S. and Mustafa, A. S., Kuwait Med. J., 2001, 33, 120-126.
3. Gandhi, N. R., Moll, A., Sturm, A. W., Pawinski, R., Govender, T., Lalloo, U., Zeller, K.,
Andrews, J. and Friedland, G., Lancet, 2006, 368, 1575-1580.
4. Shah, N. S., Wright, A., Bai, G. H., Barrera, L., Boulahbal, F., Martin-Casabona, N.,
Drobniewski, F., Gilpin, C., Havelkov, M., Lepe, R., Lumb, R., Metchock, B., Portaels,
F., Rodrigues, M. F., Rsch-Gerdes, S., Van Deun, A., Vincent, V., Laserson, K., Wells,
C. and Cegielski J. P., Emerg. Infect. Dis., 2007, 13, 380-387.
5. Migliori, G. B., Loddenkemper, R., Blasi, F. and Raviglione, M. C., Eur. Respir. J.,
2007, 29, 423-427.

A00012
March 23-26, 2010
99
Membrane Protein Simulation: A Case Study on Selected
Hypothetical protein from Klebsiella pneumoniae
MGH78578

Sy Bing Choi
1
, Yahaya M Normi
2
3,C

1
Pharmaceutical Design and Simulation (PhDs) Laboratory, School of Pharmaceutical Sciences,
Universiti Sains Malaysia, 11800 Minden Pulau Pinang, Malaysia.
2
School of Biological Sciences, Universiti Sains Malaysia, 11800 Minden, Pulau Pinang, Malayisa
3
Centre of Advanced Drug Delivery, Malaysian Institute of Pharmaceutical and Nutraceuticals,
Ministry of Science, Technology and Innovation, SAINS@USM, No.10, 11900 Persiaran Bukit Jambul,
Pulau Pinang, Malaysia
C
E-mail: habibah@ipharm.gov.my; habibah@usm.my ; Fax: +604653 4533; Tel.: +604653 2206

ABSTRACT
A hypothetical protein in Klebsiella pneumoniae was postulated as Succinate
dehydrogenase (SDH) Chain C in previous study. SDH plays an important role in the
aerobic respiratory chain and Krebs cycle that occurs in the transmembrane of
mitochondria in both eukaryotic and prokaryotic organisms. To give a more accurate
insight on its molecular role as SDH, molecular dynamics (MD) simulation of SDH in a
membrane was simulated. The simulation system consists of fully hydrated lipid bilayer
with 420 molecules of 1-palmitoyl-2-oleoyl-phosphatidylcholine (POPC) and 29185
water molecules together with the chain C and postulated chain D of SDH. A total of 6
ns of production run was carried out. Structural properties such as area per lipid, tail
order parameter and thickness of lipid were calculated and the values correlate well
with experimental data. Interactions of Ubiquinone (UQ) with the conserved residues of
the built model during MD simulation were in good agreement with the results of
molecular docking simulation that was done previously.

Keywords: Structural modeling, Succinate dehydrogenase, Molecular dynamics
simulation, membrane system.

1. INTRODUCTION
Klebsiella pneumoniae is a Gram negative, non-motile and rod-shaped bacterium and is
known as an opportunistic pathogen found in the environment and in mammalian mucosal
surfaces. K. pneumoniae infections tend to occur in patient with weakened immune system
and people with underlying diseases [1]. The complete genome sequence of K. pneumoniae
MGH 78578 had been completed in 2007 by Genome Research Center of Washington
University of St. Louise. Analysis showed that ~20% of 4776 protein coding genes are
classified as hypothetical proteins where their functions are not known [2]. Thus, it is worth to
predict their structures as it gives insight of their possible function and mechanisms.

Recently, we have postulated a hypothetical protein KPN00728 (gi: 152969292) in the
genome as Chain C of Succinate dehydrogenase (SDH)[3]. SDH plays an important role in
the aerobic respiratory chain in the Krebs cycle that occurs in the transmembrane region of
mitochondria in both eukaryotic and prokaryotic organisms. Our study showed KPN00728
has a missing region with conserved amino acid residues important for Ubiquinone (UQ) and
heme group binding. Structure and function prediction of KPN00728 coupled with the
analysis of secondary structure and transmembrane topology showed that KPN00728 adopts
SDH (subunit C)-like structure. Molecular docking was performed and it was found that UQ
A00012
March 23-26, 2010
100
docked on the built model (consisting KPN00728 and annotated chain D-KPN00729).
Formation of hydrogen bonds between UQ and Ser27, Arg31 (from KPN00728) and Tyr83
(from KPN00729) further reinforced that KPN00728 together with KPN00729 preserved the
functionality of UQ binding. This observation strongly supported the possibility that
KPN00728 is indeed SDH (chain C). That was the first to report on the structural and function
prediction of the KPN00728 of K. pneumoniae MGH78578 as SdhC.

Although docking simulations enabled us to understand the preferable orientation of UQ
when bound to the built model to form stable complex but there are limitations. In docking
simulation, rigidity of protein and target of docking location is defined by the user. Hence this
decreases the degree of freedom of both interacting component during the simulation.
Furthermore, results from docking can only provide a single snapshot of the ligand orientation
which is lacking in interaction dynamics. Therefore, we aimed to utilize a more powerful
computer simulation technique, namely molecular dynamics, to obtain an in-depth
understanding of the structure and function of SDH..

2.1 Setup of the membrane simulation system
128 pre-equilibrated POPC obtained from Peter Tielemen website [4] was used
(Figure 1A). A total of 512 POPC is needed in order to encapsulate our built model (Figure
1B) in to the membrane bilayer. Duplication of 128 lipids in 2 axes using genbox allowed us
to obtain the 512 POPC. The built model was then embedded into the 512 POPC (Figure 1C).
More than 90 molecules of POPC were removed from the constructed bilayer to generate a
suitable membrane system where the built model could be embedded. Solvation was done
with a total of 29153 SPC water molecules using genbox command giving the total number of
atom in the entire system to 111826 atoms. Counterions such as Na+ and Cl- were added to
compensate for the net charge on the system. Minimization of the entire system was done in
different stages of Steepest Descent (SD) and ended with conjugate gradient (CG) to remove
any unfavourable contacts. The system restraint was released gradually from water, followed
by lipid bilayer and lastly all atoms in the system.

Equilibration was achieved in two phases. Generally, a short NVT equilibration is
followed by a longer NPT phase equilibration. The heterogeneity of the system requires a
longer equilibration. 200ps of NVT equilibration was done to equilibrate the temperature of
the entire system. This was done under the restraint condition whereby the protein complex
was under positional restraint condition. Subsequently, NPT equilibration of the system on
protein complex (with restraint) was performed for 2ns with respect to pressure. Nose-Hoover
thermostat was used to produces a correct kinetic ensemble and allows for fluctuations that
mimic more natural dynamics [5]. Semi-isotropic pressure coupling was used to allow the
lipid bilayer to equilibrate in the x-y plane and independently of the z-axis. The system was
further simulated for 6 ns MD production run.
A00012
March 23-26, 2010
101

Figure 1. (A) 128 POPC with 2460 water molecules [4]. (B) Built model SDH C and D with t
of UQ and Heme group as the initial position for MD. (C)The protein is solvated into the
POPC bilayer.

3.1 Stability of the system
All the thermodynamics parameters, such as potential energy, kinetic energy and total
energy profile of the system were constant through out the 6 ns after MD production run. The
temperature and pressure profile also indicated that the system had reached its equilibration
stage. In addition, the area per lipid profile and the bilayer thickness were investigated as
well. Our simulation had demonstrated the average area per lipid within the 6ns production is
64.2 0.4
2
which is close to the accepted experimental value, 62-64
2
[6-11]. Figure 2A
showed that the area per lipid decreased considerably from ~68

to ~64
2
in the simulation.
The decrease might be due to the reorientation of lipids associated with the disordered
segment of the build model.

The thickness of a typical membrane bilayer is about 35-50 [12-14] as measured by
the P-P distance of the bilayer [15]. In this simulation, the thickness of POPC membrane was
observed to be ~ 38.74 during 6 ns simulation time (Figure 2B). No significant fluctuation
had been observed during the simulation. Figure 2B and 2C clearly showed that the
simulation bilayers adopted ~35-40 which is in a good agreement with the experimental-
determined thickness amounted 37 [16].

A
C
B
A00012
March 23-26, 2010
102

Figure 2. (A) Area per lipid throughout of 6 ns simulation time. The observation is in good
agreement to the experimental data from previous studies. (B) The thickness of the membrane
bilayer was obviously stable ~38 . (C) POPC thickness of ~ 40.0 (calculated using
thickness distribution map, GridMAT-MD from Bevan Lab [17].

3.2 Analyses on UQ Interaction
Molecular docking simulation was done previously [3] to validate the functionality of the
previously built model of KPN00728 as SDH. UQ being the natural ligand of SDH was
docked into the postulated binding pocket in the model identified based on the previous study
[18-20]. From the docking result, formation of hydrogen bonding between UQ and
TYR83@OH (chain D), SER27@OG (postulated chain C from KPN00728) and
ARG31@NH1 (postulated chain C from KPN00728) were found to agree well with previous
studies [19, 21]which implied further that KPN00728 have preserved the functionality of UQ
A
C
B
B
C
A00012
March 23-26, 2010
103
binding. In order to provide a further insight of the system dynamically, molecular dynamics
simulation was performed.

The average distance between UQ with postulated interacting residues of KPN00728
and annotated KPN00729 (chain D of SDH) proteins were presented in Table 1 and Figure 3.
There is a high possibility of forming hydrogen bond between TYR83@OH (from chain D of
SDH) and UQ@O1 based on the fact that the distance remained plateau ~2.5 with no
significant fluctuations occurred (Figure 4). Although the average distance for ARG31@NH1
and UQ@O2 during MD simulation was well agreed with the docking results which done
previously, but significant fluctuations from occurred in ~4 to ~5 after 2.5 ns (Figure 4).

Figure 3. Comparison of distance for SER27, ARG31 and TYR 83 residues with UQ in the
(A) docking simulation and (B) molecular dynamics simulation. Distance is quoted in .

Table 1. Comparison of molecular docking and MD simulation results

Interaction MD result ()

Docking result ()

TYR83@OH and UQ@O1 2.57 0.10 2.58
ARG31@NH1 and UQ@O2 3.76 0.60 3.83
SER27@OG and UQ@O3 7.34 2.00

2.68
B
A
A
A00012
March 23-26, 2010
104

Figure 4. (A) Distance of TYR83@OH from KPN00729 annotated chain D of SDH with
UQ@O4 measured against the simulation time. (B) Distance of ARG31@NH1 from
KPN00728 with UQ@O2 drifted further apart from each others after 2.5 ns. (C) Distance
between SER27@OG from KPN00728 and UQ@O3 drastically increased and adopted high
fluctuation after 3 ns of the simulation time.

Based on our observation, UQ drifted further away from both ARG27 and SER31
(KPN00728, the postulated chain C of SDH) during the simulation (Figure 4 B and C). It is
impossible for a hydrogen bond to form between them with the cut-off distance that we used
previously. Further examination by viewing the trajectory was done in order to further
elucidate our concern. We know that UQ binding site is located in the void area between the
entrance of two chains (postulated chain C hypothetical protein KPN00728 and chain D of
SDH) in the transmembrane area. Water molecules appeared to be found in the entrance and
void area between KPN00728 and chain D of SDH at binding site of UQ. With the existence
of water molecules at the void area between the protein and UQ, we postulated that solvation
effects might have some contributions towards binding of protein with UQ. Hence, further
investigation is needed to elucidate our postulation.

B
C
A
B
C
A00012
March 23-26, 2010
105
4. CONCLUSION
The structural and energy properties such as area per lipid, bilayer thickness and energy
profile of the system which were generated by MD simulation it showed good agreement with
the experimental data. This indicated that the membrane system was well equilibrated.
Comparison of the docking and MD simulation results for UQ interaction had revealed that
TYR83 (KPN00729 chain D of SDH) from the built model was more likely to form hydrogen
bond with UQ. However, SER27 and ARG31 (KPN00728) had drifted further away from UQ.
Further investigation for the occurrence of waters inside the binding pocket of UQ is needed
to elucidate the solvation effect on UQ binding with SER27 and ARG31 residues of
postulated chain C from KPN00728 of K. pneumoniae MGH78578.

ACKNOWLEDGMENTS
This research is one part of the USM-RU grant (1001/PBIOLOGY/815015). Sy Bing, Choi
like to acknowledge USM for the support of USM fellowship.

REFERENCES
1. Kawai, T. Clin. Infect. Dis., 2006, 42(10): p. 1359-61.
2. Gert Lubec, L.A.-S., Jae-Won Yang, Julius Paul Pradeep John. 2005, 77: p. 90-127.
3. Choi, S.B., Y.M. Normi, and H.A. Wahab. Protein J, 2009, 28: p. 415-427.
4. Tieleman, D.P., M.S. Sansom, and H.J. Berendsen. Biophys J, 1999, 76(1 Pt 1): p. 40-9.
5. Hnenberger, P.H. (2005) Thermostat algorithms for molecular dynamics simulations.
Volume, 105-147
6. Kucerka, N., S. Tristram-Nagle, and J.F. Nagle. J Membr Biol, 2005, 208(3): p. 193-202.
7. Kukol, A. J. Chem. Theory Comput., 2009, 5(3): p. 615-626.
8. Leekumjorn, S. and A.K. Sum. J Phys Chem B, 2007, 111(21): p. 6026-33.
9. Jojart, B. and T.A. Martinek. J Comput Chem, 2007, 28(12): p. 2051-8.
10. Smaby, J.M., et al. Biophys J, 1997, 73(3): p. 1492-505.
11. Pabst, G., et al. Physical Review E, 2000, 62(3): p. 4000-4009.
12. Stryer, L., Biochemistry. 4th ed. 1995, New York: W.H. Freeman. xxxiv, 1064 p.
13. Kandasamy, S.K. and R.G. Larson. Biophys J, 2006, 90(7): p. 2326-43.
14. Lantzch, G., et al. Biophys Chem, 1996, 58(3): p. 289-302.
15. Lemkul, J.A. and D.R. Bevan. Febs J, 2009, 276(11): p. 3060-75.
16. Tieleman, D.P., S.J. Marrink, and H.J. Berendsen. Biochim Biophys Acta, 1997, 1331(3):
p. 235-70.
17. Allen, W.J., J.A. Lemkul, and D.R. Bevan. J Comput Chem, 2009, 30(12): p. 1952-8.
18. Lemire, B.D. and K.S. Oyedotun. Biochimica Et Biophysica Acta-Bioenergetics, 2002,
1553(1-2): p. 102-116.
19. Yankovskaya, V., et al. Science, 2003, 299(5607): p. 700-704.
20. Yang, X.D., et al. J. Bio. Chem., 1998, 273(48): p. 31916-31923.
21. Oyedotun, K.S. and B.D. Lemire. J. Bio. Chem., 2004, 279(10): p. 9424-9431.

A00020
March 23-26, 2010
106
Finding new lead compound for anti-cancer drug by using in
silico screening technique

K. Bangphoomi and K. Choowongkomon

Department of Biochemistry, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
E-mail: fsciktc@ku.ac.th, Tel: +66-2-562-5555 ext 2051

ABSTRACT
Cancer is one of high mortality disease. Although, there are several anti-cancer drugs
but new finding of lead compounds is still important for cancer therapy. ErbB2, one of
receptor tyrosine kinase of ErbB family, is a new target for anti-cancer drug design,
especially in breast cancer (Olayioye, 2001). New ErbB2 inhibitor was design for
tyrosine kinase (TK) region inhibition. The advantage of inhibiting to TK region is that
it can prevent the crosstalk phenomenon of ErbB family. TK region of ErbB2 structure
was modeled by using TK region of ErbB1/EGFR as a template. This structure was
used for the receptor-based virtual screening of 1990 compounds in the National Cancer
Institute (NCI) database. The results have shown that some compounds have been
found to be high binding affinity to TK region of ErbB2. These compounds can be used
as a lead compound in anti-cancer drug.

Keywords: ErbB2, tyrosine kinase, anti-cancer drug, virtual screening, molecular
docking

1. INTRODUCTION
Cancer is a class of non-infectious disease. This disease is an important deadly
disease in human. Although, there are many treatment of cancer including surgery,
radiotherapy, chemotherapy, and anti-cancer drug, the effective therapy also required.
Anti-cancer drug trend is going to target specific drug such as Lapatinib that specific
to ErbB1 and ErbB2 receptor. However the novel target specific lead compound with
improved efficiency and minimal side effect still important for cancer therapy. The
ErbB family of receptor tyrosine kinase plays a major role in the formation of tumors
and has been used as anti-cancer drugs. ErbB2 is a one of epidermal growth factor
family that associate with breast, ovarian, stomach, bladder, salivary, and lung cancer.
This receptor composes of five domains including extracellular domain,
transmembrane domain, juxta membrane domain, tyrosine kinase domain, and C-
terminal domain. The binding of growth factor to ErbB2 receptor monomers induces a
conformational change in the extracellular domain of the receptor. The
conformational change of ErbB2 exposes a dimerization domain, which enables two
monomers to interact with each other and form an activated receptor dimer.
Moreover, the EGFR family can perform heterodimer without the binding of growth
factor (Earp et. al., 2003). The Activation receptor dimers cause the subsequent
activation of intracellular signaling pathways. This signaling pathway is achieved
through the intrinsic tyrosine kinase activities of ErbB receptors. In a receptor dimer,
the tyrosine kinase domain of one dimerization partner is able to phosphorylate
specific tyrosine residues in the COOH-terminal part of the other monomer; a
mechanism is called cross-talk. The way to inhibit signaling pathway without cross-
talk phenomenon is tyrosine kinase inhibition.
A00020
March 23-26, 2010
107
Virtual screening (VS) is a computational drug discovery technique (Rollinger
et. al., 2008). This method provides lower cost and used time for working, while
searching for novel potential drug-liked compounds (Schneider et. al., 2002). This
method requires the detail of the 3D structure of the binding site of the target protein
to arrange the ligand compounds by their likelihood (Lengauer et. al., 2004). There
are two categories of screening technique such as ligand-based and receptor-based.
For this research perform under receptor-based virtual screening with National cancer
institute (NCI) database. The NCI database provide 1990 compounds, the best
compound was used for further analysis.


Preparations of ligands databases and target proteins
The research was performed by using Linux operating system version Fedora
core 4 (RedHat) on Pentium IV computer. The potential drug-like compounds was
screened from 1990 compounds of NCI diversity set, a ligand data set available from
www.cancer.gov. The tyrosine kinase domain of ErbB2 receptor was model by using
swiss model server. The Structure of the epidermal growth factor receptor kinase
domain alone and in complex with a 4-anilinoquinazoline inhibitor (PDB code: 2Z0A
1M17) was used as a template. The tyrosine kinase of ErbB2 model was used as the
target protein to screen for the ligands in this study.
Virtual screening
The computational screening involved the docking of each molecule from the
Erlotinib or Tarceva into the binding site of the tyrosine kinase domain. The
docking process consisted of sampling the co-ordinate space around the binding site
and scoring each possible ligand pose, which was then taken as the predicted binding
mode for each compound. In this experiment, we used AutoDock program V.4.0 for
molecular docking. The program was used the genetic algorithm (GA) for sampling
ligand flexibility and a force field for scoring the function. A grid box was built
with twelve different atom types (A = aromatic carbon, C = aliphatic carbon, H =
hydrogen, O = oxygen, S = sulfur, N = nitrogen, P = phosphorus, I = iodine, F = iron,
f = fluorine, b = bromine, and c = chlorine) via AutoGrid calculations. The grid space
was set at 0.375 . The grid box layout is 23.49*-0.369*53.848 points in x*y*z axis
from center of macromolecule. AutoDock calculation was performed 50 times per
ligand (Morris et. al., 1998). All of the processes were controlled by in-house scripts.


The scoring functions of the Autodock program reveal the best scoring functions
compound from the NCI database. The docking result had shown the 9 best structures
with final docked energy equal or lower than-9.00 kcal/mol (Table 1). Docking energy
and binding geometries suggest that the diversity_293778 from NCI database is the
best lead compound for tyrosine kinase domain binding (figure1).

A00020
March 23-26, 2010
108
Table 1. The lead compounds possessing the best fit in the RNA tyrosine kinase domain

Compounds*

Structures

Run
Final Docked Energy
(Kcal/mol)

293778

24

-10.67

269148_b

21

-10

327702

19

-9.75

156516

1

-9.61

A00020
March 23-26, 2010
109
Table 1. (cont.) The lead compounds possessing the best fit in the RNA tyrosine kinase
domain

Compounds*

Structures

Run
Final Docked Energy
(Kcal/mol)

320218

23

-9.51

310326

19

-9.38

19990_b

8

-9.27

158413

30

-9.14

A00020
March 23-26, 2010
110
Table 1. (cont.) The lead compounds possessing the best fit in the RNA tyrosine kinase
domain

Figure 1 The molecular structures of the tyrosine kinase domain of ErbB2 model complex
with the lowest docking energy compound (diversity_293778). The binding geometries show
the buried structure of compound into the pocket of tyrosine kinase domain

Compounds*

Structures

Run
Final Docked Energy
(Kcal/mol)

340852

10

-9.12
A00020
March 23-26, 2010
111
4. CONCLUSION
The virtual screening is the technique for filtering various small chemical compounds to
interest structures. In this study, we have shown that the NCI diversity database was able to
identify the 9 candidates of lead compounds that have the final docked energies equal or
lower than-9.00 Kcal/mol. These results are the preliminary data for further computing the
detail binding affinity by using molecular dynamics simulation techniques. Moreover, these
candidates will be tests both in vitro and in vivo experiments to develop the new drugs for
cancer with ErbB2 positive treatment.

LITERATURE CITED
1. Olayioye MA., Breast Cancer Res., 2001, 3 (6): 385389.
2. Earp H. S. Calvo B. F. and Sarto C. L., Trans Am Clin Climatol Assoc ., 2003,
6(114): 315-333
3. Rollinger J. M. Stuppner H. and Langer T., Prog Drug Res., 2008, 65, (211),
213-249.
4. Schneider, G. and H. Bhm., Drug Discov. Today., 2002, 7, 64-70.
5. Lengauer, T., C. Lemmen , M. Rarey and M. Zimmermann., Drug. Discov. Today.,
2004, 9, 27-34.
6. Morris, G. M., D. S. Goodsell, R.S. Halliday, R. Huey, W. E. Hart, R. K. Belew
and , A. J. Olson., J. Com. Chem., 1998, 19, 1639-1662.

ACKNOWLEDGMENTS
Thanks are due to NANOTECH, NSTDA, Thailand for providing computer program
(Discovery studio).

A00022
March 23-26, 2010
112
De novo Design of HIV-1 Reverse Transcriptase Inhibitor
against K103N/Y181C Mutant: Bioinformatics Approach

J. Yongpisanphop
1,C
, P. Saparpakorn
2
, S. Hannongbua
2
, and M. Ruengjitchatchawalya
1

1
School of Bioresources and Technology and School of Informaion Technology, King Mongkuts
University of Technology Thonburi, 83, Moo 3, Thakham, Bangkhuntien, Bangkok, 10150, Thailand
2
Department of Chemistry, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
C
E-mail: s1460001@st.kmutt.ac.th; Fax: 082-4707500; Tel. 084-0225762

ABSTRACT
According to the rapid mutation of HIV-1 Reverse Transcriptase (HIV-1 RT), the
efficiency of drugs have been reduced, therefore, the aim of this study is to design novel
potent HIV-1 RT inhibitor against double mutation (K103N/Y181C) based on de novo
drug design. The first task is to analyze x-ray crystallographic structures of three mutant
enzymes, including K103N/Y181C (3BGR), K103N (3DRS) and Y181C (3DRR), and
prepare starting enzyme binding pockets. The second task is to generate set of ligands
by using LigBuilder program. Then, combinatorial AutoDock calculations have been
used to calculate the binding energy of enzyme/ligand complexes. Herein, about 200
ligands have been generated for each mutant structure. The obtained results can suggest
the best candidate for the double mutant HIV-1 RT inhibition, based on the lowest
binding energy. The detailed structural information has been also analyzed. The
advantage of this work is become as a high powerful bioinformatics approach to
increase probability to discover new potent inhibitor against various mutations.

Keywords: HIV-1 RT, NNRTIs, de novo Drug Design, K103N, Y181C,
K103N/Y181C

1. INTRODUCTION
AIDS (Acquired Immunodeficiency Syndrome) is a crucial health medicine problem of the
world caused by HIV (Human Immunodeficiency Virus). There are mainly three enzymes
that were encoded by HIV genome such as reverse transcriptase, integrase and protease. The
HIV-1 reverse transcriptase (RT) is an essential enzyme in the life cycle of HIV-1 playing an
important role in the conversion of the single-stranded RNA into double stranded DNA [1].
As above-mentioned, HIV-1 RT has been being considered as the major target for
chemotherapy. Recently, there are many drugs that were used for treatment with HIV patient
[2]. Unfortunately, these drugs especially NNRTIs (Nonnucleoside Reverse transcriptase
inhibitors) leading to rapidly mutate of HIV-1 RT [3]. Currently, HIV mutations have been
reported by Stanford HIV Drug Resistance Database [4]. The more HIV can rapidly mutate,
the more rapidly design novel inhibitor against it. This problem can be solved by de novo
approach. In order to obtain a novel inhibitor against the double mutation (K103N/Y181C) of
HIV-1 RT, in this study, the de novo drug design was used to generate various candidates that
binding with K103N and Y181C. Then, the combinatorial docking was used to calculate the
binding energy of their complex with targets. Finally, the best candidate will be selected and
docked into double mutant enzyme target so as to compare its binding energy with that of
K103N/Y181C binding and rilpivirine was used as a reference.

The core principle of de novo drug design concerns about assembling possible compounds,
evaluating their quality and searching the sample space for novel structure with drug-like
A00022
March 23-26, 2010
113
property. Many de novo methods have been reported [5]. One of them is LigBuilder v 1.2 that
is a multi-purpose program for structural based drug design. It builds up ligands step by step
using a library of organic fragments within the structural constraints of the target protein. In
addition, LigBuilder provides both growing and linking strategies to construct ligands from
seed as the starting point for building up ligands and the whole construction process is
controlled by genetic algorithm. Moreover, LigBuilder gives two kinds of score for filtering
ligands out. One is binding affinity score that are estimated by an empirical scoring function.
The other is bioavailability that is calculated by applying certain chemical rules [6]. At the
present time, LigBuilder has been being broadly used in many studies for example it was used
for discovery of a novel HCV helicase inhibitor [7].

AutoDock v 4.2 is a program for predicting the interaction of ligands with targets. It uses a
semi impirical free energy force field to evaluate conformations in two steps. Start from, the
estimation of the intramolecular energetics for the transition from protein and its ligand
unbound state to bound state. Then the intermolecular energetic of combining the protein and
ligand are estimated. AutoDock has been used in various applications. For example, structure
based drug design, lead optimization, virtual screening, combinatorial library design and
protein-protein docking [8]. Nowadays, AutoDock has been being extensively used in several
studies for instance it was used for Discovery of Multitarget inhibitors [9].

This investigation was divided into two main parts (1) ligand design and (2) calculations of
the binding energy of enzyme/ligand complex. Begin with, the preparation of enzyme binding
pocket, ligand and seed molecule as input of LigBuilder. Following by LigBuilders process,
ligands will be generated as output. Then generated ligands were used to calculate the binding
energy by using AutoDock. And then, ligand that contains the lowest binding energy was
selected. Finally, the binding energy was compared with the referent ligand.

The X-ray crystallography structure of K103N (PDB ID: 3DRS), Y181C (PDB ID: 3DRR),
K103N/Y181C (PDB ID: 3BGR) were downloaded from Protein Data Bank to prepare
enzyme binding pocket and ligand. Discovery Visualizer v 2.5 was used to select group of
amino acid that surrounding the ligand of 8. Ligand and water molecules were removed.
Then hydrogen atoms were added. The ligand was added by hydrogen atoms and saved in
Mol2 format file.

For seed molecule, the selection was done by finding of the same fragment structure of five
NNRTIs (Nevirapine, Delavirdine, Efavirenz, Etravirine and Rilpivirine). It was found that
benzene fragment was found in all of the NNRTIs, so it was regarded as a seed molecule.
Seed was extracted from its protein ligand complex both 3DRR and 3DRS. The suitable seed
structure was presented in a Mol2 file, added hydrogen atom and labeled growing sites or
linking sites on the seed structure by changing the atom type of the corresponding hydrogen
atom from H to H.spc.

Ligand design, the LigBuilder was used to generate set of ligand molecules. Starting with,
POCKET module, inputs are the binding pockets and ligands that were previously prepared.
The module will define a box to cover the ligand and all the surrounding residues and create
regular-spaced grids of 0.5 within the box. Then it will derive key interaction sites within
the binding pocket. The next step is GROW module that was used to generate molecules by
using Genetic Algorithm to develop and evolve molecules from seed molecule. Begin with,
an initial population was generated based on the benzene structure as a seed. And then,
parents molecules were selected from the current population into mating pool and the elite
molecule of the current population was copied into the new population. The last, a new
population was filled out by performing structure manipulation on the molecules in the
A00022
March 23-26, 2010
114
mating pool. Herein, running of GROW was done in duplicate. The last is PROCESS where
the generated molecules were analyzed and then converted to viewable Mol2 files. A set of
200 molecules meeting the chemical criteria setup were extracted and converted to viewable
Mol2 files. In this work, parameters were set to 500 and 300 of molecular weight of
maximum and minimum; respectively, 6 of maximum and 3 of minimum LogP, 10 of
maximum and 5 of minimum of binding affinity to the target enzyme, 5 of maximum and 2 of
minimum hydrogen bond donor atoms and 10 of maximum and 2 of minimum hydrogen bond
acceptor atoms, 20 of generation number, 3000 of population size and 200 of parent
molecules.

Calculations of the binding energy of enzyme/ligand complex, the AutoDock was used to
calculate the binding energy of the generated ligands in mutant type (K103N and Y181C) of
enzyme. So as to cover the binding pocket, the grid point was set to 40 x 40 x 40 center on the
XYZ-coordinates as a grid center that were set to 9.527 12.561 16.214 and 9.494 12.444
16.085; respectively. These coordinates were obtained by indicating the center on amino acid
in binding pocket and by using Discovery Visualizer v 2.5. The parameters were set for using
with all molecular docking simulation such as a grid spacing of 0.375 , receptor atom types
of A, C, HD, N, NA, OA and SA, ligand atom types of A, C, HD, N, NA, OA, SA and Cl, a
Lamarckian genetic algorithm, population size of 150, mutation rate of 0.02, cross overate of
0.8, a maximum of 2.5 million energy evaluations, maximum of 27000 generations and run
genetic algorithm of 50 times. The 50 docked conformations were obtained. The lowest
energy conformation was regarded as the binding conformation of enzyme/ligand complex. In
this study, the binding energy of K103N/Y181C mutant structure interact with the rilpivirine
were calculated for reference value by using 50.199 -28.851 36.019 as XYZ coordinate and
A, C, HD, N, NA, OA and SA as a ligand atom type.

The combinatorial docking strategy was used to identify a novel ligand that can against both
K103N and Y181C. Begin with, each of the 200 ligands that were generated from K103N
binding pocket by using LigBuilder were docked into target of K103N and Y181C mutant
enzyme by AutoDock. In the same way, each of the 200 ligands that were generated from
Y181C was docked into target of K103N and Y181C mutant enzyme as well. And then the
lowest binding energy of the same ligand was sum and sorted. Because ligands that were
generated from LigBuilder have numerous structure; therefore, python scripts were written in
order to assist the program for self-operating. Finally, the ligand that has minimum binding
energy was chosen as a novel inhibitor against K103N/Y181C mutant. Moreover, the novel
inhibitor was docked into K103N/Y181C to calculate the binding energy and compared with
referent ligand.

According to analysis of x-ray crystallographic structures of three mutant enzymes,
including K103N/Y181C, K103N and Y181C, the amino acids surrounding 8 around the
inhibitor are shown in Figure 1. Because it covers all of the necessary amino acid in binding
pocket and completely enough to interact; therefore, they have been used for starting enzyme
binding pockets as a target of LigBuilder and AutoDock. They contain amino acid both of the
p66 subunit that divides into three type such as aromatic residues (Tyr181, Tyr188 and
Try318; Phe227 and Trp229), hydrophobic residues (Pro95 and Pro236; Leu100 and Leu234;
Val106 and Val179), hydrophobic residues (Lys101 and Lys103; Ser105, Asp192 and Glu224
and the p51 subunit that contains Glu138). The hydrogen bond of each ligands were indicated
in green dash line. The structures of Y181C and K103N complex with its ligand have a
hydrogen bond between H atom of Lys 103 and N atom of ligand K103N/Y181C. Whereas,
there are three hydrogen bonds in double mutant binding with its ligand, in which N atoms of
ligand binding with NH atom of Lys 101, OH atom of Lys 101 and Asn 103; respectively.
A00022
March 23-26, 2010
115

Figure 1. Binding pockets of three mutant enzymes, inhibitor shown in stick MK
4965 ((A) and (B)) and rilpivirine (C)

The binding pocket of K103N and Y181C were analyzed in order to prepare the crucial
information to use generated molecule. Not only the key interaction sites within binding
pocket, in which nitrogen atoms (blue) represent hydrogen-bond donor sites; oxygen atoms
(red) represent hydrogen bond acceptor sites and carbon atoms (gray) were derived as shown
in Figure 2, but also all the atoms forming the binding pocket and all the grids within the
binding pocket were created as well. These were used as a template of seed molecule in order
to generate set of ligands. In these results, 200 ligands generated from each binding pockets of
mutant enzyme have been filtered, followed by parameters set up.

Figure 2. Key interaction site of K103N (A) and Y181C (B).

For the result of score summation of the binding energy from cross docking, the lowest scores
(-26.83 and -24.27 kcal/mol) belong to ligand_116 and ligand_024 that generated from
Y181C mutant enzyme in duplicate 1 and 2; respectively. Whereas the lowest scores (-25.84
and -25.73 kcal/mol) belong to ligand_024 and ligand_113 that generated from K103N
mutant enzyme in duplicate 1 and 2; respectively. In addition, the binding energies of these
ligands have been compared with the reference ligand as shown in Table 1.

(A) (B)
Y181C (A) K103N (B) K103N/Y181C (C)
A00022
March 23-26, 2010
116
Table 1. The binding energy values of novel ligand compared with MK 4965

Novel Ligand Binding Energy
(kcal/mol)
MK 4965
(kcal/mol)
Y181C as template
(duplicate I)
Ligand_116 in Y181C

-15.26 -11.69
Ligand_116 in K103N

-11.57 -11.40
Y181C as template
(duplicate II)
Ligand_024 in Y181C

-11.65 -11.69
Ligand_024 in K103N

-12.62 -11.40
K103N as template
(duplicate I)
Ligand_024 in Y181C

-12.68 -11.69
Ligand_024 in K103N

-13.16 -11.40
K103N as template
(duplicate II)
Ligand_113 in Y181C

-11.36 -11.69
Ligand_113 in K103N

-14.37 -11.40

According to Table 1, these results show that novel ligands are able to bind with their enzyme
targets by considering from the binding energy are similar or better than the binding energy of
referent ligand. It indicates that the novel ligands that generated followed by this study are
able to interact with single mutant enzyme.

The binding energies of the novel ligands that docked into double mutant enzyme target
(K103N/Y181C) as compared with those of rilpivirine are shown in Table 2.

Table 2. The binding energy of the novel ligands as compared with rilpivirine

Novel Ligand Binding Energy
(kcal/mol)
Rilpivirine
(kcal/mol)
Y181C as template Ligand_116 in Y181C
(duplicate I)
+0.11
-12.73
Ligand_024 in K103N
(duplicate II)
-3.96
K103N as template Ligand_024 in Y181C
(duplicate I)
-2.92
Ligand_113 in K103N
(duplicate II)
-1.25

According to Table 2, the ligand_024 generated from Y181C mutant enzyme target reviews
the best binding by considering from the lowest binding energy (-3.96 kcal/mol). However, its
binding energy is higher than that of rilpivirine (-12.73 kcal/mol). Therefore, it should be
considered in order to increase the efficiency of the binding. In the case of ligand_024, the
chemical formula is C
27
H
34
NO
4
and contains molecular weight of 436, 2 hydrogen bond
acceptors, 3hydrogen bond donors, and logP of 5.530. These values are in the range of
Lipinskis rule of 5, except, the value of logP which is higher than the threshold (less than 5
of logP). Therefore, ligand_024 can be considered as the drug likeness compound.

A00022
March 23-26, 2010
117
5. CONCLUSION
In this study, ligand_024 (C
27
H
34
NO
4
) has been designed by using de novo approach
according with cross docking method. From this strategy, it can bind with single mutant
enzymes and contains the estimate binding energy better than of reference ligand. Moreover,
it can bind with double mutant enzyme target with the estimate binding energy -3.96 kcal/mol
and it contains the properties of drug likeness compound. The methods used in this work can
be applied to increase probability to discover new inhibitor against various mutation enzymes
and fastening drug screening.

REFERENCES
1. Domaoal, R.A. and Demeter, L.M., The International Journal of Biochemistry & Cell
Biology, 2004, 36, 1735-51.
2. Prajapati, D.G., Ramajayam, R., Yadav, M.R., and Giridhar, R., Bioorg. Med. Chem.,
2009, 17(16), 5744-62.
3. Clavel, F., and Hance, A. J., NEJM., 2005, 350, 1023-35.
4. http://hivdb.stanford.edu
5. Schneider., G., and Fechner, U., 2005 Nat. Rev. Drug Discovery., 2005, 4, 649-63
6. Wang, R., Gao, Y., and Lai., L.,2000., J. Mol. Modell., 6(3), 197-210.
7. Kandil., S., Biondaro, S., Vlachakis, D., Cummins, A.C., Coluccia, A., Berry, C.,
Leyssen, P., Neyts, J., and Brancale A., 2009, 19, 2935-37.
8. http://autodock.scripps.edu
9. Wei, D., Jiang, X., Zhou, L., Chen, J., Chen, Z., He, C., Yang, K., Liu, Y., Pei, J., and Lai,
L., J. Med. Chem., 2008, 51, 7882-8.

ACKNOWLEDGMENTS
J.Y would like to thank National Center for Genetic Engineering and Biotechnology, Thailand
(BIOTEC), and King Mongkuts University of Technology Thonburi for a full scholarship.
S.H. and P.S. are grateful to the Thailand Research Fund (RTA5080005 and MRG5080267).
Laboratory for Computational and Applied Chemistry (LCAC) and National Center of
Excellence for Petroleum, Petrochemicals, and Advanced Materials (NCE-PPAM) are also
gratefully acknowledged for research facilities and computing resources.

A00024
March 23-26, 2010
118
Molecular Modeling of Peroxidase and Polyphenol Oxidase :
Substrate Specificity and Active Site Comparison

P. Nokthai
1,4,C
, L. Shank
2,4
, and V. S. Lee
3,4

1
Bioinformatics Research Laboratory (BiRL), Faculty of Science, Chiang Mai University,
Chiang Mai 50200, Thailand
2
Phytochemica Research Unit, Faculty of Science, Chiang Mai University, Chiang Mai 50200,
Thailand
3
Computational Simulation and Modeling Laboratory (CSML), Faculty of Science, Chiang Mai
University, Chiang Mai 50200, Thailand
4
Department of Chemistry and Center for Innovation in Chemistry, Faculty of Science,
Chiang Mai University, Chiang Mai 50200, Thailand
C
E-mail: pnokthai@gmail.com

ABSTRACT
Peroxidases and polyphenol oxidase are well known enzymes involved in the
enzymatic browning reaction of fruits and vegetables with different catalytic
mechanisms. However, the details of the substrate interaction with those two enzymes
are still unclear. In our computational study, the amino acid sequence of grape (Vitis
vinifera) peroxidase (ABX) was used for the construction of models employing
homology modeling method based on the X-ray structure of cytosolic ascorbate
peroxidase from pea (PDB ID:1APX), whereas the grape polyphenol oxidase were
obtained directly from the available X-ray structure (PDB ID:2P3X). The quality of the
model was checked by PROCHECK and Verify3D. Molecular docking of common
substrates of these two enzymes was subsequently studied. Consensus scoring function
of Piecewise Linear Potential (PLP), Potential of Mean Force (PMF), CDOCKER
energy, and Ludi Scores were used to evaluate the protein-ligand affinity. The final
conformations were selected according to their scores. Predicted binding modes of
substrates with both enzymes were compared.

Keywords: peroxidase, polyphenol oxidase, browning reaction, molecular docking,
consensus scoring

REFERENCES
1. Krieger, E.; Nabuurs, S. B.; Vriend, G. In Structural Bioinformatics; Bourne, P. E.,
Weissig, H., Eds.; WILEY-LISS, Inc.: New Jersey, 2003.
2. Sharp, K. H.; Moody, P. C.; Brown, K. A.; Raven, E. L. Biochemistry 2004, 43, 8644-51.
3. Ziemys, A.; Kulys, J. Int. J. Mol. Sci. 2005, 6, 245-256.
4. Kanade, S. R.; Suhas, V. L.; Chandra, N.; Gowda, L. R. FEBS J 2007, 274, 4177-87.

B00003
March 23-26, 2010
119
Hybrid Quantum Mechanical/Molecular Mechanical studies
on Two Families of cis,cis-Muconate Lactonizing Enzymes

T. Somboon
1
, M. P. Gleeson
1
, and S. Hannongbua
1,C
1
Faculty of Science, Kasetsart University, 50, Phahon Yothin Rd, Chatuchak, Bangkok, 10900,
Thailand

C
E-mail: fscisph@ku.ac.th; Fax: 02-562-5555; Tel. 02-5625555

ABSTRACT
Muconate lactonizing enzymes (MLEs) are members of the enolase family which
catalyse the conversion of cis,cis-muconates to muconolactones. It was recently
reported by Sakai et al that two different MLEs, derived from divergent families and
displaying only 26% sequence identity, both catalyze the same chemical reaction, but
involve stereochemically distinct mechanisms (anti- and syn-cycloisomerization). This
is particularly interesting from a fundamental evolutionary perspective, in that nature
has evolved two distinct proteins, which bare striking similarity at the active site level,
but achieve the same product in distinctly different ways.
This example represents an ideal case study for a computational analysis, in an effort to
understand the reasons for the distinct reactivity differences observed. Computational
Chemistry can play a significant role in the elucidation of physical processes by
allowing us to simulate many diverse processes at the atomic level, including
biochemical reactions. The preferred QM methods cannot however be employed on
protein sized systems so more approximate methods such as the Hybrid Quantum
Mechanical-Molecular Mechanical (QM/MM) methods must be used. In this method
the key portion of the system is treated QM and the remaining environment modeled
using the less computationally demanding MM method. Such simulations have been
used to model the mechanisms of action of numerous proteins as a method to model the
reaction mechanism of enzymes.
The aim of this research is to gain insight into the mechanism of MLEs using QM/MM
methods. Particular emphasis will be placed on: (a) understanding the origin of
differences in stereochemical courses for anti and syn-MLE, (b) identify the basic
residue involved in the reaction and (c) to estimate the energy profiles along the
reaction coordinate between two possibilities of anti and syn-MLE. From this research
we add insight into this important area of biochemistry, and help to shed the light on
the mechanism for these two enzymes and the reason for the stereochemical
differences. Particularly in this case where evolution has resulted in diverse,
mechanistically distinct ways of producing the same substrate.

Keywords: Muconate lactonizing enzymes, enolase family, Hybrid QM/MM methods.

REFERENCES
1. Sakai, A., Fedorov, A. A., Fedorov, E. V., Schnoes, A. M., Glasner, M. E., Brown, S.,
Rutter, M. E., Bain, K., Chang, S., Gheyi, T., Sauder, J. M., Burley, S. K., Babbitt, P. C.,
Almo, S. C., and Gerlt, J. A., Biochemistry, 2009, 48, 1445-1453.
B00004
March 23-26, 2010
120
Kinetics of the Hydrogen Abstraction
Cl + Alkane HCl + Alkyl Reaction Class: An Application
of the Reaction Class Transition State Theory

T. Piansawan
1
,

C. Sattayanon
1
, R. Daengngern
1
, T. Yakhantip
1
, N. Kungwan
1
, T. N. Truong
2

1
Department of Chemistry, Faculty of Science, Chiang Mai University, Chiang Mai, 50200, Thailand
2
Henry Eyring Center for Theoretical Chemistry, Department of Chemistry, University of Utah
315 South 1400 East, Room 2020, Salt Lake City, Utah, 84112, United States
C
E-mail: naweekung@hotmail.com; Fax: 053-892277; Tel. 053-943341-5 ext. 121

ABSTRACT
Kinetics of the hydrogen abstraction reaction Cl + C
2
H
6
HCl + C
2
H
5
is studied by a
dynamics method. Thermal rate constants in the temperature range of 300 2500 K are
evaluated by the canonical variational transition state theory (CVT) incorporating
corrections from tunneling using the multidimensional semiclassical small-curvature
tunneling (SCT) method and from the hindered rotations. These results are used in
conjunction with the Reaction Class Transition State Theory/Linear Energy
Relationship (RC-TST)/LER) to predict thermal rate constants of any reaction in the
hydrogen abstraction class of Cl + alkanes. Our analyses indicate that less than 50%
systematic errors on the average exist in the predicted rate constants using the RC-
TST/LER method while comparing to explicit rate calculations the differences are less
than 100% or a factor of 2 on the average.

Keywords: Reaction Class Transition State Theory, Hydrogen Abstraction, Chlorine
Atom, Alkane.

REFERENCES
1. Huynh, L. K., Barriger, K., and Violi,A., J. Phys. Chem. A, 2008, 112, 1436-1444.
2. Aguilerla-Iparraguirre, J., Curran, H. J., Klopper, W., and Simmie, J. M., J. Phys Chem.
A, 2008, 112, 7047-7054.
3. Huynh, L. K., and Truong T. N., Theor Chem Account, 2008, 120, 107-118.
4. Huynh, L. K., Panasewicz, S., Ratkiewicz, A., and Truong T. N., J. Phys. Chem. A, 2007,
111, 2156-2165.
5. Zhang, Z., and Truong T. N., J. Phys. Chem. A, 2003, 107, 1138-1147.

B00006
March 23-26, 2010
121
Structure Based Drug Design for Swine Flu
Chemotherapeutics New Neuraminidase Inhibitors from
Plants Natural Compounds

Nur Kusaira Khairul Ikram
1
, Habibah. A. Wahab
1,2,C

1
Pharmaceutical Design and Simulation Laboratory (PhDs), School of Pharmaceutical Sciences,
Universiti Sains Malaysia, 11800, Penang, Malaysia.

2
Malaysian Institutes of Pharmaceuticals and Nutraceuticals, Ministry of Science, Technology and
Innovation, Block A, SAINS@USM, 11900, Penang, Malaysia.
C
E-mail : habibahw@usm.my

ABSTRACT
Since March 2009, the outbreak of swine influenza viruses (SIV or H1N1) has raised a
global concern on the future risk of a pandemic. H1N1 were represented as influenza type
A viruses and genetic analyses had confirmed that it is a triple combination of human,
avian and swine influenza viruses [1, 2, 3]. The surfaced glycoprotein hemaglutinin (HA)
and neuraminidase (NA) of influenza virus A might be the cause of major outbreaks and
severe diseases [4, 5, 6]. HA facilitates the viral cell attachment while NA assists in virus
maturation and release [6, 7]. NA plays an important role at the final stage which facilitates
virus release, thus NA has been recognized as the main target for developing agents against
SIV infection. This project is mainly focused on determination of potential SIV inhibitor
from plant natural compound via virtual screening (docking) from Nature Based Drug
Discovery (NADI) and National Cancer Institution (NCI) database. Neuraminadase
subtype N1 (Neuraminidase type 1) (PDB code: 3B7E) were used as target protein.
Docking simulations were performed via AutoDock 3.0.5 software. Ligand orientation,
binding energy value and Ki value of each docking results were analyzed and compared
and the best docking results will be analyze further via enzyme assays (Neuraminidase
Inhibition assays).

Keywords: Neuraminidase, Virtual screening (Docking)

REFERENCES
1. Karasin., A. I., Identification of Human H1N2 and Human-Swine Reassortant H1N2 and
H1N1 Influenza A Viruses among Pigs in Ontario, Canada (2003 to 2005). Journal of clinical
Microbiology, 2006, 44(3): 11231126.
2. R. B., Lindsey. et. al., H1N1 Influenza A Disease: Information for Health Professionals. New
England Journal of Med, 2009, 10:1056.
3. Rungrotmongkol., T. et. al., Susceptibility of Antiviral Drugs Against 2009 Influenza A
(H1N1) Virus. Biochemical and Biophysical Research Communication, 2009, 1-15.
4. Taubenberger., J. K. and Morens., D. M.. The Pathology of Influenza Virus Infections.
Annual Review of Pathology: Mechanisms of Disease, 2008, 3:499522.
5. Luke., C. J. and Subbarao., K.. Vaccines for Pandemic Influenza. Emerging Infectious
Diseases, 2006, 12(1): 66-72.
6. Hampson., A. W. and Mackenzie., J. S.. The influenza viruses. MJA, 2006, 185(10): S39-S43.
7. Gubareva., L. V. Influenza virus neuraminidase inhibitors. The Lancet, 2000, 355: 827-835.
B00007
March 23-26, 2010
122
QM/MM dynamics of HCOO
-
-water hydrogen bonds in
aqueous solution

Apirak Payaka
1
and Anan Tongraar
1, C

1
School of Chemistry, Institute of Science, Suranaree University of Technology,
Nakhon Ratchasima 30000, Thailand
C
E-mail: anan@sut.ac.th, Tel: 044-224199

ABSTRACT
Two combined quantum mechanics/molecular mechanics (QM/MM) molecular
dynamics simulations, namely HF/MM and B3LYP/MM, have been performed to
obtain detailed knowledge on the structure and dynamics of formate ion (HCOO
-
) in
aqueous solution. By the QM/MM technique, the HCOO
-
and its surrounding water
molecules were treated at HF and B3LYP levels of accuracy, respectively, using
D95V+ basis set, while the rest of the system was described by classical pair potentials.
With regard to both HF/MM and B3LYP/MM simulations, the ion-water interactions
are relatively strong, i.e., compared to that of water-water hydrogen bonds.
Nevertheless, numerous water exchange processes were found, leading to large
fluctuation in the hydration number, ranging from 2 to 6 (HF/MM) and 1 to 5
(B3LYP/MM), with a prevalent value of 3. Comparing the HF and B3LYP methods for
the treatments of QM region, the first one provides slightly weak and longer hydrogen
bonds, while the latter predicts stronger ion-water interactions but with the wrong
dynamics properties.

Keywords: formate ion, QM/MM, molecular dynamics.

1. INTRODUCTION
Behaviors of hydrogen bonds between polar solvent, like water, and carboxylate group
(RCOO
-
) have been studied by both experiments and computer simulations. In NMR
experiments with carboxylic acids, it has been reported that each carboxylic group is
surrounded by 5.0-6.5 water molecules.
[1]
For formate ion (HCOO
-
), the simplest structure of
RCOO
-
, X-ray and neutron scattering experiments of NaHCOO
[2]
and KHCOO
[3]
solution
have demonstrated that the hydration number per HCOO
-
oxygen atom is below 2.5. In terms
of theoretical investigations, Monte Carlo (MC) simulations with optimized potentials for
liquid simulations (OPLS) empirical force fields have been performed for HCOO
-
in aqueous
solution, revealing relatively strong HCOO
-
-water hydrogen bonds together with a hydration
number of 3.6.
[4]
Recently, Car-Parrinello molecular dynamics (CPMD) simulations using
BLYP and PW91 functionals have predicted the hydration numbers of 2.45 and between 2.12
to 2.66 per HCOO
-
oxygen,
[5]
respectively. With regard to the CPMD technique, however, the
systems size under investigation was rather small, and that only the simple GGA density
functionals were employed. In this work, the characteristics of HCOO
-
in aqueous solution
were re-examined by means of combined quantum mechanics/molecular mechanics
(QM/MM) MD simulations.

By the QM/MM technique,
[6-10]
the system is partitioned into 2 parts, namely QM and MM
region. The total interaction energy of the system is defined as

,
MM QM MM QM QM total
E E H E

+ + + + = (1)
B00007
March 23-26, 2010
123

where the first term on the right-hand side refers to the interactions within the QM region,
while the second and the third terms represent the interactions within the MM and between
the QM and MM regions, respectively. The QM region refers to a sphere which includes the
HCOO
-
and its surrounding water molecules. This region was treated at HF and B3LYP levels
of theory using DZV+ basis set. Apart from the QM region, all interactions within the MM
and between the QM and MM regions were described by classical pair potentials. With regard
to the frequent interchanges of water molecules between the QM and MM regions, the forces
acting on each particle in the system are calculated by

, )) ( 1 ( ) (
MM m QM m i
F r S F r S F + = (2)

where F
QM
and F
MM
are quantum mechanical and molecular mechanical forces, respectively.
S
m
(r) is a smoothing function
[11]

, , 0 ) (
, ,
) (
) 3 2 ( ) (
) (
, , 1 ) (
0
0 1
3 2
1
2
0
2
1
2 2
0
2 2 2
0
1
r r for r S
r r r for
r r
r r r r r
r S
r r for r S
m
m
m
> =
s <
+
=
s =
(3)

where r
1
and r
0
are distances characterizing the start and the end of the QM region, applied
within an interval of 0.2 to ensure a continuous change of forces at the boundary between
the QM and MM regions.
For the description of all interactions within the MM and between the QM and MM
regions, a flexible model, which describes intermolecular
[12]
and intramolecular
[13]

interactions, was employed for water. The pair potential functions for describing HCOO
-
-
water interactions were newly constructed, in which 6,015 HF and 5,966 B3LYP interaction
energy points for various HCOO
-
-water configurations, obtained from Gaussion98
[14]

calculations using aug-cc-pVDZ basis set, were fitted to the analytical forms of

= =
+ + + = A

4
1
3
1
8 4
2
i j ij
j i
ij ij ij
ij
ij
ij
ij
HF
O H HCOO
r
q q
) r D exp( C
r
B
r
A
E
(4)
and

= =
+ + + = A

4
1
3
1
5 4
3
2
i j ij
j i
ij ij ij
ij
ij
ij
ij
LYP B
O H HCOO
r
q q
) r D exp( C
r
B
r
A
E
(5)

where A, B, C and D are fitting parameters, r
ij
denotes the distances between the i-th atoms of
HCOO
-
and the j-th atoms of water molecule and q are atomic net charges. In this work, the
charges on C, O and H of HCOO
-
were obtained from Natural Bond Orbital (NBO)
analysis
[15-17]
of the corresponding HF and B3LYP calculations, as 0.8635, -0.9201 and -
0.0234 (HF), and 0.6583, -0.8197 and -0.0190 (B3LYP), respectively. The charges on O and
H of water molecule were adopted from the BJH-CF2 water model
[18]
as -0.6598 and 0.3299,
respectively. The optimized parameters for the intermolecular potentials (4) and (5) are listed
in Table 1.

B00007
March 23-26, 2010
124
Table 1. Optimized parameters of the analytical pair potentials for the interactions of water
with HCOO
-
. (Interaction energies in kcal mol
-1
and distances in ).

Pair A B C D
HF Method (kcal mol
-1

4
) (kcal mol
-1

8
) (kcal mol
-1
) (
-1
)
C-O(W) 5.161390 x 10
5
-3.019987 x 10
4
-7.559447 x 10
5
1.955784
O-O(W) -2.665243 x 10
5
-5.276995 x 10
6
6.778200 x 10
7
3.556668
H-O(W) 3.248683 x 10
5
-4.992158 x 10
5
-1.720448 x 10
5
1.318893
C -H(W) -2.368418 x 10
5
4.358412 x 10
5
7.864110 x 10
5
2.016779
O-H(W) 5.356162 x 10
4
1.264105 x 10
4
-5.753688 x 10
5
3.105469
H-H(W) -3.889529 x 10
4
6.586156 x 10
3
1.236351 x 10
6
3.485256
B3LYP Method (kcal mol
-1

4
) (kcal mol
-1

5
) (kcal mol
-1
) (
-1
)
C- O(W) 4.136774 x 10
5
-2.883495 x 10
5
-1.557409 x 10
4
0.429679
O-O(W) 1.336006 x 10
6
-5.599642 x 10
6
5.529706 x 10
7
3.128274
H-O(W) 9.683479 x 10
5
-9.298010 x 10
5
-3.549124 x 10
5
1.323072
C- H(W) 2.720650 x 10
3
5.285593 x 10
4
2.761879 x 10
4
1.427650
O-H(W) -1.358645 x 10
5
1.516713 x 10
5
9.597746 x 10
4
1.431782
H-H(W) -8.338373 x 10
4
4.962553 x 10
4
7.038974 x 10
5
2.987790

In this work, two QM/MM MD simulations, namely HF/MM and B3LYP/MM, were
separately performed in a canonical ensemble at room temperature (298K) with a time step of
0.2 fs. The periodic box, with a box length of 18.17 , contains one ion and 199 water
molecules, corresponding to the experimental density of pure water. The systems
temperature was kept constant using the Berendsen algorithm
[19]
. The reaction-field method
[20]

was employed for the treatment of long-range interactions. The system was initially
equilibrated by performing HF/MM and B3LYP/MM MD simulations for 30,000 time steps.
Then, the simulations were continued for 450,000 (HF/MM) and 250,000 (B3LYP/MM) time
steps to collect configurations every 10
th
step.

Structural properties of the hydrogen bonds between HCOO
-
and water can be interpreted
through the O-O
w
and O-H
w
RDFs, together with their corresponding integration numbers, as
shown in Figures 1a and 1b, respectively. In this context, the first and the second atom in the
RDFs refer to the atom of HCOO
-
and water, respectively. In comparison to the water-water
hydrogen bonds,
[21]
both HF/MM and B3LYP/MM simulations clearly indicate that the
hydrogen bonds between HCOO
-
and water are relatively strong. Comparing between the
HF/MM and B3LYP/MM results, the B3LYP/MM simulation reveals shorter O-O
w
and O-H
w

distances, as a consequence of the overestimated ion-water interactions. Integrations up to
first minimum of the corresponding O-O
w
and O-H
w
RDFs yield the average coordination
numbers of 3.45 and 3.14 for the HF/MM simulation, and of 2.90 and 2.71 for the
B3LYP/MM simulation, respectively. These observed numbers supply information that the
first-shell waters are linearly hydrogen bonded to each of the HCOO
-
oxygen atoms.
B00007
March 23-26, 2010
125

Figure 1 a) O-O
w
and b) O-H
w
radial distribution functions and their corresponding
integration
numbers.

The distributions of oxygen atoms of first-shell waters are shown in Figure 2. Both
HF/MM and B3LYP/MM simulations show large fluctuations in the hydration number, raging
between 2 and 6 (HF/MM) and 1 and 5 (B3LYP/MM), with the most frequent value of 3.

Figure 2 Distributions of the number of waters oxygen atoms at each of HCOO
-
oxygens,
calculated within the first minimum of the O-O
w
RDFs.
According to the plots of time dependence of the hydration number at each of HCOO
-

oxygen atoms, as shown in Figures 3, both HF/MM and B3LYP/MM simulations clearly
indicate that the two HCOO
-
oxygen atoms simultaneously form asymmetrical solvation
shells, i.e., each of them hydrogen bonding to different numbers of water molecules.

B00007
March 23-26, 2010
126

Figure 3 Time dependence of the number of first-shell waters at a) first, b) second and c) both
HCOO
-
oxygen atoms, selecting only for first 10 ps of the HF/MM and B3LYP/MM
simulations.

The dynamics properties of HCOO
-
-water hydrogen-bonded complexes can be visualized
through time dependence of the O-O
w
distances, as shown in Figures 4 and 5. In both HF/MM
and B3LYP/MM simulations, it is obvious that numerous water molecules can be
interchanged between the first shell and the bulk. Of particular interest, each of first-shell
waters was either loosely or tightly bound to HCOO
-
oxygen, i.e., some of them
temporarily from a hydrogen bond with HCOO
-
oxygen, then leaving or even entering again,
while others form longer hydrogen bonds to the respective HCOO
-
oxygen.
B00007
March 23-26, 2010
127

Figure 4 Time dependence of O---O
w
distances, selecting only for first 10 ps of the HF/MM
trajectories, together with examples of HCOO
-
-water hydrogen bonding configurations
obtained for some particular simulation periods.

Figure 5 Time dependence of O---O
w
distances, selecting only for first 10 ps of the
B3LYP/MM trajectories, together with examples of HCOO
-
-water hydrogen bonding
configurations obtained for some particular simulation periods.
Simulation
time
Structural
arrangement
1.0 ps

3.0 ps

6.7 ps

9.2 ps

Simulation
time
Structural
arrangement
2.6 ps

4.2 ps

8.4 ps

9.8 ps

B00007
March 23-26, 2010
128
The rate of water exchange processes at each of HCOO
-
oxygen atoms was evaluated with
respect to mean residence times (MRTs) of the surrounding water molecules. According to
the direct method
[22]
, the MRT values can be calculated as

, ) (
ex
sim
N
t CN
MRT

= t (6)

where CN is the coordination number, t
sim
is the simulation time and N
ex
is the number of
events. The calculated MRT values with respect to time parameters t
*
(i.e., the minimum
duration of a ligands displacement from its original coordination shell to be accounted) of 0.0
and 0.5 ps are summarized in Table 2. It should be noted that the MRT data obtained using t
*
=
0.0 ps correspond to the estimations of hydrogen bond lifetimes, whereas the data obtained
with t
*
= 0.5 ps are regarded as the estimates for sustainable ligand exchange processes.
[22]

Table 2 Mean residence times of water molecules in the bulk and in the vicinity of HCOO
-

oxygens, calculated within first minimum of the O-O
w
RDFs.

Atom/solute

CN t
sim
t
*
= 0.0 ps t
*
= 0.5 ps
0.0
ex
N

0 . 0
2
O H

0.5
ex
N

5 . 0
2
O H

HF/MM MD

O1 3.45 70.0 970 0.25 132 1.83
O2 3.44 70.0 1042 0.23 112 2.15
Pure H
2
O
[21]
4.6 12.0 292 0.2 31 1.80
Pure H
2
O
[18]
4.2 40.0 - 0.33 - 1.51
B3LYP/MM MD

O1 2.84 50.0 810 0.17 57 2.49
O2 2.97 50.0 799 0.19 63 2.36
Pure H
2
O
[18]
4.20 30.0 - 1.07 - 7.84

In the HF/MM simulation, the calculated MRT values with respect to t
*
= 0.0 and 0.5 ps are
slightly larger than that of pure water,
[21]
which correspond to the observed stronger HCOO
-
-
water hydrogen bonds. In the B3LYP/MM simulation, the MRT data are in good relevance
with the HF/MM results, showing slightly higher MRT values as a consequence of the
observed stronger hydrogen bonds between the HCOO
-
and its surrounding water molecules.
Considering the B3LYP/MM results for pure water, however, the B3LYP method had
predicted too slow water exchange rates when compared to the experimental data. In recent
CPMD study of aqueous HCOO
-
using the PW91 method,
[5]
the slow dynamics of HCOO
-

rotation have also been reported. The failure of the B3LYP method to predict the dynamics
details of pure water can be seen as an inadequacy of the DFT methods to correctly describe
the dynamics details of such particular system.

4. CONCLUSION
Some characteristics of HCOO
-
-water hydrogen bonds in aqueous solution have been
investigated by means of combined HF/MM and B3LYP/MM MD simulations. According to
both the HF/MM and B3LYP/MM results, the hydrogen bonds between HCOO
-
oxygens and
first-shell waters are relatively strong, especially when compared to that of water-water
interactions. The structure of HCOO
-
-water complex was found to be rather flexible, in which
the first-shell water molecules can be either loosely or tightly bound to each of HCOO
-

oxygen, showing a varying number of hydrogen bonds, with the prevalent value of 3. In the
present study, the ab initio HF method has been demonstrated to be more reliable than the
B3LYP method, as the latter completely fails to describe the dynamics of this ion hydration.

B00007
March 23-26, 2010
129
REFERENCES
1. Kuntz, I. D. J., Journal of the American Chemical Society, 1971, 93, 514516.
2. Kameda, Y., Mori, T., Nishiyama, T., Usuki, T., and Uemura, O., Bulletin of the
Chemical Society of Japan, 1996, 69, 1495-1504.
3. Kameda, Y., Fukuhara, K., Mochiduki, K., Naganuma, H., Usuki, T., and Uemura, O.,
Journal of Non-Crystalline Solids, 2002, 312-314, 433-437.
4. Jorgensen, W. J., and Gao, J., Journal of Physical chemistry, 1986, 90, 21742182.
5. Leung, K., and Rempe, S. B., Journal of the American Chemical Society, 2004, 126, 344-
351.
6. Rode, B. M., Schwenk, C. F., and Tongraar, A., Journal of Molecular Liquids, 2004, 110,
105-122.
7. Kerdcharoen, T., Liedl, K. R., and Rode, B. M., Chemical Physics, 1996, 211, 313-323.
8. Tongraar, A., Liedl, K. R., and Rode, B. M., Journal of Physical Chemistry A, 1997, 101,
6299-6309.
9. Tongraar, A., and Rode, B. M., Physical Chemistry Chemical Physics, 2003, 5, 357-
362.
10. Tongraar, A., Tangkawanwanit, P., and Rode, B. M., Journal of Physical Chemistry A,
2006, 110, 12918-12926.
11. Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S., and
Karplus, M., Journal of Computational Chemistry, 1983, 4, 187-217.
12. Stillinger, F. H., and Rahman, A., Journal of Chemical Physics, 1978, 68, 666-670.
13. Bopp, P., Jancso, G., and Heinzinger, K., Chemical Physics Letters, 1983, 98, 129-133.
14. Frisch, M. J., Trucks, G. W., Schlegel, H. B., Scuseria, G. E., Robb, M. A., Cheeseman, J.
R., Zakrewski, V. G., Montgomery, J. A., Stratmann, R. E., Burant, J. C., Dapprich, S.,
Millam, J. M., Daniels, A. D., Kudin, K. N., Strain, M. C., Farkas, O., Tomasi, J., Barone,
V., Cossi, M., Cammi, R., Mennucci, B., Pomelli, C., Adamo, C., Clifford, S., Ochterski,
J., Peterson, G. A., Ayala, P. Y., Cui, Q., Morokuma, K., Malick, D. K., Rabuck, A. D.,
Raghavachari, K., Foresman, J. B., Cioslowski, J., Ortiz, J. V., Stefanov, B. B., Liu, G.,
Liashenko, A., Piskorz, P., Komaromi, I., Gomperts, R., Martin, R. L., Fox, D. J., Keith,
T., Al-Laham, M. A., Peng, C. Y., Nanayakkara, A., Gonzalez, C., Challacombe, M., Gill,
P. M. W., Johnson, B. G., Chen, W., Wong, M. W., Andres, J. L., Head-Gordon, M.,
Replogle, E. S., and Pople, J. A., GAUSSIAN98, Pittsburgh, Pennsylvania, 1998,
15. Carpenter, J. E., and Weinhold, F., Journal of Molecular Structure: THEOCHEM, 1988,
169, 41-62
16. Reed, A. E., Weinstock, R. B., and Weinhold, F., Journal of Chemical Physics, 1985, 83,
735-746.
17. Reed, A. E., Curtiss, L. A., and Weinhold, F., Chemical Reviews, 1988, 88, 899926.
18. Xenides, D., Randolf, B. R., and Rode, B. M., Journal of Chemical Physics, 2005, 122,
174506-174516.
19. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A., and R., H. J.,
Journal of Chemical Physics, 1984, 81, 3684-3690.
20. Adams, D. J., Adams, E. M., and Hills, G. J., Molecular Physics, 1979, 38, 387-400.
21. Tongraar, A., and Rode, B. M., Chemical Physics Letters, 2004, 385, 378-383
22. Hofer, T. S., Tran, H. T., Schwenk, C. F., and Rode, B. M., Journal of Computational
Chemistry, 2004, 25, 211-217.

ACKNOWLEDGMENT
This work was supported by the Thailand Research Fund (TRF), under the Royal Golden
Jubilee Ph.D. Program (Contract Number PHD/0211/2547).
B00008
March 23-26, 2010
130
Quantum Mechanics Simulation on Structure of
2
Cluster and Excited-State
Triple-Proton Transfer Reactions in the Gas Phase

R. Daengngern
1
, M. Barbatti
2
, and N. Kungwan
1,C

1
2
Institute for Theoretical Chemistry, University of Vienna, Waehringerstrasse 17, A-1090 Vienna,
Austria
C
E-mail: naweekung@hotmail.com; Tel. 053-943341 ext 121

ABSTRACT
The excited-state triple-proton transfer reaction (ESTPT) in 7-azaindole(methanol)
2
,
7AI(MeOH)
2
, is studied by quantum mechanics simulation. Ground-state cluster for
7AI(MeOH)
2
in the gas phase is optimized with correlated second-order RI-ADC(2)
using split valence polarized mixed with split valence (SVP-SV). The ground-state
structure with the lowest energy is explored and presented. Furthermore, on-the-fly
dynamic simulations for the first-excited state are carried out using Newton-X program.
From our electronic ground state and dynamics simulation results, we found that
ESTPT of 7AI(MeOH)
2
cluster is observed along intermolecular hydrogen-bonded
network called methanol-assisted proton transfer. The detailed analysis will be
discussed.

Keywords: 7-Azaindole, Excited-State Proton Transfer/Hydrogen Atom Transfer
Reaction, Methanol-Assisted Proton Transfer.

1. INTRODUCTION
The proton/hydrogen atom transfer processes (PT/HT) play an important role in chemistry
and biochemistry [1-3]. The excited-state proton/hydrogen atom transfer reaction (ESPT/HT)
which is a subclass in PT/HT has been intensively studied due to its application uses in laser
dyes, photostabilizers [4], fluorescent probes [5-7] and light-emitting devices [8, 9].
7-Azaindole (7AI) is an important bicyclic aza-aromatic molecule that can form two
hydrogen bonds by donating the pyrrole proton and accepting proton on the pyridine nitrogen
[10, 11]. The systems of hydrogen-bonded forming with solvent molecules have been shown
to form multiple hydrogen-bonded cyclic structures [12-16]. An observation of a PT/HT in
7AI with solvent clusters is intensively studied by many groups [12, 16-20]. Recently, Satoka
and
co-workers [17, 18, 20-25] investigated the tautomerization of 7AI(MeOH)
2
that undergoes an
excited-state triple-proton transfer (ESTPT). The tautomeric form of the ESTPT reactions is
observed based on dispersed fluorescence and resonance-enhanced multiphoton ionization
spectra. However, their experimental results did not give exactly time of ESPT/HT.
Therefore, the theoretical investigation used in our study is to support their experiments by
showing time evaluations and reaction pathway of ESTPT for 7AI(MeOH)
2
.

2.1 Ground-State Calculation
The ground-state optimization in gas phase was carried out for 7AI(MeOH)
2
using
correlated second-order RI-ADC(2) [26] with split valence polarized (SVP) basis [27] for
heavy atoms and hydrogen atoms involved in the hydrogen-bonded network of a complex and
a split valence (SV) basis for the remaining hydrogen in the cluster. Optimized structure was
B00008
March 23-26, 2010
131
confirmed by frequency calculations with all positive numbers and it was also further used for
dynamic calculations using the Turbomole program package [28, 29].

2.2 Excited-State Dynamic Simulation
Classical dynamic simulations for 7AI(MeOH)
2
were carried out on the energy surface of
the first-excited singlet state, S
1
. The initial conditions for each trajectory were generated
using a Wigner distribution of the ground-state vibrational quantum harmonic-oscillator and
were performed at the RI-ADC(2)/SVP-SV using the Newton-X program [30] interfaced with
Turbomole. A hundred trajectories were simulated with a time step of 1 fs and the maximum
time up to 300 fs for investigation of the proton transfer which occurs along the hydrogen-
bonded network.

3.1 Ground-State Structure
Two methanol molecules are added to 7AI, a complex of 7AI(MeOH)
2
is formed that
illustrated in Figure 1. Hydrogen bond interactions are shown in the dashed lines. The
structure of 7AI(MeOH)
2

is optimized at the RI-ADC(2)/SVP-SV level. The optimized
structure shows that 7AI acts as a proton donor (for H1 atom that forms hydrogen bond to the
first methanol) and a proton acceptor for H3 atom of the second methanol. Simultaneously,
the first and second methanol molecules also form hydrogen bond through H2---O2. There are
three-hydrogen bonds which form between methanol and 7AI molecule with bond lengths of
1.730 (H1---O1), 1.698 (H2---O2) and 1.788 (H3---N2), respectively.

Figure 1. Ground-state optimized structure of 7AI(MeOH)
2
at RI-ADC(2)/SVP-
SV calculation.

3.2 Dynamic Simulation Results
From Figure 1, the important numbering atoms of 7AI(MeOH)
2
are labeled and the proton
transfer process is summarized in the following three steps; (1) H1 moves from N1 to O1 (2)
H2 moves from O1 to O2, and (3) H3 moves from O2 to N2 (the tautomerization with
methanol assistance is achieved). A selected trajectory in the first-excited state during 0 fs to
150 fs reveals that the reaction is the ESTPT through hydrogen-bonded network within 80 fs.
The time for proton transfers occur at 42 fs, 52 fs, and 57 fs, for the first, second, and third
proton transfer, respectively. The complete proton transfer reaction is obtained after 57 fs as
shown in Figure 2.

B00008
March 23-26, 2010
132

0 fs

42 fs

52 fs

57 fs

75 fs

150 fs

Figure 2. A selected trajectory along the reaction in the ESTPT pathway through
hydrogen-bonded network during the time from 0 fs to 150 fs.

The dynamic simulations are carried out for 7AI(MeOH)
2
starting from 100 different initial
trajectories. From summarized results of on-the-fly dynamic simulations (Table 1), 67
trajectories of them show the ESTPT reaction with 7 trajectories have the recrossing reaction
between 7AI and the second methanol molecules. However, the remaining 33 trajectories
have no proton transfer reactions with 6 trajectories of them stop due to errors in the
calculations. Therefore, the total of 67 trajectories will be used for analysis.

Table 1. Summarized reaction patterns for dynamic simulations on 7AI(MeOH)
2
cluster (trajectories).

Proton transfer reactions No reactions
ESTPT Recrossing No ESTPT Error
60 7 27 6

Figure 3A shows the excited-state time evaluation of the distances of three-bonded
forming at N1---H1, O1---H2 and O2---H3 and three-bonded breaking at H1---O1, H2---O2
and H3---N2 along the proton transfer pathway averaged over 67 trajectories. We adopt a
definition for
XHY proton transfer for all processes, which is the time for XH equal to YH. According
to the dynamics, the three-bonded forming distances (N1---H1, O1---H2 and O2---H3) change
rapidly to covalent bond lengths. Simultaneously, the three-bonded breaking interatomic
distances (H1---O1, H2---O2 and H3---N2) increase rapidly as the covalent bond is being
broken. At 57 fs, the first proton transfer process occurs when average values of N1---H1and
H1---O1 bond distances are equal. The second proton is transferred at 70 fs with average
values of O1---H2 and H2---O2 distances are equal. And the last proton transfer is achieved at
83 fs as average value of O2---H3 and H3---N2 distances are equal.
B00008
March 23-26, 2010
133
After complete proton transfer process, the relative energy difference of S
1
-S
0
gradually
decreases as shown in Figure 3B. This relative energy difference is always over 2 eV
suggesting that the structure is almost planar throughout the process. In addition, the average
value of dihedral angle (C1N1C2N2) of 7AI(MeOH)
2
does not change confirming that its
skeleton structure is close to planar.

Figure 3. A) Time evaluation of the distances of three-bonded forming (N1---H1, O1--
-H2 and O2---H3) and three-bonded breaking (H1---O1, H2---O2 and H3---N2) along
the proton transfer pathway averaged over 67 trajectories. B) The average energies, S
1
,
S
0
, and relative energy of S
1
-S
0
states.

4. CONCLUSION
We presented the 7AI(MeOH)
2
for electronic ground state which forms hydrogen-bonded
network. The ESTPT reactions are achieved and showed that time evolution of bond forming
and breaking at 57 fs, 70 fs and 83 fs averaged over 67 trajectories for the first, second and
third proton transfer process, respectively. The ESTPT for 7AI(MeOH)
2
cluster is very fast
reaction within 83 fs. This study reveals the importance of dynamic on methanol-assisted
proton transfer.

REFERENCES
1. Arnaut, L. G., and Formosinho, S. J., J. Photochem. Photobiol., A, 1993, 75 (1), 1-20.
2. Formosinho, S. J., and Arnaut, L. G., J. Photochem. Photobiol., A, 1993, 75 (1), 21-48.
3. Miyazaki, T., Atom Tunneling Phenomena in Physics, Chemistry and Biology, Ed,
Springer, Berlin, 2004.
4. Chou, P. T., Studer, S. L., and Martinez, M. L., Appl. Spectrosc., 1991, 45 (3), 513-15.
5. Rodembusch, F. S., Leusin, F. P., da Costa Medina, L. F., Brandelli, A., and Stefani, V.,
Photochem. Photobiol. Sci., 2005, 4 (3), 254-259.
6. Kim, T.-I., Kang, H. J., Han, G., Chung, S. J., and Kim, Y., Chem. Commun. (Cambridge,
U. K.), 2009, (39), 5895-5897.
7. Morales Alma, R., Schafer-Hales Katherine, J., Yanez Ciceron, O., Bondar Mykhailo, V.,
Przhonska Olga, V., Marcus Adam, I., and Belfield Kevin, D., Chemphyschem, 2009, 10
(12), 2073-81.
8. Chang, S. M., Tzeng, Y. J., Wu, S. Y., Li, K. Y., and Hsueh, K. L., Thin Solid Films,
2005, 477 (1-2), 38-41.
9. Chang, S. M., Hsueh, K. L., Huang, B. K., Wu, J. H., Liao, C. C., and Lin, K. C., Surf.
Coat. Technol., 2006, 200 (10), 3278-3282.
10. Cane, E., Palmieri, P., Tarroni, R., and Trombetti, A., J. Chem. Soc., Faraday Trans.,
1994, 90 (21), 3213-19.
A
B
B00008
March 23-26, 2010
134
11. Ilich, P., J. Mol. Struct., 1995, 354 (1), 37-47.
12. Kim, S. K., and Bernstein, E. R. 7-Azaindole and its clusters with argon, methane, water,
ammonia, and alcohols: molecular geometry and nature of the first excited singlet
electronic state, Fort Collins, CO, USA, 1989, p 44.
13. Gai, F., Rich, R. L., Chen, Y., and Petrich, J. W., ACS Symp. Ser., 1994, 568, 182-95.
14. Nosenko, Y., Kyrychenko, A., Thummel, R. P., Waluk, J., Brutschy, B., and Herbich, J.,
Phys. Chem. Chem. Phys., 2007, 9 (25), 3276-3285.
15. Beernink, M. B., Erickson, N. R., Swenson, N. K., and Smith, J. M., Abstracts of Papers,
235th ACS National Meeting, New Orleans, LA, United States, April 6-10, 2008, 2008,
PHYS-594.
16. Chen, H. Y., Young, P. Y., and Hsu Sodio, C. N., J Chem Phys, 2009, 130 (16), 165101.
17. Sakota, K., and Sekiya, H., J. Phys. Chem. A, 2005, 109 (12), 2718-2721.
18. Sakota, K., Inoue, N., Komoto, Y., and Sekiya, H., J. Phys. Chem. A, 2007, 111 (21),
4596-4603.
19. Kina, D., Nakayama, A., Noro, T., Taketsugu, T., and Gordon, M. S., J. Phys. Chem. A,
2008, 112 (40), 9675-9683.
20. Kageura, Y., Sakota, K., and Sekiya, H., J. Phys. Chem. A, 2009, 113 (25), 6880-6885.
21. Sakota, K., Okabe, C., Nishi, N., and Sekiya, H., J. Phys. Chem. A, 2005, 109 (24), 5245-
5247.
22. Sakota, K., and Sekiya, H., J. Phys. Chem. A, 2005, 109 (12), 2722-2727.
23. Sakota, K., Komoto, Y., Nakagaki, M., Ishikawa, W., and Sekiya, H., Chem. Phys. Lett.,
2007, 435 (1-3), 1-4.
24. Sakota, K., Kageura, Y., and Sekiya, H., J. Chem. Phys., 2008, 129 (5), 054303/1-
054303/10.
25. Sakota, K., Komure, N., Ishikawa, W., and Sekiya, H., J. Chem. Phys., 2009, 130 (22),
224307/1-224307/7.
26. Haettig, C., Adv. Quantum Chem., 2005, 50, 37-60.
27. Schaefer, A., Horn, H., and Ahlrichs, R., J. Chem. Phys., 1992, 97 (4), 2571-7.
28. Ahlrichs, R., Baer, M., Haeser, M., Horn, H., and Koelmel, C., Chem. Phys. Lett., 1989,
162 (3), 165-9.
29. Ahlrichs, R., Baer, M., Haeser, M., Koelmel, C., and Sauer, J., Chem. Phys. Lett., 1989,
164 (2-3), 199-204.
30. Barbatti, M., Granucci, G., Persico, M., Ruckenbauer, M., Vazdar, M., Eckert-Maksic,
M., and Lischka, H., J. Photochem. Photobiol., A, 2007, 190 (2-3), 228-240.

ACKNOWLEDGMENTS
R. Daengngern gratefully thanks to the Human Resource Development in Science Project
(Science Achievement Scholarship of Thailand, SAST) by the Commission of Higher
Education, Thailand. Finally, all calculations were performed at Chang Noi cluster at the
department of Chemistry, faculty of Science, Chiang Mai University, Chiang Mai.

B00011
March 23-26, 2010
135
Computational Studies On The Structural
Conformations of N-Benzoyl-N-p-Substitued
Phentlthiourea Derivatives

Rafie Deraman
1
, Mohamed Ismail Mohamed-Ibrahim
2, C
, Shukri Sulaiman
3,
,
Lee Sin Ang
3
, M. Hafiz Hussim
1

1
Faculty of Applied Science, Universiti Teknologi MARA, Kuala Terengganu Campus,
Malaysia
2
Chemical Sciences Programme, School of Distance Education, Universiti Sains Malaysia,
11800 Penang, Malaysia
3
Physical Sciences Programme, School of Distance Education, Universiti Sains Malaysia,
11800 Penang, Malaysia
C
E-mail: mi-mi@usm.my; Fax: 604-657600; Tel. 604-6533939

ABSTRACT
Several thiourea derivatives namely NbenzoylN-phenylthiourea (1), Nbenzoyl-
N-(4-chlorophenyl)thiourea (2) , Nbenzoyl-N-(4-bromophenyl)thiourea (3), N
benzoyl-N-(4-nitrophenyl)thiourea (4), N-benzoyl-N-(4-methylphenyl)thiourea (5)
and N-benzoyl-N-(4-methoxylphenyl)thiourea (6) have been studied employing the
Density Functional Theory at the B3LYP level. Conformation analysis shows that all
the compounds have a rotational barrier at the thiourea moiety caused by the intra-
molecular hydrogen bond that forms a pseudo-six-membered ring C1-N1-C2-O1---H-
N2.. Optimized parameters agree well with the experimental data. The general trend
observed for the parameters of the optimized geometry for all compounds is
influenced by the electronic properties of the substituent on the phenyl ring.

Keywords: Thiourea derivative, Density Functional Theory, Structural conformation

1. INTRODUCTION
In recent years, thiourea derivatives have been widely studied for its applications as
antiviral, antibacterial and antifungal agents, in the extraction and separation of transition
metals, in the fluorometric determination of Hg(II) and as thermal stabilizers and co-
stabilizers for rigid polyvinyl chloride [1-4]. The conformational studies on several thiourea
derivatives have been made [5-9]. All of these molecules have a rotational barrier through the
formation of a pseudo-six-membered ring formed by intra-molecular hydrogen bond between
O---HN on the thiourea moiety. However, the effects of the substituent group to the thiourea
moiety have not yet been reported in the literature.

In this paper, we present the results of our theoretical investigations on six N-benzoyl-N-
p-substituted phenylthiourea derivatives: NbenzoylN-phenylthiourea (1), Nbenzoyl-N-
(4-chlorophenyl)thiourea (2), Nbenzoyl-N-(4-bromophenyl)thiourea (3), N-benzoyl-N-(4-
nitrophenyl)thiourea (4), N-benzoyl-N-(4-methylphenyl)thiourea (5) and N-benzoyl-N-(4-
methoxylphenyl)thiourea (6) (Figure 1). The phenyl and benzoyl groups in all the compounds
lie in the cis and trans positions, respectively, to the S atom across the thiourea C-N bond [10-
15]. A fundamental difference among the thiourea derivatives in this investigation is whether
their p-substituent group acts as an electron withdrawing (Cl, Br and NO
2
) or electron
donating (CH
3
and OCH
3
) entity.
B00011
March 23-26, 2010
136

The objectives of this study are to determine the most stable conformer for each of the
structure and to examine the effects of the electronic properties of the substituent group to the
thiourea moiety These structures were chosen because of the completeness of the available
structural data set and because we can systematically compare their electronic properties.

O N
H
N S
H
O N
H
N S
H
O N
H
N S
H
X
structure 1
Structure 2, X = Cl
Structure 3, X = Br
Structure 4, X = NO
2
Structure 5, X= CH
3
Structure 6, X = )CH
3

Figure 1. The structures of the thiourea derivatives used in this study

The crystallographic data for all the structures were obtained from the Cambridge
Crystallographic Database. Theoretical calculations were performed using the Gaussian 03W
software package [16] on an AMD Athlon 64 X2 processor. Becks three-parameter hybrid
method B3LYP were used to optimize the structures with the 6-31G(d) basis set. To obtain
the conformational energy profiles, firstly, the energy was calculated for a clockwise rotation
of the dihedral angle (formed by N2-C1-N1-C2; Figure 2), from 0 to 360 in steps of 30
(every 10 from 100 to 270) while keeping the other dihedral angles fixed. The most stable
conformer obtained was then re-optimized using DFT B3LYP/6-311G(d) [7]. The Mulliken
charge population on the molecules was calculated with same level of theory but with the 3-
21G* basis set [17].
(a) (b)

Figure 2. The representative numbering for (a) stable conformation and (b) maximum energy
state of structure 1 6.

The general plot of potential energy versus dihedral angle (Figure 3) shows a global
minimum at 0, two local minima and a maximum energy state at 180 for all the structures.
This indicates that all the structure have rotational barrier caused by intra-molecular hydrogen
bonding between O1---HN2 (Figure 1), forming a pseudo-six-membered ring, C2-O1---H-
N2-C1-N1. The most stable conformation is seen to be cis-trans configuration, shows in
Figure 2.
B00011
March 23-26, 2010
137

Figure 3. General representative of relative potential energy curve for structure 1-6.

The energy difference between the global minimum and the highest energy are presented
in Table 1. The energy at the global minimum (0 degrees dihedral angle) is taken to be 0
kJ/mol. Structure 4 has the smallest energy difference between the global minimum and
maximum which is 98.391 kJ/mol. It can be regarded as the energy barrier between the cis-
trans and cis-cis conformation [9]. For structure 6, no minimum exists in the energy profile
other than the one at 0 degree which is the global minimum. For structures 1 and 5, only one
local minimum exists in between 0 and 360 degrees. The conformational energy profile and
the locations of the local minima depend on the orientation of the benzoyl ring relative to
phenylthiourea.

Table 1. Energy of local minimum and relative energy for structures 1 -6

Local minimum 1 Local minimum 2
Global Maximum and
Global Minimum*
Structure
Dihedral angle
(degrees)
Energy
(kJ/mol)
Dihedral angle
(degrees)
Energy
(kJ/mol)
Energy difference, E
(kJ/mol)
1 130 86.63 - - 98.849
2 (Cl) 130 87.8 230 90.19 99.376
3 (Br) 130 90.18 230 87.8 99.323
4 (NO
2
) 130 88.5 220 91.1 98.391
5 (CH
3
) - - 250 86.7 99.046
6 (OCH
3
) - - - - 100.55
* The global minimum energy is taken to be 0 kJ/mol.

The calculated structural parameters are presented in Table 2, and showed reasonable
agreement with the experimental data. All the calculated bond lengths are larger than those
from experiments. However, all the calculated bond angle of N2-C1-N1 is smaller than the
experimental ones. The most obvious difference is the dihedral angle for N2-C1-N1-C2. The
calculated dihedral angle for structures 1, 2 and 4 are shifted to the opposite direction,
whereas those for structures 3, 5 and 6 stays at the same side but the difference with the
experimental data is quite large. This could be attributed to the packing effect in the crystal
lattice which is not accounted for in the current investigation. To observe the effect of
substituent group, structure 1 is used as a control structure and the effects of the electronic
properties of the substituent group to the thiourea moiety are shown in Table 2. Significant
differences are observed in structure 4 compared to the other structure. The strong electron
withdrawing properties of the NO
2
is probably a significant contributor to this phenomenon.

Local Minimum 1 Local Minimum 2
Max energy state
Global minimum
B00011
March 23-26, 2010
138

Table 2. Selected structural parameters for structures 1 6

The Mulliken charge of all atoms in the thiourea moiety is shown in Table 3. N2 is the
hydrogen bond donor and O1 is the acceptor. The charge variation on these atoms will affect
the relative strength of the intra-molecular hydrogen bond. Strong electron withdrawing
group (NO
2
) and moderate electron donating group (OCH
3
) give an opposite effect on the N2,
O1 and sulphur atomic charge. But the calculated values are not significantly different
between the structures studied. It is also supported by intra-molecular hydrogen bond length.
This suggests that the p-substituent groups do not affect the strength of the intra-molecular
hydrogen bond.

Table 3. Mulliken population of atoms in the thiourea moiety for structure 1 6

Structure 1 2 (Cl) 3 (Br) 4 (NO2) 5 (CH3) 6 (OCH3)
Atom Mulliken charge
S1 -0.158 -0.153 -0.154 -0.133 -0.165 -0.171
C1 0.429 0.440 0.440 0.439 0.440 0.439
N1 -0.792 -0.790 -0.790 -0.790 -0.789 -0.790
N2 -0.889 -0.890 -0.890 -0.895 -0.888 -0.888
C2 0.724 0.702 0.702 0.703 0.701 0.702
O1 -0.520 -0.509 -0.509 -0.507 -0.509 -0.511

5. CONCLUSION
From our investigation, we found that for all six thiourea derivative structures, the cis-
trans conformation is the most stable conformer and the calculated structure parameters agree
reasonably with the experimental data. The relative orientation of benzoyl and phenyl ring
contributes to the profile of the local minimum. Significant changes in structural parameters
occur for structure 4 with the NO
2
p-substituent group. The intra-molecular hydrogen bond
strength is similar for all structures but the conformational energy barrier for structure 4 is the
smallest. Further investigations are currently being carried out to study the effects of inter-
molecular interactions between neighbouring molecules to the conformational energy profile
and the electron distribution.

B00011
March 23-26, 2010
139
REFERENCES
1. Bryantsev, V. S. and Hay, B. P., J. Phys. Chem. A (2006), 110, 4678-4688
2. Woldu, M. G. and. Dillen, J., Theor. Chem. Account (2008) 121:7182
3. Sandor , M., Geistmann and F. Schuster , M., Analytica Chimica Acta (1999) 388
19-26
4. Sabaa, M. W., Mohamed, R. R and Yassin A. A., Polymer Degradation and
Stability (2003) 81 431440
5. Zhou,W., Leng, K., Zhang,Y.and Lude, L., Journal of Molecular Structure (2003)
657 215223
6. Zhou, W., Li, B., Zhu, I., Ding, J., Zhang, Y., Lu, L. and Yang, X., Journal of
Molecular Structure (2003) 657 215223
7. Yang, W., Zhou, W. and Zhang, Z.,. Journal of Molecular Structure (2007) 828 46
53.
8. Zhou, W., Lu, J., Zhang, Z., Zhang, Y., Cao, Y., Lu, L. and Yang, X., Vibrational
Spectroscopy (2004) 34 199204.
9. Zhou, W., Li, B., Cao, Y., Zhang, Y., Lu, L. and Yang, X.., Journal of Molecular
Structure (2004) 690 145150
10. Yamin, B.M. and Yusof, M.S.M.,. Acta Crystallogr.,Sect. E:Struct.Rep (2003) 59
o151
11. Rauf, M.K., Badshah, A., Florke, U. and Saeed, A., Acta Crystallogr.,Sect. E:
Struct.Rep (2006) 62 o1419
12. Yamin, B.M. and Yusof, M.S.M., Acta Crystallogr.,Sect. E: Struct.Rep. (2003) 59
o340
13. D.Zhang, Y.Zhang, Y.Cao, B.Zhao.. Acta Crystallogr.,Sect.C: Cryst.Struct.Commun.
(1996) 52 1716
14. Zhou, W., Lu, L., Yang, X. and Cao, Y., Chin.Cheml.Res.Appln. (2004) 16 369
15. Cao, Y., Zhao, B., Zhang, Y. and Zhang, D., Acta Crystallogr.,Sect. C: Cryst. Struct
.Commun. (1996) 52 1772
16. M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb,J.R. Cheeseman,
V.G. Zakrzewski, J.A. Montgomery Jr., R.E. Stratmann, J.C. Burant, S. Dapprich,
J.M. Millam, A.D. Daniels, K.N. Kudin, M.C. Strain, O. Frakas, J. Tomasi, V.
Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J.
Ochterski, G.A. Petersson, P.Y. Ayala, Q. Cui, K. Morokuma, D.K. Malick, A.D.
Rabuck, K. Raghavachari, J.B. Foresman, J. Cioslowski, J.V. Ortiz, B.B. Stefanov, G.
Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R.L. Martin, D.J. Fox, T.
Keith, M.A. Al-Laham, C.Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challacombe,
P.M.W. Gill, B. Johnson, W. Chen, M.W. Wong, J.L. Andres, C. Gonzalez, M. Head-
Gordon, E.S. Replogle, J.A. Pople, GAUSSIAN98, Rev. A.6 (Program and Manual),
Gaussian, Pittsburgh, PA, 1998.
17. Latosinska, J.N., Pajzderska, A. and We-sicki J., Journal of Molecular Structure
(2006) 786 7683

B00012
March 23-26, 2010
140
Crystal structures and DFT studies on
[Tp
Ph2
Ni(S
2
CNR
2
Ph2
Ni(S
2
Cpyr)]

Supaporn Dokmaisrijan
1,2C
, Phimphaka Harding
1,2
and David Harding
1,2C
1
School of Science, Walailak University, Tha Sala, Nakhon Si Thammarat, 80161, Thailand
2
Molecular Technology Research Unit (MTRU), Walailak University, Tha Sala, Nakhon Si Thammarat,
80161, Thailand
C
E-mail: dsupapor@wu.ac.th, hdavid@wu.ac.th; Fax: 075-672004; Tel. 075-672045, 075-672094

ABSTRACT
Our continued interest in redox-active tris(pyrazolyl)borate complexes for applications
as molecular switches has led to the discovery of a series of new dithiocarbamate
complexes which show remarkably low oxidation potentials. The reaction of Nadtc (dtc
= dithiocarbamate) with [Tp
Ph2
NiBr] yields the complexes, [Tp
Ph2
Ni(S
2
CNR
2
)] (R = Et,
Bz) and [Tp
Ph2
Ni(S
2
Cpyr)] (pyr = pyrrolidene). Crystallographic results show five
coordinate metals in all cases with asymmetrically coordinated dtc ligands and a
geometry intermediate between trigonal bipyramidal and square pyramidal.
Theoretical studies in the form of UB3LYP/SDD calculations have been undertaken on
the title compounds. The calculated results show that the predicted geometry can
accurately reproduce the structural parameters. Comparison of the theoretical values
with the experimental ones indicates that most of the optimized parameters are slightly
larger than the experimental values, due to the fact that the theoretical calculations
represent the isolated molecule in a vacuum while the experimental results are for the
molecule in the solid state. The relative structural energies of the complexes show that
[Tp
Ph2
Ni(S
2
CNR
2
)] (R = Et, Bz) are very much lower in energy than [Tp
Ph2
Ni(S
2
Cpyr)].
In addition, [Tp
Ph2
Ni(S
2
CN(Bz)
2
)] has the lowest energy in this series with the structural
energy increasing as the Bz groups are replaced by Et groups.

Keywords: DFT, Dithiocarbamate, Ni complex.

REFERENCES
1. Harding D. J., Harding P. and Adams H., Acta Cryst. Section E, 2009, E65, m773.
2. Harding D.J.,

Harding P., Daengngern R., Yimklan S. and Adams H., Dalton Trans.,
2009, 1314.
3. Harding P., Harding D.J., Phonsri W., Saithong S. and Phetmung H., Inorg. Chim. Acta,
2009, 362, 78.
B00014
March 23-26, 2010
141
Computer Study for Characterization of Porous Solid using
Accessible Pore Volume Concept

N. Klomkliang
1
, A. Wongkoblap
1,C
, C. Tangsathitkulchai
1
and D.D. Do
2

1
School of Chemical Engineering, Institute of Engineering, Suranaree University of Technology,
Muang, Nakhon Ratchasima, 30000, Thailand
2
School of Chemical Engineering, University of Queensland, St Lucia, Brisbane, QLD 4072, Australia
C To whom correspondence should be addressed.
E-mail: atichat@sut.ac.th; Fax: +6644-224609; Tel. +6644-224496

ABSTRACT
Pore size distribution (PSD) is one of properties used to characterize the porous solid.
In a determination of PSD, pore volume is one of the variables that are used to present
adsorption properties. This volume is mostly determined from the helium expansion
experiment or from adsorption of some vapors at their corresponding boiling points.
However pore volumes derived from these methods do not correspond to any specific
geometrical volume. An accessible volume proposed by Do et al. [1-3] has been
introduced to solve this issue. The accessible volume is defined as the volume that is
accessible to the centre of mass of the molecular probe of interest, i.e. the solid-fluid
potential energy of that probe is non-positive everywhere in that accessible volume [1,
3]. The porous solid is modeled as an infinite carbon slit pore of either perfect or
defective surfaces, for comparison the absolute pore volume and the accessible pore
volume are use in this study while the fluid is modeled as a single Lennard-Jones (LJ)
molecule of methane. A Grand Canonical Monte Carlo (GCMC) simulation method is
used to calculate the adsorption isotherms of methane in various pore widths at 273 K.
The simulation isotherms and the experimental data of methane in activated carbon are
used to determine the PSD of that carbon. The isotherms obtained from the accessible
volume are greater than those from the common slit pore model. At low pressure
region, the isotherm for defective surface is greater than that of perfect surface;
however the opposite is true when the pressure increases. The PSD obtained for
defective surface is smoother than that for perfect surface and there is no gap between
the PSD which is more reasonable.

Keywords: Absolute and Accessible volumes, Activated carbon, Defective surface,
GCMC simulation, Perfect surface, Pore size distribution.

1. INTRODUCTION
Physical and chemical characterizations of porous solids have attracted attention of
chemists, scientists as well as engineers as they are essential in the description of porous
solids [3-8]. This is due to that many new materials have been synthesized. These materials
have complex interiors and the need for an unambiguous characterization is more important
than ever, especially the characterization for their physical properties [3]. Physical
characterization for pore volume, surface area has been carried out with the appropriate
analysis of adsorption data, in this paper pore volume will be focused as it is important to
know how large the void space that can accommodate adsorbate molecules. When pores are
of molecular dimension, the definition of void volume is somewhat fuzzy and any choice of
void volume is subject to criticism [3].
Molecular Dynamics (MD) and Monte Carlo simulation (MC) are two popular methods
applied to solve numerous adsorption problems involving pore characterization. They allow
us to investigate adsorption from basic information of molecular parameters, such as the
B00014
March 23-26, 2010
142
collision diameter and the well depth of interaction energy. They are widely applied, and in
principle if molecular interactions between atoms or molecules and the solid configuration are
known these methods can describe adsorption properties of the system of interest. The reverse
problem is the one of great interest because given information on adsorption measurements
can one derive some information about the solid properties? This is the physical
characterization problem, and it is still a challenge issue to derive the information from the
adsorption data of the system of interest.
In the characterization problem, one has to match the experimental measurements against
the simulation results to derive some useful information. This seems to be a straightforward
task, but a problem still remains. The principle issue is that conventionally the experimental
measurements are obtained in terms of excess concentration versus pressure while the
simulation results are obtained in absolute quantities. Therefore to match between the two,
one has the convert one form into another; for example some convert the absolute
concentration of simulation to excess concentration. The conversion, unfortunately, subjects
to possible errors because of the various assumptions involved. In this paper the void volume
will be addressed to avoid this issue and to present experimental adsorption isotherms in
terms of absolute quantities. This will reconcile the simulation results and experimental data.
The pore size distribution (PSD) derived from the accessible pore volume concept will be
compared with that derived from the excess concentration conversion method.

2. METHODOLOGY OF CHARACTERIZATION
A Monte Carlo integration method is used to determine the accessible volume and the
geometrical surface area. The values of these physical parameters depend on the choice of the
probe particle. In this study, methane is used as a probe molecule because it is spherical and
non-polar and its adsorption is relatively non-specific. For obtaining the adsorption isotherm,
a Grand Canonical Monte Carlo (GCMC) simulation is used. The Monte Carlo integration
and GCMC method are popularly used in the simulation according to the pioneering work of
Metropolis and co-workers in 1953. Since then it is widely applied to determine the physical
properties of bulk fluid and adsorption on surfaces as well as in pores as described in the
literature [9-11]. The essential equations used in the characterization of porous media are the
potential equations between fluid particles and those between a fluid particle and a solid
surface. A solid surface used in this study is either a structure less carbon surface that is a
continuum surface with constant surface density or a structured surface which is composed of
discrete atoms of carbon for a defective surface.

2.1 Fluid Potential
Pairwise potential of interaction for methane are described by the 1-center
Lennard-Jones (LJ) model. The LJ parameters, o
ff
= 0.373 nm and c
ff
/k = 148.0 K [9-
11] and a cut off radius for two methane particles interaction of five time the collision
diameter (5o
ff
) are used in this study. The potential energy of interaction between two
particles is calculated using the following Lennard-Jones 12-6 equation [9-11]:
( )
|
.
|
\
| o
|
.
|
\
| o
c = m
6 12
ff
r r
4 r
ff ff
ff
(1)
where r is the separation distance between the two particles.

2.2 Solid-Fluid Potentials
This work studies the carbon-based adsorbents structure whose pores typically have a slit-
shaped geometry. The pore width H is defined as the distance between a plane passing
through all carbon atom centres of the outmost layer of one wall and the corresponding plane
of the other wall. The interlayer spacing between two adjacent layers is constant and denoted
as A. When the solid surface is made up of discrete carbon atoms, the interaction energy
B00014
March 23-26, 2010
143
between a methane particle j and the carbon surface is calculated by summing all the pairwise
interaction energies between the methane particle j and a carbon atom k, as show below:

|
|
.
|
\
|
|
|
.
|
\
|
=

6
,
,
12
,
,
,
,
4
k j k j k
s j
r r
| o | o
| o
o o
c m (2)
Here r
j,k
is the distance between the particle j and the carbon atom k. The parameter and
denotes the adsorbate and the species of carbon atom k. The cross parameters are computed
from the Lorentz-Berthelot rule, ( ) 2 /
, , , | | o o | o
o o o + = and
| | o o | o
c c c
, , ,
= .
The Steele 10-4-3 potential also can be used to calculate the fluid-solid interaction
potential
sf
in the case of infinite slit pore and the structure less carbon surface [12].
( )
( )

A + A
|
|
.
|
\
|
|
|
.
|
\
|
A =
3
4 4 10
2
61 . 0 3
5
2
2
z
z z
z
sf sf sf
s sf sf sf
o o o
o p tc m (3)
where z is the distance from graphite surface,
s
is the solid density and is the interlayer
graphene spacing. For a slit pore, the fluid molecule will interact with both sides of the pore
wall; then the total potential is given by
( ) ( ) |
.
|
\
|
+ = z H z z U
sf sf ext
m m (4)
where H is the pore width defined as the distance between a plane passing through all carbon
atom centres of the outmost layer of one wall and the corresponding plane of the other wall.
For a homogeneous graphite surface, A = 0.3354 nm; o
ss
= 0.34 nm; c
ss
/k = 28 K,
s
=
114x10
27
m
-3
.

2.3 Simulation Methodology
The Grand Canonical Monte Carlo (GCMC) ensemble [10] is used in adsorption study. It
is the natural choice for simulation of adsorption because in this ensemble the temperature,
volume and chemical potential of the system are specified. Given the equality of the bulk
phase and adsorbed phase chemical potentials at equilibrium, we can obtain the adsorption
isotherm through an appropriate equation of state which relates the chemical potential to the
bulk phase pressure. One GCMC cycle consists of one thousand displacement moves and
attempts of either insertion or deletion with equal probability. For an adsorption branch of the
isotherm, 20000 GCMC cycles are typically needed for the system to reach equilibrium, and
additional 20000 cycles are used to obtain ensemble averages. For each point on the
adsorption branch, we use an empty box as the initial configuration, and the simulation is
carried out until the number of particles in the box does not change (in statistical sense).
Pore Density
The average pore density (
av
) can be defined as the ratio of the average number of particle to
the pore volume as the following equation.
pore
av
V
N
= p (5)
In the case of new concept of accessible volume, the Monte Carlo integration is used to
calculate the pore volume of solid model of either perfect or defective surfaces. While in the
case of infinite slit pore model, volume is defined as L
x
L
y
W, where L
x
and L
y
is a linear
dimension in x and y of graphene layer, respectively, and W is the experimental pore width
which is the subtraction between pore width and a collision diameter of solid atom.

3. MODEL OF ACTIVATED CARBON
The solid surface is a source of an external force field which strongly affects the behavior
of adsorbate molecules. In computer simulation, the molecular structure of activated carbon is
assumed and completely known which is different from the experiments. The micropores in
activated carbons are slit shape with two parallel pore walls consisting of many graphene
B00014
March 23-26, 2010
144
layers. Therefore, the molecular model of activated carbon is usually assumed to be the slit
pore model of infinite extent in x and y directions and two walls of the slit pore are separated
by a distance H in z direction. In this study two solid models of activated carbon are used one
has a perfect surface while another is the defective surface.
A more realistic model of the pore wall structure for an infinite slit pore model has been
developed by a number of workers in the literature [13]. The pore width is visualized as the
gap between two walls and denoted as H. Each wall is taken to be the basal plane of graphene
layers made up of Lennard-Jones (LJ) carbon atoms of collision diameter of
ss
with the
interlayer spacing between two adjacent layers of 0.3354 nm and denoted as . These
graphene layers are arranged in parallel and offset between each others as shown in Figure 1.
The stacking of the graphene layers is in a hexagonal arrangement. This structural pore wall
model makes the possibility of calculating the interaction between the adsorbate molecule and
the individual carbon atoms. A linear combination of theoretical isotherms obtained for
different pore sizes is used to determine pore size distributions of real activated carbons from
experimental data.

Figure 1. a model of carbon slit-like pore with two structural pore walls

Another infinite slit pore model is a model of the inner graphene layers being defective
proposed by Do and co-workers [14]. To model a surface with defect, a carbon atom in this
surface is randomly selected and removed as well as all surrounding neighbors whose
distances to the selected atom less than an effective defect radius, R
c
. This randomly selection
of carbon atom is repeated until the percentage of carbon atoms removed has reached a given
value. The two important parameters for modeling of a non-graphitized surface are the
percentage of defect and the size of the defect, which is measured by the effective radius [14].
This model shows the interplay not only between the surface heterogeneity and the fluid-fluid
interaction but also with the overlapping of potentials exerted by two walls of the pore. The
entire set of local isotherms obtained for different pores with defective surfaces is used to
characterize activated carbons with experimental adsorption isotherm. Figure 2 shows the
carbon pore model with defective walls proposed by Do & Do [14].

Figure 2. Schematic model of infinite slit pore with defective walls

We shall start our discussion by presenting the accessible volumes obtained from new
concept of Do et al. [1, 3] and those obtained from the physical pore width of infinite pores.
Then the adsorption isotherms of methane at 273.15 K from the proposed method will be
presented and compared with those obtained for those normally assumed infinite pore. Next
the simulation isotherms will be used to compare against the experimental data to obtain the
Pore Size Distribution of activated carbon used in this study.
B00014
March 23-26, 2010
145

4.1 Pore Characteristic Properties
The accessible pore volume is defined as the volume accessible to the centre of a particle
at zero loading. This is determined by the Monte Carlo method of integration. We insert a
methane particle at a random position in the simulation box, calculate its potential energy
with the solid, and then remove that particle from the box. If the solid-fluid potential of that
particle is either zero or negative, that insertion is counted as a success. If the solid-potential
energy is positive, we count that insertion as a failure. Repeating this insertion process many
times, the fraction of success is f and then the accessible volume is simply f times the volume
of the simulation box, V
pore
= f V
box
. This is the accessible volume to the centre of a particle.
In this study, we choose the values for physical width, H from 0.7 to 4 nm to represent the
micropore and mesopore in activated carbons. Therefore the number of pore is 39. The linear
dimension in x and y directions are 2.754 and 3.292 nm, respectively for H of 0.7 to 2 nm
while those for H of 2.1 to 3.0 nm are 3.606 and 4.276 nm. For H = 3.5 nm, L
x
= 5.310 nm
and L
y
= 5.752 nm and for H = 4.0 nm, L
x
= 6.162 nm and L
y
= 6.244 nm. While the
conventional pore volume equals to L
x
L
y
(H-
ss
) as presented in Section 2.3. The selected
accessible pore volume and the conventional pore volume for pores of perfect surface and
those for pores of defective surface are shown in Table 1 and 2, respectively.
As expected, the accessible width H
acc
for the pore with defective surface is greater than
that without defects. The geometrical surface area obtained by the proposed method is also
greater than that calculated from the linear dimension in x and y directions. This is due to that
a fluid particle can access into the groove between solid particles. In the case of perfect
surface, it is greater about 3 % while in the case of defective surface, it is greater about 10%.
The defective surface used in this study has surface area greater than the perfect surface about
7%. Therefore the volume used to calculate the pore density for the proposed method is less
than that normally assumed for the infinite pore model, while the surface area for the
proposed model is greater than that for the conventional method. Next we shall show the
differences between the adsorption isotherm obtained from the accessible volume and that for
the conventional pore volume.

Table 1. Pores with perfect surface and the physical properties obtained from the Monte
Carlo integration for methane at 273.15K
H (nm) V
box
fraction, f V
acc
Surface S
geo
H
acc
H
acc
(nm)
0.7 416.0697 0.04994542 20.780776 130.32751 134.362 0.309325 0.1153782
0.8 433.5399 0.08469351 36.718012 130.32751 134.362 0.546553 0.2038643
0.9 451.01 0.11814733 53.285633 130.32751 134.362 0.793165 0.2958504
1.0 468.4802 0.15004891 70.294948 130.32751 134.362 1.046351 0.3902888
1.1 485.9504 0.18009047 87.515044 130.32751 134.362 1.302674 0.4858976
1.2 503.4206 0.2082422 104.8334 130.32751 134.362 1.560461 0.5820519
1.3 520.8908 0.23460522 122.20369 130.32751 134.362 1.81902 0.6784945
1.4 538.3609 0.25933984 139.61844 130.32751 134.362 2.078241 0.775184
1.5 555.8311 0.28253358 157.04095 130.32751 134.362 2.337578 0.8719166
1.6 573.3013 0.30435574 174.48754 130.32751 134.362 2.597273 0.9687828
1.7 590.7715 0.32490357 191.94374 130.32751 134.362 2.857111 1.065702
1.8 608.2416 0.34429678 209.41565 130.32751 134.362 3.117183 1.162709
1.9 625.7118 0.36261498 226.89246 130.32751 134.362 3.377328 1.259743
2.0 643.182 0.37989961 244.34457 130.32751 134.362 3.637105 1.35664
2.5 1242.451 0.45375335 563.76649 221.65409 228.028 4.944714 1.844378
3.0 1917.539 0.51195776 981.69879 305.55447 313.977 6.253316 2.332487
3.5 3049.656 0.55899494 1704.742 439.06188 450.553 7.567334 2.822615
4.0 4212.392 0.59777108 2518.0461 553.09142 567.202 8.878836 3.311806

B00014
March 23-26, 2010
146
Table 2. Pores with defective surface and the physical properties obtained from the Monte
Carlo integration for methane at 273.15K
H (nm) V
box
fraction, f V
acc
Surface S
geo
H
acc
H
acc
(nm)
0.7 416.0697 0.074231 30.88527 130.32751 143.267 0.431158 0.160822
0.8 433.5399 0.108115 46.872164 130.32751 143.267 0.654335 0.244067
0.9 451.01 0.141122 63.64744 130.32751 143.267 0.888518 0.331417
1.0 468.4802 0.172218 80.680727 130.32751 143.267 1.126302 0.420111
1.1 485.9504 0.201527 97.932126 130.32751 143.267 1.367132 0.509940
1.2 503.4206 0.228805 115.18514 130.32751 143.267 1.607984 0.599778
1.3 520.8908 0.254363 132.49533 130.32751 143.267 1.849634 0.689913
1.4 538.3609 0.278289 149.81992 130.32751 143.267 2.091485 0.780124
1.5 555.8311 0.300862 167.22846 130.32751 143.267 2.334508 0.870772
1.6 573.3013 0.322089 184.65404 130.32751 143.267 2.577769 0.961508
1.7 590.7715 0.342108 202.10764 130.32751 143.267 2.821421 1.052390
1.8 608.2416 0.361044 219.60199 130.32751 143.267 3.065642 1.143484
1.9 625.7118 0.379038 237.16855 130.32751 143.267 3.310871 1.234955
2.0 643.182 0.395849 254.60294 130.32751 143.267 3.554255 1.325737
2.5 1242.451 0.459564 570.98594 221.65409 242.196 4.715078 1.758724
3.0 1391.013 0.517477 719.8173 221.65409 242.196 5.944095 2.217147
3.5 3049.656 0.57192721 1744.181 439.06188 483.222 7.21897 2.692676
4.0 4212.392 0.60713151 2557.4757 553.09142 610.574 8.37729 3.124729

4.2 Adsorption isotherm of methane at 273.15 K
The simulated isotherms versus pressures for methane at 273.15 K calculated using
accessible pore volume of various sizes are obtained by using the GCMC method, for clarity
only selected isotherms are shown in Figures 3 and 4 for perfect surface and defective
surfaces, respectively. For comparison, the isotherms for methane in the infinite pore
corresponding to the same physical width calculated using the conventional pore volumes are
also shown in the same figures.

Figure 3. Simulated adsorption isotherms of methane at 273.15 K in the perfect
surface calculated using a) accessible volume and b) conventional volume.
B00014
March 23-26, 2010
147

Figure 4. Simulated adsorption isotherms of methane at 273.15 K in the defective
surface calculated using a) accessible volume and b) conventional volume.

As one can see from Figures 3 and 4, the adsorption behavior of methane calculated using
either accessible volume or conventional volume at 273.15 K shows a monotonic pattern with
respect to pressure. For smaller pores, the change in density is continuous and this is due to
the continuous filling of a monolayer in those pores. The isotherm decreases with increasing
of pore widths. This is due to the packing effect that leads to the difference maximum density
in each pore. However in the case of pores having physical width of 1.0 nm, we observed that
at pressures greater than 1 MPa, the maximum density is greater than that of 0.9 nm pore.
This is due to that methane can tightly form as two layers within that pore.
However the following differences are observed in the case of using accessible volume (i)
a greater adsorption isotherm and (ii) a greater maximum density. This is due to the lower
value of volume used in the calculation of density. The accessible volume for the pore is less
than the pore volume estimated from the physical width as discussed in Section 4.1. The
isotherm of pore with defective surface is greater than that of perfect surface at pressures
lower than 0.1 MPa after that the reverse is true. This is due to the effect of surface
heterogeneity which methane molecules are initially adsorbed at the stronger sites on
defective surface [15].

4.3 Pore size distribution obtained by using experimental data and simulation
isotherms using accessible volume
Having seen the simulation studies for the new concept to obtain the accessible volume
and the geometrical surface area, now we turn to the comparison between our simulation
results against the experimental data to determine the pore size distribution. The experimental
data for methane adsorption in Ajax activated carbon with a BET surface area of 1200 m
2
/g
was obtained at 273.15 K by using a volumetric adsorption experimental rig built in The
University of Queensland laboratory. The detail of the rig and the experimental measurement
can be found in the literature [2]. By using this rig, the total amount of methane in the
experiment is unambiguously known due to knowing the volume, temperature and pressure of
the reference section and then the total amount of methane in the adsorption cell is easily and
unambiguously calculated.
We use Matlab optimization method to match the theoretical results against the
experimental data to determine the pore volume distribution. The pore size distribution as a
histogram plot of specific pore volume versus characteristic size of pore j (H
acc
)
j
is shown in
Figures 5. The PSD obtained for defective pores is smoother than that for the perfect pores.
The comparison between the simulation results and experimental data for methane at 273.15
K is presented in Figure 6, the agreement between the combined isotherms and the
experimental data is satisfied.
B00014
March 23-26, 2010
148

Figure 5. Pore size distributions obtained from the pores with perfect surface (a) and those
with defective surface (b) and the experimental data at 273.15 K.

Figure 6 Adsorbed amount in adsorption cell at 273.15 K from experiment (circle symbols),
simulated isotherm for perfect pore (solid line) and that for defective pore (dashed line) using
the pore size distribution in Figures 5.

5. CONCLUSION
In this paper, the Monte Carlo integration is used to calculate the accessible volume while
the GCMC is used to obtain the adsorption isotherms of methane at 273.15 K by calculated
from accessible volume and conventional volume. The accessible volume is less than the
normally assumed volume and this lead to the greater isotherms in the case of determination
from the accessible volume for the width. The PSD for perfect and defective surfaces show
similarity however that obtained for the defective surface is smoother. The simulation results
agree very well with the experimental data.

REFERENCES
1. Do, D.D., Do, H.D., Nicholson, D., Herrera, L. and Wongkoblap, A., the 5
th
Pacific Basin
Conference on Adsorption Science and Technology Book of Abstract, 25-27 May, 2009,
Singapore.
2. Birkett, G. and Do, D.D., Langmuir, 2006, 22, 622-7630.
3. Do, D.D. and Do, H.D., J. Colloid and Interface Sci., 2007, 316, 317-330.
4. Jaroniec, M. and Kaneko, K., Langmuir, 1997, 13, 6589-96.
5. Kaneko, K., J. Membrane Science, 1994, 96, 59-89.
B00014
March 23-26, 2010
149
6. Kaneko, K., Ishii, C., Kanoh, H., Hanzawa, Y., Setpyama, N. and Suzuki, T., Adv. Coll.
Inter. Sci., 1998, 76, 295-320.
7. Neimark, A. and Ravikovitch, P., Langmuir, 1997, 13, 5148-60.
8. Sunaga, M., Ohba, T., Suzuki, T., Kanoh, H., Hagiwara, S. and Kaneko, K., J. Phys.
Chem., 2004, 108, 10651-7.
9. Nicholson, D. and Parsonage, N., Computer Simulation and the Statistical Mechanics of
Adsorption, Academic Press, New York, 1982.
10. Allen, M.P. and Tildesley, D., Computer Simulation of Liquids, Oxford University Press,
Oxford, 1987.
11. Frenkel, D. and Smit, B., Understanding Molecular Simulation, 2
nd
edition, Academic
Press, New York, 2002.
12. Steele, W.A., Surf. Sci., 1973, 36, 317-352.
13. Nguyen, C. and Do, D.D., Langmuir, 1999, 15, 3608-15.
14. Do, D.D. and Do, H.D., J. Phys. Chem. B, 2006, 110, 17531-8.
15. Wongkoblap, A. and Do, D.D., Carbon, 2007, 45, 1527.

ACKNOWLEDGMENTS
We acknowledge Suranaree University of Technology, Thailand for the financial support and
The University of Queensland, Australia for the experimental work. This research was also
made possible by the Thailand Research Fund and the Australian Research Council.

B00016
March 23-26, 2010
150
Direct QM/MM simulations of excited state dynamics of
Rhodopsin chromophore in different environments

C. Punwong
1,
, and T. J. Martnez
1,,C

1
Center for Biophysics and Computational Biology and Department of Chemistry, University of Illinois
at Urbana-Champaign, Urbana, Illinois, 61801, USA
Present address:

Department of Physics, Faculty of Science, Prince of Songkla Univeristy, Hatyai,
Songkhla, 90112, Thailand
Present address:

Department of Chemistry, Stanford University, Stanford, California 94305, USA
C
E-mail: Todd.Martinez@stanford.edu; Fax: +1650-736-7981; Tel. +1650-7368860

ABSTRACT
11-cis Retinal Protonated Schiff Base (RPSB) is the chromophore employed by
Rhodopsin (Rh), the membrane protein responsible for visual perception in the eye.
Upon photoexcitation, RPSB isomerizes from 11-cis to all-trans conformation and
triggers the protein conformational changes that lead to the visual perception cascade.
The photoisomerization of RPSB in Rh is very efficient with an ultrafast rate and a high
isomerization quantum yield. On the other hand, the rate and yield of isomerization in
solution are decreased from the results in Rh. An important unresolved question is the
role of the protein environment in altering the photochemical mechanism. We have
investigated the detailed photochemical mechanism in RPSB using the dynamics
simulations, which are carried out in isolation as well as a solvated (methanol) and
protein environments. We use the full multiple spawning method to describe quantum
mechanical effects of the nuclear degrees of freedom. A reparameterized multireference
semiempirical method is used to describe the ground and excited electronic states of the
chromophore and the environment is represented with an empirical force field
(QM/MM). The potential energy surfaces and their couplings are determined on the
fly, i.e. simultaneously with the dynamic evolution. We find dramatic differences
between the photodynamics of RPSB in isolation and in solution. The isomerization
occurs in hundreds of femtoseconds for the isolated case, but is slowed down to several
picoseconds in MeOH solution. We compare our results in MeOH to experimental
results and find good agreement. The results in protein environment also agree well
with the experiments and are compared with the gas phase and solvated results to
provide a much more complete picture of the role of complex environments in
influencing photochemical mechanism and achieving bond selective isomerization.

Keywords: Rhodopsin, Retinal protonated Schiff base, photochemistry, QM/MM, full
multiple spawning method

B00017
March 23-26, 2010
151
Virtual Screening on Neuraminidase Inhibitors activity of
plant- derived natural products by using Pharmacophore
Modelling and Docking

Muchtaridi
1
, Habibah. A. Wahab
1,2,C

1
Laboratory of Pharmaceutical Design and Simulation, School of Pharmaceutical Sciences, University
Sains Malasysia, 11800, Penang, Malaysia.
2
Malaysian Institute of Pharmaceuticals and Nutraceuticals, Ministry of Science, Technology and
Innovation, Block A, SAINS@USM, 11900, Penang, Malaysia.
Permanent office : Pharmaceutical chemistry Laboratorium of Faculty of Pharmacy, Universitas

Padjadjaran, Jl. KM 21,5 Bandung-Sumedang, Jatinangor, Indonesia
C
Email: habibahw@usm.my

ABSTRACT
Neuraminidase (NA) plays an important role in the replication and release of new avian
influenza virion. Due to this event, NA had been considered as a valid target in drug
design against influenza virus. The aim of this study is to identify the new neuraminidase
inhibitors, using pharmacophore modeling and docking based virtual screening from
natural compounds of Malaysia. A variety of natural compounds from plant sources of
Malaysia which collected in NADI database has been screened to possess substantial
neuraminidase inhibitors properties. Pharmacophore model was developed using Catalyst
software embedded in Discovery Studio 2.1 by using sialic acid derivatives which act as
N1 inhibitors. The pharmacophore hypothesis selected had five features (one hydrogen
bond donor (D), one negative ionizable (N), one positive ionizable (P), and two
hydrophobic moiety (Hy), also included two excluded volume. Best pharmacophore was
validated by Hyporefine in DS 2.1 which has a lowest total cost value (92.055), the
highest cost difference (107.807), the lowest RMSD (1.197), and the best correlation
coefficient (0.944651)..The X-ray crystal structure of Neuraminidase N1 (1F8B) was
used in the docking studies, using Autodock 3.0.5 software, and the free energy of
binding was used to rank the hits, whereas oseltamivir and DANA have been used as the
ligands in controlled docking. In silico screening had been carried out on the compounds
of our laboratories database (NADI). The mapping of NADI database natural compounds
shows that 46 compounds of 2350 NADI compounds map to three feature of common
pharmacophore training set. MSC458 (Morinda citrifolia) have the highest fit value with
free energy docking -9.48 kcal/mol (Ki 1.12x 10
-7
) into neuraminidase, whereas MSC605
have the most negative free energy of bind (-12.10) and Ki 1.35x 10
-9
.

Keywords: Neuraminidase, Pharmacophore, Docking, Virtual Screening

Abbreviation : NADI : Natural-Based Drug Discovery Intelligent, N1 : Neuraminidase 1,
DANA: 2-deoxy-2,3-dehydro-Neu5Ac

B00017
March 23-26, 2010
152
1. INTRODUCTION
Influenza virus has impacted the world since long time ago. It infects nearly 20% of the
human population of the world [1]. This virus consists of a membrane-enveloped, segmented,
negative-strand RNA viruses, and two glycoproteins on the surface of the hemagglutinin (HA)
and neuraminidase (NA). HA and NA have a major role in viral replication. HA is a trimer
macromolecules responsible for attachment to the surface of the cell receptors that are associated
with terminal sialic acid [2] whereas NA is an important viral enzyme that play a role in virus
proliferation and infectivity. Therefore, blocking its activity generates antivirus effects, making it
attractive to target this viral glycoprotein for the design and development of anti influenza drugs.
In influenza virus, it was found that the enzyme active site of viral NA of both virus A and B is
highly conserved in amino acid sequence variation.

In the recent time, the analog NA inhibitors was developed from chemical reagent starting as
lead compound, but little case which using by natural plants bioactive compounds as starting
materials for be developed lead compounds of NA Inhibitors, because Malaysia or Indonesia has
great potential to develop her abundant natural resources to increase the market based on herbal
products.

NADI (Natural Based Drug Discovery) [3] is a database of Malaysian medicinal plants which
aims to be a one-stop centre for in silico drug discovery from natural products. NADI was
developed with the aim to assist the research in performing drug discovery at Universiti Sains
Malaysia. It provides structural information on 3000 different compounds along with the
information of the botanical sources of plants species that could be used in virtual screening.


Materials
Database of Bioactive Compounds in NADI (USM)
The bioactive compounds of diversity Laboratories collection in NADI, was used as a test set
in this virtual screening. The database, at present, contains 3000 compounds of Malaysia natural
product which geometry have been built and optimized using MOPAC semi-empirical
programme.

Training Set : 24 representative structures obtained from the literature (Table 1) was taken
for training set in this study.

Software for Virtual Screening : Hyperchem 7.0. (Hyperchem, Inc), Autodock 3.0.5
(Molecular Graphics Laboratory, The Scripps Research Institute), Discovery Studio 2.5.
(Accelryl Inc.).

Hardware : The computational molecular modeling studies were carried out using Catalyst
in Discovery Studio 2.5. (Accelrys, San Diego, CA) running on Windows XP in a Dual Core
processor 2 GHz (Intel, Santa Clara, CA), whereas Autodock 3.0.5 running on LINUX Fedora in
a Dual Core Processor 3 GHz.

B00017
March 23-26, 2010
153
Methods
Generation of Conformation Library of Bioactive Compounds: For the training and
test sets molecules, conformational models representing their available conformational space
were calculated. All molecules were built using the 2D and 3D sketcher of Hyperchem 7.0, and
optimized using MM2 in Hyperchem 7.0. A conformational set was generated for each molecule
using the poling algorithm and the best energy option, based on CHARMm force field from
Discovery Studio 2.5 [4]. The molecules associated with their conformational models were
mapped onto the pharmacophore model using the best fit option to obtain the bioactive
conformation of each molecule. Generation of Pharmacophore Hypothesis: All the
pharmacophore modeling calculations were carried out by using the Discovery Studio 2.1
software package (Accelrys, San Diego, USA). The HipHop modules within Catalyst in D.S 2.1
were used for the constructions of qualitative and quantitative models, respectively. The features
considered were H-bond donor (D), hydrophobic (Hy), H-bond acceptor (A), and positive
ionizable (P), and negative ionizable (N), and also excluded volume included (E). Validation of
Pharmacophore Models: Based on the information of the qualitative models, the quantitative
pharmacophore models were created by Hypogen within Catalyst in DS 2.1 packages. 255
Conformers was chosen to be minimized as best conformation, and 20 kcal/mol was set as energy
threshold as global energy minimum for conformation searching [5], this protocol is available in
DS 2.1 packages. The best pharmacophore models was validated according to Deng et al. [6] in
terms of cost functions and other statistical parameters which were calculated by HypoRefine
module during hypothesis generation. A good pharmacophore model should have a high
correlation coefficient, lowest total cost and RMSD values, and the total cost should be close to
the fixed cost and away from the null cost. The best pharmacophore model was further validated
by test set method and Fischers randomization test [7]. Pharmacophore-Based Virtual
Screening of NADI database: Out of the 3000 molecules contained in NADI database, 2350
molecules were filtered as drug-like molecules which were then converted into separate Catalyst
libraries. Using the Ligand Pharmacophore Mapping protocol, the Best Mapping was performed
with the rigid fitting method and maximum omitted features were set to zero, and one [8].
Molecular Docking: Linux operating system version Fedora 6 Redhat on dual core processor
was used, to screen the potential bioactive compounds from NADI database. Ligands dataset was
already available in pdb file. The neuraminidase protein of subtype N1 binding with DANA
complex (PDB code : 1F8B)[9] and oseltamivir (2HUO) [10] was used as the target. Docking
simulations were performed with AutoDock [11]. The AutoDockTools (ADT) script was used to
convert the ligand PDB to the pdbq format by adding Gasteiger charges, checking polar
hydrogens and assigning ligand flexibility. In addition, the ADT was also performed to prepare
the protein targets for the simulations. Using ADT interface, the Kollman charges were added for
the macromolecule and a grid box of 60 x 60 x 60 points, with a spacing of 0.375 , centered on
the binding site for the co-crystallized ligand (26.507; 17.972; 57.828) was setup for AutoGrid
and AutoDock calculations.

The common feature pharmacophore model was generated by HipHop from four most active
compounds. The results of HipHop model identified five feature (one hydrogen-bond donor (D),
two hydrophobic moiety (Hy), one negatively ionizable (N), and one positive ionizable (P)), as
can be shown in Figure 2. After that, these features were chosen as the initial chemical features
in the quantitative pharmacophore modeling which generated by Hypogen. To consider the steric
effect, the value for excluded volume was set to 2 [5].
B00017
March 23-26, 2010
154
O
OH CH
3
N H
O
C H
3
NH
3
+

1 (1 nM)
[12]
O
OH CH
3
C H
2
N H
O
C H
3
NH
2

2 (1 nM)
[12]
O
OH CH3
C H3
N H
O
C H3
NH3
+

3 (3.019nM)
[12]
O
O H
O
O
N
C H
3
C H
3
N H
CH
3 O
NH
3
+

4 (5.011nM)
[13]
O
OH CH
3
C H
3
N H
O
C H
3
NH
3
+

5 (10nM)
[13]
O
OH
O
O
N
CH
3
C H
3
N H
CH
3
O
N H NH
2
+
NH
2
6 (10.715nM)
[14]
O
OH
N H
O
CH
3
NH
3
+

7 (25.118nM)
[13]
O
OH
N H
O C H
3
NH
3
+

8 (60.256nM)
[13]
O
OH
O
O
N
CH
3
C H
3
N H
C H
3
O
NH
3
+

9 (87.096nM)
[14]
O
OH
C H
3
N H
CH
3
O
NH
3
+

10 (100nM)
[12]
O
OH CH
3
C H
3
N H
O
C H
3
NH
3
+

11 (128nM)
[13]
O
OH
F
F
F
NH
C H
3
O
NH
3
+

12 (223.87nM)
[13]
O
OH
O
O
NH
C H
3
N H
CH
3
O
NH
NH
2
+
NH
2
13 (251.188nM)
[12]
O
O
-
N
N H2 O
N
O
OH
CH3
CH3
1
4 (1288.0nM)
[15]
O
O
-
N
NH
2
O
N
CH
3
C H
3
CH
3

15 (1584.89nM)
[15]
O
O
-
N
NH
2
O
N
O H
CH
3
C H
3
16 (1995.2nM)
[15]
O
O
-
N
N H
2
O
N
O H
CH
3
C H
3

17 (2089.30nM)
[15]
O
OH
C H
2
N H
C H
3
O
NH
3
+

18 (2187.76nM)
[13]
O
OH
O
O
NH
N H
3
+
NH
CH
3
O
NH
N H
2
+
NH
2

19 (11749.0nM)
[12]
O
O
-
N
N H2 O
N
O
OH
CH3
CH3

20 (19054.0nM)
[15]
O
O
-
N
O
N
C H
3
CH
3
C H
3
O
CH
3
NH
2

21 (18848.0nM)
[15]
O
O
-
N
N H2
O
N
NH2
CH3
CH3

22 (45708nM)
[15]
O
O
-
N
N H
2
O
N
O H CH
3
C H
3

23 (20893.0nM)
[15]
O
O
-
N
O
N
CH
3
CH
3
C H
3
S
O
O
CH
3
NH
2

24 (128825 nM)
[15]

Figure 1. Sialic Acid Derivatives Structure, Activity (IC50), and Reference Data of Training Set

24 Compounds (Fig. 1) selected were used as the training set compounds in the Hypogen run in
3D-QSAR Pharmacophore DS 2.1 packages. The 10 hypotheses were produced and the results of
statistical parameters are given in Table 1. The best hypothesis (Hypo1), shown in Fig. 3(i), is
characterized by the lowest total cost value (92.055), the highest cost difference (84.395), the
lowest RMSD (1.197), and the best correlation coefficient (0.944651). The fixed cost and null
cost are 97.2168 and 197.498 bits, respectively. Hypo1 contains five features: one hydrogen-bond
donor (D), two Hydrophobic aliphatic moiety (Hy), one negatively ionizable (N), and one
positive ionizable (P). Two excluded volumes are also involved in Hypo1. The 3D space and
distance constraints of these pharmacophore features are shown in Fig. 3(ii).

Figure 2. HipHop pharmacophore model for NA inhibitors. (i) The HipHop pharmacophore model. (ii)
The HipHop model mapped with the most active compound 1 in the training set.
Pharmacophore features are color coded; magenta: hydrogen-bond donor (D), blue
hydrophobic feature (Hy), dark blue negative ionizable(N), red positive ionizable
(P), and grey excluded volume.
B00017
March 23-26, 2010
155
Table 1. Statistical parameters of the top 10 hypotheses of Neuraminidase inhibitors generated by
HypoRefine program
Hypo no. Total cost Cost diff.
1
RMSD () Correlation (r) Features
2

1. 113.103 84.395 1.03907 0.944357 DHyHyNPEE
2. 114.131 83.367 0.98385 0.952229 ADHyPEE
3. 116.433 81.065 1.10535 0.937879 ADHyHyPE
4. 120.136 77.362 1.31857 0.907315 ADHyHyPEE
5. 121.363 76.135 1.29581 0.912158 DHyNPE
6. 121.742 75.756 1.42300 0.889973 ADHyHyPEE
7. 122.472 75.026 1.44230 0.886802 ADHyHyPEE
8. 123.657 73.841 1.41536 0.892491 DHyHyNPE
9. 123.752 73.746 1.32968 0.908464 ADHyPE
10. 123.917 73.581 1.32585 0.909342 DHyNPEE
1
(Null cost-total cost), null cost =197.498, fixed cost =97.2168, configuration cost =15.3653.
2
A, D, Hy, N, P, and E represent hydrogen-bond acceptor, hydrogen-bond donor, hydrophobic feature,
Negative Ionizable, Positive Ionizable, and excluded volume, respectively.

(i) (ii) (iv) (v)
Figure 3. Best Pharmacophore with Validation by Hyporefine Run in DS 2.1. (i) The best HypoRefine
pharmacophore model, Hypo1. (ii) 3D spatial relationship and geometric parameters of Hypo1.
(iii) Hypo1 aligned with the most-active compound 1 (IC
50
: 1 nM). (iv) Hypo1 aligned with the
least active compound, Compound 24 (IC
50
: 128825 nM). Pharmacophore features are color
coded; magenta: hydrogen-bond donor (D), blue hydrophobic feature (Hy), dark blue
negative ionizable(N), red positive ionizable (P), and grey excluded volume.

Validation of Pharmacophore Model
1. Fischer Randomization test
Fischer Randomization is provided in the DS 2.1 package to evaluate models of best
pharmacophore. The confidence level was set to 95 % and produced a total 19 random
spreadsheets (Figure 4). From the figure, we can see that the correlation (r
2
) of all pharmacophore

models generated using the 19 random spreadsheet are much lowest than the correlation of
corresponding original pharmacophore (blue line) which have r
2
value greater than 0.9. These
results provide confidence on our pharmacophore.

Figure 4. The difference correlation of hypotheses between the initial spreadsheet and 19
random spreadsheets after CatScramble run.
B00017
March 23-26, 2010
156
2. Test set
Hypo1 was applied against the 96 (www.bindingDB.org) test set compounds which gave a
correlation coefficient of between experimental and estimated activities as shown in Fig 5.

3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10
3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
Experiment Activity (pIC50)
P
r
e
d
i
c
t
e
d

A
c
t
i
v
i
t
y

(
p
I
C
5
0
)
Training Set 24 compounds
(r=0.94)
Test Set 96 compounds
(r=0.84)

Figure 5. Plot of the correlation (r) between the experimental activity and the predicted activity
by Hypo1 for the test set molecules (red) and the training set molecules (blue).

Figure 5 showed that there is a significant correlation between the experimental values
and estimated values, where the correlation value of the test set is 0.84, which means that 84% of
the total variation in test set can be explained by the linear relationship between experimental
activity and predicted activity. This observation further proves that our pharamacophore model is
a good model.

Virtual Screening of NADI database based on Feature Pharmacophore of Training Set
In this study, 45 compounds from NADI database were successfully captured by using
four features with set maximum omitted 1 (allowed one missing feature). There are no
compounds of NADI which have high affinity to the five feature into D, Hy, N and P by set
maximum omitted zero (no missing feature).
All compounds captured into feature pharmacopore model, have carboxylic acid and
hydroxyl groups. Carboxylic groups of NADI compounds mapped into the positive ionizable
feature, such as shown by sialic acid derivatives. The hydroxyl group satisfactorily mapped into
H-bond donor feature. On the other hand, the carbonyl or the enol groups, and aromatic moiety
and alkyl chain mapped into the hydrogen bond acceptor and hydrophobic features, respectively.

Docking Study
All 45 compounds that mapped into pharmacophore model were docked into
neuraminidase enzyme (PDBcode: 1F8B) using Autodock 3.0.5. The docking orientation of
NADI compounds were compared to oseltamivir (PDBcode: 2HUO)[10].
From molecular docking study, MSC605 has shown the most negative free energy of
binding (-12.10 kcal/mol). In Figure 5, -COOH of oseltamivir (grey carbon) and MSC607 (black
carbon) gave similar charge and H-bond interaction with Arg371, Arg118 and Arg292, and N-
acetyl of oseltamivir and OH-C24 of CGA interact into Asp151, and Glu227 residue, while
MSC605 didnt have interaction into Glu119 and Asp151 as shown at NH
3
+
-OSTM. Isopentyl
groups of oseltamivir and aromatic moiety of MSC605 given strong hydrophobic interaction into
Glu 276, Glu 277, Ala 246 and Arg 224.
B00017
March 23-26, 2010
157

Figure 5. Conformation of MSC605 (black carbon) and OSTM X ray (pink carbon) bonded N1, surface
visualizased with DS 2.1.

In addition, MSC927 was found that binding interaction into N1 (1F8B) [9] have
similarity with DANA X-ray, although have fit value (mapping) and free energy docking less
than better if be compared with Oseltamivir (2HUO)[10] and DANA X-ray (1F8B)[9]. MSC927
also have mapping similar with OSTM, but MSC927 doesnt has positive ionizable feature.
Figure 6b and 6c show CGA and OSTM interaction at N1.

(i) (ii)
Figure 6. (i) Conformation of MSC927 (green) and OSTM X ray (orange) bonded N1, surface
visualizased with DS 2.1. Non-polar hydrogen atom at ligand did not seems in order to
clear. B. (ii) Pocket area of site active of neuraminidase A (green : NI, red : strong
hydrophobic, purple : PI, blue : weak hydrophobic) with visualizased VMD 1.8.5 by
Linux

4. CONCLUSION
The best quantitative pharmacophore model, Hypo1 showed the lowest total cost value (92.055),
the highest cost difference (107.807), the lowest RMSD (0.966317), and the best correlation
coefficient (0.941732) compared to other models. Hypo1 contains four features: one hydrogen-
bond donor (D), one Hydrophobic aliphatic moiety (Hy), one negatively ionizable (N), and one
positive ionizable (P). There were 45 compounds of 3000 NADI database that showed high
predictive affinity when matched into Hypo1s HBD, Hy, NI and PI by optimizing the minimum
predicted activity to 1 mM. This is also further predicted by docking results showed that MSC605
and MSC927 possessed the best binding interaction into site active of neuraminidase A (1F8B).

B00017
March 23-26, 2010
158
REFERENCES
1. Moscona, A., Oseltamivir resistance--disabling our influenza defenses. N Engl J
Med, 2005. 353(25): p. 2633-6.
2. Colman, P.M., Influenza virus neuraminidase: structure, antibodies, and
inhibitors. Protein Sci, 1994. 3(10): p. 1687-96.
3. Wahab, H.A., et al., Nature based drug discovery (NADI) & Its Application to
Novel Neuraminidase Inhibitors Identification by virtual screening,
pharmacophore modelling and mapping of Malaysian Medicinal Plants. , in Drug
Design and Discovery for Developing Countries. , E. Megnassan, L. Owono
Owono, and S. Miertus, Editors. 2009, ICS-UNIDO,Trieste, Italy.: Trieste, Italy.
4. Brooks, B.R., et al., CHARMM: A Program for Macromolecular Energy,
Minimization, and Dynamics Calculations. J Comput Chem, 1983. 4(2): p. 187-
217.
5. Wang, H.Y., et al., Pharmacophore modeling and virtual screening for designing
potential PLK1 inhibitors. Bioorg Med Chem Lett, 2008. 18(18): p. 4972-7.
6. Deng, X.Q., et al., Pharmacophore modelling and virtual screening for
identification of new Aurora-A kinase inhibitors. Chem Biol Drug Des, 2008.
71(6): p. 533-9.
7. Abu Hammad, A.M. and M.O. Taha, Pharmacophore modeling, quantitative
structure-activity relationship analysis, and shape-complemented in silico
screening allow access to novel influenza neuraminidase inhibitors. J Chem Inf
Model, 2009. 49(4): p. 978-96.
8. Adane, L., D.S. Patel, and P.V. Bharatam, Shape- and chemical feature-based
3D-pharmacophore model generation and virtual screening: identification of
potential leads for P. falciparum DHFR enzyme inhibition. Chem Biol Drug Des,
2010. 75(1): p. 115-26.
9. Smith, B.J., et al., Analysis of inhibitor binding in influenza virus neuraminidase.
Protein Sci, 2001. 10(4): p. 689-96.
10. Russell, R.J., et al., The structure of H5N1 avian influenza neuraminidase
suggests new opportunities for drug design. Nature, 2006. 443(7107): p. 45-9.
11. Morris, G.M., R. Huey, and A.J. Olson, Using AutoDock for ligand-receptor
docking. Curr Protoc Bioinformatics, 2008. Chapter 8: p. Unit 8 14.
12. Kim, C.U., et al., Structure-activity relationship studies of novel carbocyclic
influenza neuraminidase inhibitors. J Med Chem, 1998. 41(14): p. 2451-60.
13. Williams, M., et al., Structure-activity relationships of carbocyclic influenza
neuraminidase inhibitors Bioorganic & Medicinal Chemistry Letters, 1997. 7(14):
p. 1837
14. Yi, X., Z. Guo, and F.M. Chu, Study on molecular mechanism and 3D-QSAR of
influenza neuraminidase inhibitors. Bioorg Med Chem, 2003. 11(7): p. 1465-74.
15. Wang, G.T., et al., Design, synthesis, and structural analysis of inhibitors of
influenza neuraminidase containing a 2,3-disubstituted tetrahydrofuran-5-
carboxylic acid core. Bioorg Med Chem Lett, 2005. 15(1): p. 125-8.

B00019
March 23-26, 2010
159
Electronic and Mechanical Properties on B-N Doped

Arthit Vongachariya
1
, Vudhichai Parasuk
1,C
, Thiti Bovornratanaraks
2

1
Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
2
Department of Physic, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
C
E-mail: Vudhichai.P@Chula.ac.th; Fax: 02-2187603; Tel. 02-2187602

ABSTRACT
Electronic structures of undoped and B-N doped zigzag (8,0) and armchair (5,5) single-
walled carbon nanotubes (SWCNT) were investigated using density functional
theoretical calculations. The generalized gradient approximation (GGA) based on
Perdew, Burke and Ernzerhof (PBE) functional and double numerical plus polorized
functions (DNP) basis set were employed. Various electron spin states and B-N doping
positions were considered. Calculations revealed that the energy band gap of B-N
doped zigzag SWCNT depend on only B-N XY doping positions while it is relied on
both XY and Z B-N doping positions for armchair SWCNT. Effects of press upon form
B-N doped zigzag (8,0) and armchair (5,5) single-walled carbon nanotubes(SWCNT)
were also studied. Applied pressures are between -0.1 to 0.1 of strain ratio values.
Energy band gaps of B-N doped zigzag and armchair SWCNT vary significantly with
strain ratio.

Keywords: DFT, condense matter, carbon nanotube

B00020
March 23-26, 2010
160
Mutation of Hemagglutinin H5 can change recognition to
human sialic acid-2,6-galactose using in silico technique

Nopporn Kaiyawet
1
, Thanyada Rungrotmongkol
1
, Mathuros Malaisree
1
, Panita decha
1
,
Pornthep Sompornpisut
1
, Supot Hannongbua
1,C

1
Department of Chemistry, Faculty of Science,Chulalongkorn University, Bangkok, Thailand 10330
C
E-mail: supot.h@chula.ac.th, Tel: +66-2-218-7602

ABSTRACT
Influenza virus hemagglutinin is an essential protein for the binding to host cell
receptor. Rare cases of human infection with avian influenza virus subtype H5N1 is
thought to be associated with poor binding of avian viral H5 to human sialic acid-2,6-
galactose receptor (SA2,6Gal). It was found that mutations at positions 222 and 224
enhanced specificity of H5N1 viruses to recognize SA2,6Gal. In this study, we
employed molecular modeling and simulation techniques to investigate the binding and
recognition roles of these amino acids to SA2,6Gal. Binding energy of the complexes
between the SA2,6Gal receptor and the mutated H5 proteins were computed and
compared. The simulated results would be useful for prediction of the critical amino
acid residue which possibly leads to the efficient transmission of H5N1 subtype from
avian to human.

Keywords: Hemagglutinin, H5N1, recognition, molecular dynamics simulation

REFERENCES
1. J. Stevens, O. Blixt, J. C. Paulson, I. A. Wilson, Nat. Rev. Microbiol., 2006, 4, 857-64.
2. R. J. Connor, Y. Kawaoka, R. G. Webster, J. C. Paulson, Virology, 1994, 205, 17-23.
3. A. S. Lipatov, E. A. Govorkova, R. J. Webby, H. Ozaki, M. Peiris, Y. Guan, L. Poon, R.
G. Webster, J. Virol., 2004, 78, 8951-59.
4. H. Beigel, J. Farrar, A. M. Han, F. G. Hayden, R. Hyer, M. D. de Jong, S. Lochindarat,
N. T. K. Tien, N. T. Hien, T. T. Hien, A. Nicoll, S. Touch, K. Yuen, N. Engl. J. Med.,
2005, 353, 1374-85.

B00021
March 23-26, 2010
161
A Comparative Study of Structural and Binding Affinity of
Pyrrolidinyl PNA and DNA Using MD Simulations

A. Meeprasert
1
, N. Kaiyawet
1
, T. Rungrotmongkol
1
, P. Sompornpisut
1
and S.
Hannongbua
1,C

1
Computational Chemistry Unit Cell, Department of Chemistry, Faculty of Science,
C
E-mail: supot.h@chula.ac.th; Fax: +662-2187603; Tel. +662-2187602

ABSTRACT
Pyrrolidinyl peptides nucleic acid (PNA) is synthetic deoxyribonucleic acid (DNA)
mimic in which the nucleobases are attached to pyrrolidinyl 2S-amino-cyclopentane-
1S-carboxylic acid backbone. The neutral charge of PNA backbone has an effect on
binding efficiency of the duplex that permits PNA to recognize its complementary base
pairs in DNA with strong and specific binding. The molecular dynamics simulations
have been applied for DNADNA and DNAPNA duplex both parallel and anti-parallel
fashion to study the structural properties and binding efficiency. To analysis 1500
snapshots taken from the last 1.5 ns were created. The binding free energy of -40.36
kcal/mol obtained from DNAPNA duplex in anti-parallel form demonstrates higher
stability than that parallel form and DNADNA hybrid. Interestingly, substituting DNA
backbone with PNAs one was increased the stability by reduced the electrostatic
repulsion between two negative charges on DNA backbone. It can be concluded that
the electrostatic interactions play a crucial role in stability of the duplex.

Keywords: Peptide nucleic acid, Deoxyribonucleic acid and MD simulations.

REFERENCES
1. Ray, A. and Norden, B., Faseb J, 2000, 14, 1041-1060.
2. Sen, S. and Nilsson, L., J. Am. Chem. Soc., 1998, 120, 619-631.
3. Siriwong, K., Chuichay, P., Saen-Oon, S., Suparpprom, C., Vilaivan, T. and Hannongbua,
S., Biochem. Biophys. Res. Commun., 2008, 372, 765-771.
4. Vilaivan, T. and Srisuwannaket, C., Org. Lett., 2006, 8, 1897-1900.

B00022
March 23-26, 2010
162
Theoretical Study of Organic Molecules Use in Dye-
Sensitized Solar Cell (DSSC) Based on Time Dependent-
Density Functional Theory (TD-DFT)

T. Yakhantip
1
, S. Jungsuttiwong
2
, and N. Kungwan
1,C

1
2
Department of Chemistry, Faculty of Science, Ubon Rajathanee University, UbonRatchathani, 34190,
Thailand
C
E-mail: naweekung@hotmail.com; Fax: 053-892277; Tel. 053-943341 ext. 101

ABSTRACT
Four oligomeric dyes in UB series as use in dye-sensitized solar cell (DSSC)
application were theoretically studied using density functional theory (DFT). The
ground state structures were fully optimized using DFT/B3LYP with 6-31G(d,p) basis
set. The lowest excitation energies and absorption wavelengths were obtained using
time dependent-DFT (TD-DFT) on the same optimized basis set. The results show that
the HOMO-LUMO gaps of UB09, UB10, UB11, UB12 and UB13 are 2.56, 2.37, 2.26,
2.48 and 2.37 eV, respectively. These values will be compared with experimental data.

Keywords: TD-DFT, HOMO-LUMO gaps, DSSC, Organic dye.

REFERENCES
1. Brian, O. R., Michael, G., Nature, 1991, 353, 737-740.
2. Michael, G., J. Photoch. Photobio. A, 2004, 164, 3-14.
3. Hitoshi, K., Hironori, A., Hideki, S., J. Photoch. Photobio. A, 2005, 171, 197-204.
4. Sanghoon, K., Hyunbong, C., Duckhyun, K., Kihyung, S. Sang, O. K., Jaejung, K.,
Tetrahedral, 2007, 63, 9206-9212.
5. Osamu, K., Hideki, S., Inorg. Chim. Acta, 2008, 361, 712-728.

B00027
March 23-26, 2010
163
Classification of Thai Fragrant Rice (Oryza sativa) Using
Gas Chromatographic Profiles in Conjunction with
Statistical Methods

K. Prasittichok
1
, S. Prasitwattanaseree
1, 2, c
and S. Wongpornchai
3

1
Bioinformatics Research Laboratory (BiRL), Faculty of Science, Chiang Mai University,
Chiang Mai, Thailand 50200
2
Department of Statistics, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand 50200
3
Rice Chemistry Research Laboratory, Department of Chemistry, Faculty of Science,
Chiang Mai University, Chiang Mai, Thailand 50200
E-mail: sprasitwattanaseree@gmail.com; Tel. 053-943381

ABSTRACT
The purpose of this research was to classify rice varieties based on chemical profiles
obtained using the headspace gas chromatographic (HS-GC) technique combined with
statistical methods. Three Thai fragrant milled rice varieties; Khao Dawk Mali 105
(KDML 105), RD 15 and Pathum thani were selected for this research. All samples
were obtained from the Pathum thani Rice Research Center. Eighteen samples from
each variety were randomly collected at predefined intervals. Forty-one peaks and area
data corrected using internal standard from gas chromatographic profiles were
identified. Seventy-two profiles were analyzed by pattern recognition with principle
component analysis (PCA), stepwise linear discriminant analysis (SLDA) and
multinomial logistic regression (MLR). The results showed the profile patterns of
Pathum thani rice were clearly different from KDML 105 and RD 15 rice. However,
the profile patterns between KDML 105 and RD 15 were not distinguished completely.
The corrected classifications of Thai fragrant rice using SLDA and MLR were 94.4 and
100.0 % respectively.

Keywords: Rice, Classification, Gas chromatographic profiles, Statistical methods.

1. INTRODUCTION
Thailand has long had an agriculture based economy with rice being the most important
food product in Thai lives and providing the main calorie source for many Thai [1]. With
more than 40,000 varieties in the world, rice is known to have many different characteristics
such as taste, smell, grain characterization, colour and nutrition etc. Economically, trading
rice prices are based on specific rice variety properties, especially, fragrant rice which is more
expensive than non-scented rice and in high demand by consumers who are interested in rice
quality more than price such as Thai Hom Mali, Basmati [1, 2]. Thai Hom Mali is a favourite
high rice qualify variety in Europe, Middle Asia, Hongkong, Singapore and the United States
etc. The high Hom Mali rice price bears with it a high expectation for quality, which can
cause problems when rice is contaminated leading to decreased qualify and confidence from
the consumers. To ensure that the highest quality rice being provided methods are needed to
classify related rice varieties. There are many methodologies used to classify rice including
the basic phenotypic technique (grain characteristics (colour, length-to-width ratio) amylase
content, chemical methods, and DNA fingerprint) [3, 4]. However, those techniques are
hampered by the limited amount of accuracy in measurement, expert and very high expense
for inspection. A chemical method could be a useful common method to extract the chemical
profiles to identify or classify rice variety [5].
B00027
March 23-26, 2010
164
In this study, we propose to determine the feasibility of using the headspace gas
chromatography (HS-GC) technique in conjunction with statistical methods to classify Thai
fragrant rice varieties because that technique can measure aromatic compound (2-Acetyl-1-
pyrrolline) [6] and the other chemical profiles, cheaper than other technique (DNA
fingerprint, liquid chromatography, solid chromatography), non-waste-toxic from the
experiment, and it can apply to study the other issues (environment, time) [7-9].

2. MATERIALS AND METHODS
2.1 Rice samples
We selected three milled Thai fragrant rice varieties; Khao Dawk Mali 105 (KDML 105),
RD 15 and Pathum thani. The rice samples were obtained from the Pathum thani Rice
research center in August 2009. All samples were chemically profiled using the headspace
gas chromatography (HS-GC) technique and were then randomly tested every two weeks until
12 weeks. These samples were preserved in polyethylene (PE) bags in a black box at room
temperature.
2.2 HS-GC conditions
An Agilent 6890 equipped with an Agilent G1888 headspace sampler and a FID (Agilent
technology, Palo Alto, CA) was utilized to measure chemical profiles in the rice seed extracts.
A fused silica capillary column HP-5, 5% Phenyl Methyl Siloxane, with dimension at 60.0 m.
x 0.32 mm. id and 1.0 m film thickness, was programmed, starting at 45C. The
temperature was ramped to 240C at a rate of 3C min
-1
, resulting in an overall separation
time of 70 min. The injector temperature was set at 150C and was operated in a splitless
mode. Purified helium was used as the GC carrier gas at a flow rate of 3.0 ml min
-1
.

2.3 Chemical profile data
The chemical profile data in this study were separated into rice chemical component
profiles obtained using the HS-GC technique (fig.1). This work was performed in the Rice
Chemistry research laboratory, Department of Chemistry, Faculty of Science, Chiang Mai
University. For the datasets, consisting of seventy-two samples, the area under the peak for
each component was obtained from the many chemical components at each retention time (t
1
,
t
2
, t
3,
, t
41
). From this, chemical profile data in the first dataset were normalized from
quality using an internal standard yielding the second dataset. The third dataset were then
normalized using the overall quality standard of the forty-one cumulative chemical profiles.

Figure1. The HS-GC analysis extracts of Three Thai fragrant rice

B00027
March 23-26, 2010
165
2.4 Statistical methods
2.4.1 Principle component analysis (PCA)
The PCA method is widely used in chemometrics to address a variety of problem [10, 11,
and 12]. The most common PCA is used to extract factors and form uncorrelated linear
combinations of the observed variables, where the first component accounts for the maximum
variance. Successive components explain progressively smaller portions of the variance, and
are all uncorrelated with each other. The PCA used to obtain the initial factor solution can
even be used when a correlation matrix is singular.
2.4.2 Linear Discriminant analysis (LDA)
The LDA method is the most common statistical method used for classification by
determining the discriminate varieties or class [13, 14]. It builds a predictive model for group
membership. The model is composed of a discriminant function for two groups (or m-1
functions for m groups) based on linear combinations of the predictor variables that provide
the best discriminate between the groups. The function are generated from a sample of cases
for which group membership is known and the functions can then be applied to new cases that
have measurements for the predictor variables but have unknown group membership. The
LDA model constructs a set of linear functions over the predictors, known as discriminant
functions such that Y
i
= b
0j
+ b
1j
t
1
+ b
2j
t
2
+ + b
pj
t
p
where the t
i
s were the scalar input
variables, the b
ij
s were discriminant coefficients, and Y
i
were predicted group or class. The
LDA is a method that minimizes the variance in the group and maximizes the variance
between the groups.
Stepwise linear discriminant analysis (SLDA)[11, 14] is the variable selection method to
choose variables for entry into the equation on the foundation of how much the lower Wilks
lambda is optimized. At each step, the variable that minimizes the overall Wilks lambda is
added.
2.4.3 Multinomial logistic regression (MLR)
The MLR method is useful for situations in which you want to be able to classify category
data subjects based on value of a set of predictor variable [15, 16]. Statisticians have proven
that logistic regression (LR) is comparable with the DA method. The MLR method can be
used to classify systems not unlimited by the multivariate normal distribution assumption
implicit with DA models. The MLR method can be fit with a full factorial model or a user-
specified model, and the parameter estimation is performed through a maximum likelihood
algorithm.
In this study, we used these statistical methods to classify Thai fragrant rice varieties by
the PCA data using LDA, the LDA, the SLDA and the MLR methods. In all, there were three
datasets obtained from the experiment including the normalized data.

Table 1. Result of the Thai rice variety classification using the statistical methods.

Dataset Model
PCs data
with LDA
(%)
SLDA
(%)
LDA
(%)
MLR
(%)
1. Chemical profiles 88.9 94.4 98.6 100.0
2. Normalized by the internal standard 80.6 98.6 100.0 100.0
3. Normalized by the cumulative from
all profiles
84.7 93.1 98.6 100.0

B00027
March 23-26, 2010
166
In the present case, these statistical methods can classify the varieties based on the
maximum probability of the membership in each sample to predict varieties. All models in
this study classified the profile pattern of the Pathum thani rice as being clearly different from
the KDML 105 and RD 15. However, the profile pattern of the KDML 105 and RD 15 were
not completely distinguishable. Because of this for the KDML 105 and the RD 15, there were
probabilities of the membership in the misclassifications case closely or more than the
membership probability in to the real case. However, the MLR method could completely
classify the varieties in each dataset. The classification function in each model gave a score
to plot each model to explain and show the distribution of each variety in the each dataset.
The RD 15 was developed from the KDML 105 variety to improve disease resistance.
The KDML 105 and RD 15 are closely related and share similar profiles for factors including
low amylase content, slender rice varieties with low gelatinization temperature and soft gel
consistency [4]. Hence, market foods have accepted both varieties as similar for Khao Hom
Mali or Jasmine rice.

3.1 PCA result
From the three datasets and the PCs factor, there are 41, 40 and 41 variables, and 10, 12
and 13 factors in each dataset, respectively. The PCs factor was chosen using the optimized
Eigen value criteria on more than one factor. The first ten PCs of the chemical profiles in the
first dataset explained 82.20% of total variance. The second and third dataset, the first twelve
and thirteen PCs of the chemical profiles explained 81.66 and 82.71%. The first two PCs in
each dataset, explained 46.88, 33.09 and 32.14%, respectively, and were plotted and labeled
by the different colour in each variety (fig.2). The varieties in each dataset were distinguished
from the first two PCs. Therefore, we used the PCs factor from the three datasets with LDA.

Figure 2. The first two PCs in each dataset were plotted and labeled by the variety in
different colour; blue, green and red circle are KDML 105, RD15 and Pathum thani.

3.2 LDA result
3.2.1 PCs data with LDA
From the PCs data in each dataset used with LDA to classify the rice varieties. The two
linear discriminant (LD) score from LD functions used to plot to compare the rice variety
(fig.3). The accuracy in this method was 88.9, 80.6 and 84.7% in each dataset, respectively
(table1).

B00027
March 23-26, 2010
167

Figure 3. LD score from the model functions in each PCs datasets were plotted and labeled
by the variety in different colour; blue, green and red circle are KDML 105, RD15 and
Pathum thani. The square box was group centroid.
3.2.2 LDA
From the datasets, there were 41, 40 and 41 chemical profiles, used by LDA to classify
the varieties. The two LD scores from the LD functions were plotted in fig.4. The Pathum
thani rice profile pattern LD scores were clearly different from each two varieties but the
KDML 105 was not clearly different from the RD 15. However, the classification results in
LDA (98.6, 100.0 and 98.6, respectively) were better than PCs data with LDA (88.9, 80.6 and
84.7, respectively) (table1).

Figure 4. LD score from the model functions in each datasets were plotted and labeled by the
variety in different colour; blue, green and red circle are KDML 105, RD15 and Pathum thani.
The square box was group centroid.
3.2.3 SLDA
The three datasets used SLDA to classify the varieties. From the selection variable, there
were 11, 12 and 9 chemical profiles from each datasets, respectively. The two stepwise linear
discriminant (SLD) scores from the SLD functions in each sample were plotted to compare
the varieties (fig.5). The results in SLDA (were corrected 94.4, 98.6 and 93.1, respectively)
method was better than PCs data with LDA (table1). The accuracy in the SLD models was
closely with LD model but SLD model used smaller variables than LDA to classify the
varieties in the same three data.

B00027
March 23-26, 2010
168

Figure 5. SLD score from the model function in each dataset were plotted and labeled by the
The square box was group centroid.
3.2.4 MLR result
Here the three datasets were classified using the MLR method with the results from this
method providing a 100.0% distinguishable result (table1) for every dataset in this study. The
LR functions in each dataset gave the LR scores and brought the scores to plot in the chart
(fig.6). The result for each of these datasets as plotted below are clearly different in each
variety.

Figure 6. LR scores from the model functions in each dataset were plotted and labeled by the

4. CONCLUSION
The three Thai fragrant rice varieties; KDML105, RD 15 and Pathum thani, were
successfully classified based on chemical composition profiles obtained using the headspace
gas chromatography (HS-GC) technique in conjunction with statistical methods. From the
results of this work, the Thai rice classification using the LDA and MLR methods can provide
a high accuracy percentage. However, the assumption used in the discriminant analysis that
the data are multivariate normally distributed were not conserved. Therefore these methods
seem to be inappropriate to be used for rice variety classification. Nevertheless, even with
this limitation, the MLR method still provided reasonable results.
In this study, the results obtained the statistical methods could be used to classify the
fragrant rice varieties. However, the classification accuracies were dependent on many
factors specifically, rice samples, the parameter in the chemical methods, the parameter in the
statistical models and the number of variable to explain the classification etc.

B00027
March 23-26, 2010
169
REFERENCES
1. Laksanalamai, V., Doctoral Dissertation, No. AE-93-3, Asian Institute of Technology,
Bangkok, 1993.
2. Petrov, C.M., Danzart, M., Giampaoli, P., Faure, J. and Richard, H., Sci. Aliments, 1996,
16(4), 347-360.
3. Dallas, J.F., Proc, Nalt, Acad, Sci, USA, 1998.
4. Dziezak, J.D., Food Techno., 1991, 45(5), 74-80.
5. Jaisieng, N., Wongpornchai, S., and Prasitwattanaseree, S., Graduate school Chiang Mai
University, 2008.
6. Chichester, C. O., Advances in Food Research, 1986, Boston, Academic Press.
7. Buttery, R.G., Juliano, B.O., and Ling, L.C., Chem ,1983, Ind, London, 478
8. Buttery, R.G., Juliano, B.O., Ling, L.C., and Turnbaugh, J.G., Journal of Agricultural and
Food Chemistry, 1983, 31,823-826.
9. Wongpornchai, S., Dumri, K., Jongkaewwattana, S., Siri, B., Food Chem, 2004, 87, 407-
414.
10. Dirinck, P., and De Winne, A., Journal of Chromatography A, 1999, 847, 203-208.
11. Javier, M.O.U., Bart, N., Jeroen, L., Maria, C. B., Antonio, J.M. and Pablo, J.F.T.,
Postharvest Biology and Technology, 2009, 54, 146-155.
12. Marengo, E., Aceto, M., and Maurino, V., Journal of Chromatography A, 2001,943, 123-
137.
13. Yolanda, G.M., Jose, L.P.P., Bernardo, M.C. and Carmelo, G.P., Analytica Chimica Aeta,
1999, 384, 83-94.
14. Thomas, W.O. and Robert, F., the America Statistician, 1991, 45(3), 187-193.
15. Bradley, E., Journal of the American Statistician, 1975, 70(352), 892-898.
16. Shelley, B.B., and Allan, D., Journal of the American Statistician,1987, 82(400), 1118-
1122.

ACKNOWLEDGMENTS
The author would like to thank Yangthavorn, M. and Sriseadka, T. practiced the experiment
method in this research, the Rice Chemistry Research Laboratory, Department of the
Chemistry, Faculty of Science, Chiang Mai University for do this experiment data, the
Pathum thani Rice Research Center for support rice samples, and the Research Professional
Development Project under the Science Achievement Scholarship of Thailand (SAST)
support the scholarship.

B00030
March 23-26, 2010
170
The effect of electron-donating groups on the conducting
property of polythiophene derivatives using PBC calculation

T. S. Krasienapibal
1
, P. Itngom
1
, S. Ekgasit
1
, V. Ruangpornvisuti
1
and V. Vchirawongkwin
1,*

1
Department of Chemistry, Faculty of Science, Chulalongkorn University, Phayathai Road, Patumwan,

E-mail: v_viwat@hotmail.com; Fax: 0-2218-7598; Tel. 0-2218-7633

ABSTRACT
The structural and electronic properties for the local minima of thiophene oligomer,
thiophene polymer and the 3- and 3,4- substituted derivatives were evaluated using a
periodic boundary condition (PBC) utilized to study the infinite systems of polymer.
These calculations employed the means of hybrid density functional, Becke, three-
parameter, Lee-Yang-Parr (B3LYP) with the basis set of 6-31G(d). Due to the energy
gap (Eg) between HOMO and LUMO levels is related to materials conductivity, it is
used to determine the effective of conducting materials. The result indicated that the
electron-donating group decreased the band gap energy of 3-substituted but increased
in the 3,4-disubstituted with respect to non-substituent polythiophene resulting from the
steric effect predominant. The calculated Eg values obtained from the PBC method are
good agreement with the experimental data. Our results presented the poly(3-hydroxy
thiophene) and poly(3,4-ethylenedioxy thiophene) as the best conducting polymer for
the 3-substituted and 3,4-disubstituted, respectively.

Keywords: thiophene, thiophene derivatives, band gaps, PBC.

1. INTRODUCTION
In recent years, polymer-based materials have received much attention due to their photo-
physical properties [1]. They have been used in the novel generation of electric and optical
devices, e.g., polymer-based light-emitting devices (LEDs) [1-3]. An intensive research has
been dedicated to exploring new conjugated polymer with narrow band gaps (Eg) that it is an
essential property for high conductivity. A fundamental model of conducting polymer is
trans-polyacetylene. Other well-known polyaromatic polymers are poly(para-
phenylenevinylene) (PPV), poly(thiophene) (PT) and poly(pyrrole) (PPy). Among them, the
PT and its derivatives (PTs) are widely interested in both experimental and theoretical studies
as one of conjugated polymer, because of its chemical stability, high yield, good conductive
properties and easy processibility [4-6]. A modification of conjugated polymers is one
possible way to generate a narrow band gap. Therefore, the molecular and electronic
structures of PT and PTs have been investigated for searching the effective conducting
polymers.

According to previous work, 3-substituted polythiophene have been investigated their
photo physical properties using periodic boundary condition (PBC) calculation. Poly(3-
methyl thiophene) has energy gap of 2.07 eV and dihedral angle of S-C-C-S is 169.25 and
poly(3-hydroxy thiophene) has 2.10 eV and 163.74 [7]. The 3,4-disubstituted polythiophene
have also been studied with one hexyl group and vary of another groups, for example, poly(3-
hexyl-4-(p-tolyl)-thiophene) [7].
In this study, we investigated the effect of several substituents on the photophysical
properties of 3,4-disubstituted PTs. The symmetric 3,4-disubstituted PTs, which are less
complicated to study and easier to design an experimental procedure, are selected as a basic
moiety of these materials. In order to investigate the band gap of PT and PTs, the periodic
B00030
March 23-26, 2010
171
boundary condition (PBC) was utilized to study the infinite systems of polymer. Thus, the
PBC has been employed for setting the infinite system with a periodic PT unit.

The 3-methylthiophene (3MT), 3,4-dimethoxythiophene (DMT), 3,4-diethoxythiophene
(DET), 3,4-ethylenedioxythiophene (EDOT), 3-hydroxythiophene (3HT) and 3,4-
dihydroxythiophene (DHT) are selected to study for the thiophene derivatives as monomers
with infinite extension. To investigate the substituent effect in polythiophene, the 3- and 3,4-
substituted thiophene monomers were constructed with head to tail connection between
adjacent thiophene rings. The geometry of thiophene and its derivatives monomer as a unit
cell was firstly optimized with an extension along the polymer axis to infinite length using the
PBC model. The optimized geometry and electronic structures of polythiophene were
performed by the HF, LSDA, PBEPBE and B3LYP methods with the 3-21g, 6-31g and 6-
31g(d) basis sets. The results were, then, compared with the experimental data, showing the
suitable method and basis set. Then, method and basis set were chosen for polythiophene
derivatives calculations using PBC-DFT model.

S
S
n
R
3 4
3 4
R
R'
R'
R = -H R' = -H PTH
R = -CH
3
R' = -H P3MT
R = -OCH
3
R' = -OCH
3
PDMT
R = -OCH
2
CH
3
R' = -OCH
2
CH
3
PDET
R,R' = -OCH
2
CH
2
O- PEDOT
R = -OH R' = -H OHPTH
R = -OH R' = -OH OH2PTH

Figure 1. Molecular structure of 3- and 3,4-substituted polythiophene.

The optimized geometries and electronic structures of polythiophene were calculated by
the HF, LSDA, PBEPBE and B3LYP methods with various basis sets. The results were
compared with the experimental data (Eg = 2.20 eV [8]), showing the suitable method and
basis set. The B3LYP/6-31G(d) was selected for the calculation of polythiophene derivatives
based on the PBC-DFT model.

Table 1. Energy gap (eV) of polythiophene calculated by the HF, LSDA,
PBEPBE and B3LYP with several basis sets.
Method Basis set Eg (eV)
HF 6-31g(d) 7.71
LSDA 6-31g(d) 1.04
PBEPBE 6-31g(d) 1.00
B3LYP 3-21g 1.88
6-31g 1.85
6-31g(d) 2.05

The conducting polymers performed a -conjugated system, so Hartree Fock (HF)
method, which is a single determinant method, is unable to explain. DFT is more suitable to
use for this kind of system especially B3LYP method which gave the close value to the
experiment. According to the Table 1, the energy gap (Eg) calculated by the HF level was too
large and supported the assumption that the HF cannot explain the delocalized electrons. The
B00030
March 23-26, 2010
172
LSDA and PBE method overestimated the band gap energy of polythiophene, while the
B3LYP provided the information nearly to the experimental data due to electron correlation.
The 6-31g(d) basis set cooperated with the B3LYP level gave the nearest Eg value (2.05 eV)
compared to the experimental data (2.20 eV [8]). This can explain that polythiophene has
sulfur atom which should be calculated with polarized function, so, the 6-31g(d) is much
more suitable set than the 3-21g, 6-31g and lp-31g(d) basis sets.

Table 2. Torsion angle and band gap (eV) of optimized polythiophene and its
derivatives.
Torsion angle Eg (eV) Other works (eV)
PTH 179.97 2.05 2.20 [8]
P3MT 179.95 2.00 2.26 [10]
PDMT 180.00 2.08 -
PDET 180.00 2.08 -
PEDOT 179.00 1.83 1.5-1.6 [9]
OHPTH 179.96 1.89 2.25 [6]
OH2PTH 161.54 2.13 -

Torsion angle was defined as the angle between two thiophene rings measured from the S-
C-C-S angle. This angle gave the information of molecular planarity. P3HT was computed its
Eg giving 2.10 eV and 163.75 of torsion angle and P3MT was also studied resulting Eg of
2.07 eV and 169.25 of torsion angle [8]. Our results of these two polymers were not
corresponded to the previous work both Eg and torsion angle although the same method and
basis set were employed.
The calculated Eg shown that monosubstitution of polythiophene at 3-position have the
narrower Eg than polythiophene while disubtituted polythiophene at 3,4-position have the
larger Eg. Considered with the PEDOT which is 3,4-substitued with the ring of
ethylenedioxy, the Eg is smaller than any other disubstituted. This might be caused by less
steric hindrance and more donating effect resulting in more delocalization of electrons and
better -conjugation system. According to Table 2, the OHPTH and PEDOT have the smallest
Eg among 3-substitued and 3,4-disubstituted, respectively.

4. CONCLUSION
The geometrical and electronic structures of polythiophene derivatives were computed by
using PBC-DFT method at B3LYP/6-31g(d). The calculated results of energy gap indicated
that the steric hindrance affects the -conjugation in the backbone of substituted
polythiophene. There are three polymers that have smaller band gap than polythiophene. They
are poly(3-methoxythiophene), poly(3-hydroxythiophene) and poly(3,4-
ethylenedioxythiophene). The poly(3-hydroxy thiophene) and poly(3,4-ethylenedioxy
thiophene) are the best conducting polymer among studied polymers for the 3-substituted and
3,4-disubstituted, respectively.

REFERENCES
1. M.R. Anderson, M. Berggren, O. Inganas, G. Gustafsson, J.C. Gustafsson-Carlberg, D.
Selse, T. Hjertberg, O. Wennerstrom, Macromolecules, 1995, 28, 7525.
2. M.R. Anderson, O. Thomas, W. Mammo, M. Svensson, M. Theander, O. Inganas, J.
Mater. Chem. 1999, 9, 1933.
3. J. Pei, W.L. Yu, W. Huang, A.J. Heeger, Macromolecules, 2000, 33, 2462.
4. C.Kitamura, S. Tanaka, Y. Yamashita, Chem. Mater, 1996, 8, 570.
B00030
March 23-26, 2010
173
5. E.Bundgaard, F.C. Krebs, Macromolecules, 2006, 39, 2823.
6. Y. Li, G. Vemvounis, S. Holdcroft, Macromolecules, 2002, 35, 6900-6906.
7. Y.M. Chou, W.H. Chen, C.C. Liang, J. Mol. Struct.: Theochem, 2009, 894, 117-120.
8. H. Cao, J. Ma, G. zhang, Y. Jiang, Macromolecules, 2005, 38, 1123.
9. Q. Pei, G. Zuccarello, M. Ahskog and O. Ingans, Polymer, 1994, 35, 1347.
10. S. Rughooputh, D.D.V. Rughooputh, A.J. Heeger, F. Wudl, Macromolecules, 1987, 20,
212.
11. Ma, J., Li, S., Jiang, Y., Macromolecules, 2002, 35, 1109-1115.

ACKNOWLEDGMENTS
The authors would like to thank Department of Chemistry, Faculty of Science, Chulalongkorn
University to support the infrastructures for the calculation.

B00033
March 23-26, 2010
174
QM/MM Study On The Catalytic Mechanism
of Family 18 Chitinase

J. Jitonnom
1
, P. Nimmanpipug
1
, A.J. Mulholland
2
, and V.S. Lee
1,C

1
for Innovation in Chemistry, Chiang Mai University, Chiang Mai, 50200 Thailand
2
Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, UK
C
E-mail: vannajan@gmail.com; Fax: 66-53-892277; Tel. 66-53-943341 ext. 117

ABSTRACT
Combined QM/MM (quantum mechanics/molecular mechanics) calculations have been
used to study the reaction mechanism of glycoside hydrolase (GH) family 18 from
Serratia marcescens chitinase B (SmGH18) in atomic detail. This QM/MM method
was calculated at B3LYP/6-31+G(d)//AM1-CHARMM22 level, providing molecular-
level detail of the general mechanism proposed in the literature for this family of
enzymes. The SmGH18 mechanism was investigated by generating a 2D potential
energy surface as a function of two reaction coordinates, describing the proton transfer
from Glu144 to the glycosidic oxygen linkage and the nucleophilic attack of carbonyl
oxygen of N-acetamido group at the anomeric center. The mechanism of this process is
shown to be concerted. The calculated barriers for this reaction based on multiple
snapshots are 15.6-19.2 kcal/mol, in relatively good agreement with the experimental
barrier of 16.2 kcal/mol. Interestingly, the enzyme tends to stabilize more at the
intermediate (INT) rather than the transition state (TS), indicating an important role of
the reaction intermediate in catalysis of this enzyme. In addition, the observed TS
structure has adopted a significant oxocarbenium ion-like character. A
1,4
B [
4
H
5
/
4
E]

4
C
1
B
3,O
conformational itinerary along the reaction coordinate was found
inconsistent with the Stoddards pseudorotational itinerary. This provides evidence for
more complicated conformational behaviour in GH enzymes that utilize a substrate-
assisted catalytic mechanism. These results may be useful for future inhibitor design.

Keywords: family 18 chitinase, QM/MM, enzyme reaction, glycoside hydrolase.

.

REFERENCES
1. Senn Hans, M.; Thiel, W. Angew. Chem. Int. Ed. Engl. 2009, 48, 1198-229.
2. Greig, I. R., Zahariev, F., Withers, S. G. J. Am. Chem. Soc. 2008, 130, 17620-8.
3. Vocadlo, D. J., Davies, G. J. Curr. Opin. Chem. Biol. 2008, 12, 539-555.
4. Van Aalten, D. M. F., Komander, D., Synstad, B., Gaseidnes, S., Peter, M. G., Eijsink,
V. G. H. PNAS, 2001, 98, 8979-84.
5. Davies, G., Henrissat, B. Structure, 1995, 3, 853-9.

B00034
March 23-26, 2010
175
Theoretical Study of Li and Li
+
intercalated in
Double-Walled Carbon Nanotubes

A. Udomvech
1,C
, A. J. Page
2
, T. Kerdcharoen
3
and K. Morokuma
2,4

1
Department of Physics, Faculty of Science, Thaksin University, Songkhla, 90000, Thailand
2
Fukui Institute for Fundamental Chemistry, Kyoto University, Kyoto 606-8103, Japan
3
Material Science and Engineering Programme, Faculty of science, Mahidol University,
4
Cherry L. Emerson Center for Scientific Computation and Department of Chemistry, Emory
University, Atlanta, GA 30322, U.S.A.
C
E-mail: nu_fermi1@yahoo.com; Tel. 081-4812830

ABSTRACT
We have performed a theoretical investigation of Li and Li
+
intercalation in double-
walled carbon nanotubes (DWNTs). An all-electron CAM-B3LYP variant of the
density functional theory was employed in conjunction with the 6-31G(d) basis set for
all geometry optimizations and energy calculations. A description of the long-range
dispersion forces present in our model system was therefore included implicitly. In this
study, the intercalation of Li/Li
+
within an off-centered (6,0)@(20,0) DWNT model
system were investigated. The radial potential energy profile revealed three
energetically favorable sites for Li/Li
+
adsorption, viz. outside the (20,0) nanotube, in
the (6,0)@(20,0) interlayer region, and inside the (6,0) tube. This profile also
demonstrated that both the adsorption and diffusion of Li/Li
+
within the interlayer
region of a (6,0)@(20,0) DWNT are energetically/geometrically feasible processes.
Moreover, coupling effects between the two nanotube walls provided no perceivable
obstacle towards Li/Li
+
adsorption and diffusion in this interlayer region. It follows
then that a larger interlayer spacing would provide a greater capacity for Li/Li
+

intercalation inside DWNTs. Analysis of the natural bond orbital atomic net charge of
this Li-DWNT model system illustrated that the formal charge located on the Li moiety
is largely irrelevant with respect to the energetics of Li/Li
+
adsorption within the
(6,0)@(20,0) DWNT. This is consistent with previous investigations of Li adsorption
on/within single-walled carbon nanotubes.

Keywords: DWNT, CAM-B3LYP, Li/Li
+
doping, Intercalation, Li-ion Battery.

REFERENCES
1. B. J. Landi, M. J. Ganter, C. D. Cress, R. A. DiLeo, and R. P. Raffaelle, Energy Environ.
Sci., 2009, 2, 638654.
2. N. A. Kaskhedikar and J. Maier, Adv. Mater., 2009, 21, 26642680.
3. G. G. Wallace, J. Chen, A. J. Mozer, M. Forsyth, D. R. MacFarlane, and C. Wang,
Material Today, 2009, 12, 2027.
4. T. Yanai, D. P. Tew, N. C. Handy, Chemical Physics Letters, 2004, 393, 5157.
5. Y. A. Kim, M. Kojima, H. Muramatsu, S Umemoto, T. Watanabe, K. Yoshida, K. Sato, T.
Ikeda, T. Hayashi, M. Endo, M. Terrones, and M. S. Dresselhaus, small, 2006, 2 (No. 5),
667 676.
6. Y-W. Wen, H-J. Liu, L. Pan, X-J. Tan, and J. Shi, CHIN. PHYS. LETT., 2009, 26 (No. 8),
087102-1087102-4.
7. A. Udomvech, T. Kerdcharoen, and T. Osotchan, Chemical Physics Letters, 2005, 406,
161166.
B00035
March 23-26, 2010
176
Molecular Calculation of Plasma Treatment Efficiency on
PMMA and FRC as Denture Materials

W. Sangprasert
1,3
, V.S. Lee
1,3
, D. Boonyawan
2,3
, and P. Nimmapipug
1,3,C

1
Department of Chemistry Faculty of Science Chiang Mai University, Chiang Mai, 50200, Thailand
2
Department of Physics, Faculty of Science, Chiang Mai University, Chiang Mai, 50200, Thailand
3
Thailand Center of Excellence in Physics, Commission on Higher Education, 328 Si Ayutthaya Road,

E-mail: npiyarat@chiangmai.ac.th; Fax: 053-892277; Tel. 6653-943341 ext. 117,136

ABSTRACT
The continuous development of fiber-reinforced composites (FRC) and a fracture
denture base repair was challenged using various approaches for artificial teeth
application. The use of plasma technique is one powerful ways to work out these
problems. It was found that mixed Ar/Air and He/N
2
plasma can introduce
hydrophilicity of denture polymer chain. In this study, molecular dynamic simulation of
PMMA was used to analyze Radial Distribution Function (RDF) and quantum
mechanics calculation by virtue of Lowest Unoccupied Molecular Orbital (LUMO) can
result in the reactive site of the polymer at the methacrylate groups.

Keywords: Plasma, Denture, Computational calculation.

REFERENCES
1. Tumma, S., Yavirach, P., Daungsuriya, S., Sermchaiwong, U., Umongno, C., and
Boonyawan, D., International Workshop on Plasma Diagnostics & Applications, 2-3 July,
2009, Nanyang Technological University, Singapore.
2. Tang, S. and Choi, H.S., J. Phys. Chem. C, 2008, 112, 4712-4718.
3. Wrbas, K. T., Schirrmeister, J. F., Altenburger, M. J., Agrafioti, A., and Hellwig, E.,
Int. Endod. J., 2007, 40, 538543.
4. Sane, S.B., Cagin, T., Goddard, W.A., and Knauss, W.G., J. Comput. Aided Mater. Des.,
2001, 8, 1573-4900.
5. Lu, K.T. and Tung, K.L., Korean J. Chem. Eng., 2005, 22(4), 512-520.
6. Todorova, T. and Delley, B., Mol. Simulat., 2008, 34, 1013-1017.

B00037
March 23-26, 2010
177
To the best estimation of reaction barriers for proton
exchange reactions of C
1
-C
4
alkanes in ZSM-5 zeolite

K. Sukrat
1
, V. Parasuk
1,C
, D. Tunega
2
, A. J. A. Aquino
2
, and H. Lischka
2

1
Department of Chemistry, Faculty of Science, Chulalongkorn University, Patumwan, Thailand 10900
2
Institute of Theoretical Chemistry, University of Vienna, Whringerstrae 17, A-1090, Vienna,
Austria
C
E-mail: parasuk@act.actcu.chula.ac.th; Fax: 02-2187603; Tel. 02-2187603

ABSTRACT
Theoretical calculation were used to study the mechanism of proton exchange reaction
of C2-C4 alkanes, such as ethane, propane, n-butane, and iso-butane, which were
catalyzed by Brnsted acid site of ZSM-5 zeolite. All transition state structures and
corresponding activation energies were determined using various methods and basis
sets. Totally 5 cluster models, 5T, 20T, 28T, 38T, and 96T where T represents alumina
or silica tetrahedral were employed for the calculations. The 96T model contains the
whole unit cell. In addition, the periodic calculation denoted as P was also considered.
The TS optimized from 38T were embedded to the unit cell and the periodic structure
then the single point calculations were performed. Effects such as zero-point energy,
cluster-size, electron-correlation, basis-set, and entropic contributions have been
considered. The obtained results showed that the proton exchange occurred at the
primary and the secondary carbons is very similar and have barriers in the range of 9.5-
28.0 kcal/mol for all alkanes, in exception of iso-butane with H atom at tertiary carbon
being exchanged and the reaction barrier of 37.7 kcal/mol were given. Using the
extrapolation scheme, the best results which show good agreement with experiments
were obtained.

Keywords: Proton exchange, ZSM-5, Extrapolated method

REFERENCES
1. Stepanov, A.G., Ernst, H., and Freude, D., Catalysis Letters, 1998. 54, 1-4.
2. Esteves, P.M., Nascimento, M.A.C., and Mota, C.J.A., J. Phys. Chem. B, 1999. 103(47),
10417-10420.
3. Arzumanov, S.S., Stepanov, A.G., and Freude, D., J. Phys. Chem. C, 2008. 112(31),
11869-11874.

B00038
March 23-26, 2010

178
Neuraminidase Inhibitor Identification by Pharmacophore
Modelling and Docking from NADI-VA compound

Mohd Razip Asaruddin
1
and Habibah A Wahab
1,2

1
Pharmaceutical Design and Simulation Laboratory,
School of Pharmaceutical Sciences, Universiti Sains Malaysia, 11800 Minden, Penang, Malaysia
2
Centre of Advanced Drug Delivery, Institute of Pharmaceuticals and Neutraceuticals, Malaysian
Ministry of Science and Technology and Innovation, Level 1, J05 Science Complex, Penang, Malaysia
E-mail: ra2420@yahoo.com

ABSTRACT
Influenza infection has been the cause of some of the worst epidemics in human history
and continues to be a major health concern. The synthesis of NADI-VA as influenza
neuraminidase (NA) inhibitor is described. Starting with the basic material from NADI
database and using available structural information of the NA active site as the guide,
NADI-VA was successfully synthesized. These studies exerted the identification of
NADI-VA as the potent NA inhibitor with IC
50
of 3.7 nM against NA using Amplex
Red assay. The pharmacophore features and docking analysis against NA revealed the
interaction of NADI-VA in the NA enzyme active site.

Keywords: Influenza, neuraminidase, inhibitor, pharmacophore, docking, NADI-VA,
Amplex Red.

B00040
March 23-26, 2010
179
MD simulation of Nafion surface modification
by Ar
+
bombardment

Janchai Yana
1,2
Vannajan Sanghiran Lee
1,2
Sornthep Vannarat
3
Supaporn Dokmaisrijan
4

Min Medhisuwakul
2,5
Thirapat Vilaithong
2,5
and Piyarat Nimmanpipug
1,2
*

1
Computational Simulation and Modeling Laboratory (CSML),
Department of Chemistry and Center for Innovation in Chemistry, Faculty of Science,
Chiang Mai University,Chiang Mai, Thailand 50200
2
Thailand Center of Excellence in Physics, Commission on Higher Education,
328 Si Ayutthaya Road, Bangkok, Thailand 10400
3
National Electronics and Computer Technology Center (NECTEC), Thailand science park,
Paholyothin Rd., Klong 1, Klong Luang, Pathumthani, Thailand 12120
4
Institute of Science, Walailak University, Thaiburi , Thasala District Nakhonsithammarat 80160
5
Fast Neutron Research Facility, Department of Physics, Faculty of Science,
Chiang Mai University, Chiang Mai, Thailand 50200
*E-mail: piyaratn@gmail.com; Fax: 66-53-892277; Tel. +66-53-943-341 ext 136

ABSTRACT
The fuel cell performance can be improved by increasing of Nafion surface.
Molecular dynamics (MD) simulations of ion bombardment on Nafion side chain
cluster model were carried out to clarify the molecular level phenomenon. The Ar
+
ions
bombardment using classical MD simulations at difference the initial energies of 5, 10,
20, and 40 eV with a time step of 1 femtosecond were carried out. The MD trajectories
were analyzed in term of possibility of potentially broken bond. The potentially broken
bond of C-S bond was used for analyzing the possibility of sulfonate group sputtering.
The results of Ar
+
bombardment related to hydrophilicity change in Nafion surface
which measured by contact angle, as reported by Cho et al.

Keywords: Fuel cell, Ion bombardment, MD simulation

REFERENCES
1. Cho, S. A.; Cho, E. .A.; Oh, I. H.; Kim, H. -J.; Ha, H.Y.; Hong, S.-A. Ju , J. B.
Journal of Power Sources, 2006, 155, 286290.
2. Paddison, S. J. ; and Zawodzinski Jr., T. A. Solid State Ionics, 1998, 113, 333-340.
3. Yana, J.; Lee, V. S.; Nimmanpipug, P.; Aukkaravittayapun, S.; Vilaithong, T.
J. Solid Mechanics and Materials Engineering. (JMME), 2007, 1, 556-563.
4. Yana, J.; Nimmanpipug, P.; Dokmaisrijan, S.; Aukkaravittayapun, S.; Lee, V. S.
Proceeding of Asian Symposium on Materials and Processing 2006 (ASMP2006), 2006,
165.
5. Yana, J.; Nimmanpipug, P; Lee, V. S. Proceeding of the 9
th
Annual National Symposium
on Computational Science Engineering (ANSCSE9), 2005, 391-396.
B00041
March 23-26, 2010

180
Influence of the silanol groups on the external surface of
silicalite-1 on the adsorption dynamics of methane

S. Thompho
1
, R. Chanajaree
3
, T. Remsungnen
2
, S. Fritzsche
3
and S. Hannongbua
1

1
2
Department of Mathematics, Faculty of Science, Khon Kaen University, Khon Kaen 40002, Thailand
3
Institute of Theoretical Physics, Leipzig University, Faculty of Physics and Geosciences, Postfach
100920, D-04009 Leipzig, Germany
E-mail: sthompho@yahoo.com; Fax: 02-2187602; Tel. 086-6485484

ABSTRACT
Modelling a zeolite membrane the unsaturated bonds at the surface have to be saturated
by additional hydrogens forming silanol groups. This is usually neglected by simply
considering an unmodified sheet of an infinite zeolite lattice as a model of a membrane.
This paper examines the influence of these silanol groups. The adsorption dynamics of
methane molecules through the external surface of silicalite-1 has been investigated
using a Control Volume Grand Canonical Molecular Dynamics method. Periodical
boundary conditions are applied in all directions.
An ab initio fitted methane/silicalite-1/silanol potential was used that has been newly
developed using ONIOM (MP2/6-31G(d):HF/6-31G(d)) calculations with the BSSE
correction
1,2,3,4
. In the present investigation, we have examined the entering dynamics
with and without silanol groups at temperatures between 250 and 600 K. A slow down
of the adsorption process due to the silanol groups is found.

Keywords: silanol, diffusion, surface.

Figure 1: Schematic diagram of the simulation cell for gas diffusion through a zeolite

REFERENCES
1.Saengsawang, O.; Remsungnen, T.; Fritzsche, S.; Haberlandt, R.; Hannongbua, S., J. Phys.
Chem. B. 2005, 109, 5684.
2.Remsungnen, T.; Kormilets, V.; Loisruangsin, A.; Schring, A.; Fritzsche, S.; Haberlandt,
R.; Hannongbua, S., J. Phys. Chem. B. 2006, 110, 11932.
3.Saengsawang,O.;Remsungnen,T.;Loisruangsin,A.;Fritzsche,S.;Haberlandt,R.; Hannongbua,
S., Studies in Suface Science and Catalysis, 2005, 158, 947.
4.Chanajaree, R.; Remsungnen, T.; Loisruangsin, A.; Fritzsche,S.; Hannongbua, S.,
The Computatioanal Study of Methane on Silanol Covered (010) Silicalite-1 Surface:
Development of Ab intio Fitted Potentials. The 10th Annual National Symposium on
Computational Science and Engineering (ANSCSE10), Chiang Mai, 22-24 March 2006,
THAILAND.
B00042
March 23-26, 2010
181
Virtual Screening and Binding Free Energy Calculation for
Inhibitors of Dengue Virus NS2B/NS3 Protease

Kanin Wichapong
1,2
, Somsak Pianwanit
1
, Wolfgang Sippl
2
, and Sirirat Kokpol
1,c
1
Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
2
Department of Pharmaceutical Chemistry, Martin-Luther-University of Halle-Wittenberg, 06120,
Halle (Saale), Germany
c
E-mail: siriratkokpol@gmail.com; Fax: 02-2187603; Tel. 02-2187603

ABSTRACT
Dengue Virus (DV) infection is not only a severe public-health problem in
Thailand but also in tropical and subtropical regions. DV uses enzyme protease,
which is NS3 protease complexed with its essential cofactor (a central 40 amino
acid hydrophilic domain of NS2B), to cleave a polyprotein precursor into
individual functional proteins. Therefore, NS2B/NS3 protease presents as a
promising enzyme target for drug development against DV infection. In this
work, small-molecule inhibitors [1-2] were docked into the representative
conformation of DV NS2B/NS3 protease derived from our previous work [3].
Then, these complexes were subjected to molecular dynamics (MD) simulations
for 6 ns using Amber 9 program. Snapshots from 4-6 ns were subsequently
applied to calculate binding free energy using MM-PBSA approach. The
derived binding free energies correlated well with their experimental binding
affinities. In addition, the docking solution and snapshots during 4-6 ns derived
from MD simulation of the most selective compound were applied to generate a
static and dynamic pharmacophore model. Then, a stepwise virtual screening,
starting with drug-like filtering and following by pharmacophore search,
similarity search, molecular docking, ranking and post-docking filtering,
respectively, was performed. Finally, hit compounds were obtained and their
binding free energies were calculated and compared with these values of known
inhibitors. Compounds which gave favorable binding free energies were
proposed as novel inhibitors for DV NS2B/NS3 protease and were suggested for
testing their biological activities.

Keywords: Dengue Virus, NS2B/NS3 protease, Virtual Screening

REFERENCES
1. Ganesh V. K., Muller N., Judge K., Luan C. H., Padmanabhan R., Murthy K.H.,
Bioorg. Med. Chem., 2005, 13, 257-264.
2. Mueller N. H., Pattabiraman N., Ansarah-Sobrinho C., Viswanathan P., Pierson T.
C., Padmanabhan R., Antimicrob. Agents. Chemother., 2008, 52, 3385-93.
3. Wichapong K., Pianwanit S., Sippl W., Kokpol S., J. Mol. Recognit., 2009, In
Press

B00045
March 23-26, 2010
182
Loading of Doxorubicin on Single-Walled Carbon Nanotube
by MD Simulations

P. Sornmee
1
, U. Arsawang
1
, T. Rungrotmongkol
2
, O. Saengsawang
3
, P. Intharathep
3
,
K. Sukrat
3
, T. Remsungnen
4
and S. Hannongbua
3,C

1
Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, 10330,
Thailand
2
Center of Innovative Nanotechnology, Chulalongkorn University, Bangkok, 10330, Thailand
3
Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
4
Department of Mathematics, Faculty of Science, Khonkean University, Khonkean, 40002, Thailand
C
E-mail: supot.h@chula.ac.th; Tel. 02-2187603

ABSTRACT
With many unique properties of carbon nanotubes, they are widely used in various
applications. One is developed as nanocontainer to deliver drug to the target and
improve cancer chemotherapy using single-walled carbon nanotube (SWNT) as a drug
delivery system. We have shown the applicability of computer-assisted molecular
dynamics simulations in understanding of the intermolecular interaction between the
SWNT (28,0) and doxorubicin, an anticancer drug. Two systems of doxorubicin
binding inside SWNT and in bulk phase were investigated. These models were
separately simulated using AMBER10 programme over 10 ns. Based on RMSD plot,
the systems were found to reach equilibrium at 0.5 ns and, therefore, the 1000 MD
snapshots extracted from the last 10 ns simulations were used for analysis. The results
were presented to examine the structural properties of drug by comparison between
drug inside tube and drug in bulk phase. From MD results, the glycosidic linkage which
attachs to sugar daunosamine of DOX is likely to be more stable than that in the free
form. The torsion angle at the alkyl side chain of both DOX inside SWNT and free
DOX shows high flexibility while the main part of drug molecule remains its planar
form. Moreover, based on the RDF plots, the probability of DOX encapsulated in
SWNT from oxygens of waters to ammonium nitrogen of DOX shows a sharper peak
than the free DOX system while the probability of DOX encapsulated in SWNT from
oxygens of water to oxygen atoms of DOX has shown in opposite ways.

Keywords: Carbon nanotube, Doxorubicin, Molecular dynamics Simulations.

1. INTRODUCTION
The emerging field of nanotechnology has generated considerable research into various
areas of science such as biology, physics, medicine and mechanics
[9]
, which have led to the
development of many applications for diagnostic and therapeutic purposes. Especially, new
fields to site-specific drug targeting using nanoparticle drug carrier systems have been
developed
[10]
. One such proposed which may assist in the treatment of cancer and other
terminal illnesses is the use of fullerenes and carbon nanotubes as drug delivery vehicles
[1,9]

that may release the drug by some kind of chemical trigger such as a change in pH
[9]
.
Carbon nanotubes are an attractive as drug delivery due to their unique properties. It is
found that the carbon nanotubes not only offer a larger inner volume enabling more of the
drug to be encapsulated and removable end caps which make the inner volume more
accessible
[1,9]
, but also have the advantage of easily alterable surface characteristics without
significantly changing their inherent nature
[8]
. Nevertheless, the pristine carbon nanotubes are
highly toxic
[9]
and generally found to be insoluble in most common solvents, including
B00045
March 23-26, 2010
183
water
[5]
, which impede the separation and manipulation of carbon nanotubes for specific
applications. Thus, the chemical functionalization of carbon nanotubes is an especially
attractive target, as it can improve solubility and processability
[2,6]
. However, there are many
other issues which need to be investigated. Such as, it is important to understand the
underlying physical principles to effectively load the nanotube with drug molecules
[9]
. Thus,
in this study, we further investigate the anticancer drug molecule that have far more
complicated molecular structure, namely doxorubicin (DOX), encapsulated in the single-
walled carbon nanotube (SWNT). Doxorubicin is commonly used as anticancer drug for a
wide range of tumers and has been used in recent nanoscale drug delivery applications
[9]
.
General side effects of the cancer treatment are nausea and vomiting, loss of appetite,
diarrhea, and swelling which are probably reduced by development of drug delivery system.
The present study has focused upon trying to understand the basic knowledge on the drug
delivery system where the SWNT was used as drug carrier. Here, the molecular dynamics
(MD) simulations have been performed on the two systems: one is doxorubicin bound inside
the (28,0) zigzag SWNT and another is doxorubicin in aqueous solution. The structural and
dynamics properties of drug were analyzed in terms of drug conformation and water
accessibility.

2. MATERIALS AND METHOD
2.1 System preparation
The (28,0) zigzag type of SWNT with a diameter of 21.94 and 38.89 in length was
built by Nanotube Modeler program
[7]
and terminated by adding the hydrogen atoms at the
end of tube. The atomic coordinates of doxorubicin (DOX) was retrieved from the Drug Data
Bank Database entry code: DB00997
[11]
. This compound was then positioned at the center of
mass of the SWNT in order to construct the starting structure for the DOX-SWNT complex.

Figure 1. Schematic view of doxorubicin encapsulated in pristine SWNT (left), where the
atomic labels and torsion definitions were also shown (right).

2.2 Molecular dynamics simulations
All simulations were performed by using AMBER10 software package
[3]
. The AMBER03
force field was applied for SWNT. In contrast, the starting structure and empirical force field
parameter of doxorubicin were developed by the subsequent steps. (i) Hydrogen atoms were
added to the atomic coordinates of drug taking into account the hybridization of the covalent
bonds. (ii) To refine the drug geometry for the electrostatic potential calculations, it was
partially optimized at B3LYP/6-31G** level of theory using Gaussian03 program
[4]
. (iii)
RESP charge-fitting procedure was initiated to reproduce the calculated electrostatic potential
B00045
March 23-26, 2010
184
(ESP) around drug compound
[4]
using the HF/6-31G* basis set in which the partial charges of
equivalent atoms were fitted to the identical value.
The water molecules were added in the systems of the DOX in the free state (aqueous
solution) and DOX encapsulated in pristine SWNT (Fig. 1) by distributing over the truncated
octagonal box. Each system was then neutralized with a chloride ion. The SHAKE algorithm
[3]
was employed for all hydrogen atoms while time step of 2 fs and pressure of 1 atm were
used. The simulations with periodic boundary conditions were applied with the use of cutoff
of 12 for nonbonded interactions and particle mesh Ewald method
[3]
to account for long-
range interactions. The whole structures were heated from 10 K to 300 K, consequently
equilibrated for 500 ps and finally simulated the production phase of 9.5 ns. In the last stage,
the MD trajectories were stored every 200 steps.

3.1 System stability
The root mean square displacements (RMSDs) of the pristine SWNT and doxorubicin
were evaluated and plotted in Fig. 2. It can be seen that the complex was found to reach an
equilibrium state at 0.5 ns where drug showed a higher fluctuation (RMSD value of c.a. 1.0
). Therefore, the MD trajectories extracted from the last 9.5 ns simulations were used for
analysis.

Figure 2. RMSDs relative to the initial structure for pristine SWNT (black) and DOX
(red).

3.2 Structural change of doxorubicin
The structural conformation of DOX encapsulated in SWNT was compared with the DOX
in free state in term of probability of the torsion angles chains around its side. The results are
given in Fig. 3. In Fig. 3a, the torsion angle, Tor1, of DOX (see Fig. 2 for definition) inside
SWNT has only one sharp peak at -79 degree, while the free DOX has three peaks at -77, -
125 and -156 degree, respectively. The situation is similar for Tor2 angle (Fig. 3b) where the
system of DOX inside SWNT appeared a sharp peak at -79 degree, whilst, in the free state,
there are two peak at -70 and -154 degree, respectively. These indicate that for encapsulated
form, the glycosidic linkage which attachs to sugar daunosamine of drug is considerably more
stable than those of the free form. For Tor3 (Fig. 3c), the alkyl side chain of DOX either
bound inside SWNT or in free state shows a notably high flexibility. At last, the Tor4 torsion
angle of 0 degree (Fig. 3d) suggested the planar form of the drug in both systems.

B00045
March 23-26, 2010
185

a) b)

c) d)

Figure 3. Probability of torsion angles (Tor1-4 defined in Fig. 2) of the DOX
encapsulated in SWNT (black) and free DOX in water system (red).

3.3 Solvation structure
To monitor the solvation of DOX bound with/without SWNT, the atom-atom radial
distribution functions (RDFs), expressed as g
ij
(r), the probability of finding a particle of type j
in a sphere of radius, r, around a particle of type i, were calculated. The water-DOX RDFs
were evaluated separately for those water molecules inside SWNT and in bulk water solution
where i are the ligand atoms, i.e., N, O5, O8 and O10 of DOX (see Fig. 1 for atomic labels)
and j designates as the O atom of water (O
w
). Although, all the RDFs of both DOX inside
SWNT and in bulk water solution show the peak at almost similar position of 3 , the slight
difference in peak intensity and shape was considerably found. Note that the sharp peak
centered at 3 indicates the strong hydrogen bonding interaction between ligand and water
molecules. In Fig. 4, only the N-ammonium nitrogen of the bound SWNT system showed the
higher intensity than that in the free state. In addition, lower minimum of the first sharp peak
in this bound complex (Fig. 4a-d) suggests that the water molecules could bind strongly with
the ligand (less water exchange into the first solvation shell).

B00045
March 23-26, 2010
186

a) b)

c) d)

Figure 4. Radial distribution functions, g(r), from ligand atoms to oxygen atoms of water
molecules for the two systems of DOX bound inside SWNT and in aqueous solution.

4. CONCLUSION
The structure of DOX was investigated in terms of probability of torsion angles. By
consideration of the torsion angle probability of drug side chains, the encapsulation of DOX
causes significantly changes of torsion angle of DOX. The glycosidic linkage which attachs to
sugar daunosamine of DOX is likely to be more stable than that in the free form. The torsion
angle at the alkyl side chain of both DOX inside SWNT and free DOX shows high flexibility
while the main part of drug molecule remains its planar form. The water accessibility to the
drug in the two states has also observed. Based on the RDF plots, the solvation properties
between DOX encapsulated in SWNT and free DOX in water are somewhat different. Both
systems display the sharp peak at the same distance, however, the probability of DOX
encapsulated in SWNT from oxygens of waters to ammonium nitrogen of DOX shows a
sharper peak than the free DOX system while the probability of DOX encapsulated in SWNT
from oxygens of water to oxygen atoms of DOX has shown in opposite ways. Taken together,
the simulated results can demonstrate how the drug binding inside the SWNT and lead us to
assume that the encapsulation of DOX inside SWNT may be act as steady form as long as it
reached target site.

B00045
March 23-26, 2010
187
REFERENCES
1. C.R. Martin and P. Kohli, Nat.Rev.Drug Discov., 2003, 2, pp. 2937.
2. C. Wongchoosuk, The electronic structure of functionalized single-walled carbon
nanotubes, 31st Congress on Science and Technology, Suranaree University, Thailand,
2005.
3. D.A. Case, T.A. Darden, T.E. Cheatham, III, C.L. Simmerling, J. Wang, R.E. Duke, R.
Luo, M. Crowley, Ross C. Walker,W. Zhang, K.M. Merz, B.Wang, S. Hayik, A.
Roitberg, G. Seabra, I. Kolossvry, K.F.Wong, F. Paesani, J. Vanicek, X.Wu, S.R.
Brozell, T. Steinbrecher, H. Gohlke, L. Yang, C. Tan, J. Mongan, V. Hornak, G. Cui,
D.H. Mathews, M.G. Seetin, C. Sagui, V. Babin, and P.A. Kollman (2008), AMBER 10,
University of California, San Francisco.
4. Gaussian 03, Revision C.02, Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G.
E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, Jr., J. A.; Vreven, T.; Kudin, K. N.;
Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi,
M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota,
K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.;
Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Bakken, V.; Adamo, C.;
Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.;
Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.;
Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas,
O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui,
Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.;
Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C.
Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M.
W.; Gonzalez, C.; and Pople, J. A.; Gaussian, Inc., Wallingford CT, 2004.
5. J.H. Walther, R. Jaffe, T. Halicioglu and P. Koumoutsakos, J. Phys. Chem. B 2001, 105,
41, pp. 9980-9987.
6. L.L. Huang, L.Z. Zhang, Q. Shao, J. Wang, L.H. Lu, X.H. Lu, S.Y. Jiang and W.F. Shen,
J. Phys. Chem. B 2006, 110, pp. 25761-25768.
7. Nanotube Modeler, JCrystalSoft, 20042005 (http://www.jcrystal.com/products/wincnt/)
8. R. Sirdeshmukh, K. Teker and B. Panchapakesan, Functionalization of carbon nanotubes
with antibodies for breast cancer detection applications, The international Conference on
MEMs, NANO and Smart Systems. Proc., 2004.
9. T.A. Hilder, and J.M. Hill, Micro & Nano Lett., 2008 , 3, 2, pp. 41-49.
10. V.E. Kagan, H. Bayir and A.A. Shvedova, Nanomed. ; Nanotech., Bio., and Med. 1
(2005), pp. 313316.
11. http://www.drugbank.ca/

ACKNOWLEDGMENTS
The authors thank the Computational Chemistry Unit Cell (CCUC), Chulalongkorn
University and the National Nanotechnology Center (NANOTEC) for computing facilities.

B00046
March 23-26, 2010
188
Molecular Dynamics Simulations of GEMZAR

encapsulated in carbon nanotube

U. Arsawang
1
, P. Sornmee
1
, O. Saengsawang
2
, T. Rungrotmongkol
2
, P. Intharathep
2
,
A. Pianwanit
2
, T. Remsungnen
3
, S. Hannongbua
2,C

1
Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
2
3
Department of Mathematics, Faculty of Science, Khonkaen University, Khonkaen 40002, Thailand
C
E-mail: supot.h@chula.ac.th ; Fax: 02-2187603; Tel. 02-2187602

ABSTRACT
Molecular dynamics (MD) simulations were carried out for a zigzag (18,0) single-
walled carbon nanotube (CNT) functionalized with the carboxylic (-COOH) or
hydroxyl (-OH) groups. Each CNT is complexed with GEMZAR
, the anticancer drug.

The Cornell force field is applied to CNT and GEMZAR
, while the SPC model is used

for water. All simulations were performed for 10.5 ns using AMBER programme
package. Based on the RMSD plots of the drug and CNT, all systems were found to
reach the equilibrium state at 0.5 ns. Therefore, the MD trajectories taken from 0.5 to
10.5 ns simulations were used for analysis the properties of the system. Some structural
and dynamical properties showed that the drug molecule remains inside the nanotube
along the simulation period. Such information is primarily required in the drug delivery
technology in order to bring drug molecule to the specific target.

Keywords: Carbon nanotube, Molecular dynamics simulations, Drug delivery.

REFERENCES
1. Zhu et al., J. Phys. Chem. C, 2009, 113, 882-889.
2. Rana M. and Chandra A., J. Chem. Sci., 2007, 119, 367-376.
3. Huang L. et al., Phys. Chem. Chem. Phys., 2006, 8, 3836-3844.
4. Hummer G., NATURE, 2001, 414, 188-190.

B00050
March 23-26, 2010
189
Computational studies of HIV-1 Reverse Transcriptase
Inhibitors: as a Molecular Basis for Drug Development

P. Decha
1
, P. Intharathep
2
, T. Udommaneethanakit
2
, P. Sompornpisut
2
, S.
Hannongbua
2
, P. Wolschann
3
and V. Parasuk
1

1
Computational Chemistry Research Unit, Department of Chemistry, Faculty of Science, Thaksin
University, Phatthalung 93110, Thailand
2
Computational Chemistry Unit Cell, Department of Chemistry, Faculty of Science, Chulalongkorn
University, Phayathai Road, Patumwan, Bangkok 10330, Thailand
3
Institute of Theoretical Chemistry, University of Vienna, Waehringer Strasse 17, Vienna 1090, Austria
E-mail: panita487@hotmail.com; Fax: +667-4609634; Tel. +667-4609634

ABSTRACT
Molecular dynamics simulations (MD) of the HIV-1 RT complexed with the four
NNRTIs, EFV, EMV, ETV and NVP, were performed to examine the structures,
binding free energies and the importance of water molecules in the binding site. In
terms of hydrogen bonding, EFV and EMV bind with surrounding residues of HIV-1
RT while for NVP system such binding was not found. The interaction energy of the
protein-inhibitor complexes was found to be essentially associated with the cluster of
seven hydrophobic residues, L100, V106, Y181, Y188, F227, W229 and P236, and two
basic residues, K101 and K103. The obtained results support clinical data which reveal
that these residues are the most frequent residues when mutation against NNRTIs takes
place. The binding free energy, calculated using MM-PBSA, was found to decrease in
the following order: EFV ~ ETV > EMV > NVP. The decrease in stability of HIV-1
RT/NNRTI complexes is in good agreement with the experimentally derived IC
50
values. In addition, the distribution and binding of water molecules, in terms of
hydrogen bonding to the donor atoms of the inhibitors were investigated and show that
three inhibitors were increasingly solvated in order of NPV > EMV > EFV.

Keywords: Molecular Dynamics Simulations, Reverse Transcriptase, HIV-1, NNRTIs.
B00051

March 23-26, 2010
190
Docking of Dengue Virus Methyltransferase Inhibitor from
Nadi Database (In House Malaysian Medicinal Plant
Database)

Mohamed Sufian M. Nawi
1,C
, Habibah A. Wahab
1
, Noorsaadah Abd. Rahman
2
and
Shafida Abd. Hamid
3
1
Pharmaceutical Design and Simulation (PhDS) Laboratory, School of Pharmaceutical Sciences,
Universiti Sains Malaysia, 11800 Minden, Pulau Pinang, Malaysia.
2
Department of Chemistry, Faculty of Science, University of Malaya, Lembah Pantai, 50603 Kuala
Lumpur, Malaysia.
3
Kulliyyah of Science, International Islamic University, Bandar Indera Mahkota, 25020 Kuantan,
Pahang, Malaysia.
C
E-mail: msufian_uia@yahoo.com; Fax: 604-6570017; Tel. 604-6532238.

ABSTRACT
Dengue, from the genus of Flavivirus is the most prevalent arthropod-borne virus
affecting human today. In the viruses replication, methyltransferase of non-structural
protein NS5 is an important enzyme that catalyses the methylation of 5-cap structure
of genomic RNA that involve the transferring of a methyl group from S-adenosyl-L-
methionine onto N7 position of the cap guanine and the ribose 2O position of the first
nucleotide adenosine. Thus, methyltransferase is an attractive target for dengue and
other flavivirus therapy. In this study, we docked a Eurycoma longifolia compound,
MSC379 from NADI database against S-adenosyl-L-methionine binding pocket of
Dengue-2 virus methyltransferase using AutoDock 3.0.5. The protein structure was
prepared from the crystal structure of complex of Dengue-2 virus methyltransferase and
S-adenosyl-L-homocysteine (PDB code: 3EVG). It showed the docking free energy of -
13.76 kcal/mol with Ki value of 8.16 x 10
-11
.

Keywords: Dengue virus, Methyltransferase inhibitor, Viral RNA capping, NS5
protein, Docking.

REFERENCES
1. Milani, M., Mastrangelo, E., Bollati, M., Selisko, B., Decroly, E., Bouvet, M., Canard, B.,
and Bolognesi, M., Antiviral Research, 2009, 83, 28-34.
2. Deen, J. L., Harris, E., Wills, B., Balmaseda, A., Hammond, S. N., Rocha, C., Dung N.
M., Hung, N. T., Hient, T. T., and Farrar, J. J., The Lancet, 2006, 368, 170-173.
3. Mukhopadhyay, S., Kuhn, R. J., and Rossmann, M. G., Nat. Rev. Microbiol., 2005, 3(1),
13-22.
4. Kyle, J. L., and Harris, E., Annu. Rev. Microbiol., 2008, 62, 71-92.
5. Sampath, A., and Padmanabhan, R., Antiviral Res., 2009, 81(1), 6-15.
6. Dong, H., Ren, S., Li, H., and Shi, P. Y., Virology, 2008a, 377(1), 1-6.
7. Dong, H., Zhang, B., and Shi, P. Y., Antiviral Res., 2008b, 80(1), 1-10.
8. Geiss, B. J., Thompson, A. A., Andrews, A. J., Sons, R. L., Gari, H. H., Keenan, S. M.,
and Peersen, O. B., J. Mol. Bio.l, 2009, 385(5), 1643-1654.
9. Egloff, M. P., Decroly, E., Malet, H., Selisko, B., Benarroch, D., Ferron, F., and Canard,
B., J. Mol. Biol., 2007, 372(3), 723-736.
10. Luzhkov, V. B., Selisko, B., Nordqvist, A., Peyrane, F., Decroly, E., Alvarez, K., Karlen,
A., Canard, B., and Qvist, J., Bioorg. Med. Chem., 2007, 15(24), 7795-7802.
11. Zhou, Y., Ray, D., Zhao, Y., Dong, H., Ren, S., Li, Z., Guo, Y., Bernard, K. A., Shi, P.
Y., and Li, H., J. Virol., 2007, 81(8), 3891-3903.
B00053
March 23-26, 2010
191
Investigating the Binding of Arylamide Derivatives as
Tuberculosis Agent in InhA using
Molecular Dynamics Simulations

Auradee Punkvang
1
, Peter Wolschann
2
, Anton Beyer
2
and Pornpan Pungpo
1C

1
Faculty of Science, Ubonratchathani University, Ubon Ratchathani, Thailand
2
Institute for Theoretical Chemistry, University of Vienna, A-1090 Vienna, Austria
C
E-mail: pornpan_ubu@yahoo.com; Tel. 0066453534014124

ABSTRACT
Arylamides were identified as the novel inhibitors of the enoyl ACP reductase enzyme
(InhA) involved in the type II fatty acid biosynthesis pathway of M. tuberculosis. In
order to study the binding of arylamide derivatives in the InhA binding pocket,
molecular dynamics (MD) simulations were performed using Gromacs program.
Moreover, the estimate binding free energies of arylmides in InhA were also calculated
using linear interaction energy (LIE) method. The results show that Gromacs program
is successful to simulate the binding modes of arylamides in InhA binding pocket.
Analysis of arylamide/InhA complexes reveals that only Tyr158 and NADH are within
distances for forming hydrogen bond interaction with the carbonyl oxygen of
arylamides. These results indicate that arylamides are tightly held in InhA binding
pocket by these hydrogen bonds. The estimate free binding energy of each arylamide
shows corresponding well with its inhibitory activity. Therefore, the obtained results
might be helpful for better understanding the binding mechanism of arylamide
derivatives in InhA binding pocket.

Keywords: Arylamide, InhA, MD simulations.

1. INTRODUCTION
Enoyl-acyl ACP reductase (InhA) is one of the potential enzyme targets in FAS-II pathway
for developing antibacterial drug. InhA has been identified as the primary target of isoniazid
(INH), the frontline drug for tuberculosis chemotherapy [1]. As a prodrug, INH must first be
activated by catalase-peroxidase (KatG) to generate the reactive acyl radical [2]. Then, the
reactive specie binds covalently to nicotinamide adenine dinucleotide (NAD+) to form the
active adduct (INH-NAD adduct) that functions as the highly potent inhibitor of InhA [3].
However, the high potency of INH for tububerculosis treatment is diminished by the drug
resistance. High levels of resistance to INH are caused by mutations in katG, most commonly
found in M. tuberculosis clinical isolates [4]. To address the resistance to INH associated with
mutations in the katG enzyme, a series of arylamides were identified as a novel class of potent
InhA inhibitors [5]. Arylamides target InhA directly without a requirement for KatG
activation and function as direct InhA inhibitors. Therefore, to understand the important drug-
enzyme interactions for binding of arylamides toward InhA and to get some information
about the dynamics behaviour, molecular dynamics simulations were employed in the present
study. The obtained results should provide us some information on the correlation of
arylamide structures and their inhibition activities, the binding modes and the binding free
energies.

B00053
March 23-26, 2010
192
GROMOS96 43a2 force field was applied for MD simulations. Gromacs 4.0.4 software
package was used for MD simulations. Two arylamides, the most active compound P2 and
lower active compound B3 were selected for this study. Arylamide/InhA complexes were
immersed in a box that extends at 3 nm from all atoms of these complexes and were then
solvated by SPC216 water molecules. The energy minimization with 2000 steps using the
steepest descent algorithm was performed for these systems with constrained all bonds. After
energy minimization, position restraining simulation of this system was performed. Then, 6 ns
MD simulations with time step of 0.002 ps were performed. The leapfrog algorithm in the
NVT ensemble at 300 K was used for simulations. To calculate the binding free energy, MD
simulations of arylamides in water were also performed. Last 1 ns simulation (5 to 6 ns) was
selected for detailed analysis.
The linear interaction energy (LIE) method was employed to estimate the relative binding
free energy for arylamides in InhA enzyme. The binding free energy of an inhibitor to a
receptor target based on LIE method can be expressed using the following equation.
G
bind
= (V
LJ
)
bond
- (V
LJ
)
free
+ (V
CL
)
bond
- (V
CL
)
free
(1)

Where
(V
LJ
)
bond
= average Lennard-Jones energy for ligand/protein interaction
(V
LJ
)
free
= average Lennard-Jones energy for ligand/water interaction
(V
CL
)
bond
= average electrostatic energy for ligand/protein interaction
(V
CL
)
free
= average electrostatic energy for ligand/water interaction
, = Scaling factors with = 0.18 and = 0.50

The root mean square deviations (RMSD) of all atoms of InhA/NADH/B3 and
InhA/NADH/P2 complexes compared with initial coordinates as a function of the simulation
time were examined as shown in Figures 1 and 2. NADH and two arylamides reache
equilibrium at an early time and stay very stable thereafter. In case of InhA in two complexes,
they were more stable over the entire simulation time after increasing of RMSD about 0.30
nm.

Figure 1. RMSD of InhA, NADH and B3 with respect to their initial configurations as a
function of simulation time

B00053
March 23-26, 2010
193

Figure 2. RMSD of InhA, NADH and P2 with respect to their initial configurations as a
function of simulation time

Compounds B3 and P2 in InhA binding pocket taken from average structure at last 1 ns of
simulation time are shown in Figures 3 and 4. Hydrogen bonds for compounds B3 and P2 in
complex structures at the last 1 ns of the simulation time were investigated. Two hydrogen
bonds could be observed for compound B3 and P2. The hydrogen bonds between carbonyl
oxygen atoms of compounds B3, P2 and hydroxyl group of Tyr158 are strongly formed.
Another hydrogen bond is formed between these oxygen atoms of arylamides and the
hydroxyl group of nicotinamide ribose of NADH. In case of compound P2, hydrophobic
interactions with Met232, Trp222, Pro151, Tyr158 observed, whereas compound B3 could
not form these interactions. These results are consistent with the higher activity of compound
P2 as compared with compound B3.

Figure 3. Compound B3 (yellow) in InhA binding pocket obtained from MD
simulation. NADH and Tyr158 represented in yellow and orange,
respectively.
B00053
March 23-26, 2010
194

Figure 4. Figure 3. Compound P2 (yellow) in InhA binding pocket obtained
from MD simulation. NADH and Tyr158 represented in yellow and pink,
respectively.

The estimate binding free energies of compounds B3 and P2 were calculated by LIE
equation. The calculated binding energies of these compounds are -24.99 and -9.18 kJ/mol,
respectively, indicating the better binding of compound P2 compared with compound B3.
These results are consistent with the higher activity of compound P2 (IC
50
= 0.09 M) as
compared with compound B3 (IC
50
= 5.16 M). To reveal key inhibitor-enzyme interaction,
interaction energies between compounds B3 and P2 and each amino acid residue in InhA
binding pocket were calculated. Interactions between compounds B3, P2 and NADH and
Tyr158 show highest attractive energies as shown in Table 2. These high attractive energies
relate well with two strong hydrogen bonds among them. These results confirm that two
hydrogen bonds are important to hold arylamides in InhA binding pocket.

Table 1. Average interaction energies between compounds B3, P2 and residues
in InhA binding pocket.
Residue
Interaction energy (kJ/mol)
B3 P2
NADH -98.69 -104.95
Gly96 -6.49
-7.43
Phe97 -5.17 -1.67
Met98 -2.64 -0.62
Phe149 -20.94 -32.79
Ser152 -0.04 -6.31
Arg153 -2.23 -0.32
Met155 -11.80 -7.27
Pro156 -11.75 -21.27
Ala157 -2.76 -11.32
Tyr158 -68.36 -57.51
Pro193 -8.92 -10.67
Thr196 -2.53 -2.87
Ala198 -8.07 -21.05
Ile202 -13.69 -1.86
Ile215 -5.35 -17.05
Leu218 -4.08 -2.28
Trp222 -0.06 -9.06

B00053
March 23-26, 2010
195
4. CONCLUSION
MD simulations with Gromacs program are convenient to simulate the binding mode of
arylamides in InhA binding pocket. The binding free energies of arylamides to InhA
estimated by LIE equation correlate well with their inhibition activities. Moreover, the
important drug-enzyme interactions are also delineated. These results provide insight into the
dynamics of arylamide and InhA which may be useful for rational drug design.

REFERENCES
1. D.A. Rozwarski, G.A. Grant, D.H. Barton, W.R. Jacobs Jr., and J.C. Sacchettini. Science
279 (1998), pp. 98102.
2. B. Lei, C.-J. Wei and S.-C. Tu. J. Bio. Chem. 275 (2000), pp. 25202526.
3. C. Vilcheze, F. Wang, M. Arai, M.H. Hazbon, R. Colangeli, L. Kremer, T.R. Weisbrod,
D. Alland, J.C. Sacchettini, and W.R. Jacobs Jr. Nat. Med. 12 (2006), pp. 12:10271029.
4. A.I. De La Iglesia and H.R. Morbidoni. Rev. Argent Microbiol. 38 (2006), pp. 97-109.
5. X. He, A. Alianc and P.R. Ortiz de Montellano, Bioor. Med. Chem. 15 (2007), pp. 6649
6658L.

ACKNOWLEDGMENTS
This research was supported by the grant from under the program Strategic Scholarships for
Frontier Research Network for the Ph.D. and the Thailand Research Fund (DBG5180022,
RTA5080005 and MRG5080267).

B00054
March 23-26, 2010
196
Proton transfer reactions and dynamics at

M. Phonyiem and K. Sagarik
C

School of Chemistry, Institute of Science, Suranaree University of Technology
Nakorn Ratchasima 30000, Thailand
C
E-mail: kritsana@sut.ac.th; Fax: (6644)224635; Tel. (6644) 224635

ABSTRACT
Polymer electrolyte membrane widely used in proton exchange membrane fuel cells
(PEMFCs) is Nafion
. The sulfonic acid groups (-SO

3
H ) in Nafion
are preferentially
hydrated and play important roles in proton conduction in PEMFCs. Although
extensively studied using various theoretical and experimental techniques, mechanisms
of proton transfer in Nafion
are not well understood. In the present work, elementary

reactions of proton transfer processes at a -SO
3
H group were investigated, using
complexes formed from triflic acid (CF
3
SO
3
H), H
3
O
+
and nH
2
O n = 1 - 3, as model
systems. Elementary reactions and dynamics of the H-bond protons susceptible for
proton transfer were analyzed based on Born-Oppenheimer (BOMD) simulations at
350 K. It was observed that quasi-dynamic equilibriums were establish between
precursors and transition state complexes, and could be considered as the rate
determining steps.

Keywords: proton transfer, sulfonic acid, Nafion
, PEMFCs, ab initio MD simulations.

1. INTRODUCTION
Proton exchange membrane fuel cell (PEMFC) is considered to be one of the most
promising clean and high-efficiency power generators for diverse applications such as
transportation, residential, and portable devices [1]. Although some basic information on
proton transfer reactions in aqueous solutions has been quite well documented [2-5],
mechanisms of proton transport in Nafion
especially at the molecular level are not well

understood. This is extremely important for the development of the next generation PEMFC.
The PEM widely utilized in PEMFC is Nafion
. Separation of hydrated Nafion
into the
hydrophilic and hydrophobic domains allows theoretical and experimental investigations to
focus attention only on the hydrophilic domains, where proton transfer reactions take place
[6]. In order to obtain information on characteristics of proton transfer processes at a
hydrophilic functional group of Nafion
, structures, energetic and dynamics of the precursors

and transition states in the proton transfer pathways were theoretically studied, using the H-
bond complexes formed from CF
3
SO
3
H, H
3
O
+
and nH
2
O, n = 1 - 3, as model systems. Since
vibrational spectroscopy has been one of the most powerful techniques in H-bond research
[7], due to the fact that the most evident effects of the AHB H-bond formation are the red
shifts of the AH stretching mode, accompanied by its intensity increase and band broadening
[8-9], it was the interest of the present work to analyses the characteristics of the AH
stretching frequencies in the gas phase and aqueous solutions. These could lead to valuable
information on proton transfer mechanisms in H-bonds [10].

Proton dissociations in minimally hydrated Nafion
were theoretically studied by

performing ab initio molecular dynamic (MD) simulations on triflic acid monohydrate solid
B00054
March 23-26, 2010
197
((CF
2
SO
3
H
3
O
+
)
4
) [11]. The MD results showed a relay-type mechanism, in which a proton
defect represents an intermediate state, as the most important step in proton transfer process;
the defect involves the formation of the Zundel complex (H
5
O
2
+
) and the reorganization of the
neighboring groups which share a proton between the oxygen atoms of the anionic sites. The
proposed mechanism also showed a possibility for proton transfer along the hydrophilic head
groups, -SO
3
H and -SO
3
-
. Elementary steps of proton transfer reactions at SO
3
H of Nafion

were investigated, by performing BOMD simulations on the H-bond complexes formed from
triflic acid (CF
3
SO
3
H), H
3
O
+
and H
2
O. It was found that, at 298 K, proton transfer reactions at
SO
3
H are not concerted due to the thermal energy fluctuations, leading to quasi-dynamic
equilibriums among precursors, transition state complexes and products. Most importantly,
SO
3
H could directly and indirectly mediate proton transfer reactions through the formation of
proton defects, as well as the SO
3
-
and SO
3
H
2
+
transition states [12].

Our experience [12-13] indicated that the H-bonds at -SO
3
H of Nafion
could be studied
reasonably well by taking the following three basic steps; (1) searching for all possible
precursors and transition state complexes in the proton transfer pathways using appropriate
pair potentials, such as the T-model potentials [14]; (2) refinements of the computed
structures using DFT calculations at the B3LYP/TZVP level; (3) BOMD simulations using
the refined structures as the starting configurations.
In the present work, the conductor-like screening model (COSMO) was used to take into
account the electrostatic effects introduced by aqueous solvent. Proton transfer reactions were
characterized and analyzed based on the asymmetric stretching coordinates and IR
frequencies of the transferring protons. All the calculations were made using the
TURBOMOLE 6.0 software package [15].


Equilibrium structures and vibrational frequencies
Fig. 1 shows the refined equilibrium structures and interaction energies (E) of the H-
bond complexes in the gas phase and continuum aqueous solutions, together with the
asymmetric stretching coordinates (d
DA
) and asymmetric O-H stretching frequencies (
OH
)
of the H-bond proton. The static proton transfer potentials (B3LYP/TZVP) suggested two
basic H-bond structures (cyclic and linear structures) as the potential transition state
complexes. The tendency of proton transfer at SO
3
H seems to be higher in continuum
aqueous solutions. The relationship between d
DA
and R
O-O
is shown in Fig. 2a. The internal
and external H-bonds could be approximated by linear functions, with a separation of 2.55 .
The relationship between
OH
and R
O-O
is shown in Fig. 2b. It could be approximated by an
exponential function similar to the integral rate expression for the first-order reaction.

B00054
March 23-26, 2010
198

.

Figure 1. Refined equilibrium structures, interaction energies (E), asymmetric stretching
coordinates (d
DA
) and asymmetric O-H stretching frequencies (
OH
) for the CF
3
SO
3
H-H
3
O
+
-
H
2
O 1: 1: 1 complexes (1a)-(1c), CF
3
SO
3
H-H
3
O
+
-H
2
O 1: 1: 2 complexes (2a)-(2c) and
CF
3
SO
3
H- H
3
O
+
-H
2
O 1: 1: 3 complexes (3a)-(3c) obtained from B3LYP/TZVP geometry
optimizations. H-bond Distances, energies and IR frequencies are in , kJ/mol and cm
-1
,
respectively. The values in parentheses are the results in continuum aqueous solutions.

Figure 2. (a) Plot of asymmetric stretching coordinates (d
DA
) and OHO H-bond
distances (R
O-O
), obtained from B3LYP/TZVP calculations. (b) Plot of asymmetric OH
stretching frequencies (
OH
) and OHO H-bond distances (R
O-O
).

Dynamics of proton transfer reactions
Here, the BOMD results on the proton transfer reactions in the CF
3
SO
3
H, H
3
O
+
and H
2
O
complexes are explained, using the CF
3
SO
3
H-H
3
O
+
-H
2
O 1: 1: 2 complex (structure 2a in Fig.
1) as an example. For structure 2a, two important proton transfer mechanisms were observed
3c)
(1)
(3)
(4)
(2)
(1)
(1)
(1)
(2)
(2)
(1)
(1)
(3)
(2)
1a)
1c)
1b)
2a)
3a)
(2)
(1)
2b)
3b)
(1)
(2)
2c)
(2)
(1)
E = -265.7 (-90.4)
(1) O..O = 2.47 (2.43)
d
DA
= 0.36 (0.20)
O-H
= 2307 (1272)
E = -295.7 (-97.5)
(1) O..O = 2.42 (2.43)
d
DA
= 0.14 (0.18)
O-H
= 1143 (1179)
E = -425.2 (-285.4)
(1) O..O = 2.44 (2.47)
d
DA
= 0.23 (0.33)
O-H
= 1793 (2048)
(2) O..O = 2.49 (2.46)
d
DA
= 0.37 (0.35)
O-H
= 2283 (2159)
E = -358.0 (-144.0)
(1) O..O = 2.44 (2.43)
d
DA
= 0.27 (0.31)
O-H
= 1968 (1279)
(2) O..O = 2.54 (2.41)
d
DA
= 0.47 (0.06)
O-H
= 2598 (940)
E = -339.7 (-181.1)
(1) O..O = 2.45 (2.41)
d
DA
= 0.30 (0.07)
O-H
= 2064 (855)
E = -361.7 (-289.4)
(1) O..O = 2.55 (2.43)
d
DA
= 0.49 (0.21)
O-H
= 2703 (1470)
(2) O..O = 2.50 (2.45)
d
DA
= 0.42 (0.27)
O-H
= 2617 (1786)
E = -1069.1 (-512.3)
(1) O..O = 2.44 (2.44)
d
DA
= 0.27 (0.25)
O-H
= 1922 (1399)
(2) O..O = 2.54 (2.44)
d
DA
= 0.48 (0.26)
O-H
= 2765 (1497)
(3) O..O = 2.43 (2.63)
d
DA
= 0.27 (0.61)
O-H
= 1951 (2991)
E = -429.6 (-241.7)
(1) O..O = 2.54 (2.42)
d
DA
= 0.49 (0.16)
O-H
= 2774 (1106)
(2) O..O = 2.50 (2.79)
d
DA
= 0.43 (0.89)
O-H
= 2450 (3595)
E = -616.6 (-560.4)
(1) O..O = 2.44 (2.59)
d
DA
= 0.24 (0.57)
O-H
= 1881 (2991)
(2) O..O = 2.43 (2.65)
d
DA
= 0.22 (0.66)
O-H
= 1776 (3153)
(3) O..O = 2.55 (2.45)
d
DA
= 0.51 (0.29)
O-H
= 2824 (1887)
(4) O..O = 2.55 (2.43)
d
DA
= 0.51 (0.20)
O-H
= 2871 (1249)
a) b)
2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6

d
D
A
/
Gas
COSMO
R(O-O)/

2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2
0
500
1000
1500
2000
2500
3000
3500
4000
4500

c
m
-
1
Gas
COSMO
R(O-O)/

B00054
March 23-26, 2010
199
in the course of BOMD simulations; the pass-by and pass-through mechanisms. They are
labeled with (1) and (2) in Fig. 1, respectively. The IR spectra of the transferring proton are
shown in Fig. 3. For the pass-by mechanism in the gas phase in Fig. 3a, two asymmetric O-H
stretching frequencies are seen at
OH
= 1115 and 2053 cm
-1
, respectively, whereas in
continuum aqueous solutions in Fig. 3b,
OH
are located 967 and 1683 cm
-1
. These could be
considered as the IR spectral signatures of the transferring proton in the gas phase and
continuum aqueous solutions, respectively, with considerable red shifts in continuum aqueous
solutions. For the pass-through mechanism in structure 2a, the IR spectral signatures in the
gas phase and continuum aqueous solutions are shown in Fig. 3c and 3d, respectively. It
appeared that, the characteristic IR spectral signatures are seen only in continuum aqueous
solutions, at
OH
= 972 and 1717 cm
-1
. Based on the IR spectral signatures, one could
conclude that, for structure 2a, proton transfer reactions through the pass-by and pass-through
mechanisms are more preferential in continuum aqueous solutions.

Figure 3. Symmetric, asymmetric O-H and O-O stretching frequencies of H-bond proton
in the CF
3
SO
3
H-H
3
O
+
-H
2
O 1: 1: 2 complex (2a), obtained from BOMD simulations at 350 K.
(a)-(b) For the pass-by mechanisms in the gas phase and continuum aqueous solution,
respectively. (c)-(d) For the pass-through in the gas phase and continuum aqueous solution,
respectively.

5. CONCLUSION
Dynamics and mechanisms of proton transfer reactions in Nafion
were studied using the

H-bond complexes formed from CF
3
SO
3
H, H
3
O
+
and nH
2
O, n = 1 - 3, as model systems.
Based on the static information obtained from DFT calculations at B3LYP/TZVP level,
several transition states, with linear and cyclic H-bond arrangements, were proposed to be
potentially involved in the proton transfer reactions. BOMD simulations at 350 K suggested
two important proton transfer pathways namely, through the pass-by and pass-through
mechanisms. The analyses of characteristic IR spectral signatures obtained from BOMD
simulations at 350 K, between 800 and 2200 cm
-1
, indicated that, proton transfer reactions
through the pass-by and pass-through mechanisms are generally more preferential in
continuum aqueous solutions. In the present theoretical study, BOMD simulations with
appropriate IR analyses have been proved to be powerful tools for the investigations of proton
transfer reactions and could be applied in more complex systems.
0 500 1000 1500 2000 2500 3000 3500 4000
0
10
20
30
40
O-O vibration

Frequency/ cm
-1
0
10
20
30
40
asymmetric

I
n
t
e
n
s
i
t
y

(
a
r
b
.

u
n
i
t
s
)
0
10
20
30
40
symmetric

0 500 1000 1500 2000 2500 3000 3500 4000
0
10
20
30
40
50
60
O-O vibration

Frequency/cm
-1
0
10
20
30
40
asymmetric

I
n
t
e
n
s
i
t
y

(
a
b
r
.

u
n
i
t
s
)
0
10
20
30
40
symmetric

0 500 1000 1500 2000 2500 3000 3500 4000
0
10
20
30
40
50
60
O-O vibration

Frequency/cm
-1
0
10
20
30
40
50
60
asymmetric

I
n
t
e
n
s
i
t
y

(
a
b
r
.

u
n
i
t
s
)
0
10
20
30
40
50
60
symmetric

0 500 1000 1500 2000 2500 3000 3500 4000
0
10
20
30
40
50
60
O-O vibration

Frequency/ cm
-1
0
10
20
30
40
50
60
asymmetric

I
n
t
e
n
s
i
t
y

(
a
r
b
.

u
n
i
t
s
)
0
10
20
30
40
50
60
symmetric

a)
b)
c)
d)
B00054
March 23-26, 2010
200
REFERENCES
1. Larmine, J. and Dicks, A., Fuel cell Syst., John Wiley & Sons Ltd, Chichester, 2001.
2. Kim, E., Weck, P., Balakrishnan, N., Bae, C., J. Phys. Chem. B, 2008, vol. 112, 11, 3283-
3286.
3. Kreuer, K. D., Paddison, S. J., Spohr, E. and Schuster, M., Chem. Rev., 2004, 104, 4637-
4678.
4. Kreuer, K. D., Solid State Ionics, 2000, 149, 136137.
5. Agmon, N., Chem. Phys. Lett., 1995, 244, 456.
6. Mauritz, K. A. and Moore, R. B., Chem. Rev., 2004, 104, 4535-4585.
7. Okumura, M., Yeh, L. I., Myers, J. D. and Lee,Y. T., J. Phys Chem., 1990, 94, 3416-
3418.
8. Wu, C. C., Chaudhuri, C. J., Jiang, C., Lee, Y. T. and Chang, H. C., J. Phys. Chem. A,
2004, 108, 2859.
9. Asbury, J. B., Steinel T. and Fayer, M. D., Lumin. J, 2004, 107, 27.
10. Buzzoni, R., Bordiga, S., Ricchiardi, G., Spoto, G. and Zecchina, A., J. Phys. Chem.,
1995, 99, 11937.
11. Eikerling, M., Paddison, S. J., Pratt, L. R. and Zawodzinski Jr., T. A., Chem. Phys. Lett.,
2001, 368, 108-114.
12. Sagarik, K., Phonyiem, M., Lao-ngam, C. and Chaiwongwattana, S., Phys. Chem. Chem.
Phys., 2008, 10, 2098-2112.
13. Chaiwongwattana, S. and Sagarik, K., Phys. Chem. Chem. Phys., 2009, 335, 103-117.
14. Sagarik, K. P. and Rode, B. M., Chem. Phys., 2000, 260, 159.
15. TURBOMOLE V6.0 2009, a development of University of Karlsruhe and
Forschungszentrum Karlsruhe GmbH, 19892007, TURBOMOLE GmbH, since 2007;
available from http://www.turbomole.com.

ACKNOWLEDGMENTS
Financial supporting from the Thailand Research Fund (TRF): The Advanced Research
Scholarship, Grant No. BRG5180022; the Royal Golden Jubilee (RGJ) PhD Program, Grant
No. PHD/0110/2548 to Mayuree Phonyiem and Prof. Kritsana Sagarik. All facilities and
computer resources are provided by following organizations are also gratefully
acknowledged: the Computational Chemistry Research Laboratory (CCRL), School of
Chemistry, SUT; National Electronics and Computer Technology Center (NECTEC); the
Thai National Grid Center (THAIGRID).

B00055
March 23-26, 2010
201
Proton Conduction at Sulfonate Group of Nafion

Ch. Lao-ngam and K. Sagarik
C

School of Chemistry, Institute of Science, Suranaree University of Technology
Nakorn Ratchasima, Thailand 30000
C
E-mail: kritsana@sut.ac.th; Fax: (6644)224635; Tel. (6644)224635

ABSTRACT
Proton transfer reactions at a hydrophilic group of polymer electrolyte membrane
(PEM) were studied using theoretical methods. The study began with the investigations
of the dynamics in the most basic units of hydrated proton in aqueous solutions namely,
the Zundel and Eigen complexes. The density functional theory (DFT) at
B3LYP/TZVP level was applied in BOMD simulations, from which the IR
spectroscopic signatures and diffusion coefficients of the active protons were analyzed.
Based on the same approach, proton transfer reactions at a sulfonate group (-SO
3
-
) of
Nafion
were studied, using the microhydrated complexes formed from the triflate ion
(CF
3
SO
3
-
), H
3
O
+
and H
2
O as model systems. At low hydration levels, Born-
Oppenheimer MD (BOMD) simulations suggested that quasi-dynamic equilibriums
among these species represent elementary reactions of proton transfer reactions.
BOMD simulations also suggested that, in continuum aqueous solutions, the proton
transfer pathways involve the CF
3
SO
3
-
-H
3
O
+
-H
3
O
+
1: 1: 1 complex as the most active
transition state complex, and the fluctuations of the coordination number at H
3
O
+
could
help promote proton conduction, as in the case of hydrated proton in aqueous solutions.
IR spectroscopic signatures of transferring proton are represented by well defined
asymmetric O-H stretching frequencies, with the threshold frequencies 2000 cm
-1

Keywords: proton transfer, Nafion
, BOMD simulations, sulfonate group, PEMFC.

REFERENCES
1. Larminie, J., and Dicks, A., Fuel Cell Systems, John Wiley & Sons Ltd., Chichester, 2001.
2. Vincent, C.A., and Scrosati, B., Modern Batteries: An introduction to electrochemical
power sources, John Wiley & Sons Ltd., New York, 1997.
3. Koppel, T., Powering the Future: The ballard fuel cell and the race to change the world,
John Wiley & Sons Ltd., New York, 1999.
4. Mauritz, K.A., and Moore, R.B., Chem. Rev., 2004, 104, 4535.
5. Paddison, S.J., Annu. Rev. Mater. Res., 2003, 33, 289.
6. Kreuer, K.D., Chem. Mater., 1996, 8, 610.
7. Kreuer, K.D., Paddison, S.J., Spohr, E., and Schuster, M., Chem. Rev., 2004, 104, 4637.
8. Agmon, N., Chem. Phys. Letters, 1995, 244, 456.

Computational
Mathematics

C00002
March 23-26, 2010
202
A Numerical Computation for Water Quality Model in a
Non-Uniform Flow Stream Using Maccormack Scheme

Nopparat Pochai
1,C
, Sureerat A. Konglok
2
and Suwon Tangmanee
3

1
Department of Mathematics, King Mongkut's Institute of Technology Ladkrabang, 10520, Thailand
2
Department of Mathematics Statistics and Computer, Ubon Ratchathani University, 34190, Thailand
3
Center of Excellence in Mathematics, Mahidol University, 10400, Thailand
C
E-mail: konoppar@kmitl.ac.th; Fax: 02-3264341 ext 284; Tel. 02-3264341 ext 283

ABSTRACT
The water quality assessment problems often involves a mathematical model. The most
of governing equation of the uniform flow is a one-dimensional advection-dispersion-
reaction equations (ADRE). In this research, the affect of non-uniform flow of water in
a stream is concentrated. A Couple of mathematical models is used to simulate
pollution due to sewage effluent. The first is a hydrodynamic model that provides the
velocity field and elevation of the water flow. The second is an advection-dispersion-
reaction model that gives the pollutant concentration fields after input the velocity data
from the first model. The numerical schemes, the Crank-Nicolson method for system of
a hydrodynamic model and the implicit schemes to the dispersion model are used. The
MacCormack implicit schemes are revised from two computation techniques of
uniform flow stream problems: forward time forward space (FTFS) and backward time
central space (BTCS) schemes for dispersion model. The advantage of this scheme is
the stability aspect. The application to the real-world problem is presented.

Keywords: Finite differences, Crank-Nicolson scheme, Implicit scheme, Forward time
forward space scheme, Backward time central space scheme, MacCormack scheme,
One-dimensional, Hydrodynamic model, Advection-dispersion-reaction.

REFERENCES
1. Dehghan, M., Numerical schemes for one-dimensional parabolic equations with
nonstandard initial condition, Applied Mathematics and Computation, 2004, 147(2), 321-
331.
2. Li, G. and Jackson, C.R., Simple, accurate and efficient revisions to MacCormack and
Saulyev schemes: High Peclet numbers, Applied Mathematics and Computation, 2007,
186, 610-622.
3. Pochai, N., Tangmanee, S., Crane, L.J. and Miller, J.J.H., A mathematical model of water
pollution control using the finite element method, Proceedings in Applied Mathematics
and Mechanics, 2006, 6(1), 755-756.
4. Pochai, N., Tangmanee, S., Crane, L.J. and Miller, J.J.H., A Water Quality Computation
in the Uniform Channel, Journal of Interdisciplinary Mathematics, 2008, 11(6), 803-814.
5. Pochai, N., A Numerical Computation of Non-dimensional Form of Stream Water Quality
Model with Hydrodynamic Advection-Dispersion-Reaction Equations, Journal of
Nonlinear Analysis: Hybrid System, 2009, 3, 666-673.
C00004
March 23-26, 2010
203
Tidal Analysis with Error Estimates:
Local and Repositories Variations

S. Sirisup
C
, S. Tomkratoke and N. Harnsamut
Large-Scale Simulation Research Laboratory
National Electronics and Computer Technology center
112 Thailand Science Park, Klong 1 Klong Luang Pathumthani 12120, Thailand
C
E-mail: sirod.sirisup@nectec.or.th; Fax: 662-5646776; Tel. 662-5646900

ABSTRACT
The tidal signals records from tidal gauge stations are very useful for understanding
estuarine and coastal evolution purposes. However, non-tidal noises can be regularly
introduced to the tidal signals by recoding, transcription errors as well as other climatic
events which can alter the results of the tidal analysis. In this work, we will employ the
tidal analysis with error estimates in order to investigate the local tidal variability from
each tidal gauge in the Gulf of Thailand. We will also compare the variability of the
tidal signals recorded by the near by stations but from different tidal repositories

Keywords: Tidal analysis, Error estimate, Coastal evolution.

1. INTRODUCTION
The needs of understanding the dynamics of the coastal ocean and estuaries have been
increased recently. One of the triggering points is to understand how these areas will react and
respond to the changing climate. Understanding both coastal area and estuarine evolution will
greatly benefit the managements purposes [1]. But these, however, pose some dynamical
challenges. Tides provide the longest instrumental records available to address such problems.
However, to fully utilize tidal records, it is necessary to develop better analysis strategies,
because these records are often short, sparse and/or noisy. Non-tidal noises can be regularly
introduced into the tidal signal by recording and transcription errors or else.
Tide records are also used as the validation tools in many coastal ocean and regional
ocean simulations. They thus play important role in determining the governing parameters
used in those simulations. The tidal records with less interference from non-tidal noises are
preferred in this stage.
In this study, we will carry out the harmonic analysis of tidal signals with error estimates
for the tidal model standard parameters (amplitudes and the Greenwich phase) of tidal records
located in the Gulf of Thailand. The results from analysis can benefit both coastal ocean
simulation and coastal ocean management communities.
The paper is organized as follows; in the next section we will discuss the tidal model
used in this investigation as well as techniques to derive error estimate of the non-tidal noises
and the confidence interval of the standard parameters. In section 3, the information of data
sets used in the current analysis are discussed. In section 4, we provide analysis results and.
Finally, we close the paper with a brief summary.

2. MODEL AND ERROR ESTIMATE
2.1 Tide model
Follow the concept from [2], in this paper; we assume that the tidal respond can be model
as:
) sin( ) ( ) cos( ) ( ) (
1
1 0
t a a i t a a t b b t x
k k k
N
k
k k k

(1)
C00004
March 23-26, 2010
204
Where N is the number of constituents included in the tidal model and for the tidal signals
(real time series)
k
a is the complex conjugate of
k
a . Here, the first two terms are introduced
to handle any offset and linear trend that may exist in tidal signals. Also, for tidal signal
(scalar time series), the above formula can be converted to the standard (eclipse) parameters;
amplitude and phase as:

k k k
a a A (2)
) (
k k k
a ang v (3)
Where
k
v , is the equilibrium phase.
For a given tidal signal, the coefficients
k
a and
k
a can be found by imposing the least
square fit to that particular tidal signal. After obtaining these coefficients, many post
processing processes can also be done in order to improve the estimation of these coefficients
if more information becomes available. However, we do not perform those processes in this
investigation.

2.2 Error estimate and confidence interval
In order to make better estimates of tidal behavior in an interested coastal or estuary
region, one must seek coefficients that provide the best representation of the true tidal energy
and drop all interferences from non-tidal signal that may coalesce during the sampling period.
In order to justify this, we follow the work from [2] that we must first estimate the
characteristics of the non-tidal or residual noise that affects the tide model coefficients.
Secondly, we need to transform that estimate into the confidence interval of the standard tidal
parameters (amplitude and phase) via non-linear mapping.
If we assume that the residuals are statistically Gaussian and uncorrelated with time, the
total residual power or standard error of the residuals is t P
x
/
2
where P is the two-
sided spectral density and t is time between two samplings. The amplitude coefficients of
the tide model can be faulty derived because the errors from unresolved noise components
with frequency interval of
t N
f

1
. Thus, the standard errors of the amplitude
coefficients are
N
x
a a i a a
k k k k
2
2
) (
2

. However, for statistically non-Gaussian
residuals, the value of P can be numerically derived by making a spectral estimate from the
residual time series and averaging the frequency power over a window around the frequency
of a given constituent.
Next, the transformation of the previous errors to the standard (eclipse) parameters
(amplitude and Greenwich phase) is done by a non-linear function,
)) ( , (
k k k k
a a i a a F where is either standard parameter. In order to derive the
standard errors of the standard parameters, the parametric bootstrap technique [3] is
introduced to handle this non-linear function. The 95% confidence interval is then
derived after the standard errors are obtained.

3 DATA AND COMPUTATIONAL DETAILS
The Marine department and the Hydrographic service department of the Royal Thai Navy
have provided us with the tidal signals recorded hourly from the tide gauge stations deployed
throughout in the Gulf of Thailand region. In order to investigate the local variations in the
data for each year, we have investigated in two one-year hourly-recorded data sets: 2005 and
2008. In the current study, we will use the following stations from the Marine department:
Prasae, Si Chol, Pak Panang, and Lang Suan. In order to measure the inter-repository
variabilities in different repositories, the following tidal gauge stations from the Hydrographic
service department of the Royal Thai Navy have been also investigated: Laem Sing, Ko Lak
C00004
March 23-26, 2010
205
and Ko Prap. The stations are chosen so that the distance between two stations one from the
Marine department repository and the other one from the Hydrographic service department of
the Royal Thai Navy are relatively close. For the inter-repository variabilities study we
perform the investigation only for the year 2005 only. The locations of each station are
shown in Figure 1.

Figure 1. Location of the tide gauge stations.

We have also compared our investigation to the global model of ocean tides, OTIS
TPXO7.2 [4]. However, the confidence interval for this data set is not able to determine but
we will use this information as reference candidate.
In order to perform the tidal analysis with error estimate, we apply the T_TIDE toolbox [2]
to our data sets. The results of the analysis are presented and discussed in next section.

4 RESULTS AND DISCUSSION
The results of the analysis of the tide gauge stations from the Marine department are
presented in Figure 2, 3, 4 for the tide model amplitudes and Figure 5, 6, 7 for the tide model
phase. It is noted here that the negative degree in phase refer to modulo of 360 degrees. In
Figure 2, the mean of predicted tidal model amplitudes together with their 95 % confidence
intervals are presented for the year 2005. Same goes to Figure 3 but for the year 2008. From
these figures, the confidence intervals of each constituent are small indicating that there is no
large interference from non-tidal noise. However, this is not true for Sa constituent in Si Chol,
even though, from the current analysis, we can only detect the Sa constituent only for the year
2008, the 95 % confidence interval of is wide. If we compare the two nearest tidal gauge
stations: Si Chol and Pak Panang, we can see that 95 % confidence intervals of the
coefficients of the semi-diurnal constituents closely locate each other but those of the diurnal
constituents are not, especially in the year 2005. There are the tendency that those of Pak
Panang are lower that those of Si Chol.
C00004
March 23-26, 2010
206

Figure 2. Means of predicted amplitudes and 95% CIs for tide signals in year
2005.

Figure 3. Means of predicted amplitudes and 95% CIs for tide signals in year
2008

Figure 4. OTIS: Amplitude.
C00004
March 23-26, 2010
207

Figure 5. Means of predicted phases and 95% CIs for tide signals in year 2005.

Figure 6. Means of predicted phases and 95% CIs for tide signals in year 2008

Figure 7. OTIS: Phase.

However, the situation becomes better and a good agreement to the OTIS data, Figure 4, is
found in the year 2008. We also make the comparison of the harmonic analysis to [4] which
give the harmonic results for the tidal signals in the year 1997. However, only the coefficients
C00004
March 23-26, 2010
208
and the phases for M
2
, S
2
, K
1
, O
1
constituents are available. We find that the coefficients the
semi-diurnal constituents in the year 1997 fall into the 95 % confidence interval of those in
the year 2005 and 2008 as well. However, this is not true for the case of the coefficients of the
diurnal constituents. The magnitudes of the coefficients of these constituents are about 10
centimeters lower. In some cases, this can lead to an erroneous in the Form number.
In Figure 5 and 6, the mean of predicted of tidal model phases together with their 95 %
confidence intervals are presented for the year 2005 and 2008, respectively. Most of the
intervals are small except for the shallow water constituents and less significant semi-diurnal
and diurnal constituents. Now let we focus on the two nearby locations in the current data set:
Si Chol and Pak Panang. In both years the 95% confidence intervals of the tide model phases
for these locations are not really close. In contrast, the results from the OTIS in Figure 7
indicate that the tide model phases for these locations should be close to each other.
So far, we have investigated the local variations in the tide data set obtained from the
Marine department. We also make a comparison of the analysis results to the one from the
OTIS data. However, in order to make the inter-repositories comparison; the comparison to
the other repository, the Hydrographic service department of the Royal Thai Navy must also
be done. The results of this comparison will, however, be deferred until the conference period.

5 CONCLUSION
In this study, we perform the harmonic analysis with error estimates on the tide data in
order to investigate the local tidal variability from each tidal gauge stations scattered around
in the Gulf of Thailand. The mean of the predicted standard parameters and 95 % confidence
interval for each ones has been determined from the spectral estimation of the non-tidal noise
and parametric bootstrap technique. We have performed the analysis on tidal data sets are
from the year 2005 and 2008 for four tide gauge stations (from the Marine department). The
95 % confidence intervals for the coefficients and phases of constituents in the tide model are
provided and compared. It is found that almost all of the 95 % confidence intervals for the
coefficients are narrow except that for the Sa constituent. With the harmonic analysis with
error estimates, we can also compare and observe if there are any irregular characteristics of
the standard parameters of the tide signal from the two nearest stations. The results of the
inter-repositories variations together with the full discussion will be reported in the
conference.

REFERENCES
1. Schwartz, M. (Eds), Encyclopedia of Coastal Science, Springer, 2005.
2. R. Pawlowicz, B. Beardsley and S. Lentz, Computers & Geosciences, 2002, 28, 929937.
3. B. Efron and R.J. Tibshirani, An introduction to the Bootstrap, Chapman & Hall, 1993
4. G. D. Egbert and S. Y. Erofeeva, Journal of Atmospheric and Oceanic Technology, 2002,
19 (2), 183-204
5. J. Phaksopa, Storm surge in the gulf of Thailand generated by typhoon Linda in 1997
using POM, M.Sc Thesis, Chulalongkorn University, 2003.

ACKNOWLEDGMENTS
The authors would like to thank both the Marine department and the Hydrographic service
department of the Royal Thai Navy for providing us the data from their tidal gauge stations.
C00009
23-26 March, 2010

209
Comparison of Reversible Feature Extraction Techniques
Applied to Anatomical Shape Modelling

J. Chiverton
1,C
1
School of Information Technology, Mah Fah Luang University,
333, Moo 1, Thasud, Muang, Chiang Rai, 57100, Thailand
C
E-mail: johnc@mfu.ac.th

ABSTRACT
This paper compares the ability of various reversible feature extraction techniques to
accurately retain the relevant information required to describe anatomical shapes. The
particular emphasis here in this work is on anatomical shapes and whether features
regarding the anatomical shapes can be effectively extracted using linear and non-
linear feature extraction and reconstruction techniques.
Modelling the shape of anatomy is useful for the biomedical field and in particular can
improve computerized medical image analysis tasks. Early shape modelling techniques
often described the variation of a set of shapes with respect to a single mean shape
using Principal Component Analysis (PCA). These techniques work well for data sets
with linear covariations.
More recent popular shape modelling techniques have utilized various manifold
learning techniques to extract relevant features. These techniques describe the variation
of a set of shapes with respect to each other rather than global means and have been
used to learn the principal modes of variations of data sets exhibiting non-linear
covariations. These non-linear shape variations have been found to be modelled more
accurately with kernel based techniques where a kernel is used to approximate the local
linearity in the feature space, and hence identify variables that differ most between
points in the feature space that have close proximity in the manifold.
This paper therefore presents a comparison of these techniques applied to white matter
extracted from MRI data. Shape is represented by the signed distance function as this
has been found to be a computationally and mathematically convenient and popular
representation of shape.

Keywords: Linear and non-linear shape modelling, Principal Component Analysis,
Kernel PCA

1. INTRODUCTION
Modelling the shape of objects is an important component in many computational
systems, including medical equipment with applications such as radiotherapy planning and
pre-surgical planning. Anatomical shapes can be particularly complex to model as the shape
of anatomy can often fully deform in three dimensions and will usually vary between
individuals and even within individuals at different times in their lives. Nevertheless the shape
of anatomy, e.g. the white matter of the human brain possesses significant similarity between
individuals and over the course of a person's life. Furthermore medical professionals are able
to recognize and or extract relevant information from the complex structures of the human
body which they can use in a diagnosis or as part of developing an understanding of a
patient's condition.

C00009
23-26 March, 2010

210
Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA)
[1] can be used to describe shape using various different approaches. PCA is a very popular
technique, e.g. [2] and has been used extensively for modeling linear and even some non-
linear problems. However data is usually assumed to be Gaussian if PCA is being used.
KPCA along with many other techniques are able to model non-linear non-Gaussian
distributed data, see e.g. [1,2,4,5,6]. KPCA shares many similarities with PCA in that they
both can be derived using a dot product formulation. A review of the theory behind KPCA is
given shortly and closely follows [1,2].

The shape of human anatomy can be described by various different data structures. One
popular representation of shape is the signed distance function which can also be used to a
good approximation as a level set. A shape can be represented by a signed distance function
as it describes the distance from the boundary of the shape. There are various computationally
efficient algorithms to compute the signed distance function. Many algorithms also exist to
compute the unsigned distance function, see e.g. [3]. The work in [3] is extended here to
signed distances, where points inside the shape take positive distances from the shape
boundary and points outside of the shape take negative distances. This extension is trivial but
it nevertheless encodes further implicit information into the shape representation. A signed
distance is therefore defined here by:
( ) ( ) ( ) w d w s = w
r r r
(1)
where ( ) w d
r
is the distance to the closest boundary point of the shape from point w
r
and
( ) 1 = w s
r
which is negative if the point w
r
is not inside the shape. Each point w
r
is defined
here as 2 dimensional (2D), in Cartesian form.
Next the signed distance function is considered to define a matrix ( ) ( )
ij
= j i, = w
r
as the
shape information is considered fixed for an instance of an anatomical shape. This matrix is
then vectorized ( )
T
2 21 1 11
... .. ..
PQ Q Q
=
r
to enable feature extraction with Principal
Component Analysis (PCA) or Kernel Principal Component Analysis (KPCA).
If there are M observed anatomical shapes (
M

r r
,...,
1
) then PCA can analyze and extract up
to M eigenvectors from the covariance matrix of the anatomical shapes. The sample
covariance matrix is calculated here by (assuming zero means),
=
=
M
k
k k
M
S
1
T
1

r r
. (2)
PCA can be defined as diagonalizing the covariance matrix, which is the same as projecting
the information (or variations) in the data into a small number of dimensions. In Kernel PCA
(KPCA) the eigenvectors are calculated from a matrix of the form,
( ) ( )
=
=
M
k
k k
M
C
1
T 1

r r
(3)
where ( )
k
r
is a function that takes an anatomical shape as the input and transforms it
to an alternative representation known here as the feature space. As for (2), (3)
assumes zero mean samples but in feature space, i.e. ( )
=
k
k M
0
1
r
. ( )
k
r
can be
nonlinear and if ( )
k k

r r
= then (3) is the same as (2). Also (3) is the calculation for a
sample covariance matrix, but in the feature space rather than the raw data space.
An important point to note is that the feature space ( )
k
r
may have a different
dimensionality in comparison to the dimensionality of the original data.
As for PCA, KPCA reduces to an eigenvalue problem but in feature space,
C00009
23-26 March, 2010

211
v C = v
r r
(4)
for eigenvalues 0 > and eigenvectors { } 0 \ R
N
v
r
. This problem can be re-written as,
( ) ( )
( ) ( ) ( )
=
=
=
= =
M
k
i k k
i
M
k
k k i i i
v
M
v
M
v C v
1
T
1
T
.
1
1
r
r r
r
r r
r r

(5)
Therefore, if 0 >
i
, then the eigenvector
i
v
r
is given by a linear combination,
( )
=
=
M
k
k n i i
a v
1
,
.
r
r
(6)
Combining equations (5) and (6),
( ) ( ) ( ) ( )
k
M
k
k i
M
k
j
M
j
j i k k
a a
M

r r r r
=

= = = 1
,
1 1
,
T 1
. (7)
Multiplying both sides by ( )
T
k
r
gives
( ) ( ) ( ) ( ) ( ) ( )
k l
M
k
k i
M
k
j k
M
j
j i k l
a a
M

r r r r r r
=

= = =
T
1
,
1
T
1
,
T 1
, (8)
and replacing ( ) ( ) ( )
j k j k
k
r r r r
,
T
= with a kernel function,
( ) ( ) ( )
k l
M
k
k i
M
k
j k
M
j
j i k l
k a k a k
M

r r r r r r
, , ,
1
1
,
1 1
,

= = =
= . (9)
In matrix form (9) becomes
i i i
a M a
r r
K K =
2
. (10)
where ( )
T
, 2 , 1 ,
...
M i i i i
a a a a =
r
are the eigenvectors of K and
i
are the eigenvalues. As
described in [1], the solution for
i
a
r
can be found by solving the eigenvalue problem:
i i i
a M a
r r
= K (11)
where
i
a
r
are treated as eigenvectors in anatomical shape space which have to be normalized.
They can be normalized by normalizing the eigenvectors in the feature space, via
i i
v v
r r
T
1 = .
Therefore using (6) and (11),
( ) ( )
i i i i i
M
k
M
j
j k j i k i i i
a a M a a a a v v
r r r r
r r
r r
T T
1 1
T
, ,
T
= = =

= =
K (12)
Then a test anatomical shape
h
r
may be projected onto eigenvector i with,
( ) ( ) ( ) ( ) ( )

= =
= = =
M
j
j h j i
M
j
j l j i i h h i
k a a v y
1
,
1
T
,
T
,
r r r r
r
r r
. (13)
Or in full matrix form,
( ) ( ) ( ) ( )
=
= =
M
i
i l i h h
a Y
1
T T

r r
r
r r
V . (14)
These equations can be used by:
1. Compute matrix K
2. Compute eigenvectors ( )
T
, 2 , 1 ,
...
M i i i i
a a a a =
r
and eigenvalues
i
of K
3. Normalize the eigenvectors, using (12)
4. Compute projections of a test anatomical shape onto the eigenvectors, using (14).
This derivation follows [1,2] and corresponds to data that is zero mean in the feature
space. A similar derivation can be found in [1,2] for the non-zero mean case, where
the data is centred in the feature space implicitly via the formulation.

C00009
23-26 March, 2010

212
The main aim of this work was to investigate the ability of PCA and KPCA to describe
variations in complex anatomical shapes. White Matter (WM) in the human brain does
possess some complex anatomical shapes. The variations seen between individuals in
Magnetic Resonance Imaging (MRI) data can also be quite large. Therefore a single
corresponding 2D transaxial image slice (180) was selected from each of 20 different
anatomical models of the human brain available from [7]. These are based on real anatomy
expertly constructed with state of the art imaging and image processing techniques. The
discrete anatomical model was selected for each subject and the WM pixels were isolated via
a simple two level threshold. Next Signed distances were calculated for each image slice
using the technique described earlier. Then each image slice was downsampled to 100x120
pixels to ease the computational burden as higher resolutions were not required for the
purposes of the work described here.

Figure 1. Example 2D transaxial image slice (180) from human brain model
subject 54 (left). Signed distance function for the same image slice (right).

The image slices were vectorized as described earlier and then the eigenvectors and
eigenvalues using PCA and then KPCA using a Gaussian kernel were calculated. These
eigenvectors were then used to project the data vectors into the PCA and KPCA feature
spaces using a reduced number of eigenvectors to test the ability of the two techniques to
capture the important variations in the data with as few eigenvectors as possible.

The data were then projected back from feature space into the data space. This process is a
simple step for PCA as the inverse of the matrix of eigenvectors can be used, which can even
be the transpose of the matrix due to the properties of the matrix. However KPCA has what is
known as the preimage problem, where the reverse mapping from feature space to data space
is not usually one to one and therefore requires techniques described in e.g. [8], which was the
technique used here to reconstruct the data.

The reconstructed image slices were then compared with the original data by thresholding the
signed distances to reconstruct the original binary images and then calculating the number of
false positive, true positive, false negative and true negative pixels. A reconstruction error
(sensitivity) was then calculated using these values.

Results comparing the reconstruction error for the PCA and KPCA using 5 eigenvectors
can be seen in figure 2. This plot illustrates variable reconstruction errors for both PCA and
KPCA. PCA reconstructs the original data quite accurately. However KPCA was able to
reconstruct some data less accurately and some data more accurately. Something that is
difficult to display using a single numerical value is the shape of the reconstructed data. The
results obtained with KPCA seemed interesting as complex variations, particularly around the
gyri were retained. These were notably lacking for some of the results obtained with PCA,
particularly when fewer eigenvectors were used for the project of the test images to the
feature space. Figure 3 gives an example of this difference.
C00009
23-26 March, 2010

213

Figure 2. Comparison of the shape reconstruction error for PCA and KPCA.

Figure 3. Comparison of the shape reconstruction for subject 5 using 5 eigenvectors.
The original data is shown to the left. Middle is the reconstruction obtained with PCA
and right is the KPCA reconstruction.

PCA is easier to use than KPCA because it does not use any parameters. Furthermore the
preimage problem does appear to complicate the reconstruction process. Results obtained
with PCA, with all eigenvectors retained, reconstructed the original data perfectly, however
the results obtained with KPCA always contained some reconstruction error.

5. CONCLUSIONS
KPCA is an interesting technique for feature extraction of non-linear features in e.g.
anatomical shapes but it does appear to be more difficult to use in comparison to PCA.

REFERENCES
1. B. Scholkopf, A. Smola and K-R. Muller, Neural Computation, 1998, 10, 1299-1319.
2. C.M. Bishop, Pattern Recognition and Machine Learning, Springer 2006.
3. P.F. Felzenszwalb and D.P. Huttenlocher, Cornell Computing and Information
Science,TR2004-1963, 2004.
4. M. Leventon, W. Grimson, O. Faugeras, In Proc. IEEE CS Conf. Computer Vision and
Pattern Recognition (CVPR'00), IEEE, 2000, 316-323.
5. S. Dambreville, Y. Rathi, and A. Tannenbaum, IEEE Trans. Patt. Anal. Mach. Intell.
2008, 30(8), 1385-1399.
6. S. Roweis and L. Saul, Science, 2000, 290, 2323-2326.
7. B. Aubert-Broche, M. Griffin, G.B. Pike, A.C. Evans and D.L. Collins, IEEE Trans.
Medical Imaging, 2006, 25(11), 1410-1416.
8. Scholkopf and A.J. Smola, Learning with kernels, MIT Press, 2002.

ACKNOWLEDGMENTS
The author gratefully acknowledges Mae Fah Luang University for providing a research grant
that has funded this work.
C00011
March 23-26, 2010
214

A Comparative Study of Conjugate Gradient Method for

Mohd Rivaie
1
, Mustafa Mamat
2
, Ismail Mohd
3
,Muhammad Fauzi
4
3 , 2
Department of Mathematics, Faculty of Science and Technology
Universiti Malaysia Terengganu (UMT)
Malaysia
1,4
Universiti Teknologi MARA (UiTM) Terengganu
Department of Computer Sciences and Mathematics
Campus Kuala Terengganu
Malaysia
1
rivaie75@yahoo.com,
2
mus@umt.edu.my,
3
ismail@umt.edu.my,
4
fauziembong@yahoo.com

ABSTRACT
Conjugate gradient methods are popular in unconstrained optimization. Numerous
studies have been devoted recently to improve this method. In this paper we compare
three of our new propose conjugate gradient coefficient (
k
| ) with the six most
common
k
| proposed by the early researches. Numerical results have shown that,
our new formula for
k
| performs far better than the original formula. It is
also shown to possess global convergence properties.

Keywords: conjugate gradient method, conjugate gradient coefficient, convergence.

1. INTRODUCTION
The conjugate gradient methods (CG) are one of the popular methods in finding the minimum
value of function for unconstrained optimization. Generally, the method has the following
form
) ( min x f
n
R xe
(1.1)
where R R f
n
: is continuously differentiable. The iteration for solving (1.1) is,
,... 2 , 1 , 0 ,
1
= + =
+
k d x x
k k k k
o (1.2)
where
k
x is the current iterate point, 0 >
k
o is a stepsize computed using line search
technique and
k
d is the search direction. The search direction,
k
d is defined by

> +
=
=
1 if
0 if
1
k d g
k g
d
k k k
k
k
|
(1.3)
where
k
g is the gradient of ) (x f at the point
k
x . The parameter R
k
e | is known as
conjugate gradient coefficient. Some well known formulas are given as follows:

2
1
=
k
k
T
k FR
k
g
g g
| (1.4)

( )
2
1
1
=
k
k k
T
k PR
k
g
g g g
| (1.5)

( )
( )
1 1
1

=
k
T
k k
k k
T
k HS
k
d g g
g g g
| (1.6)
C00011
March 23-26, 2010
215

( )
1 1
1

=
k k
k k
T
k LS
k
g d
g g g
| (1.7)

( )
1 1
=
k
T
k k
k
T
k DY
k
d g g
g g
| (1.8)

1 1
=
k
T
k
k
T
k CD
k
g d
g g
| (1.9)
where
k
g and
1 k
g are the gradients of ) (x f at the point
k
x and
1 k
x respectively. The
above corresponding methods are known as FR (Fletcher and Reeves [11]), PR (Polak and
Ribiere [16]), HS (Hestenes and Steifel [13]), LS (Liu and Storey [14]), DY (Dai and Yuan
[7]) and lastly CD denotes conjugate descent by Fletcher [10]. We represent norm of vectors
as . . Dai and Yuan [6] and Yuan and Sun [23] have shown that, all these methods are
equivalent for strict convex quadratic function, but behave differently for general non
quadratic functions.

The most studied properties of CG methods are its global convergence properties. The most
well known research are by Zoutendijk [24], Powell [17], Powell [18], Al-Baali [1], Touati-
Ahmed and Storey [21], and Gilbert and Nocedal [12]. For further reading and recent finding
of CG methods refer to Sun and Zhang [20], Birgin and Matrtinez [5], Dai and Yuan [8],
Yuan and Wei [22], Andrei [3], Shi and Gao [19].

In this paper we present three new
k
| and compared their performances with the six above
mention formula. The general algorithm for our new methods and the standard CG methods
are shown in Section 2. All the problems are solved using the exact line search. Some
interesting numerical results are then presented in Section 3 by comparing our new method
with formula (1.4) to (1.9). Lastly, our discussion and conclusion based on these comparisons
are presented in Section 4 and Section 5 respectively.

2. THE ALGORITHM AND NEW PROPOSED METHOD
We proposed our first method based on the Thesis paper by Battaglia [4]. We named this
method, the Eigen Conjugate Gradient (ECG). Where,

n n
ECG
k
E E E E + + + +
=
1 2 1
...
1
| (2.1)
where E
1
, E
2
,,E
n-1
, E
n
are eigenvalues.
Our second modification is based on the original HS method known as New Version
Hestenes-Steifel method (NHS). Where,

) (
) (
1 1
1
k k
T
k
k k
T
k NHS
k
g d d
g g g
| (2.2)
Our third modification is based on the original PR method known as New Version Polak-
Ribiere method (NPR). Where,

( )
1 1
1

=
k
T
k
k k
T
k NPR
k
d d
g g g
| (2.3)
New point is then computed using (1.2) and (1.3). The complete algorithm is shown as
follows:

C00011
March 23-26, 2010
216

Algorithm of CG method
Step 1: Given
0
x , set 0 = k
Step 2: Compute
k
| based on (2.1), (2.2) and (2.3)
Step 3: Computing
k
d based on (1.3). If 0 =
k
g , then stop.
Step 4: Solve ) ( min
0
k k k
d x f o o
o
+ =
>
,
Step 5: Updating new point from
k k k k
d x x o + =
+1

Step 6: If ) ( ) (
1 k k
x f x f <
+
and c <
k
g then stop
Otherwise go to Step 1 with 1 + = k k .

For convergence properties of general CG methods, we refer the reader to Dai, et.al. [9] and
Mohd, et.at. [15].

3. NUMERICAL RESULTS
To test and analyze the efficiency of ECG, NHS and NPR, we use the test problems
considered in Andrei [2] as shown in Table 1.

Table 1: List of problems functions.

Problem Function
1 Rosenbrocks function for two variables.
2 Rosenbrocks function for four variables.
3 Cube function for two variables.
4 Shalow function for two variables
5 Shalow function for four variables
6 Wood function for four variables.
7 Strait function for two variables.
8 Six Hump Camel Back function for two variables.
9 Three Hump Camel Back function for two variables.

Comparisons with other CG methods mention in (1.4) to (1.9) are based on the number of
iterations to reach minimizer. Results are shown in Table 2. The word Fail in Table 2 means
that the run was stopped due to the line search procedure failed to find the positive stepsize.
We did not consider the CPU time as a comparisons, since the time taken for a single iteration
did not show any significant difference. We considered
6
10
= c and
6
10
<
k
g as a
stopping criteria. The programming is stop if the iteration number is more than one thousand.
All the problems mentioned above are solved by Maple 12 subroutine program using the
exact line search.

Table 2: Performance comparison of different CG method based on number of iterations

No. Initial point FR PR HS LS DY CD NHS NPR ECG

1
(13,13)
(50,50)
(100,100)
(200,200)
563
>1000
>1000
>1000
23
29
30
41
23
29
30
41
23
29
30
41
619
>1000
>2000
>3000
524
>1000
>2000
>3000
12
23
24
29
12
23
24
29
174
156
170
367
(13,13,13,13) 585 23 23 23 587 841 13 13 468
C00011
March 23-26, 2010
217

2 (50,50,50,50)
(100,100,100,100)
(200,200,200,200)
>1000
>1000
>1000
29
30
41
29
30
41
29
30
41
>1000
>1000
>1000
>1000
>1000
>1000
23
24
32
23
24
29
366
314
885

3
(3,-6)
(10,-10)
(-10,-10)
(-15,15)
145
240
235
328
33
31
31
43
33
31
31
43
33
31
31
45
145
240
234
491
145
240
235
491
39
33
33
21
39
33
33
21
266
>1000
598
>1000

4
(10,10)
(50,50)
(100,100)
(200,200)
163
866
>1000
>1000
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
70
104
124
180

5
(10,10,10,10)
(50,50,50,50)
(100,100)
(200,200)
172
888
>1000
>1000
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
41
584
775
449

6
(2,2,2,2)
(5,5,5,5)
(10,10,10,10)
(50,50,50,50)
26
30
33
>1000
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
Fail
170
131
204
259
188
222
178
278
>1000
>1000
>1000
>1000

7
(10,10)
(50,50)
(100,100)
(200,200)
9
37
117
284
5
7
8
9
5
7
8
9
5
7
8
9
9
37
117
284
9
37
117
284
18
14
28
18
18
14
28
18
159
80
30
96

8
(10,-10)
(50,-50)
(100,-100)
(200,-200)
105
8
8
8
6
6
6
6
6
6
6
6
6
6
6
6
105
8
8
8
105
8
8
8
6
6
6
6
6
6
6
6
Fail
Fail
Fail
Fail

9
(10,-10)
(50,-50)
(100,-100)
(200,-200)
6
5
3
5
4
4
3
3
4
4
3
3
4
4
3
3
6
5
3
5
6
5
3
5
6
4
3
3
6
4
3
3
12
27
32
18

We further simplified Table 2 and shown the percentage performance of ECG, NHS, and
NPR as compared to the other method in Table 3. The words successful, equivalent and
unsuccessful in Table 3, means that the methods has achieved to reach the minimizer with
less number of iterations, equivalent in number or worse compared to the other methods.

Table 3: Performances comparison of NHS, NPR and ECG with other CG methods in
percentage

Method NHS NPR ECG
Comparison S Eq Un S Eq Un S Eq Un
FR 65% 5% 30% 65% 5% 30% 47.5% - 52.5%
PR 40% 30% 30% 32.5% 37.5% 30% 30% - 70%
HS 32.5% 37.5% 30% 32.5% 37.5% 30% 30% - 70%
LS 32.5% 37.5% 30% 32.5% 37.5% 30% 30% - 70%
DY 70% 25% 5% 70% 25% 5% 55% - 45%
CD 70% 25% 5% 62.5% 25% 12.5% 55% - 45%
*S = successful Eq = equivalent Un = unsuccessful

C00011
March 23-26, 2010
218

4. DISCUSSION
From Table 3, if we combine both the percentages of successful rate and equivalent rate, the
combined percentage for NHS and NPR will exceed 70%. Hence, our new suggested method,
NHS and NPR are both superior compared to the other CG methods.

For ECG method, the performance is quite fair when compared to others. But when compared
to PR, HS and LS, this method is inferior. However, in Table 2 we have shown that, there are
in some problems where the exact line search fails to produce result, this method could be an
alternative method in doing so. For problem 8, the fail result is due to the Hessian matrix that
is not positive definite, hence lead insufficient descent condition. Therefore, this method is
only suitable to be used when the Hessian matrix generated is positive definite.

5. CONCLUSION
In this paper, we have shown that, two of our new proposed methods (NHS and NPR) are
superior compared to the other conjugate gradient methods using the standard test problem
functions. For ECG, though the successful rate is low, it could be an alternative method when
the other methods fail using the exact line search. Our numerical result also suggested that our
new methods converge globally.

REFERENCES
1. Al-Baali, M. (1985). Descent property and global convergence of Fletcher-Reeves
method with inexact line search. IMA. J.Numer.Anal., 5, 121-124.
2. Andrei, N. (2008). An unconstrained optimization test functions colection. Advanced
Modelling and Optimization, 10(1), 147-161.
3. Andrei, N. (2009). Accelerated conjugate gradient algorithm with finite difference
Hessian/vector product approximation for unconstrained optimization.
J,Comput.Appl.Math., 230, 570-582
4. Battaglia, J.P. (2005). The eigenstep method: A new iterative method for unconstrained
quadratic optimization. Master Thesis. University of Winsdor.
5. Birgin, E.G. and Martinez. J.M. (2001). A spectral conjugate gradient method for
unconstrained optimization. J.Appl.Maths.Optim, 43, 117-128.
6. Dai, Y. and Yuan, Y. (1998). Nonlinear conjugate gradient method. Beijing: Shanghai
Scientific and Technical Publishers.
7. Dai, Y. and Yuan, Y. (2000). A nonlinear conjugate gradient with a strong global
convergence properties. SIAM J. Optim., 10, 177-182.
8. Dai, Y.H. and Yuan, Y. (2002). A note on the nonlinear conjugate gradient method.
J,Comput.Appl.Math., 18(6), 575-582.
9. Dai, Y.H., Han, J.Y., Liu, G.H., Sun, D.F., Yin, X. and Yuan, Y. (1999). Convergence
properties of nonlinear conjugate gradient method. SIAM J. Optim., 10, 348-358.
10. Fletcher, R. (1987). Practical method of optimization, vol 1, unconstrained optimization,
John Wiley & Sons, New York.
11. Fletcher, R. and Reeves, C. (1964). Function minimization bu conjugate gradients.
Comput.J., 7, 149-154.
12. Gilbert, J.C. and Nocedal,J.(1992). Global convergence properties of conjugate gradient
methods for optimization. SIAM J. Optim., 2(1), 21-42.
13. Hestenes, M. R. and Steifel, E. (1952). Method of conjugate gradient for solving linear
equations. J,Res.Nat.Bur.Stand., 49, 409-436.
14. Liu, Y. and Storey,C. (1992). Efficient generalized conjugate gradient algorithms
part1:theory. J,Comput.Appl.Math., 69, 129-137.

C00011
March 23-26, 2010
219

15. Mohd, R., Mustafa, M., Ismail, M., and Mohd, F. (2009). A new conjugate gradient
coefficient for unconstrained optimization, Proceeding of ICREM4 (International
Conference on Research and Education in Maths), Universiti Putra Malaysia.
16. Polak, E. and , Ribiere, G. (1969). Note sur la xonvergence de directionsconjugees.
Rev.Francaise Informat Recherche Operationelle,3E Annee(16), 35-43.
17. Powell, M.J.D. (1977). Restart procedures for the conjugate gradient method.
Mathematical Programming, 12, 241-254.
18. Powell, M.J.D. (1984). Nonconvex minimization calculations and the conjugate
gradient method. Lecture notes in mathematics, 1066, 122-141. Berlin: Springer.
19. Shi, Z.J. and Gao, J. (2009). A new family of conjugate gradient methods.
J,Comput.Appl.Math., 224, 444-457.
20. Sun, J. and Zhang, J. (2001). Global convergence of conjugate gradient methods
without line search. Annals. Opr. Rch, 103, 161-173.
21. Touati-Ahmed, D. and Storey, C . (1990). Efficient hybrid conjugate gradient
techniques, J.Optim.Theory Appl., 64, 379-397.
22. Yuan, G. and Wei, Z. (2009). New line search methods for unconstrained optimization.
J.Korean Stat.Soc., 38, 29-39.
23. Yuan, Y. and Sun, W. (1999). Theory and methods of optimization. Beijing: Science
Press Of China.
24. Zoutendijk, G. (1970). Nonlinear programming computational methods. In:Abadie
J.(Ed.) Integer and nonlinear programming, 37-86.

C00013
23-24 March 2010
220
A Matrix Partitioning Technique for Distributed Solving
Large Linear Dense Equations

Pongwit Promsuwan
*
and Peerayuth Charnsethikul
Operations Research and Management Science Units
Industrial Engineering Department, Kasetsart University, Bangkok, 10903, Thailand.
*
E-mail: pongwit1983@hotmail.com; Tel. 086-2069196

ABSTRACT
This paper is to propose and investigate a distributed computing process for large
linear dense equations, Ax = b in a matrix partitioning form. A computational study
on a set of random generated data using a single computer desktop was conducted
under constraints of Random-Access Memory (RAM) and related secondary storage
of computers used. Matrix A and vector x and b are partitioned into a general sub-
matrix form so that each can be distributed and can feasibly be processed in each
computer in the network environment. The blocks Gauss elimination with back
substitutions is used to formulate a resource constrained project scheduling decision
problem. An application in computational electromagnetic is illustrated and
preliminary tested.

Keywords: Matrix Partitioning, Large Linear Dense Equation, Computational
Electromagnetic

1. Introduction
At present, computers have used to solving universal problems including massive
production measurement, mathematics and engineering problems. However, it has limitations
when using a single computer unit to process a large scale computing problem. First,
Random-Access Memory (RAM) does not possess sufficient amount of memory to store
relevant data for calculations [5]. And secondly, some computational routines must be
processed repetitively. In this work, we propose and investigate the method for solving large
linear dense equations Ax = b in a sub-matrix form based on the concept of matrix partitioning
[9, 12] and distributed computing [13]. To find vector x value, matrix A and vector x and b are
partitioned into a general sub-matrix form so that each sub-matrix can be processed under a
single processor capacity using blocks Gauss elimination with back substitutions (BGB) [3].
As the number of central processing units increases, sub-procedures of blocks Gauss
elimination can be computationally distributed and then be continued with sub-procedures of
back substitutions process. This decision problem can be formulated as a resource constrained
project scheduling [4] of how to assign each sub-computing activity or job to each
computer/server. Thus, this computing process can be operated in parallel [2, 7, 10, 11]. Each
server performs computations and store resulting data in secondary memory and being
retrieved / uploaded for final results computing. Then, the approach is applied to solving a
classical case of computational electromagnetic [1]. A preliminary result will be illustrated
and discussed.

2. Theory and related works
In general, a linear equations system can be represented algebraically as follows.

C00013
23-24 March 2010
221

11 1 12 2 1 1
...
n n
a x a x a x b + + + =

21 1 22 2 2 2
...
n n
a x a x a x b + + + = (1)

1 1 2 2
...
m m mn n m
a x a x a x b + + + =

Alternatively, the system can be represented as the following matrix form, Ax = b.

11 12 1 1
21 22 2 2
1 2
n
n
m m mn n
a a a x
a a a x
a a a x
| || |
| |
| |
| |
| |
\ .\ .
=
1
2
m
b
b
b
| |
|
|
|
|
\ .
(2)

Where
m n
A

represents a coefficient matrix,
1 n
x

represents a variables vector and
1 m
b

represents a constant right hand sides vector. In this study, the case of m = n is
considered. Consider a large and dense linear equation system Ax = b where matrix A = [A
ij
]
with A
ij
as the partitioned matrix sizing n
i
n
j
, x = [x
j
] with x
j
as the partitioned vector sizing
n
j
1 and b = [b
i
], with as the partitioned vector sizing n
i
1, i,j = 1,2,,p. To solve the
system, BGB can be adapted from [3] and described algorithmically as follows.

For k = 1:p
Compute A
kk
-1

For j = k+1:p
A
kj
= A
kk
-1
A
kj

End
b
k
= A
kk
-1
b
k

For i = k+1:p
For j = k+1:p
A
ij
= A
ij
A
ik
A
kj

End
b
i
= b
i
- A
ik
b
k

End
End
For k = p:1, step -1
x
k
= b
k

For l = k+1:p
x
k
= x
k
A
kl
x
l

End
End

Based on the above algorithm step by step, for each k, a work breakdown structure can be
developed as follows.

C00013
23-24 March 2010
222
Activities Predecessors

J1: Compute A
kk
-1
All computations at stage k-1

J2: For j = k+1:p J1
A
kj
= A
kk
-1
A
kj

End
b
k
= A
kk
-1
b
k

J3: For i = k+1:p J2
For j = k+1:p
A
ij
= A
ij
A
ik
A
kj

End
b
i
= b
i
- A
ik
b
k

End

J4: x
k
= b
k
J3

J5: For l = k+1:p J4
x
k
= x
k
A
kl
x
l

End

From the above formulation, J1-J2-J3 illustrates the critical path for each k during the
block-elimination process while J4-J5 is for the case of back substitution process. Also, the
cycle of J1-J2-J3 for all k need to be performed before the cycle of J4-J5. For each breakdown
activity, parallel computing can be applied in order to accelerate the job makespan. To
illustrate the whole process, consider the following example.

11 12 13 14 15 16
21 22 23 24 25 26
31 32 33 34 35 36
41 42 43 44 45 46
51 52 53 54 55 56
61 62 63 64 65 66
a a a a a a
a a a a a a
a a a a a a
A
a a a a a a
a a a a a a
a a a a a a
| |
|
|
|
=
|
|
|
|
|
\ .
=
11 12 13
21 22 23
31 32 33
A A A
A A A
A A A
| |
|
|
|
\ .
(3)

Eq.3 shows partitioning of matrix A to 3 3 sub-matrices. To find vector x, sub-matrix A
must be processed using BGB resulting as the following Figure 1.

11 12 13 1
21 22 23 2
31 32 33 3
A A A x
A A A x
A A A x
| || |
| |
| |
| |
\ .\ .
=
1
2
3
b
b
b
| |
|
|
|
\ .

12 13
1
23
2
3
33
0
0 0
I A A
x
I A x
x
A
| |
| |
|
|
|
|
|
|
|\ .
\ .
=
1
2
3
b
b
b
| |
|
|
|
|
\ .

Figure 1: Example of Blocks Gauss Elimination

To obtain the results as shown in the above figure, the following set of computing
activities can be formed using BGB and shown as the following example.

C00013
23-24 March 2010
223
Table 1: Activities of block Gauss elimination
Activity Predecessors

Processing time (Sec.)
1.
1
11
A

2.
1
12
11 12
A A A
=
3.
1
13
11 13
A A A
=
4.
1
1
11 1
b A b
=
5. 2 1
2 21
b b A b =
6. 3 1
3 31
b b A b =
7. 22 12
22 21
A A A A =
8. 23 13
23 21
A A A A =
9. 32 12
32 31
A A A A =
10. 33 13
33 31
A A A A =
11.
1
22 A

12.
1
23 22 23 A A A
=
13.
1
22 2 2 b A b
=
14. 32 3 3 2 b b A b =
15. 33 33 32 23 A A A A =
-
1
1
1
4
4
2
3
2
3
7
11
11
13
12
23
40
40
10
15
16
50
51
51
50
24
39
13
16
51
Total processing time (Sec.)
489
In the normal process of a single computer processor, each activity must be computed
sequentially. To reduce processing time, a better method to manage activities processing is to
Schedule with multiprocessing units [7, 11] resulting in distributing activities to additional
servers and processing in parallel [2, 6, 10]. The sequence of processing is considered merely
predecessors and a server which has the lowest total already assigned workload are scheduled
with the activity with the longest processing time (LPT: see [4]) first. Each server stores
computing results as data in hard disk memory and then be used to computing vector x. Next,
an illustration of improving processing time between the sequential computing and the
distributed approach with two parallel processors can be illustrated in the following figures.

Figure 2: Sequential computing
C00013
23-24 March 2010
224

Figure 3: Distributed computing with two parallel servers

For an application, a large linear dense equations in matrix exists in solving
electromagnetic problems relating to Maxwells integral equations after being reformulated
using the moment method [1, 8]. In this study, the classical problem of determining charge
distribution across a conductive strip capable of measuring point potentials but unknown
about charge value. The related electromagnetic equation [1] is:

0
1
( ) ln
2
L
q
do p
t p p
| |
=
|
|
' e
\ .

(4)

Where

0
e
12
8.854223 10
Farads/meter,
o represents the potential,

L
q represents the charge density,
p
represents the location vector of the observed potential and

p'
represents the source location vector.

When consider conductive strip, it can divide the charge strip of conductor into N regions
of charge [1], also called subdomains as

N

a b

1
a
2
1
a
b
| |
|
\ .
3
2
a
b
| |
|
\ .
..
1
n
n
a
b

| |
|
\ .

n
b

Figure 4: The divided charge strip

From figure 4, the use of pulse functions assumed or measured the charge is constant over
each subdomain. This produces one equation with N unknowns. We obtain the same number
of equations as unknown by selecting observation region along the strip, which are all known
to be at a specific potential.

C00013
23-24 March 2010
225
From Eq.4 and Figure 4, after using the moment method, the resulting approximated field
equation is:

1
( , ) ( )
N
n
n
n
n
b
a
G x x dx Q g x
=

' ' =

,
1 2 3
, , ,....,
n
x x x x x = (5)

From Eq.5, ( , ) G x x' is the kernel or Greens function of the corresponding integral
equation [1]. N represents number of pulses functions which assumes or measures the charge
over each region along charge-bearing strip. ( ) g x represents an observable quantity which
located at x.
n
Q are variables representing values of charge/unit. When rewriting Eq.5 in
matrix form, the equation can be written as

1 11 12 13 1
2 21 22 23 2
31 32 33 3 3
1 2 3
N
N
N
N N N NN N
Q a a a a
Q a a a a
a a a a Q
a a a a Q
| | | |
| |
| |
| |
| |
| |
| |
\ .\ .
=
1
2
3
( )
( )
( )
( )
N
g x
g x
g x
g x
| |
|
|
|
|
|
|
\ .
(6)

From Eq.6 when transforming equations into Ax = b form,

N N
A

represents the large dense coefficient charge matrix,

1 N
x

represents the vector of charges at each area point and

1 N
b

represents the vector of related point potentials.

3. Experimental Details
A MATLAB program for modeling and solving large linear dense equations of
electromagnetic scattering in matrix form is developed. Dense coefficient charge matrices on
various sizes are generated using Eq. 5 and all values in the vector of potential were assumed
to be 1. For computing value of charges at each area point, the first process is to partition
matrix A generated from electromagnetic field equation for distributed computing under
constraints of Random-Access Memory (RAM) and processing time. In this study, we used a
computer notebook with CPU speed of 1.73 Ghz and 1.5 Gbytes RAM for each
processor/server. Data communication among servers if necessary is operated using a basic
normal thumb drive. The experiment generated charge matrix size 400 400 and 2500
2500 and defined matrix partitioning at 4 4 sub-matrices were used. Next, the corresponding
set of activities was scheduled and assigned using LPT in case of a single server, 3 servers, 4
servers and 5 servers. Processing time in each activity of computing is measured using
MATLAB timing commands. To analyze the experiment, processing time from each case is
compared with each other and being compared to the estimated time measured by processing
time of solving the randomly generated large linear dense system using the same conditions of
matrix partitioning and the number of servers.

4. Result and Discussion
The result of this experiment can be divided in two main cases with matrix size 400 400
and matrix size 2500 2500. After distributed processing, the processing times of every cases
compared with the estimated times can be shown in the following table.

C00013
23-24 March 2010
226
Table 2: Processing time of large linear dense equations solving

Number of
servers
Total processing time (Sec.)

Matrix size 400 400

Matrix size 2500 2500
Estimated time
(Sec.)
Processing time
(Sec.)
Estimated time
(Sec.)
Processing time
(Sec.)

1

3

4

5

21.2697

10.6251

8.5396

7.8930

28.2958

12.8541

10.8184

10.3964

1071.6740

545.1232

443.0807

419.2703

1377.9760

634.2870

512.5975

484.6373

From Table 2, the processing result between sequential processing and distributed
processing are:

Figure 5: Processing time of matrix size 400 400

C00013
23-24 March 2010
227

Figure 6: Processing time of matrix size 2500 2500

From Figures 5 and 6, when distributed computing is utilized, improving in
computational time processing can obviously be found. However, these improvements keep
reducing as the number of servers grows. This is due to more required data communication
processing time. Another remark of this result is that the real/actual processing time requires a
longer period of processing than those estimated time because of more computations required
in generating matrix A of the electromagnetic scattering problem.

5. Conclusion
This paper illustrates some advantages of distributing computing process for solving large
linear dense equations. The blocks Gauss elimination with back substitutions is used to
generate a set of distributed computing activities for scheduling using LPT rule with various
numbers of processors. Experimentally, the approach is applied to solve a case of
electromagnetic scattering problem resulting in a strong support evidence of improvements as
stated.

C00013
23-24 March 2010
228
REFERENCES
1. Bancroft, Randy, Understanding electromagnetic scattering using the Moment Method: a
practical approach, 1 ed., Artech House, Norwood, Massachusetts, 1996, 13-33.
2. Buchau, Ande, Hafla, Wolfgang, Groh, Friedemann and M. Rucker, Wolfgang,
COMPEL, 2005, 24(2), 468-479.
3. Gantmacher, F. R., Matrix Theory, Vol. I, Chelsea Publishing, New York, 1977, 41-49.
4. Garey M. and Graham R., SIAM Journal on Computing 1975, 4, 187-200.
5. Hagen, Jrgen V. and Wiesbeck, Werner, ACES Journal, 2002, 17(2), 166-169.
6. Iwashita, T., Shimasaki, M. and Lu, J., ACES Journal, 2007, 22(2), 195-199.
7. Li, Keqin, Journal of Parallel and Distributed Computing, 2005, 65(2005), 1406-1408.
8. Mickle, M. H. and Vogt, W. G., Kybernets, 1978, 8, 293-297.
9. Piziak, Robert and Odell, P.L., Matrix theory: from generalized inverses to Jordan form.,
1 ed., Taylor & Francis Group, Boca Raton, Florida, 2007, 503-505.
10. Takahashi, N., Nakano, T., Fujiwara, K. and Muramatsu, K., COMPEL, 1998, 17(5/6),
726-731.
11. Zeng, Zeng and Veeravalli, Bharadwaj, Journal of Parallel and Distributed Computing,
2006, 66(2006), 1404-1418.
12. Zhou, B.B. and Brent, R.P. Journal of Parallel and Distributed Computing, 2003,
63(2003), 638-648.
13. http://en.wikipedia.org/wiki/Distributed_computing
C00014
March 23-26, 2010
229
Filling Incomplete Wind Speed Data by Using Kriging
Interpolation

S. Siridejachai
C
, C. Ruttanapun, and S. Vannarat
National Electronics and Computer Technology Center (NECTEC)
112 Thailand Science Park, Phahonyothin Road, Klong 1, Klong Luang, Pathumthani 12120, Thailand
C
E-mail: songkran.siridejachai@nectec.or.th; Fax: 02-5646776; Tel. 02-5646900 ext.2275

ABSTRACT
The aim of this work is to present numerical technique to fill the missing data. The
wind speed data in the gulf of Thailand are collected, but there are some missing value
at unobserved sites. Kriging interpolation is using to estimate the value at undetermined
location from observations of its value at nearby locations. The various variogram
functions associated with kriging scheme is presented to verify the optimal choice. The
results shown that kriging variance is highly depend on selection of variogram models.

Keywords: geostatistics, kriging, interpolation, missing data, wind speed.

1. INTRODUCTION
Nowadays, beach erosion and coastal erosion are severe environment problems. Beach
erosion affects humans by having the waves crash up on the sand, taking away a majority of
the sand leaving only sediment in its place. Waves, generated by storms and wind, cause
coastal erosion, which may take the form of long-term losses of sediment and rocks, or
merely the temporary redistribution of coastal sediments; erosion in one location may result in
accretion nearby. Due to this reason, a project to study the effect of wave influence seaside in
the gulf of Thailand is initiated.
To investigate effect of wave induced beach erosion, wind speed data is the necessary as
input information. There are two sources that provide wind speed data into public domain.
Thai Meteorological Department contribute data detect by physical sensor measured close to
shore, while National Aeronautics and Space Administration (NASA) share remotely-inland
data which observed from the space satellite orbit around the earth.
To combine data from these two sources, however, is not straightforward. There are some
areas where the merged data do not fulfill or empty. The locations include missing data are
spread across the gulf terrain. The aforementioned shortage one becomes considerable
stumbling block in studying coastal erosion. To deal with this difficulty, the mathematical
basis of estimation is adopted. Points without data are resolved via method of interpolation.
With this arithmetical approximation base on minimization of error, the entire points on the
domain can be found its optimal wind speed data. This leads the study in coastal erosion
simulation can be progressed to the next step as purpose.
In this paper, a geostatistic technique, namely kriging interpolation, is used to evaluate the
value of wind speed information at an unobserved location from observations of its value at
neighboring locations. The main objective of the study is to examine effect of variogram
function on smoothness of the kriging variance. Mathematical derivations and numerical
details of the method are also presented, with explanation.

C00014
March 23-26, 2010
230
The main result in kriging interpolation [1-9] is concerned with estimation of the attribute
value at a location where we do not known the true value
V
*
(x) =
=
n
1 i
w
i
V(x
i
) (1)
where x refer to a spatial location, V
*
(x) is an evaluation at location x, there are n data value
V(x
i
), i = 1, , n, and w
i
refer to weights.
Let E{X} denotes expected value of X, the error variance of the linear estimator (1) can be
manipulated as

2
E
= Var{V*(x)V(x)} (2)
= E{[ V*(x) V(x)]
2
} (3)
= E{[V*(x)]
2
} 2 E{V*(x) V(x)} + E{[V(x)]
2
} (4)
=
= =
n
1 i
n
1 j
w
i
w
j
E{V(x
i
) V(x
j
)} 2
=
n
1 i
w
i
E{V(x) V(x
i
)} + C
0
(5)

=
= =
n
1 i
n
1 j
w
i
w
j
C(x
i
, x
j
) 2
=
n
1 i
w
i
C(x, x
i
) + C
0
(6)
where C(x
i
, x
j
) is the covariances between data points x
i
and x
j
, C
0
is a pertinent constant.
The goal is to determine weights, w
i
, that minimized the variance of the estimator,
2
E
. In
order to obtain that, we take partial derivatives of the error variance with respect to each of
the kriging weights and set each derivative to zero. This leads to the following system of
equations:

i
2
E
w
c
c
= 0 (7)

i
2
E
w
c
c
= 2
=
n
1 j
w
j
C(x
i
, x
j
) 2 C(x, x
i
), i = 1, 2, , n (8)
0 = 2
=
n
1 j
w
j
C(x
i
, x
j
) 2 C(x, x
i
) , i = 1, 2, , n (9)

=
n
1 j
w
j
C(x
i
, x
j
) = C(x, x
i
) , i = 1, 2, , n (10)
This can be written in matrix form as
[C]{w} = {c} (11)
If the covariance model is admissible and no two data points are collocated, then the data
covariance matrix is positive definite and we can solve for the kriging weights using
elementary linear algebra
{w} = [C]
1
{c} (12)
After substitute the kriging weights, {w}, into (1), the estimation value, V
*
(x), can be
computed as desired.

C00014
March 23-26, 2010
231
The covariance matrix [C] in (11) is hardly to obtain. In a practical manner, the variogram
function is carried out instead of covariance relationship. Variogram, (x
1
, x
2
), is a function
describing the degree of spatial dependence of a spatial variable. In case of a stationary
mode, the variogram can be represented as a function of one variable, of the difference h = x
2

x
1
between locations only, by the following relation
(x
1
, x
2
) = (x
2
x
1
) = (h) (13)
The variogram function is defined as:
2(h) = E{[V(x) V(x+h)]
2
} (14)
Covariance is defined as function of one variable in the same with variogram as:
C(h) = E{V(x) V(x+h)} (15)
Link between the variogram and covariance can be derived as follow,
2(h) = E{[V(x) V(x+h)]
2
} (14)
= E{[V(x)]
2
} + E{[V(x+h)]
2
} 2 E{[V(x) V(x+h)]} (16)
= Var{V(x)} + Var{V(x+h)} 2 C(h) (17)
= 2[C(0) C(h)] (18)
Thus, we can indirectly evaluate covariance dependency via variogram function by
C(h) = C(0) (h) (19)
The combined wind speed data from two sources consist of 1,540 stations. The sample
standard deviation brings together to 2.3923. The unknown values contain 83,073 spatial
positions. In Figure 1, the selected five variogram functions are represented against scattering
field variables. Expressions of the matched variogram functions are shown in Table 1. It is
obviously noticeable that the more similar shape functions with the scattering plot result in the
lower variance. This can be concluded that, to minimized variance of the estimation, we
should assign the variogram function which mostly equivalent to the characteristic of the
measured data values.

Table 1. Effect of various variogram functions to variance.

Figure Variogram function, (800,000 h) = Variance
1.1 7 ( ) ) h exp( 1
2
2.03e-10
1.2 5
+ h arcsin
2
h 1 h
2
2
0.266
1.3 7 ( ) h) exp( 1 0.293
1.4 5.3
|
.
|
\
|

3
h
2
1
h
2
3
0.333
1.5 5
|
.
|
\
|
+
5 3
h
8
3
h
4
5
h
8
15
0.395

C00014
March 23-26, 2010
232

(1.1)
(1.4) (1.3)
(1.2)
(1.5)
Figure 1. Comparison between various variogram function with scattering plot of field variables
C00014
March 23-26, 2010
233
This paper presents a practical approach to fill out the incomplete numeric data based on
mathematical philosophy. Method of kriging interpolation is explicitly explained. The specific
case study on complement wind speed data in the gulf of Thailand is given as an example. In
this study, we focused on the effect of variogram function to minimize the variance. The
outcome shows that the variogram functions more equivalent to the field data results in
smaller variance.

REFERENCES
1. Chiles, J.P., Delfiner, P., Geostatistics, John Wiley & Sons, New York, 1999, 150-230.
2. Webster, R., Oliver, M.A., Geostatistics for Environmental Scientists, 2nd Ed., John
Wiley & Sons, Cornwall, 2007, 153-266.
3. Gaetan, C., Guyon, X., Spatial Statistics and Modeling, Springer, New York, 2009, 43-52.
4. Diggle, P.J., Ribeiro Jr., P.J., Model-based Geostatistics, Springer, New York, 2007, 134-
154.
5. Burrough, P.A., Spatial Data Quality, Taylor & Francis, New York, 2002, 18-34.
6. Sinclair, A.J., Blackwell, G.H., Applied Mineral Inventory Estimation, Cambridge
University Press, Cambridge, 2004, 215-240.
7. O'Sullivan, D., Unwin, D., Geographic Information Analysis, John Wiley & Sons, New
Jersey, 2003, 274-283.
8. Trauth, M.H., MATLAB Recipes for Earth Sciences, Springer, New York, 2007, 206-222.
9. Pebesma, E.J., Wesseling, C.G., Computers and Geosciences, 1998, 24(1), 17-31.
C00015
March 23-26, 2010
234
On linearization of stochastic ordinary differential equations

S.V. Meleshko
1
and E. Shulz
School of Mathematics, Suranaree University of Technology
Nakhon Ratchasima, 30000, Thailand
1
E-mail: sergey@math.sut.ac.th; Fax: 044-224185; Tel. 044-224382

ABSTRACT
Many mathematical models in chemistry and physics are based on the second-order
equation
= ( , , ) ( , , ) ,
t
X f t X X g t X X B

where
t
B
is a white noise. While solving problems related with nonlinear ordinary

differential equations (deterministic) it is often expedient to simplify the equations by
a suitable change of variables. The simplest form of a second order ordinary
differential equation = ( , , ) x f t x x is a linear form. S.Lie showed that a second-order
ordinary differential equation = ( , , ) x f t x x is linearizable by a change of the
independent and dependent variables if, and only if, it is a polynomial of third degree
with respect to the first-order derivative and the coefficients satisfy two partial
differential equations. In general, the change of variables in stochastic differential
equations differs from that in ordinary differential equations (deterministic) owing to
the necessity of using the It formula instead of the formula for differentiation of a
composite function. The present research gives the complete criteria for linearization
of a second-order stochastic ordinary differential equation. In particular, if a second-
order stochastic ordinary differential equation
= ( , , ) ( , , ) ,
t
X f t X X g t X X B

is linearizable, then the deterministic second-order ordinary differential equation
= ( , , ) x f t x x
is also linearizable.

Keywords: Brownian motion, Linearization, Stochastic ordinary differential equation

ACKNOWLEDGMENTS
This research is supported by the Center of Excellence in Mathematics, the Commission on
Higher Education, Thailand.

C00020
March 23-26, 2010
235
A Study on Numerical Methods for Mean-Reverting
Square Root Processes with Jumps

S. Sirisup
1
, R. Tanadkithirun
2
, and K. Wong
2

1
National Electronics and Computer Technology Center, Pathumthani, 12120, Thailand
2
Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, 10330,
Thailand
E-mail: sirod.sirisup@nectec.or.th, xraywat@gmail.com, kittipat.w@gmail.com;
Tel. 02-5646900 Ext. 2276, 086-9864328, 02-2185154

ABSTRACT
The mean-reverting square root process is a stochastic differential equation which is
widely used in finance. However, introducing a jump process into such process makes
the model become more realistic. Unfortunately, the mean-reverting square root process
with jumps has no general explicit solution. In this work, we study three different
numerical methods: Euler-Maruyama method, compensated split-step backward Euler
method, and jump-adapted Euler method by numerically investigating on their
performance as well as accuracy in solving this particular model in weak sense.
Rigorous error bounds for Euler-Maruyama and compensated split-step backward Euler
methods will also be provided.

Keywords: stochastic differential equations with jumps, numerical methods for
stochastic differential equations, mean-reverting square root processes with jumps.

1. INTRODUCTION
The mean-reverting square root process [1] is a stochastic differential equation (SDE)
which has found considerable use in mathematical finance as an alternative to geometric
Brownian motion. It is used as a model for volatility, interest rate, and other financial
quantities, and forms the stochastic volatility component of Hestons asset price model [2].
Moreover, it can be used for pricing bonds and barrier options [4]. However, introducing a
jump process into such process makes the model become more realistic. The mean-reverting
square root process with jumps on which we focus in this work has the form

( )
( ) ( ) ( ) ( ) ( ) ( ), dS t S t dt S t dW t S t dN t o u o o

= + +

(1.1)
where ( ) S t
denotes lim ( ),
r t
S r
W is a Wiener process and N
is a compensated Poisson
process.
Unfortunately, this SDE with jumps has no general explicit solution. Thus, we would like
to find its numerical solution. Note that we cannot directly apply the standard convergence
theory for numerical simulations to this model due to the non-Lipschitz diffusion coefficient.
In this work, we study three different numerical methods: Euler-Maruyama (EM) method,
compensated split-step backward Euler (CSSBE) method, and jump-adapted Euler (JAE)
method by numerically investigating on their performance as well as accuracy in solving this
particular model. Computable error bounds for EM and CSSBE methods will also be
provided.
The rest of the paper is organized as follows. The next section will deal with our three
numerical schemes. We provide error bounds for EM and CSSBE methods and other relevant
theorems in the third section. Our numerical experiments will be presented in the forth
section, and we lastly conclude our work in the fifth section.

C00020
March 23-26, 2010
236
2. NUMERICAL SCHEMES
Throughout this paper, let (, F, P) be a complete probability space with a filtration {F
t
}
t0

satisfying the usual conditions. Let W(t) be a Wiener process and N(t) a Poisson process with
intensity such that ( ) ( ) N t N t =
t is the corresponding compensated process. Assume that

W and N are independent, and all of these processes are defined on this probability space. This
paper considers Eq.(1.1) in which , and are positive, is nonnegative, + > 0, and
S(0) > 0.
For any given initial value S(0) = S
0
> 0, the condition + > 0 will force (1.1) to have a
unique solution which will never become negative with probability one [5]. The following
theorem yields the expectation of the solution of (1.1) for any time t.
Theorem 2.1. [5] For Eq.(1.1), ES(t) = e
t
(ES
0
) so that lim ( ) .
t
ES t u
=
Now, we will focus on our three numerical schemes. We will use s
n
to denote the
numerical solution of (1.1). Given a fixed time step > 0, we construct an equidistant time
discretization {t
0
, t
1
, } with t
n
= n. We define the EM approximation to (1.1) by s
0
= S(0)
and

1
( ) | |
(1 ) | | ,
n n n n n n n
n n n n n
s s s s W s N
s s W s N
o u o o
o ou o o
+
= + A + A + A
= A + A + A + A
(2.1)
where W
n
= W((n +1)) W(n) is a Wiener process increment and
(( 1) ) ( )
n
N N n N n A = + A A

is a compensated Poisson process increment.
Next, with the equidistant time discretization, the CSSBE scheme for (1.1) introduced in
[3] is defined by s
0
= S(0) and

* *
1 1
* * *
1 1 1 1
( ) ,
| | .
n n n
n n n n n n
s s s
s s s W s N
o u
o o
+ +
+ + + +
= + A
= + A + A

(2.2)
We note here that
*
1
.
1
n
n
s
s
ou
o
+
+ A
=
+ A

Our last method based on time discretizations that include all jump times is originally
introduced in [6]. Note that waiting time between two consecutive jump times of a Poisson
process with intensity is exponentially distributed with parameter , which has mean 1/ .
We construct a jump-adapted time discretization by merging the jump times {
1
,
2
, }
generated by the Poisson process N and the old equidistant time discretization with step size
, and then orderly rename all points in this new jump-adapted time discretization, namely
{t
0
, t
1
, }. Let
1
.
n n n
t t
+
A = The JAE scheme for (1.1) is then given by s
0
= S(0) and

1
1
1
1
1
1 1
( ) | | ,
, if is not a jump time;
, if is a jump time.
n n n n n n n
n
n
n
n
n
n n
s s s s W s
s t
s
s s t
o u o o
o

+
+
+
+
+
+ +
= + A + A A
=

+

(2.3)

3. ERROR BOUNDS
We will first deal with the EM method. From now on otherwise stated, s
n
will denote the
EM numerical solution of (1.1). Let us define the continuous-time EM approximation
( )
0
0 0 0
( ) ( ) ( ) ( ) ( ) ( ),
t t t
s t s s r dr s r dW r s r dN r o u o o = + + +

(3.1)
where ( ) s t is the step function ( ) :
n
s t s = for
1
[ , ).
n n
t t t
+
e Note that s(t) is a simple process. At
the grid point t = t
n
, ( ) s t = s
n
= s(t
n
). This yields that an error bound for s(t) will automatically
imply an error bound for our numerical solution s
n
. Hence, we will aim at the error bound for
s(t).
Theorem 3.1.
0
(1 ) ( )
n
n
Es Es u o u = A . Hence, lim
n
n
Es u
= for < 2/.

C00020
March 23-26, 2010
237
The proof is trivial when taking expectation in (2.1). Note that from Theorem 2.1, we also
know that
0
( ) (1 ) .
n
t n
n n
ES t Es e ES
o
o u
= A
Lemma 3.2. For any > 0, let
0
2 (1 )( ), A Es ou o u = A A

2
2 2 2 2
2
2 (1 ) , B
o
|
ou o o u
A
= A A + A +
2
2 2
2
(1 ) , r
|o
o o
A
= A + A + and

1 1
2 1
, 0
0 0
(1 ) .
n n
n i n i i
n
i i
M r Es B r A r
|
o

= =
= + + A

Then,
2
, n n
Es M
|
s .
Proof. From (2.1),
( )
( )
( )
2
2 2 2 2 2 2 2
1
1 2 1 .
n n n n
Es Es Es E s o o ou o o o u
+
= A + A + A A + A + A
By the Lyapunovs inequality and AM-GM inequality,
( )
2 2 2
1 1 1
2
.
n n n n
E s Es Es Es
| |
| | s = s + Then,
it is straightforward to show that ( )
2 2
1
1 .
n
n n
Es r Es A B o
+
s + A +
Define a sequence a
n
by setting
2
0 0
a Es = and ( )
1
1
n
n n
a r a A B o
+
= + A + , so
1 1
2 1
0
0 0
(1 ) .
n n
n i n i i
n
i i
a r Es B r A r o

= =
= + + A

Since r > 0, we have
2
n n
Es a s for all n,
which completes the proof.

Remark that if 1 r = and 1 r o = A, then
1
1
1
0
n
n
i
r
r
i
r
=
=
and
1
(1 ) 1
1
0
(1 ) .
n n
n
r n i i
r
i
r
o
o
o
A
A
=
A =

This will simplify the formula of M
n,
. However, computers can easily calculate these
summations within a second. We can see from this proof that a suitable should be around
the numerical solution s
n
so that the approximation using AM-GM inequality can give a good
result. Thus, when computing this bound on a finite time interval divided into N steps, we
would suggest to set
1
0 0
0
1 (1 )
(1 )
1 1
1 1
N N
n
n
Es Es
N N
u o u
u u o
o
|
+
=
A
+ + A
+ A +
= =

which is usually defined and positive, unless the parameters are weird.
Now, we will examine Eq.(1.1) on the finite time interval [0, T].
Lemma 3.3. For any > 0 and
, n
M
|
defined in Lemma 3.2, let
( ) ( )
2 2 2 2 2
, 0 , , ,
2 (1 ) ( )
n
n n n n
Q Es M M M
| | | |
o u u u o u o o = A + A + + A + A
and
{ }
,
0,1,...,
max
T
n
n
D Q
| |

A
e
= . Then, | |
2
0
sup ( ) ( ) .
t T
E s t s t D
|
s s
s Hence,
0
sup ( ) ( ) .
t T
E s t s t D
|
s s
s
Proof. Let [0, ] t T e and
t
n
A
=

, the integer part of t/. From (3.1), we have
( ) ( )
( ) ( ) ( )( ) | | ( ) ( ) ( ) ( )
n n n
s t s t s t n s W t W n s N t N n o u o o = A + A + A

.
Hence, by the Lyapunovs inequality, Theorem 3.1, Lemma 3.2, and noting that t n A s A ,
we have | |
{ }
2
, ,
0,1,...,
( ) ( ) max .
T
n i
i
E s t s t Q Q D
| | |

A
e
s s = Thus, | |
2
0
sup ( ) ( )
t T
E s t s t D
|
s s
s
as desired.
We can see from this lemma that 0 D
|
as 0. A
Theorem 3.4. For any > 0, D
|
defined in Lemma 3.3, and k > 0,
( )
( )
( 1)
2 ( 1) ( 1)
2
2 2
1 ( 1)
( )
2
0
sup ( ) ( ) ( ) 1 .
k k
k
k k k k
e e k
T
T
k k
t T
E S t s t D T e D e e
o o o
o
| |
o o o
+
+
+
+ +
s s
| |
s + + + + + |
|
\ .
Proof. We will use the technique in [5] to help show this outcome. Let a
i
= exp(
( 1)
2
i i+
) for
C00020
March 23-26, 2010
238
, i e k > 0,
1
2
(0, ),
k k
a a
c

e and
( )
1 1 1
1 1 1
2 2 ( ) 2 ( )
log(1 ) log(1 ) .
k k k k k k
a a k a k a k a k a
h
c c c c
c c c c c

+
= + + + Then, there exists a continuous
function f with support in (a
k
, a
k-1
) such that
1
0 ( )
k u
f u h
c
c < s + + for
1
( , ),
k k
u a a

e
1
( )
k u
f u >
for
1
( , ),
k k
u a a c c
e + and
1
( ) 1.
k
k
a
a
f u du
Define
| |
0 0
( ) ( ) .
y x
F x f u dudy =

Then, it is
straightforward to verify that F is twice continuously differentiable, ( ) 1 F x ' s for all , xe
1
| |
( )
k u
F x h
c
c '' s + + for
1
| |
k k
a x a

< < , and ( ) 0 F x '' = for
1
( , ).
k k
x a a

e Observe that F(0)
= 0,
( )
( 1)
1
2
1 ( 1)
0
( ) ,
k k
k
k
k
a y
e e k
k
a
f u du dy
+

+
>

and
( )
( 1)
2
1 ( 1)
1
| | ( ) ( ) ( ) .
k k
k
e e k
k k
x a F x F k F x
+
s s Now,
let [0, ]. t T e From (1.1) and (3.1), by applying the It-Doeblin formula for one jump process,
recalling the fact about ( ) F x ' and ( ) , F x '' using the mean value theorem, Lemma 3.3, and
the Gronwalls inequality, and lastly taking limit 0 c , we acquire our desired result.
We can see that
0
sup ( ) ( ) 0
t T
E S t s t
s s
as 0. A By this theorem, we also know that this
bound can govern
0
sup ( ) ( ) ,
t T
ES t Es t
s s
the error bound in weak sense.
Now we will work on the CSSBE method. From now on, s
n
will denote the CSSBE
numerical solution of (1.1). We now define the continuous-time CSSBE approximation by
( )
0
0 0 0
( ) ( ) ( ) ( ) ( ) ( ),
t t t
s t s s r dr s r dW r s r dN r o u o o = + + +

(3.2)
where ( ) s t is the step function
*
1
( ) :
n
s t s
+
= for
1
[ , ).
n n
t t t
+
e Consequently, ( )
n n
s s t = . Like the
EM method, we will seek the error bound for ( ) s t in order to obtain an error bound for our
numerical solution s
n
. For the proofs of the subsequent theorems and lemmas, we draw on the
same technique previously used so that the proofs will be omitted.
Theorem 3.5.
( )
1
0 1
( )
n
n
Es Es
o
u u
+ A
= . Hence, lim
n
n
Es u
= for any size of .

From Theorem 2.1, we immediately have
( )
1
0 1
( ) .
n
n
n
ES n Es e ES
o
o
u
A
+ A
A =
Lemma 3.6. Let
2 2
2
1
2(1 )
(1 )
, r
o | o
o
o
+ A A
+ A
+ A
= +
0
2 ( ), A r Es ou u = A
2
2 2 2 2
2 (1 )
2 , B r r
o
| o
ou o u
A
+ A
= A + A +
and
( )
1 1
1
2
1
, 0 1
0 0
n n
n i
n i i
n
i i
M r Es B r A r
| o

+ A
= =
= + +

for each > 0. Then,
2
, n n
Es M
|
s .
When calculating for this bound on a finite time interval divided into N steps, we would
suggest to set
( )
( )
1
0 1
1
0 1
0
1
1
1
1 1
N N
n
n
Es
Es
N
N
o
o
u
o
u o u u
u
o
|
+ A
+ A
=
+ A
+ + A
+ +
+ A
= =
which is always finite and positive.

Lemma 3.7. For any > 0 and
, n
M
|
defined in Lemma3.6, let
( ) ( )
( )
2
2 2 2
1 1
, , 0 1 1
2 ( ) ,
n
n n
P M Es
| | o o
ou u u o u
+ A + A

= + A + + A

( )
( )
2 2
2
2 2 2
1
, , 0 , , 1
(1 )
2 ( )
n
n n n n
Q M Es P P
o
| | | | o
o
u u u u o o
A
+ A
+ A

= + + + A + A

and
{ }
,
0,1,...,
max
T
n
n
D Q
| |

A
e
= . Then, | |
2
0
sup ( ) ( ) .
t T
E s t s t D
|
s s
s Hence,
0
sup ( ) ( ) .
t T
E s t s t D
|
s s
s

Theorem3.8. For any > 0, D
|
defined in Lemma 3.7, and k > 0,
C00020
March 23-26, 2010
239
( )
( )
( 1)
2 ( 1) ( 1)
2
2 2
1 ( 1)
( )
2
0
sup ( ) ( ) ( ) 1 .
k k
k
k k k k
e e k
T
T
k k
t T
E S t s t D T e D e e
o o o
o
| |
o o o
+
+
+
+ +
s s
| |
s + + + + + |
|
\ .

First, in Table 1, we show computable error bounds for EM and CSSBE methods coming
from Theorem 3.4 and Theorem 3.8, respectively. Specfically, we demonstrate the bounds in
the case = 1, = 100, = 0.5, = 2, = 0.1, s
0
= 50 and T = 1 for a range of values. In
each case, we minimize over k > 0.

Table 1. Error bounds when = 1, = 100, = 0.5, = 2, = 0.1, s
0
= 50 and T
= 1.
= 2
-5
= 2
-6
= 2
-7
= 2
-8
= 2
-9

EM 19.1276 14.0272 10.4778 7.9416 6.0963
CSSBE 19.1214 14.0256 10.4773 7.9415 6.0963

Next, in Figure 1, graphs that show the order of convergence in weak sense of our three
methods are presented. Here, we fix = 4, = 100, = 0.5, = 0.1, s
0
= 50 and T = 0.25, and
vary over 8, 32, 64 and 128. For each , the expected number of jumps of sample paths is T
so that we examine the cases when the expected numbers of jumps of sample paths are 2, 8,
16 and 32, correspondingly. The reference line with slope of one is also plotted in dash.
Figure 1. Weak error plots when = 4, = 100, = 0.5, = 0.1, s
0
= 50, T = 0.25
and = 8(top left), 32(top right), 64(bottom left), 128(bottom right).

5. CONCLUSION AND DISCUSSION
In this paper we have provided rigorous numerical bounds for EM and CSSBE methods
for the mean-reverting square root process with jumps. The numerical investigations have
also been done with both methods and also with the JAE method. It is found numerically that
C00020
March 23-26, 2010
240
all three methods tend to have weak order of convergence 1.0. This coincides with the general
theory for SDEs with jumps with Lipschitzian coefficients.
Although the JAE method should have a better numerical solution when it is compared
with EM and CSSBE methods for other models, it has been observed from the experiments
that the medthod gives higher errors in some cases of our simulation. Thus, we shift out
attention to the effect of the number of jumps in sample paths to its accuracy. Figure 2 shows
some sample paths simulated by the JAE method with the same parameters used in Figure 1.
The dash line in each picture represents the expectation of the exact solution obtained from
Theorem 2.1.

Figure 2. Sample paths when = 4, = 100, = 0.5, = 0.1, s
0
= 50, T = 0.25
and = 8(top left), 32(top right), 64(bottom left), 128(bottom right).

We can see from Figure 2 that when the number of jumps is too low, the JAE method can
provide some paths that might not generate any jumps, it thus can not provide good estimation
to the solution of the SDE with jumps. On the contrary, if the number of jumps is too high, the
JAE method can create some paths that have too many jumps before they approach to the
strike time T and go far off the expectation of the exact solution at time T. Hence, we should
use the JAE method for the scenario having the reasonable expected number of jumps.
Based on our intensive numerical investigations, from the three methods, the CSSBE
outperforms the other two methods in providing the expectation of the exact solution of the
mean-reverting square root process with jumps.

REFERENCES
1. Cox, J.C., Ingersoll Jr., J.E., and Ross, S.A., Ecomometrica, 1985, 53, 385-407.
2. Heston, S.I., Review of Financial Studies, 1993, 6, 327-343
3. Higham, D. J. and Kloeden, P. E., Numerische Mathematik, 2005, 101, 101-119.
4. Higham, D. J. and Mao, X., Journal of Computational Finance, 2005, 8(3), 35-61.
5. Wu, F., Mao, X., and Chen, K., Applied Mathematics and Computation, 2008, 206, 494-
505.
6. Platen, E., Liet. Mat. Rink., 1982, 22(2), 124-136.
G00072
March 23-26, 2010
241
Filter Rules and Thai big capital stocks trading

S. Naknoi, K. Kittiwutthisakdi
School of science, Walailuk University 222, Tha Sala, Nakhon Si Thammarat 80160, Thailand,
Email:sutthinun@gmail.com, kkowit@wu.ac.th
Mobile: 66 894 692 935, Tel: 66 7567 2005-6 Fax: 66 7567 2004

ABSTRACT
Two years daily closing prices of some Thai big capital stocks (PTT, PTTEP,
SCC, SHIN, BBL) were tested with Alexanders filter technique [1]. Serial
correlation coefficients [2] for successive daily price changes were also
computed to indicate degree of random-walk. These stock prices were found to
follow the random-walk: PTT had the lowest serial correlation and SCC had the
greatest value. The return of this trading rule and the random-walk character of
each stock were discussed. [3]

Keywords: Random walk hypothesis, Filter rule.

1. INTRODUCTION
Various mechanical trading rules have been proposed and developed by economists,
mathematicians, statisticians and professional analysts. One of the early simple rules, filter
rule, was invented to study the behavior of stock prices. It was used to refute the random walk
theory of stock prices [1]. To measure the randomness of stock prices statistically, their first
different serial correlation was calculated. Serial correlation (autocorrelation) is the
correlation of a variable with itself over successive time intervals. Technical analysts use
serial correlation to determine how well the past price of a security predicts the future price.
When computed, the resulting number can range from +1 to -1. A serial correlation of +1
represents perfect positive correlation (i.e. an increase seen in one time series will lead to a
proportionate increase in the other time series), while a value of -1 represents perfect negative
correlation (i.e. an increase seen in one time series results in a proportionate decrease in the
other time series). A serial correlation of 0 represents that time series is random or
unpredictable.
In 1965, Alexander [1] created a simple mechanical trading rule called the filter rules.
The concept is, an investor buys and sells stocks if their price movement reverses direction by
a minimally acceptable percentage. For example, a technician may decide on a filter of 10%.
If a stock subsequently reverses a downtrend and rises by 10% from its low price, the filter
rule indicates that the stock should be bought. A 10% decline from a high price indicates that
a stock should be sold or sold short. The size of the filter is determined by the technician. The
filter rule is supposed to permit an investor to participate in a security's major price trends
without being misled by small fluctuations. Without considering dividends and trading fees
in the calculation, Alexander proposed that there are trends in a stock price time series from
larger profits for smaller size filters. Later, Fama [3] improved the calculation by adding the
dividend and trading fee and compared the result with buy and hold method. The filter
method can not outperform buy-and-hold.
The similar methodology was applied to some Thailand big capital stocks (PTT,
PTTEP, BBL, SCC, and SHIN). Two years closing prices of these stocks were back-tested
with filter method [1] and compared with buy-and-hold. The serial correlation for each stock
was calculated to indicate the degree of randomness.

G00072
March 23-26, 2010
242
The random walk theory of the stock prices was illustrated in two aspects: (1) the
independence of the successive price change and (2) the price change conforms to some
probability distributions [3]. The independence of the successive price change in statistics
term means that the price change in present time is independent of the price change in
previous time. The first difference serial correlation is the method to find the degree of
independence. The positive serial correlation represents that the value in one time period are
positively correlated with the value in the next time period. The negative serial correlation
represents that the value in one time period are negatively correlated with the value in the next
time period.
The equation for first different [5] is
X
t
X
t-1
=
t
(1)
When X
t
is the daily closing price of day t and X
t-1
is the daily closing price of day t-
1. The
t
is the first different for day t. Kendalls tau correlation for each time series was
calculated by SPSS. If the past successive price change generates no information to predict
future successive price change [5], Kendalls tau correlation will close to zero. We assume
that the behavior of successive price change is random walk.
Fama [4] applied the filter rule to series of daily closing prices for each of individual
securities. The sample size about 1,200 to 1,700 observations came from thirty securities of
Dow-Jones Industrial Average Between 1956 to 1962. The filter sizes were varied from 0.5
percent to 50 percent in twenty-four different filters. Rates of return before commissions
under the filter rule technique were compared with the ones under a buy and hold policy. The
returns were computed in many ways such as with gross and net of broker fee or with and
with out dividends. When included the commissions, profits from filter technique were
dispensed by the fee. In other hand, when the commissions were omitted, the returns from
filter rules were enlarged but were not larger than the simple buy and hold policy because the
total profit for the buy and hold policy came from the successive price change for the time
period and the dividend. So the dividend increased the profitability of holding share.
The aim of this research was to find out a relationship between returns from filter rule
trading and the serial correlations of stock price that showed degrees of randomness.
Histograms of the differences were also drawn to show derivations that serial correlations
failed to uncover.
A similar research was done by Leuthold [5] with live cattle futures market. The
random walk character was measured by spectral analysis and returns from the filter rule were
calculated. The results showed both the random walk behavior and the larger than might be
expected profits from filter rule trading. This raised the doubt about the assumption that
random walk hypothesis allows no profits generated by technical trading rule. On the other
hand, the prices that appear to behave randomly may generate profits for the investor who
uses filter rules trading.

3. EXPERIMENTAL
Two years daily stock closing prices and dividend data was obtained from Kim Eng
Thailand. These securities were PTT, PTTEP, BBL, SCC, and SHIN. The date of samples
started from October 16, 2007 to October 16, 2009 for PTTEP, BBL, SCC and SHIN (490
daily prices), except PTT data started from September 7, 2007 to September 7, 2009 (486
daily prices).
The initial study was the random walk behavior testing. The first difference results for
PTT, PTTEP, BBL, SCC, and SHIN were calculated. Figures 1 to 5 showed the distribution
of first difference for each stock. The distribution was computed by the basic statistics tools
that also calculated max, min, mean, variance, and standard deviation. The extreme value
for negative and positive and the sample frequency of each difference was also calculated.
G00072
March 23-26, 2010
243
Serial correlations were calculated by SPSS with Kendalls tua formula. The more
these values close to zero, the more random they are. Table 1 shows the First Different
Descriptive Statistics and Table 2 shows this random walk tested result.
After, random walk hypothesis was tested, the filter test was simulated. For x percent
filter, if the price moves up x percent, buy the stock and wait until the price moves down x
percent from a high price, sell the stock and wait for new x percent moving up again. The
movement less than x percent were ignored. These filter sizes were employed: 0.005, 0.010,
0.015, 0.020, 0.025, 0.030, 0.035, 0.040, 0.045, 0.050, 0.060, 0.070, 0.080, 0.090, 0.100,
0.120, 0.140, 0.160, 0.180, 0.200, 0.250, 0.300, 0.400 and 0.500. They were the same as in
previous work [4].
Table A1 shows, the returns from different filter sizes and the buy and hold policy
that included dividend and the broker fee in the simulation. The averages return for all filters
compared with buy and hold were shown in last row. The results were represented by each
stock.


4.1The results from statistics tool

The bar charts of first differences in Figure 5 showed the distribution frequency,
number of sample, mean and standard deviation for PTT, PTTEP, BBL, SCC and SHIN
respectively.

First_Diff_PTT
14.00 -1.50 -24.00
F
r
e
q
u
e
n
c
y
50
40
30
20
10
0

First_diff_PttEp
6.50 -1.00 -15.50
F
r
e
q
u
e
n
c
y
50
40
30
20
10
0

Figure 1. PTT Bar chart

Figure 2. PTTEP Bar chart

First_Diff_BBL
3.00 .00 -3.00 -12.50
F
r
e
q
u
e
n
c
y
60
50
40
30
20
10
0

First_Diff_SCC
11.00 .00 -14.00
F
r
e
q
u
e
n
c
y
100
80
60
40
20
0

Figure 3.BBL Bar chart Figure 4.SCC Bar chart
G00072
March 23-26, 2010
244
First_diff_Shin
2.50 -.10 -24.20
F
r
e
q
u
e
n
c
y
120
100
80
60
40
20
0

Figure 5. SHIN Bar chart

From the bar charts, the distributions were distinguished under the market
fluctuation. When compared with normal curve, each stock looked like the asymmetrical form
and each chart represented the frequency in rank of first order price changes.

Table 1. First Difference Descriptive Statistics.

Security N Min Max Mean
Std.
Deviation
PTT 485 -24 32 - 0.14 7.50
BBL 489 -12.5 7 0.00 2.53
PTTEP 489 -15.5 16 - 0.03 4.14
SCC 489 -14 14 - 0.11 3.47
SHIN 489 -24.2 24 0.00 1.62

The average mean value was illustrated to zero for BBL and SHIN. The others were
given the minus number that explain the high frequency of first different order was the
negatively. The fluctuation of price change was indicated by standard deviation. PTT the
highest had 7.50, PTTEP had 4.14, SCC had 3.47, BBL had 2.35 and SHIN had 1.62 that
shows the interval of first order distribution how far from the average mean value.

Table 2. First difference serial correlation with Kendalls tau b formula.

Security

Test description

Value
PTT Correlation Coefficient .035
Sig. (2-tailed) .263
BBL Correlation Coefficient .044
PTTEP Correlation Coefficient .051
SCC Correlation Coefficient .105(**)
SHIN Correlation Coefficient .096(**)

Note: ** Correlation is significant at the 0.01 level (2-tailed).

The correlation coefficient initiated from 0.035, 0.044, 0.051, 0.096 and 0.105 in
respectively that followed PTT, BBL, PTTEP, SHIN, and SCC. Overall value demonstrated
the positive correlation in small fluctuation. When compared with past attempt [1], [2], [3],
[4], [5]. The close to zero coefficient mean that the prices movement was nearly the random
G00072
March 23-26, 2010
245
walk but there were some small price trends. To further prove that, the filter rule was
employed.

4.2 The results from Filter technique

The filter rule was used to find the return and compare with that of buy and hold
policy. The study was calculated with twenty-four filters in accordance with the past attempt
[4]. The broker fees were included for each security to test the rate of return from filter rule.
The dividends were included in calculation for both filter rule and the buy and hold policy.
The buy and hold started at the first time when filter rule required to buy security. This point
was the starting buys point of buy and hold policy. Also the final selling point of filter became
the buy and hold policy selling point. Every time of buying and selling, the broker fee were
included either the filter rule or the buy and hold policy. The rates of return on filter rule for
each security were shown in this order: PTT, PTTEP, BBL, SCC and SHIN. The
highest yields profit occurred at 14 %, 20 %, 4.5 %, 18 % and 30 % filter size and the
biggest loss occured at 30 %, 3.5 %, 40 %, 8% and lower than 4% filter size. The average
rates of return from filter were as follows: - 0.95 %, -21.47 %, -5.71 %, - 4.37 % and
46.69% compared with the average returns from buy and hold at -17.47 %, -4.11%, 6.34 % -
16.7%, and 14.29% in respectively. The result showed that, buy and hold obtained the profit
only for BBL.

5. CONCLUSION
The random walk hypothesis and the filter rule were attempted with some big
securities from Thailand Stock Exchanges. The calculation proved these securities followed
the random walk in some degrees. The filter technique was applied to test whether the series
had the moving trends. We hoped that filter technique can generate the profit from security
trading however this technique was ineffective.

REFERENCES
1. Alexander, Sidney S. Price Movement in Speculative Market: Trends or Random walks,
Industrial Management Review, II (May 1961), 7-26.
2. Kendall M.G. The Analysis of Economic Time Series, Journal of the royal statistical
society, Ser.A, XCVI (1953), 11-25.
3. Eugene F. Fama. The behavior of stock market prices ,The journal of Business, Volume
38 N.1 (Jan., 1965), 34-105
4. Fama Eugene F., Blume Marshall E. Filter Rules and Stock Market Trading , The
journal of Business, Volume 39 (Jan., 1966), 226-241
5. Raymond M. Leuthold. Random Walk and Price Trends: The Live Cattle Futures Market
, The Journal of Finance, Vol. 27, No. 4 (Sep., 1972), 879-889

G00072
March 23-26, 2010
246
APPENDIX

Table A1. The comparison results

SECURITY
FILTER SIZE PTT PTTEP BBL SCC SHIN
F B F B F B F B F B
0.005 2.990 -16.35 -30.800 -4.11 0.640 7.7 -7.270 -17.3 -111.130 14.15
0.010 2.990 -16.35 -30.800 -4.11 0.640 7.7 -7.270 -17.3 -111.130 14.15
0.015 2.990 -16.35 -30.800 -4.11 0.640 7.7 -7.270 -17.3 -111.130 14.15
0.020 2.990 -16.35 -30.800 -4.11 0.640 7.7 -7.270 -17.3 -111.130 14.15
0.025 2.990 -16.35 -30.800 -4.11 0.640 7.7 -7.270 -17.3 -111.130 14.15
0.030 2.990 -16.35 -30.800 -4.11 1.100 7.7 -7.410 -17.3 -111.130 14.15
0.035 3.460 -16.35 -37.310 -4.11 1.530 7.7 -7.410 -17.3 -111.130 14.15
0.040 3.460 -16.35 -36.500 -4.11 1.030 7.7 -9.540 -17.3 -111.130 14.15
0.045 5.930 -16.35 -33.010 -4.11 4.200 7.7 -8.220 -17.3 -32.960 14.15
0.050 0.640 -16.35 -33.010 -4.11 0.260 7.7 -9.370 -17.3 -21.150 14.15
0.060 2.150 -16.35 -26.580 -4.11 -4.830 7.7 -14.590 -17.3 -26.120 14.15
0.070 0.100 -17.92 -27.190 -4.11 -14.770 7.7 -13.620 -17.3 -30.720 14.15
0.080 -0.570 -17.92 -19.330 -4.11 0.370 7.7 -15.770 -17.3 -30.720 14.15
0.090 -3.660 -17.92 -19.330 -4.11 -5.400 5.03 -9.310 -17.3 -14.140 12.16
0.100 -4.910 -17.92 -23.660 -4.11 -11.040 5.03 -5.950 -17.3 -6.670 12.16
0.120 12.980 -17.92 -29.050 -4.11 -5.510 5.03 -2.190 -17.3 -1.760 12.16
0.140 19.260 -17.92 -12.910 -4.11 -18.200 5.03 -1.560 -17.3 4.920 12.16
0.160 5.410 -17.92 -14.380 -4.11 -21.880 5.03 7.530 -18 9.710 12.16
0.180 -6.270 -17.92 -0.800 -4.11 -6.090 5.03 8.730 -18 -6.130 12.16
0.200 0.650 -17.92 3.980 -4.11 -5.160 5.03 8.350 -18 7.500 12.16
0.250 -16.550 -17.92 -5.250 -4.11 -11.980 5.03 6.860 -18 0.400 12.16
0.300 -31.540 -18.93 -13.100 -4.11 -9.060 5.03 8.990 -18 14.860 11.19
0.400 -16.850 -19.18 -5.490 -4.11 -26.270 3.33 4.330 -8.05 -87.990 26.46
0.500 -14.520 -22.06 2.330 -4.11 -8.550 3.33 -8.480 -8.84 -10.650 24.02
AVERAGE -0.954 -17.47 -21.47 -4.11 -5.71 6.335 -4.374 -16.7 -46.694 14.29
RETURN

Note : Value in the table represent by percent.

Computational
Physics

C00012
March 23-26, 2010

247
Two-Dimensional Bisoliton Model in Cuprates
T. Chanpoom
Department of Physics and General Science, Faculty of Science and technology,
Nakhon Ratchasima Rajabhat University, Nakhon Ratchasima, 30000, Thailand.
E-mail : Nakonrachasima@yahoo.co.th Tel. 089-4252486

ABSTRACT
Two-dimensional bisoliton model in cuprate has been studied by nonlinear Schrodinger
equation without Coulomb interaction. It is shown that in the square crystal lattice
which containing square plane layers of alternating copper and oxygen ions, the
formation of bisoliton and its bound states are of quasi-one-dimensional character. So
that one-dimensional bisoliton model to confirm its application to high-temperature
superconductors

Keywords: The Bisoliton Model,Two Dimensional Bisoliton Model.

1. INTRODUCTION
The cuprates superconductor discovered in 1986 when cooled it below 40K to called
high-temperature superconductor. The new superconductors different from the conventional
superconductor, the great anisotropy properties, small density of charge carriers and very
weak the isotope effect.[1]. The BCS theory that the main idea is weak and linear electron-
phonon interaction cannot explained the peculiar properties of cuprates superconductor. A. S.
Davydov proposed the mechanism of high-T
c
superconductor by one-dimensional bisoliton
model in 1988 [2]. The model based on the concept of moderately strong electron-phonon
coupling in a singlet state and moving along the alternative oxygen and copper atom chain.
The bisoliton model modify from soliton model which explained mechanism of
superconductivity in organic crystals [3]. The result from one dimensional bisoliton model
show the absent of isotope effect and critical temperature as a function of a doping level.

The mechanism of high-temperature superconductor from one-dimensional bisoliton
model started by nonlinear Schrodinger equation
( ) ( )
2 2
2
, , 0
2
j
i x t x t
t m x
op v
-
c c
+ =

c c

(1)
which determined the wave function of one soliton at arbitary position x and time t
( , )
j
x t v on spin state , j o | = , and the equation characterized the deformation of the chain
( , ) x t p is
( ) ( ) ( )
2 2 2 2
2 2
2
0 2 2 2
, , , 0
a
v x t x t x t
t x M x
o |
o
p v v
| | c c c

+ =
|

c c c
\ .
(2)

C00012
March 23-26, 2010

248
To substitute solution of (2) in to (1) we have the equation for amplitude function for
one bisoliton ( ) c u as

( ) ( )
2
2
2
4 0
d
g
d
c c c
c

+ + u u =

(3)
where c is energy of bisoliton and dimensionless quantity c defined by

x vt
a
c

= (4)
the solution of (3) in the periodic form given by

( )
( )
,
( )
2
dn u k
g
E k
c u = (5)
is Bose condensation bisolitons state wave function in one-dimensional model.

In the real system, cuprates superconductor consist of the plane layers of CuO ions
are form two mutually perpendicular chains along x and y directions. The two-dimensional
bisoliton model to applied consider bisolitons moving on lattice layers, start by two
dimensional nonlinear Schrodinger equation

2 2
2
2 2
( , , ) ( , , ) ( , , )
2 ( , , ) ( , , ) 0
2
j j j
j
x y t x y t x y t
i x y t x y t
t m x y
v v v
op v
-
| | c c c
+ + =
|
|
c c c
\ .

(6)
and

2 2 2
2
0 2 2 2
( , , ) ( , , ) ( , , ) x y t x y t x y t
v
t x y
p p p | | c c c
+
|
c c c
\ .
2 2
2 2
2
2 2
2
0
j j a
M x y
v v
o
| |
c c
|
+ =
| c c
\ .

(7)
The amplitude function for two-dimensional one bisoliton given by solve equations
(6) and (7) in the form [4]
( ) sec
2
g
h g c c
'
' u = (8)
Where

2

( ) v xi yj v t
va
c
+
=

(9)
v is bisoliton velocity, a is the lattice constant.
The coupling parameter g' in case of two-dimensional is four times for the one-
dimensional g . The one bisoliton distribute along x and y directions on lattice layers is given
by equations (10) and (11) respectively

2
2
4
1 2
( , , ) sec ( )
2
g
t h g x c c
'
' u = (10)

2
2
4
1 2
( , , ) sec ( )
2
g
t h g y c c
'
' u = (11)
are show in Figure 1 and Figure 2 respectively

C00012
March 23-26, 2010

249

Figure 1. Graph of the distribute one bisoliton moving
on lattice layers along x direction.

Figure 2. Graph of the distribute one bisoliton moving
on lattice layers along y direction.

At low temperature the lattice layers consisting of many bisolitons they can form a
Bose condensation state. The properties of this state define by periodic solution depend on x,
y and k [5] . The function which show the distribute of Bose condensation bisolitons along x
and y directions on lattice layers is given by equations (12) and (13) respectively

2
2
2
( , ) ( , )
2 ( )
g
k dn x k
E k
c
'
u = (12)

2
2
2
( , ) ( , )
2 ( )
g
k dn y k
E k
c
'
u = (13)
are show in Figure 3 and Figure 4 respectively

C00012
March 23-26, 2010

250

Figure 3. Graph of the distribute Bose condensation bisolitons moving on
lattice layers along x direction for k=0.9.

Figure 4. Graph of the distribute Bose condensation bisolitons moving on
lattice layer along y direction for k=0.7.

The binding energy of two solitons form bisoliton at rest define by

{ }
2
2 2 2
2 3
(2 3 ) 1 1
2 ( ) 2(2 ) ( ) (1 ) ( )
( ) 4 ( ) 3
k
k g J k E k k K k
E k E k

' A =
`
)
(14)
this data can determining the critical temperature T
c
to destroy bisoliton into two free solitons
by

{ }
2 2
2 2
2 2
(2 3 ) 1 1
2(2 ) ( ) (1 ) ( )
( ) 4 ( ) 3
c
B
g J k
T k E k k K k
k E k E k
'
=
`
)
(15)
The depend of T
c
on k can be show in Figure 5

C00012
March 23-26, 2010

251

0.0 0.2 0.4 0.6 0.8 1.0
k
T
c

Figure 5. The critical temperature Tc as a function of k.
The upper limit of k for stability Bose condensation bisoliton about 0.82.

4. RESULTS AND DISSCUSSION
The function that show distribution of Bose condensation bisolitons on the plane of
lattice layers are

2
2
2
( , ) ( , )
2 ( )
g
k dn x k
E k
c
'
u =
in case of bisoliton moving in x direction and

2
2
2
( , ) ( , )
2 ( )
g
k dn y k
E k
c
'
u =
in case of bisoliton moving in y direction. The parameter g' is four times for the one-
dimensional g .

From these data show that each the two-dimensional bisoliton wave function
depended only x or y variables. In other words, the moving in two-dimensional is quasi-one-
dimensional. So, the superconductivity on the plane of lattice layers can be considered the
charge carries moving along one chain in x or y directions.

The binding energy of bisolitons which caculated from the wave function revealed the
critical temperature depended on k, which corresponded the data from experimental that the
critical temperature depended on doping level [1]. .

C00012
March 23-26, 2010

252
REFERENCES
1. Mourachkine, A., Room-temperature superconductivity. Cambridge International
Science Publishing, Cambridge, 2004.
2. Davydov, A. S., Bisoliton mechanism of high-temperature superconductivity. Physica
Status Solidi (b),1988, 146, 619.
3. Brizhik, L. S., and Davydov, A. S., Soliton mechanism of superconductivity in organic
quasi-one-dimensional crystals. Physica Status Solidi (b) ,1987,143, 689.
4. Abramowitz, M., and Stegun, I. A., Soliton and the inverse scattering transform., The
society for industrial and applied mathematics, U.S.A, 1981.
5. M. Inc., and M. Ergiit., Periodic wave solution for the generalized shallow water wave
equation by the improved Jacobi elliptic function method. Applied Mathematics E-
Notes , 2005, 5, 89-96.

D00003 253
March 23-26, 2010
Vacancy-mediated dynamics with quenched disorder in
binary alloy: Monte Carlo simulations and dynamic scaling

B. Pattanasiri
1,2
, N. Nuttavut
1
, D. Triampo
1,2
, and W. Triampo
1,3,C

1
R&D Group of Biological and Environmental Physics (BIOPHYSICS), Department of Physics,
Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
2
Center of Excellence for Innovation in Chemistry, Department of Chemistry, Faculty of Science,
Mahidol University, Bangkok, 10400, Thailand
3
Center of Excellence for Vector and Vector-Borne Diseases, Mahidol University,
C
E-mail: wtriampo@gmail.com; Fax: 02-4419322; Tel. 084-0042689

ABSTRACT
Vacancy-mediated dynamics processes with quenched disorder of binary alloy were
investigated by using Monte Carlo simulations. We study initially phase-segregated
binary alloy illustrated by Ising-like model with vacancies. The time evolution of
interface destruction and bulk disordering processes via time-dependent disorder
parameter at very high enough temperature were analyzed with various conditions. It
was found that the dynamic behavior could be characterized by the dynamic scaling
and their characteristic exponents.

Keywords: Vacancy-mediated dynamic, Binary alloys, Disorder parameter, Dynamic
scaling.

REFERENCES
1. Schmittmann B, Zia R.K.P., Triampo W., Brazilian Journal of Physics, 2000, 30, 139-
151.
2. Triampo W, Aspelmeier T, Schmittmann B., Phys. Rev. E. 2000, 61(3), 2386-2396.

D00005
March 23-26, 2010
254
Numerical Investigations of the Distributions of Elementary
Excitations of the Bimodal Ising Spin Glass

N. Jinuntuya
1
*, and J. Poulter
2

1
Department of Physics, Faculty of Science, Mahidol University, Bangkok, Thailand 10400
2
Department of Mathematics, Faculty of Science, Mahidol University, Bangkok, Thailand 10400
*E-mail: fscinpr@ku.ac.th, Tel: 081-772-1955

ABSTRACT
We investigate numerically the distributions of elementary excitations of the bimodal
Ising spin glass for a large number of disorder realizations. The Pfaffian method is used
to calculate the degeneracies of the systems. The distributions are analyzed according
to a fat-tailed power law distribution. Extrapolation of the distribution parameters to the
thermodynamic limit are attempted and discussed.

Keywords: Spin Glass, Ising Model, Elementary Excitations, Fat-tailed Distribution.

1. INTRODUCTION
The bimodal Ising spin glass model was proposed by Edwards and Anderson 35 years ago.
However its properties are not fully understood. One of those properties is its elementary
excitations. There has been a long debate of what is the value of the elementary excitation of
this model [1]. By considering a local spin-flip inside a lattice one may naively expect a 4J
excitation on the square lattice. However, this may not be the case. Our recent studies suggest
that the elementary excitations in the thermodynamic limit should be 2J. To see this we start
form the expression of the specific heat

c =
|
|
|
.
|
\
|
+
|
|
.
|
\
|
|
|
.
|
\
|
+

kT J kT J
e
M
M
M
M
J e
M
M
J
L kT
J
/ 4
2
0
1
0
2 / 2
0
1
2 2
2
1
8 2
2
, (1)

where M
i
is the degeneracy of the i
th
excited state. We may expect that at low temperature c
should be dominated by the first few terms. By examining the ratios M
i
/M
0
we should be able
to decide what the elementary excitation gap is. It was shown [1-3] that the distributions of
degeneracy of the first few excited states are fat-tailed, not developing the sharp peaks when L
is increasing. However the scaling property of the peak positions supports the value of a 2J
excitation gap.

In this work we try to understand more about this 2J excitation. We review again the
system in [1] to gain more information about the origin of 2J excitations. Since their
distributions are fat-tailed, the value of M
1
/M
0
can deviate far from the most likely values. So
it is interesting to study the properties of the tail in a hope that they will provide more
information for the system.

2. MODEL AND COMPUTATIONAL METHOD
The system of our interest is the bimodal Ising spin glass on a L L square lattice with
odd L. The ferro- and antiferromagnetic bonds are distributed randomly with equal
probability. The boundary conditions are periodic in one dimension and embedded in an
infinite unfrustrated nest in the other dimension. This choice of boundary conditions allows 2J
excitations in a finite system. We show in figure 1 an example for L = 3. We use the Pfaffian
method [2, 4] to calculate the degeneracy of the 1
st
and 2
nd
excited states. We then analyze the
D00005
March 23-26, 2010
255
distribution parameters according to a fat-tailed power law distribution. Finally,
extrapolations of the distribution parameters to the thermodynamic limit are discussed.

Figure 1. Ground state (a) and 1
st
excited state (b) configuration for L = 3. The thin and thick
lines are ferro- and antiferromagnetic bond respectively. The dash line shows the
periodic boundary condition. The marked lines are the unsatisfied bonds. The energy
difference of these configurations is 2J.

It was shown in [1] that the most likely value of M
1
/M
0
tends to be zero as the system size
increases. From these data we expect that the direct 2J excitations are unlikely when the
system size is large enough. However the story is different when we look at the tails. Figure
2(a) shows the log-log plots of the tails of M
1
/M
0
. We can see that the probability of finding
M
1
/M
0
far from the most likely value is increasing with L. It seems that the tails are pulled
out when L increases. The large M
1
/M
0
can be expected even in the thermodynamic limit.
We have fitted the tails with the power law distributions f (x) x
a
where x = (M
1
/M
0
)/L
2
. The
results are shown in figure 2(b). We can see that a is approaching a
~ 1.2378 when 1/L is

approaching zero. In the thermodynamic limit the distribution can be normalized but the mean
does not exist [5].

It is interesting to compare the values of M
1
and M
2
. Figure 3 shows the distributions of
M
2
/M
1
(scaled with L). The most likely values move to zero when L increases. Figure 4 shows
the log-log plots of the tails of M
2
/M
1
and the fitting results using power law distributions. We
can see that the tails grow fatter and it is more likely that M
2
is much larger than M
1
when L
increases.

(a) (b)

Figure 2. (a) Log-log plots of the tails of (M
1
/M
0
)/L
2
with various values of L and (b) Power
law fitting results.

D00005
March 23-26, 2010
256

Figure 3. Distributions for (M
2
/M
1
)/L with various values of L.

(a) (b)

Figure 4. (a) Log-log plots of the tails of (M
2
/M
1
)/L with various values of L and (b) Power law
fitting results.

4. CONCLUSION
From the results above it seems that the origin of the 2J excitations is more complicated
than we first expected. For the system we have studied here there are two sources of 2J
excitation gap, firstly from the boundary condition and secondly from the complicated
contribution of an infinite number of 4J excitations. However the boundary effect should
vanish for a large system. The distributions of M
2
/M
1
support this expectation. What we plan
to study next is how the first source is merged into the second in the thermodynamic limit.

The fat-tailed distributions introduce other question. Although the most likely value
vanishes in the thermodynamic limit, the value at the tail may be able to contribute to a
physical quantity. This means that the 2J excitation gap may not be contributed to only from
the direct 2J but also from complicated combinations of 4J excitations, and also some higher
excited states. To answer this question needs more analytical and numerical work. These lines
of study are in progress.

D00005
March 23-26, 2010
257
REFERENCES
1. Jinuntuya, N. and Poulter, J., 13
th
Annual Symposium on Computational Science and
Engineering, Kasetsart University, 2009.
2. Atisattapong, W. and Poulter, J., New J. Phys., 2008, 10, 093012.
3. Atisattapong, W. and Poulter, J., New J. Phys., 2009, 11, 063039.
4. Blackman, J. A. and Poulter, J., Phys. Rev. B, 1991, 44(9), 4374-86.
5. Newman, M.E.J., Contemp. Phys., 2005, 46(5), 323-351.

ACKNOWLEDGMENTS
NJ thanks the Thai Government Science and Technology Scholarship for his scholarship.
Some of the computations were performed on the Tera Cluster at the Thai National Grid
Center and the cluster at the Department of Physics, Kasetsart University.

D00007
March 23-26, 2010
258
The Effects of Dangling Bond Terminators in MOF-5

Muhammad Hafiz Hussim
1
, Shukri Sulaiman
2,C
3
, Rafie
Deraman
1
, and Lee Sin Ang
2,

1
Faculty of Applied Sciences, Universiti Teknologi MARA, Kuala Terengganu Campus, Malaysia
2
Physical Sciences Programme, School of Distance Education, Universiti Sains Malaysia, 11800
Penang, Malaysia
3
Chemical Sciences Programme, School of Distance Education, Universiti Sains Malaysia, 11800
Penang, Malaysia
E-mail: shukri@usm.my; Fax: 604-657600; Tel. 604-6533639

ABSTRACT
The effects of different dangling bond terminators, namely H and CH
3
, in molecular
clusters to represent MOF-5 have been studied using density functional theory [1-4]. A
column of two Zn corners connected with a benzenedicarboxylate (BDC) linker,
(Zn
4
O)
2
(COOCH
3
)
10
(COO)
2
C
6
H
4
was constructed and its dangling bonds terminated
with H or CH
3
. The geometry of the chosen cluster was optimised either with full or
partial constrain on the coordinates of the terminator. For the constrained optimization,
the terminator bond lengths were adjusted so that the binding parameters of atoms in
the cluster are close to the experimental values. The variation of terminators bond
lengths resulted in significant difference to the optimised geometry of the cluster. From
these studies, the optimum terminator bond length can be inferred and suggested. The
effects of the different type of terminators on the electron distribution and electronic
state of the cluster will also be presented.

Keywords: Metal Organic Frameworks; MOF-5; Density Functional Theory; dangling
bonds.

1. INTRODUCTION
After the Yaghi group [1] first reported on the high H
2
uptake capacity of MOF-5 (Figure
1) in 2003, Huber et al. promptly investigated H
2
binding to the MOF using a theoretical
approach [2]. They focused on the interaction of H
2
with the aromatic linkers using second-
order Moller Plesset (MP2) calculations. The calculations
show that larger aromatic linkers increase the interaction
energy with H
2
. A related study carried out by Sagara et al.
was able to find the binding energy of multiple hydrogen
molecules to organic BDC linkers [3].

Mulder et al. [4], and Mueller and Ceder [5] have applied
the DFT method to study the H
2
binding energies in MOF
crystals. Both papers showed that the strongest interactions
with hydrogen were located near the ZnO
4
clusters, although
they reported different H
2
binding energies to the ZnO
4

clusters. Their works were then validated by some
experimental [6-10] and theoretical studies [11-13]. Results
from DFT computations are very helpful to calculate H
2

binding energies to MOFs and also provide good
information on the H
2
adsorption sites in MOFs.

The modeling study of MOF-5 to simulate the seamless representation including metal
clusters and organic BDC linker to trace all possible adsorption sites was later performed by
Lee et al. [14]. In their approach which was inspired by the previous work of Sagara et al.
Figure 1 : MOF-5
Molecule
D00007
March 23-26, 2010
259
[15], they constructed a simplified cluster model by using acetate groups to replace the side of
the MOF-5 crystal structure that are attached with 10 BDCs. This work was further
investigated by Pianwanit et al. [16] to include the complete molecule of MOF-5 using
ONIOM calculations and also exploring possible binding sites for CH
4
and CO
2
.

In our studies, we seek to find a compromise between high computational cost and
producing reliable calculated geometries of the MOF-5 that are in reasonable agreement with
experimental data. Various H terminator bond lengths were explored to investigate its effects
on the electronic structure of the cluster.

A column (Figure 2)
that consists of two ZnO
4

corners connected by an
organic BDC linker is
constructed and fully
optimized using two
quantum mechanical
methods namely Hartree
Fock (HF/6-31G*) and
Density Functional Theory
(DFT/MPW1PW91/6-
311G*). The latter model is considered to represent a model with a higher level of accuracy
than the former. The latter is also regarded to be adequate in terms of quantitative precision
for the investigations of weak interactions between hydrogen and MOF-5, as suggested by
Lee et al. The ten dangling bonds at the ZnO
4
corners were terminated by hydrogen atoms
instead of acetates to investigate the effects of different type of terminators. This type of
termination was inspired by a previous study conducted by Okulik et al. where H atoms were
used to terminate the zeolite clusters [17]. The study showed how the Si-H distance affects the
electronic structures of zeolite clusters and can be used to derive universal Si-H distance that
reproduces a realistic cluster.

Using the constructed cluster to represent MOF-5, we first performed full geometry
optimization calculations by allowing all geometrical parameters including those of the H
terminators to change. Subsequently, we performed a series of calculations whereby the H
terminator bond lengths were fixed at various values and the effects of various positions of H
terminators on the electronic structure of the cluster were then analyzed. The structure was
further optimized with constrained optimization where the terminator bond length is adjusted
so that the binding parameters of atoms in the cluster are close to the experimental values
[18]. The optimum terminator bond length can then be inferred and suggested from the
minimum relative error in binding parameters compared to the experimental values. All
optimizations and energy calculations were carried out using the Gaussian03 package [19].

The results of our investigations are summarized in Figure 3, Figure 4 and Table 1. Figure
3 shows the plot of the H terminator bond length versus the relative error in bond lengths and
bond angles, calculated using the HF/6-31G* method while Figure 4 is the corresponding plot
using DFT/MPW1PW91/6-311G*. The relative error is calculated by taking the average
difference between the calculated and experimental values of bond lengths and bond angles
that are tabulated in Table 1. Table 1 summarizes the results from different techniques of
terminating the dangling bonds.
Corner
Linker
Figure 2 : ZnO
4
Corner and BDC Linker
D00007
March 23-26, 2010
260
As can be seen from Figure 1, for HF/6-31G*, fixing the H terminator bond length at 1.44
produces the lowest discrepancies in the intramolecular geometries of the MOF-5 column
relative to the experimental ones. When using high accuracy model with DFT
MPW1PW91/6-311G*, H terminator bond length of 0.79 gave the best accurate model, as
compared to the experimental geometries. Although the MOF-5 column terminated with
acetate groups provided a higher accuracy compared to the experimental geometries, the
computational time is about 10 times longer.

From the electron population densities and HOMO-LUMO analysis in Figure 5, we
observe that there is only a minimal discrepancy between the model with restricted H
terminator bond length and the model terminated with acetate groups. The results also show
that there is no significant change in the calculated Mulliken charges between the two models.

Figure 3 : Average relative error in bond lengths and bond angles between restricted
model geometries (HF/6-31G*) and experimental geometries
Figure 4 : Average relative error in bond lengths and bond angles between restricted
model geometries (MPW1PW91/6-311G*) and experimental geometries
D00007
March 23-26, 2010
261
Table 1 : Comparison between restricted model geometries and unrestricted model
geometries with H terminators, model with CH
3
terminators and experimental values
a
.
Method A is HF/6-31G* and Method B is DFT/ MPW1PW91/6-311G*
a.
H. Lir, M. Eddaoudi, M. OKeeffe, O. M. Yaghi, Nature, 1999, 402, 276.

Methods
Terminator
Bond
Lengths
()
Relative
Time
Bond Lengths () Bond Angles (
o
)
Zn-
O1
Zn-
O2
O2-
C1
C1-
C2
O1-Zn-
O2
Zn-O2-
C1
O2-C1-
C2
H
Terminator
(Restricted)
A 1.44 1.00 1.993 1.952 1.246 1.494 111.1 133.0 117.9
B 0.79 0.51 1.964 1.941 1.258 1.488 111.4 131.7 117.4
H
Terminator
(Unrestricte
d)
A 1.08 0.49 1.985 1.961 1.245 1.496 110.4 132.9 117.8
B 1.10 0.53 1.969 1.937 1.258 1.488 111.6 131.7 117.4
CH
3

Terminator
A 1.51 11.65 1.978 1.969 1.245 1.497 110.1 132.8 117.7
B 1.50 5.10 1.957 1.946 1.258 1.489 110.9 131.6 117.3
Experiment
al data 1.936 1.941 1.252 1.498 111.1 132.3 118.1
Electron density at 0.2
HOMO
LUMO
Figure 5 : Comparison between model terminated with restricted H atoms (left) with model
terminated with methyl groups (right)
D00007
March 23-26, 2010
262
4. CONCLUSION
The HF/6-31G* and DFT/MPW1PW91/6-311G* methods were used to study the effect of
using H atoms as the terminator atom in MOF-5 structures. Marked difference was noticed on
the optimum H terminator bond length produced by the two methods. Termination of
dangling bonds with acetate groups produces the best results in terms of conformance with the
experimental geometry. However, the effects of using different types of terminators on the
distribution of charges in the cluster were small. Further studies are being carried out to find
the effects of the proposed model to the binding energy and optimal binding sites for it to be
comparable with the results from previous works [14-16].

REFERENCES

1. N. L. Rosi, J. Eckert, M. Eddaoudi, D. T. Vodak, J. Kim, M. OKeeffe and O. M. Yaghi,
Science, 2003, 300, 1127.
2. O. Huber, A. Gloss, M. Fitchtner and W. Klopper, J. Phys. Chem. A, 2004, 108, 3019.
3. T. Sagara, J. Klassen, and E. Ganz, J. Chem, Phys., 2005 123, 014701.
4. F. M. Mulder, T. J. Dingemans, M. Wagemaker and G. J. Kearley, Chem. Phys., 2005,
317, 113.
5. T. Mueller and G. Ceder, J. Phys. Chem. B, 2005, 109, 17974.
6. J. L. C. Rowsell, E. C. Spencer, J. Eckert, J. A. K. Howard and O. M. Yaghi, Science,
2005, 309, 1350.
7. J. L. C. Rowsell, J. Eckert and O. M. Yaghi, J. Am. Chem. Soc.,2005, 127, 14904.
8. T. Yildirim and M. R. Hartman, Phys. Rev. Lett., 2005, 95,215504.
9. S. Bordiga, J. G. Vitillo, G. Ricchiardi, L. Regli, D. Cocina,A. Zecchina, B. Arstad, M.
Bjorgen, J. Hafizovic and K. P. Lillerud, J. Phys. Chem. B, 2005, 109, 18237.
10. E. C. Spencer, J. A. K. Howard, G. J. McIntyre, J. L. C. Rowsell and O. M. Yaghi, Chem.
Commun., 2006, 278.
11. T. Sagara, J. Klassen and E. Ganz, J. Chem. Phys., 2004, 121,12543.
12. S. S. Han, W.-Q. Deng and W. A. Goddard III, Angew. Chem. Int. Ed., 2007, 46, 6289.
13. L. Zhang, Q. Wang and Y.-C. Liu, J. Phys. Chem. B, 2007, 111,4291.
14. T. B. Lee, D. Kim, D. H. Jung, S. B. Choi, J. H. Yoon, J. Kim, K. Choi and S. H. Choi,
Catalysis Today, 2007, 120, 330.
15. T. Sagara, J. Klassen, E. Ganz, J. Chem Phys. 2004, 121, 12543.
16. A. Pianwanit, C. Kritayakornupong, A. Vongachariaya, N. Selphusit, T. Ploymeerusmee,
T. Remsugnen, D. Nuntasri, S. Fritzsche, S. Hannongbua, Chem, Phys. 2008, 249, 77.
17. N. B. Okulik, R. P. Diez, A. H. Jubert, Comp. Materials Sc. 2000, 17, 88.
18. H. Lir, M. Eddaoudi, M. OKeeffe, O. M. Yaghi, Nature, 1999, 402, 276.
19. M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman,
J.A. Montgomery, T. Vreven, K.N. Kudin, J.C. Burant, J.M. Millam, S.S. Iyengar, J.
Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G.A. Petersson, H.
Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T.
Nakajima, Y. Honda, O. Kitao, H. Nakai, M.L. Klene, X. Li, J.E. Knox, H.P. Hratchian,
J.B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R.E. Stratmann, O. Yazyev,
A.J. Austin, R. Cammi, C. Pomelli, J.W. Ochterski, P.Y. Ayala, K. Morokuma, G.A.
Voth, P. Salvador, J.J. Dannenberg, V.G. Zakrzewski, S. Dapprich, A.D. Daniels, M.C.
Strain, O. Farkas, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J.V.
Ortiz, Q. Cui, A.G. Baboul, S. Clifford, J. Cioslowski, B.B. Stefanov, G. Liu, A.
Liashenko, P. Piskorz, I. Komaromi, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham,
C.Y. Peng, A. Nanayakkara, M. Challacombe, P.M.W. Gill, B. Johnson, W. Chen, M.W.
Wong, C. Gonzalez, J.A. Pople, Gaussion03, Revision C.02, Gaussian, Inc., Wallingford
CT, 2004.
D00008
March 23-26, 2010
263
Hyperfine Interactions of Muonium in Graphene

Lee Sin Ang
1
, Shukri Sulaiman
1,C
2

1
Penang, Malaysia
2
Penang, Malaysia
C

ABSTRACT
We have performed theoretical investigations on the hyperfine interactions of a
muonium attached to different sites in two types of graphene nanoribbons (GNRs); the
zigzag-edge graphene nanoribbon (ZGNR) and the armchair-edged graphene
nanoribbon (AGNR). The electronic properties of the GNRs with an added muonium
were calculated from the optimized geometries at the B3LYP/3-21G level of theory.
Three possible sites of muonium attachment to GNR were considered: (i) directly
connected to a carbon (A), (ii) above the bond between two carbon atoms (B), and (iii)
above the centre carbon ring (C). From the energy point of view, the most stable site
for the muonium is site A followed by site C. Site B is the least stable. The calculated
isotropic hyperfine coupling constant (hfcc) a
H
of the muonium in graphene is largest
when it is at site C. However, the sign of the Fermi Contact term is opposite for ZGNR
and AGNR, with a
H
(H) = 352.384 MHz for C
126
H
33
(ZGNR), whereas for C
132
H
35
(AGNR), a
H
(H) = 671.755 MHz. The anisotropic contribution of the hyperfine
interaction is small, in the range of 0.004 MHz to 0.954 MHz. Thus it can be concluded
that the main contribution to the hyperfine interactions is the Fermi Contact term.

Keywords: Graphene nanoribbons, Density functional theory, Spin density, Hyperfine
interactions

1. INTRODUCTION
Graphene is a two-dimensional carbon network of hexagonal mesh that has been
discovered recently [1-5]. Identification of the edges of the graphene nanoribbons
(abbreviated GNR), either as zigzag (ZGNR) or armchair (AGNR) has been performed by
employing scanning tunneling microscopy (STM) or atomic force microscopy (AFM) [6-13],
or a combination of both [14]. A video on how the edges are formed has also been recorded
[15]. It was also proposed that the specific edges can be determined by the spectra of the
bright exciton state of the optical absorption [16] or using Raman peaks [17].

In this work, we suggest another way to identify the edges of a GNR by exploiting the
hyperfine interactions between the nucleus of a muonium and the conduction electrons from
graphene. Our approach is different from the hyperfine interactions between the nucleus of
13
C isotope and the conduction electrons [18, 19]. Hyperfine interactions in naturally occuring
carbon are very weak because of the low percentage of the
13
C isotope, which would provide
the nucleus spin that interacts with the electrons. Yazyev [18] used a first-principle method to
study the hyperfine interactions of
13
C isotope in graphene, modeled by a few small graphene
flakes. Apart from the reported non-zero hyperfine coupling constants, Yazyev found that the
spin of the conduction electrons and the local atomic structure affects the hyperfine
interactions in graphene. Furthermore, the hyperfine constant is weaker and more anisotropic
than those heavier elements in the solid state environment [18]. Using a bigger graphene
model in the shape of a quantum dot, Fisher et al. [19] reported that the isotropic hyperfine
D00008
March 23-26, 2010
264
constant is zero when the graphene size is extended to its limit, and the contributions of the
hyperfine constants arise from the anisotropic hyperfine interactions.

2. METHODOLOGY
We employed the DFT/B3LYP/3-21G method in performing the calculations. ZGNR is
represented by a strip of 126 carbon atoms (C
126
H
33
), with a length of 24.6 and a width of
11.4 , excluding the edge hydrogen terminators. For AGNR, C
132
H
35
was chosen as our
model, where the strip has a length of 24.2 and a width of 12.3 . The dimensions of the
strips were chosen to minimize the interactions between the edges. The muonium was added
to three different sites: directly connected to a carbon (A), above the bond between two
carbon atoms (B) and above the center of a ring (C). For all the three cases, the initial distance
of the attached muonium to the graphene sheet is 1.091 . Geometry optimizations were
performed at these three different sites of possible muonium attachment. Since the number of
electron in these systems is not even, open shell calculations were employed. The
configurations of sites A, B, and C are as shown in Figure 1 for zigzag edges and in Figure 2
for armchair edges.

(b)
(a)
D00008
March 23-26, 2010
265

Figure 1: Configurations of the possible muonium attachment sites (coloured red) for ZGNR.
(a) directly connected to a carbon (A); (b) above the bond between two carbon atoms (B); (c)
above the center of a ring (C).

(c)
(b)
(a)
D00008
March 23-26, 2010
266

Figure 2: Configurations of the possible muonium attachment sites (coloured red) for AGNR.
(a) directly connected to a carbon (A); (b) above the bond between two carbon atoms (B); (c)
above the center of a ring (C).

From the energy point of view, the muonium prefers site A for both AGNR and ZGNR.
The results are as shown in Table 1. This agrees with results from first principles calculations
of Zhu et al. [20] and Ferro et al. [21]. Site B is the least stable of the three. In the following
part, we discuss the findings on our calculated hyperfine interactions.

Table 2 shows the values for the isotropic coupling. Only sites A and C have values that
are non-negligible. Between sites A and C, the values for site C are more distinguishable,
where one has a positive value (for AGNR), and the other has a negative value (for ZGNR).
From Table 3, we can conclude that all the values for the anisotropic terms are small. This
means that the isotropic term is the dominant parameter of the hyperfine interactions. The
values of the isotropic constants are related to the geometry of the systems. For site C, the
muonium has a distance of about 3 from the basal plane, while for sites A, and B, the C-H
bond is 1.12 . Also, of all the configurations considered, only the attachment in site B
exhibit symmetry. The results obtained here are different from that using
13
C isotope as the
source of the nuclear spin, in that the hyperfine interactions do have a non-negligible
anisotropic part. In our case, isotropic hyperfine constants do show differences for
configuration A and C, thus it is possible to detect the type of edge for GNR by using
muonium as a probe.
Table 1. Total energy (eV) for the optimized geometry of configurations A, B, and C.

Configuration
Energy (eV)
ZGNR AGNR
A
-130488.97811 -136710.98677
B
-130486.43802 -136709.28182
C
-130487.01127 -136710.15087

Table 2. Isotropic hyperfine constants for configurations of A, B, and C.

Isotropic Fermi contact coupling (MHz)
A B C
ZGNR AGNR ZGNR AGNR ZGNR AGNR
1
H 108.987 95.198 -14.753 0.748 -352.384 671.755
(c)
D00008
March 23-26, 2010
267

Table 3. Anisotropic hyperfine constants for configurations of A, B, and C.

4. CONCLUSION
Among the three sites of muonium attachment, configuration A is the most probable site,
followed by C and B. This is in line with the results of other first principle calculations. At the
DFT/B3LYP/3-21G level of theory, the dominant contributions to the hyperfine interactions
for ZGNR and AGNR are from the Fermi contact terms. The calculated isotropic hyperfine
constants showed marked differences between configurations A and C, particularly for
ZGNR, where it changes sign. Thus, in principle, it is possible to detect the type of GNR
edges by using muonium as a probe. The geometry of the configuration as well as the choice
of basis sets are important in predicting the values of the hyperfine coupling constants. We are
currently in the process of investigating the effects of basis sets on the strength of the
hyperfine interactions for trapped muonium in GNR.

REFERENCES
1. Geim, A.K. and Novoselov, K.S., Nature Materials, 2007. 6(3): p. 183-191.
2. Ando, T., NPG Asia Materials, 2009. 1(1): p. 17-21.
3. Geim, A.K., Science, 2009. 324(5934): p. 1530-1534.
4. Neto, A.H.C., Guinea, F., Peres, N.M.R., Novoselov, K.S., and Geim, A.K., Reviews
of Modern Physics, 2009. 81(1): p. 109.
5. Allen, M.J., Tung, V.C., and Kaner, R.B., Chemical Reviews, 2010. 110(1): p. 132-
145.
6. Kobayashi, Y., Fukui, K.-i., Enoki, T., and Kusakabe, K., Physical Review B, 2006.
73(12): p. 125415.
7. Kobayashi, Y., Kusakabe, K., Fukui, K.-i., and Enoki, T., Physica E: Low-
dimensional Systems and Nanostructures, 2006. 34(1-2): p. 678-681.
8. Brar, V.W., Zhang, Y., Yayon, Y., Ohta, T., McChesney, J.L., Bostwick, A.,
Rotenberg, E., Horn, K., and Crommie, M.F., Applied Physics Letters, 2007. 91(12):
p. 122102.
9. Enoki, T., Kobayashi, Y., Katsuyama, C., Osipov, V.Y., Baidakova, M.V., Takai, K.,
Fukui, K.-i., and Vul, A.Y., Diamond and Related Materials, 2007. 16(12): p. 2029-
2034.
10. Ishigami, M., Chen, J.H., Cullen, W.G., Fuhrer, M.S., and Williams, E.D., Nano
Letters, 2007. 7(6): p. 1643-1648.
11. Rutter, G.M., Crain, J.N., Guisinger, N.P., Li, T., First, P.N., and Stroscio, J.A.,
Science, 2007. 317(5835): p. 219-222.
Anisotropic spin dipole couplings in principal axis (MHz)

A B C
1
H

ZGNR AGNR ZGNR AGNR ZGNR AGNR
Baa
-0.954 -0.393 -0.219 0.284 -0.150 -0.025
Bbb
0.259 -0.274 -0.075 0.105 -0.119 0.004
Bcc
0.695 0.667 0.294 0.179 0.269 0.020
D00008
March 23-26, 2010
268
12. Campos-Delgado, J., Romo-Herrera, J.M., Jia, X., Cullen, D.A., Muramatsu, H., Kim,
Y.A., Hayashi, T., Ren, Z., Smith, D.J., Okuno, Y., Ohba, T., Kanoh, H., Kaneko, K.,
Endo, M., Terrones, H., Dresselhaus, M.S., and Terrones, M., Nano Letters, 2008.
8(9): p. 2773-2778.
13. Tapaszto, L., Dobrik, G., Lambin, P., and Biro, L.P., Nature Nanotechnology, 2008.
3(7): p. 397-401.
14. Jia, X., Hofmann, M., Meunier, V., Sumpter, B.G., Campos-Delgado, J., Romo-
Herrera, J.M., Son, H., Hsieh, Y.-P., Reina, A., Kong, J., Terrones, M., and
Dresselhaus, M.S., Science, 2009. 323(5922): p. 1701-1705.
15. Girit, C.O., Meyer, J.C., Erni, R., Rossell, M.D., Kisielowski, C., Yang, L., Park, C.-
H., Crommie, M.F., Cohen, M.L., Louie, S.G., and Zettl, A., Science, 2009.
323(5922): p. 1705-1708.
16. Yang, L., Cohen, M.L., and Louie, S.G., Physical Review Letters, 2008. 101(18): p.
186401.
17. Kudin, K.N., ACS Nano, 2008. 2(3): p. 516-522.
18. Yazyev, O.V., Nano Letters, 2008. 8(4): p. 1011-1015.
19. Fischer, J., Trauzettel, B., and Loss, D., Physical Review B, 2009. 80(15): p. 155401.
20. Zhu, Z.H., Lu, G.Q., and Wang, F.Y., Journal of Physical Chemistry B, 2005.
109(16): p. 7923-7927.
21. Ferro, Y., Marinelli, F., and Allouche, A., Journal of Chemical Physics, 2002.
116(18): p. 8124-8131.

ACKNOWLEDGMENTS
The authors would like to thank Universiti Sains Malaysia for the financial support for this
research through the Research University grant: 1001/PJJAUH/811062

D00009
March 23-26, 2010

269
First Principle Investigations of Electronic Structures and
Hyperfine Interactions of Muonium in Tetraphenylmethane

Shukri Sulaiman
1,C
2
, Pek-Lan Toh
1
, Lee Sin Ang
1

, Upali A Jayasooriya
3

1
Penang, Malaysia
2
Penang, Malaysia
3
School of Chemistry, University of East Anglia, Norwich NR47TJ, United Kingdom
C

ABSTRACT
Tetraphenylmethane has been the subject of experimental and theoretical investigations
recently because of its important role in many areas such as optoelectronics [1]. The
experimental study on positive muon implantation into tetraphenylmethane has been
conducted using Muon Spin Rotation (SR) technique. It was observed that the
trapping of muonium near one of the phenyl rings causes the ring to rotate within the
lifetime of the muonium. To understand the mechanism of this dynamic behaviour of
Mu, the trapping site must be known, which could not be determined through SR
experiments. It was however indicated that there are three possible sites for Mu
trapping, namely the ortho, meta and para positions on the phenyl rings. We have
performed first principle investigations employing the Density Functional Theory
technique to examine these three possible Mu trapping sites from the energetics aspect,
as well as the associated Mu hyperfine interactions. The results show that there exist
local minima in the energy profile at all three positions. Furthermore, the three local
minima are all within 0.05 eV of each other. For all three positions, the major
contributor to the hyperfine interaction is the isotropic component, which is 130.06,
140.44, and 132.69 MHz respectively. The corresponding anisotropic contributions are
3.80, 3.72 and 3.59 MHz.

Keywords: First Principle Investigations, Tetraphenylmethane, Muonium, Hyperfine Interaction
Constants.

1. INTRODUCTION
The studies of muonium (Mu) in organometallic compounds, semiconductor and
other materials have long been carried out using muon spin rotation (SR) spectroscopic
technique [17]. Recently, SR experiment had been conducted to probe the properties of
G14 tetraphenyl derivatives, XPh
4
where X = C, Si, Ge and Sn [8]. XPh
4
has been the subject
of both experimental and theoretical investigations because of its potential in optoelectronics
and applications in nonlinear optics materials [9-11]. The crystal structures of XPh
4
have also
been determined with X-ray crystallography techniques [11-17].
The implantation of muon particles into XPh
4
resulted in the formation of muonium
which is trapped in the vicinity of the phenyl ring. The exact trapping site however could be
ascertained through the SR experiment. Further, it was observed that the attachment of Mu
to the phenyl ring causes the Mu-attached phenyl ring to rotate about the C X bond, within
the short lifetime of Mu. To determine the Mu trapping site and to study its associated
hyperfine interactions, we have carried out first principle Density Functional Theory (DFT)
D00009
March 23-26, 2010

270
investigations for the case of Mu in tetraphenylmethane (CPh
4
). In our investigations, we have
examined three possible trapping sites for Mu from the energetics aspect, as well as the
associated Mu hyperfine interactions.

2. COMPUTATIONAL METHOD
We have employed the Density Functional Theory molecular cluster method at the
B3LYP/6-311G level to investigate the trapping of Mu in tetraphenylmethane. As for the
cluster in our calculations, we have used a single CPh
4
molecule (C
25
H
20
) to simulate the
tetraphenylmethane host environment. A hydrogen atom is used to represent the Mu and three
Mu trapping sites were considered namely, the ortho, meta and para sites on a phenyl ring as
shown in Figure 1. To simulate the crystal environment effect, restricted geometry
optimisation calculations were performed to find the stable sites for Mu. The optimised
geometries were then used to calculate the wavefunctions of the C
25
H
20
-Mu cluster. The
isotropic and anisotropic components of the Mu hyperfine coupling constants were evaluated
using the converged wavefunctions. All calculations were performed using the Gaussian 03
package [18].

1(a)

D00009
March 23-26, 2010

271

1(b)

1(c)
Figure 1: The numbering system used and the three possible Mu trapping sites near a phenyl ring of
tetraphenylmethane: (a) ortho, (b) meta, and para (c).

The results for the total energies and bond lengths corresponding to the three possible Mu
positions are summarised in Table 1. The total energies of the clusters are normalised to the
one corresponding to the ortho site, hence the energy for that site is taken to be zero. As can
be seen from Table 1, the energies for Mu at the meta and para sites are lower than that of the
ortho site. Based on the total energy consideration, the para site has the lowest energy and
might be considered as the most probable site. However, it should be noted that the energy
difference between the three sites is quite small.
The MuC bond length at the ortho, meta and para sites is 1.09942 , 1.1040 , and
1.10315 respectively. These values are all very similar and close to the typical CH (carbon
sp
3
) bond length of 1.1 in organic compounds. The C1C2 bond length before geometry
D00009
March 23-26, 2010

272
optimisation and without Mu is 1.54912 . The trapping of Mu in the vicinity of the phenyl
ring seems to cause this bond to be slightly elongated as evidenced by the results tabulated in
Table 1. Even though the elongation of the C1C2 bond length is quite small (1.1%, 1.0% and
0.4% respectively for the ortho, meta and para Mu sites), it may significantly contribute to
the factors that cause the rotation and dynamics of the Mu-attached phenyl ring.
The hyperfine interactions and the calculated coupling constants are tabulated in
Table 2. The isotropic hyperne interaction constant,
iso
A are 130.06, 140.44, and 132.69
MHz respectively for the ortho, meta and para sites. The anisotropic hyperfine interaction
constant
aniso
A is much smaller than the isotropic contribution and is opposite in sign (3.80
MHz, 3.72 MHz, and 3.59 MHz respectively) as shown in Table 2.

Table 1: Total energies and bond lengths in the optimized clusters corresponding to the three
Mu trapping sites on the phenyl rings of CPh
4

Total Energies (eV)
Bond Length ()
C1-C2 C3-Mu C4-Mu C5-Mu
ortho -26253.54 1.56558 1.09942
meta -26253.58 1.56507 1.1040
para -26253.59 1.55576 1.10315

Table 2: The calculated isotropic and anisotropic hyperfine interaction coupling constants for
Mu

Hyperfine Interaction Constant for Mu
Isotropic,
iso
A ( MHz )
Anisotropic,
aniso
A ( MHz )
Baa Bbb Bcc
ortho 130.06 -3.80 0.40 3.40
meta 140.44 -3.72 0.21 3.51
para 132.69 -3.59 0.11 3.48

4. CONCULSION
In this work, we have investigated the three possible trapping sites for Mu in
tetraphenylmethane using the DFT/B3LYP/6-311G method. The para site has the lowest
energy followed by the meta and ortho sites. The MuC bond length is close to 1.1 and the
C1C2 bond length seems to be slightly elongated for all three cases. The major contribution
to the Mu hyperfine interaction is from the isotropic component, and the anisotropic
component has opposite sign. Further investigations are currently being carried out to study
the effects of cluster size and to determine the rotational barrier.

REFERENCES
1. J. S. Lord, R. Scheuermann, S. F. J. Cox, and A. Stoykov, Physica B, 2006, 374-5, 395-7.
2. U. A. Jayasooriya, J. A. Stride, G.M. Aston, G. A. Hopkins, S. F. J. Cox, S. P. Cottrell,
and C. A. Scott, Hyperfine Interactions, 1997, 106, 27-32.
D00009
March 23-26, 2010

273
3. H. Li, T. M. Briere, K. Shimomura, R. Kadono, K. Nishiyama, K. Nagamine, and T. P.
Das, Physica B, 2003, 326, 133-8.
4. I. McKenzie, J. C. Brodovitch, K. Ghandi, B. M. McCollum, and P. W. Percival, Journal
of Physical Chemistry A, 2007, 111, 10625-34.
5. R. H. Scheicher, T. P. Das, E. Torikai, F. L. Pratt, and K. Nagamine, Physica B, 2006,
374-5, 448-50.
6. S. L. Thomas, and I. Carmichael, Physica B, 2006, 374-5, 290-4.
7. Vasily S. Oganesyan, Andrew N. Cammidge, Gareth A. Hopkins, Fiona M. Cotterill, Ivan
D. Reid, and Upali A. Jayasooriya, Journal of Physical Chemistry A, 2004, 108, 1860-6.
8. Upali A Jayasooriya, private communication.
9. K. Claborn, B. Kahr, and W. Kaminsky, CrystEngComm, 2002, 4(46), 252-6.
10. S. Sengupta, and S. K. Sadhukhan, Tetrahedron Letters, 1999, 40, 9157-61.
11. T. T. Lin, X. M. Liu, and C. B. He, Journal of Physical Chemistry B, 2004, 108 (45),
17361-8.
12. A. Robbins, G. A. Jeffrey, J. P. Chesick, J. Donohue, F. A. Cotton, B. A. Frenz, and C. A.
Murillo, Acta Crystallography, 1975, B31, 2395-9.
13. G. Filippini, and C. M. Gramaccioli, Acta Crystallography, 1986, B42, 605-9.
14. H. T. Sumsion, and J. D. McLachlan, Acta Crystallography, 1950, 3, 217-9.
15. M. Gomberg, Journal of American Chemical Society, 1898, 20(10), 773-80.
16. M. Gomberg, and O. Kamm, Journal of American Chemical Society, 1917, 39(9), 2009-
15.
17. N. A. Ahmed, A. I. Kitaigorodsky, and K. V. Mirskaya, Acta Crystallography, 1971, B27,
867-70.
18. M. J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman,
J.A. Montgomery, T. Vreven, K.N. Kudin, J.C. Burant, J.M. Millam, S.S. Iyengar, J.
Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G.A. Petersson, H.
Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T.
Nakajima, Y. Honda, O. Kitao, H. Nakai, M.L. Klene, X. Li, J.E. Knox, H.P. Hratchian,
J.B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R.E. Stratmann, O. Yazyev,
A.J. Austin, R. Cammi, C. Pomelli, J.W. Ochterski, P.Y. Ayala, K. Morokuma, G.A.
Voth, P. Salvador, J.J. Dannenberg, V.G. Zakrzewski, S. Dapprich, A.D. Daniels, M.C.
Strain, O. Farkas, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J.V. Ortiz,
Q. Cui, A.G. Baboul, S. Clifford, J. Cioslowski, B.B. Stefanov, G. Liu, A. Liashenko, P.
Piskorz, I. Komaromi, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham, C.Y. Peng, A.
Nanayakkara, M. Challacombe, P.M.W. Gill, B. Johnson, W. Chen, M.W. Wong, C.
Gonzalez, J.A. Pople, Gaussion03, Revision C.02, Gaussian, Inc., Wallingford CT, 2004

D00010
March 23-26, 2010
274
Diffusion of Galactic Cosmic Rays in an Interplanetary
Magnetic Flux Rope

W. Krittinatham
1,2,C
, D. Ruffolo
1
, and J. W. Bieber
4

1
Department of Physics, Faculty of Science, Mahidol University, Bangkok, 10400, Thailand
2
National Astronomical Research Institute of Thailand, Chiang Mai, 50300, Thailand
3
ThEP Center, CHE, 328 Si Ayutthaya Road, Bangkok, 10400, Thailand
4
Bartol Research Institute, Department of Physics and Astronomy, University of Delaware,19716, USA

C
E-mail: watcharawuth.krittinatham@gmail.com ; Fax: 02-2015762; Tel. 02-2015762

ABSTRACT
Interplanetary magnetic flux ropes (IMFRs) released by solar storms can strongly affect
the transport of energetic charged particles, as evidenced by second-stage Forbush
decreases in Galactic cosmic ray (GCRs). Considering the particles drift orbits, with
streaming along the magnetic field and guiding center drifts (curvature and gradient
drifts) perpendicular to the magnetic field, particles are expected to preferentially enter
along one leg of the loop and exit along the other, with a unidirectional anisotropy [1].
In the present work, we also consider the guiding center diffusion and pitch
angle scattering due to magnetic fluctuations in the IMFR. We use a numerical
treatment to calculate the drift orbits of GCR test particles with an initial postion at the
IMFR surface, and then trace their drift motions by a fourth-order Runge-Kutta
numerical method. In addition, we include random variables to trace teh particle
position subject to perpendicular diffusion and the parallel velocity subject to pitch-
angle scattering. This allows us to model how particles penetrate the interior of the
IMFR, for better comparison with observations.

Keywords: Galactic cosmic rays, magnetic fluctuations, magnetic flux rope, diffusion,
pitch-angle scattering

REFERENCES
1. Krittinatham, W. and Ruffolo, D., Astrophysical Journal, 2009, 704, 831-841

D00011
March 23-26, 2010
275
Effect of Magnetic Turbulence Structure on the Parallel
Transport of High Energy Particles

C. Buachan
1,2,C
, D. Ruffolo
1,2
, A. Siz
1,2
, A. Seripienlert
1,2
, and W. Matthaeus
3

1
Department of Physics, Faculty of Science, Mahidol University, Bangkok, Thailand 10400
2
ThEP Center, CHE, 328 Si Ayuthaya Road, Bangkok, Thailand 10400
3
Bartol Research Institute and Department of Physics and Astronomy, University of Delaware,
Newark, DE 19716, USA
C
E-mail: b.charong@gmail.com; Tel. 02- 201-5762

ABSTRACT
Solar Energetic Particles (SEPs) are accelerated by solar flares and coronal mass
ejections (CMEs). They are transported along the interplanetary magnetic field (IMF)
lines, resulting in intermittent dropout features [1, 2]. We use the 2D + slab model of
magnetic turbulence in this study. We perform particle trajectory simulations by
solving the Newton-Lorentz equation in a spherical geometry, using a fourth-order
Runge-Kutta method with adaptive step size control. We derive the mean free path by
least squares fitting of the results compared with the solutions of a transport equation
[3] to determine the best-fit mean free path of interplanetary scattering due to magnetic
fluctuations. We expect that the magnetic turbulence structure may lead to a mean free
path that depends on position. If so, this work may have implications for high energy
particle transport in space physics and astrophysics. Partially supported by TRF, NASA
and NSF.

Keywords: Turbulence, Interplanetary transport, SEPs, High energy particles

REFERENCES
1. J. E. Mazur, G. M. Mason, J. R. Dwyer, J. Giacalone, J. R. Jokipii, & E. C. Stone,
Astrophysical Journal, 2000, 532, L79
2. D. Ruffolo, W. H. Matthaeus, and P. Chuychai, Astrophys. J. Lett., 2003, 597, L169.
3. D. Ruffolo, Astrophysical Journal, 1995, 442, 861.

D00012
March 23-26, 2010
276
Secondary Neutrons from Cosmic Rays in Earths
Atmosphere above the Princess Sirindhorn Neutron Monitor

N. Kamyan
1,2,C
, D. Ruffolo
1,2
, A. Siz
1,2
, and P. Tooprakai
2,3

1
2
ThEP Center, CHE, 328 Si Ayuthaya Road, Bangkok, 10400, Thailand
3
Department of Physics, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
C
E-mail: pchang24@gmail.com; Fax: 02-2015762; Tel. 087-1774647

ABSTRACT
Neutron monitors are recognized as a key tool for studying the time variations of
galactic cosmic rays, especially with regard to solar effects, including advance warning
of space weather effects on Earth [1]. We have installed the Princess Sirindhorn
Neutron Monitor (PSNM) at a high vertical cutoff rigidity of 16.8 GV, which provides
unique data on the energy dependence of solar synodic variations, cosmic ray
anisotropy, Forbush decreases, and solar modulation. We show results from a
simulation of the atmospheric structure effects [2] on secondary neutron counts by
using the FLUKA program [3]. This work will give us data for further analysis and
better understanding of those effects. We will also obtain information that can be
applied to improve space physics and astrophysics knowledge. The results from the full
simulation with FLUKA will be presented at the meeting.

Keywords: FLUKA, Neutron Monitors, Space Physics.

1. INTRODUCTION
Cosmic rays are energetic particles or gamma rays from outer space. The most commonly
detected primary cosmic rays are protons (90%), alpha particles (9%), and ions of other
elements. They move along the interplanetary magnetic field to Earth. When a cosmic ray
strikes an air molecule in Earths atmosphere, it can produce secondary neutrons and we can
detect them by using a neutron monitor. Our group recently installed the first neutron monitor
station in Thailand, the Princess Sirindhorn Neutron Monitor (PSNM). It was set up at Doi
Inthanon in Chiang Mai province in order to measure the count rate of secondary neutrons,
which is related to the flux of cosmic rays in space.

2.1 Earths Atmosphere
Earths atmosphere contains many types of molecules. Their mass density is an inverse
function of the altitude, while their temperature is typically modeled as a piecewise linear
function with a temperature lapse rate (slope of temperature vs. height) as shown in Table 1.
When a primary cosmic ray strikes an air molecule, it can produce many more particles called
secondary cosmic rays. Some particles are absorbed by the atmosphere, but if they have more
energy they can arrive at the ground. We can study these particles by using neutron monitors.
Typically a cosmic ray must have a rigidity of at least 1 GV (the atmospheric cutoff) to yield
detectable neutrons at sea level (Figure 1).

D00012
March 23-26, 2010
277
Table 1. Standard model of Earths atmosphere vs. altitude. [2]

Subscript
b
Height
Above Sea
Level (h)
(m)
Mass
Density
() (kg/m)
Static
Pressure (P)
(pascals)
Standard
Temperature
(T) (K)
Temperature
Lapse Rate
(L) (K/m)
0 0 1.225 101325 288.15 -0.0065
1 11000 0.36391 22632.1 216.65 0
2 20000 0.08803 5474.89 216.65 0.001
3 32000 0.01322 868.019 228.65 0.0028
4 47000 0.00143 110.906 270.65 0
5 51000 0.00086 66.9389 270.65 -0.0028
6 71000 0.000064 3.95642 214.65 -0.002

Figure 1. Cosmic ray showers. A primary cosmic proton strikes an air molecule and
produces many secondary cosmic rays at different energy levels (Image source: John Clem,
2004 Annual CRONUS Collaboration Meeting)

2.2 Neutron Monitors

The first worldwide network of neutron monitors used the IGY (International
Geophysical Year) design, similar to the design by John Simpson in 1948. The first neutron
monitors were used in the US and Peru to study geomagnetic effects. In 1957-1958, 50 IGY
monitor stations were installed at different locations around the world. After that, in 1964, a
new generation of neutron monitors was developed, with the NM64 design, which gives a
higher counting rate than the IGY design. An important component of the NM64 is a
10
BF
3
(boron trifluoride) counter to yield nuclear reactions between
10
B and neutrons to produce Li
and He nuclei: n +
10
B
7
Li +
4
He. (Now sometimes
10
BF
3
is replaced with
3
He.) The product
nuclei can be detected in the proportional counter. The lead producer can produce more
neutrons when atmospheric neutrons impact Pb nuclei. There is polyethylene to trap neutrons
inside and to protect against disturbance by other particles. A Bare Counter is like the NM64
but it does not have a reflector or lead producer.
D00012
March 23-26, 2010
278
Reflector
Producer
Moderator
Tube alignment piece
Neutron detector
Secondary neutrons
Reflector
Producer
Moderator
Tube alignment piece
Neutron detector
Secondary neutrons

Figure 2. Components of an NM64 neutron monitor.

2.3 PSNM: Princess Sirindhorn Neutron Monitor station

The station is set up at the summit of Doi Inthanon in Chiang Mai province, within a
Royal Thai Air Force base. This station monitors the secondary neutron count rate, which is
proportional to the cosmic ray intensity from space. The summit of Doi Inthanon is the
highest point of Thailand. This is important because the secondary neutrons are absorbed by
the atmosphere. At Doi Inthanon, the count rate is about 5 times higher than that at sea level.

Figure 2. The Princess Sirindhon Neutron Monitor building at Doi Inthanon.

D00012
March 23-26, 2010
279
FLUKA is a program that is used to calculate particle transport and interactions with
matter. We use this program by create an input file that has 100 layers of the atmosphere. For
the material components we include the same material for each layer but at different pressure
and temperature.

Table 2. Altitude range for the layers that we defined.

i
2b
- i
1b
+1 i b h (m) L
b

77 1 - 77 0 0 - 11000 -0.0065
17 78 - 94 1 11000 - 22000 0
4 95 - 98 2 20000 - 32000 0.001
1 99 3 32000 - 47000 0.0028
1 100 4 47000 - 51000 0

Table 3. Materials and compounds for each layer. [2]

Gas Form % by Volume % by Weight
Molecular
Weight
Nitrogen N
2
78.08 75.47 28.01
Oxygen O
2
20.95 23.2 32
Argon Ar 0.93 1.28 39.95
Carbon Dioxide CO
2
0.038 0.059 44.01
Neon Ne 0.0018 0.0012 20.18
Helium He 0.0005 0.00007 4
Krypton Kr 0.0001 0.0003 83.8
Hydrogen H
2
0.00005 Negligible 2.02
Xenon Xe 8.70E-06 0.00004 131.3

D00012
March 23-26, 2010
280

4. RESULTS AND CONCLUSION
For preliminary results, we can get information of the particles from the interactions
including the velocity, kinetic energy, direction and position of each particle in each layer. An
analysis of the results regarding effects of the atmosphere on the neutron monitor count rate
will be presented at the meeting.

REFERENCES
1. K. Leerungnavarat, D. Ruffolo, and J. W. Bieber, Astrophys. J. 593, 587 (2003)
2. U.S. Standard Atmosphere, 1976, NOAA, NASA and USAF, U.S. Government Printing
Office, Washington, D.C., 1976.
3. Alfredo Ferrari, Paola R. Sala, Alberto Fass, Johannes Ranft , FLUKA: a multi-particle
transport code (Program version 2008), CERN2005010, INFN TC 05/11, SLACR
773, 12 October 2005.
D00015
March 23-26, 2010
281
Collimation of Particle Beams by Two-Dimensional
Turbulent Structure

A. Seripienlert
1,2,a
, P. Tooprakai
1,2,3
, D. Ruffolo
1,2
, P. Chuychai
2,4
and W. H. Matthaeus
5

1
2
ThEP Center, CHE, 328 Si Ayutthaya Road, Bangkok, 10400, Thailand
3
Department of Physics, Faculty of Science, Chulalongkorn University, Bangkok, 10330, Thailand
4
Thailand
5
Bartol Research Institute and Department of Physics and Astronomy, University of Delaware,
Newark,
DE 19716, USA
a
E-mail: achara.seri@gmail.com; Fax: 02-2015762; Tel. 02-2015762

ABSTRACT
We study charged particle motion in the interplanetary medium using a turbulent
magnetic field model, composed of a mean field and a turbulent fluctuating field. We
use a two-component (2D+slab) model of the turbulent magnetic field. We find that
charged particles of relativistic energies are systematically drawn toward potential
maxima of the 2D turbulence structure. We perform simulations of particle trajectories
at various energies in 2D MHD + slab magnetic turbulence by solving the Newton
Lorentz equations in spherical geometry, using a fourth-order Runge-Kutta method
with adaptive stepsize control. We use the 2D MHD procedure because we believe it
provides a more accurate description of the 2D fluctuations in an astrophysical plasma.
We find that particle trajectories from a localized source describe interesting structures
related to the 2D turbulence. This work may have implications for high energy particles
in space physics and astrophysics. Partially support by TRF, NASA and NSF.

Keywords: beam collimation, magnetic turbulence, particle simulations

REFERENCES
1. Ruffolo, D., Matthaeus, W. H., and Chuychai, P., Astrophys. J., 2003, 597, L169-172.
2. Chuychai, P., Ruffolo, D., Matthaeus, W.H., and Meechai, J., Astrophys. J, 2007, 659,
1761-1776.
3. Kittinaradorn, R., Ruffolo, D., and Matthaeus, W. H., Astrophys. J., 2009, 702, L138-141.

D00016
March 23-26, 2010
282
Computational Classification of Cloud Forest Using
Atmospheric Data from Field Sensors

P. Sangarun, W. Pheera, K. Jaroensutasinee, and M. Jaroensutasinee
E-mail: ppsangarun@gmail.com, wittayapheera@gmail.com, krisanadej@gmail.com,
jmullica@gmail.com; Fax: 086-4795011; Tel. 075673993

ABSTRACT
In theory, cloud forests exhibit distinct atmospheric characteristics. This study
attempted to quantify such distinct characteristic features so that we can
computationally classify them. In this work, we collected atmospheric data from
weather stations at eight study sites which can be grouped into three classes: (1) four
cloud forest sites (Duan Hok, Dadfa, Mt. Nom, and Doi Intanon stations), two tropical
forests (Huilek and Khao Nan headquarters stations) and two coastal sites (Khanom
and Muang Nakhonsithammarat stations). Atmospheric data were composed of
temperature, rainfall, humidity, solar radiation, solar energy, UV index, and heat index
during January to October 2009. Our results indicated that such computational
classifications can be achived and hence indices can be constructed to allow us to
monitor the effects of climate change on these cloud forests.

Keywords: Pattern Detection, Atmospheric data, Cloud Forest, Automatic Weather
Station, Field Sensor.

1. INTRODUCTION
Tropical montane cloud forest (TMCFs) occurs in mountainous altitudinal band
frequently enveloped by orographic clouds [1-2]. This forest obtains more moisture
from deposited fog water in addition to bulk precipitation [3-5]. The main climatic
characteristics of cloud forests includes frequent cloud presence, usually high relative
humidity (RH) and low irradiance [4-5]. TMCFs normally found at altitudes between
1,500 to 3,300 m a.s.l., occupying an altitudinal belt of approximately 800 to 1,000 m
at each site. The lowermost occurrence of low-statured cloud forest (300600 m a.s.l.)
is reported from specific locations like small islands, where the cloud base may be
very low and the coastal slopes are exposed to both, high rainfall and persistent wind-
driven clouds [1]. On small tropical islands, TMCFs can be found at lower altitudes.
TMCFs obtain more moisture from deposited fog water in addition to precipitation [1-
3]. All tropical forests are under threat but cloud forests are uniquely threatened both
by human pressures and by climate change impacting on temperature, rainfall and the
formation of clouds in mountain areas [6].

Many studies have been done on cloud forest climatic characteristics throughout the
world. Most of the studies were focusing on only two main climatic factors:
temperature, and relative humidity. Unfortunately, this parameter is not always
available particularly in remote areas, where there are no meteorological stations
installed in these locations. For this reason, several researchers have been interested in
developing several approaches for generating this parameter, some more recent works
D00016
March 23-26, 2010
283
are interesting to predict this parameter using the artificial intelligence techniques.
This study attempted to quantify temperature so that we can computationally classify
habitat types.

With the systematic accumulation of various climatic data and weather records for
long period, analytical distribution models which fit the observed distributions well
have been proposed by many climatologists and statisticians [7]. The following
theoretical distribution models have been proposed:
Temperature: Normal distribution, Pearson I types.
Precipitation: Gamma, Log-Normal, Kappa distribution.
Relative Humidity: Beta distribution.
Wind speed: Gamma, Weibull, Log-Normal distribution.
Wind rose: Circular distribution model and an empirical non-negative PDF.
Some other climatic elements: Poisson and binomial distribution.

Normal Distribution
The distribution most frequency encountered in meteorology and climatology is the
normal distribution. Many variables studied in climatology are averaged or integrated
quantities of some type. The law of large numbers that random variables of this type
are nearly normally distributed regardless of the distribution of the variables that are
average or integrated [8].
The form of the normal distribution is entirely determined by the mean and the
variance. Thus we write ) , (
2
N X to indicate that x has a normal distribution
with parameter

and
2
.
The normal density function is given by
2
2
2
) (
2
1
) (

x
N
e x f

The bimodal distribution is given by
2
2
2
2
2
1
2
1
2
) (
2
1
2
) (
1
1
2 2
) (

x x
N
e
c
e
c
x f

where
1
,
2
and
2
1
,
2
2
are the mean and variance of variables.

Study Site
We installed automatic weather station (Davis weather station model Vantage Pro II
Plus) at eight study sites grouped into three classes: (1) four cloud forest sites (Duan
Hok, Dadfa, Mt. Nom, and Doi Intanon stations), two tropical forests (Huilek and
Khao Nan headquarters stations) and two coastal sites (Khanom and Muang
Nakhonsithammarat stations). We used the data logger for data storage interval time
30 min for temperature, humidity, wind speed, wind direction, solar radiation rain and
atmospheric pressure.
D00016
March 23-26, 2010
284

Bimodal Distribution Fit
We used bimodal distribution to fit temperature distribution.


Table 1. The parameter values from fitting bimodal distribution.
1
,
2
,
1
,
2
c
1
,
c
2
represent the mean, standard deviation, and constant values of modal distribution 1
and 2.

1

2

1

2

1
c
2
c
Kanom 25.90 30.00 1.50 2.10 0.56 0.44
Nakhon Si 25.60 29.60 1.15 2.92 0.42 0.55
Mt. Nan 24.30 28.60 1.19 2.62 0.54 0.51
Hui Lek 21.80 26.00 1.25 3.02 0.59 0.40
Mt. Nom 18.30 20.30 0.72 3.06 0.42 0.52
Duan Hok 19.80 21.70 0.90 1.51 0.60 0.42
Dadfa 21.80 24.90 0.91 1.94 0.62 0.38
Doi Intanon 7.10 13.10 1.06 1.89 0.10 1.10

(a) (b) (c) (d)
D00016
March 23-26, 2010
285

(e) (f) (g) (h)

Figure 1. Temperature distribution with bimodal curve at eight study sites.
(a-b) are coastal sites, (c-d) are tropical forests and (e-h) are cloud forests

A bimodal distribution was composed of two subpopulations each of which is normal.
The tails of the bimodal distribution fitted the observed distribution well (Table 1, Figure 1a-
h). Kanom and Muang Nakhonsithammarat sites had higher
1
, and
2
than other sites
(Table 1). Mt. Nom and Duan Hok cloud forest had lower mean temperatures than at tropical
forest (Mt. Nan and Hui Lek) (Table 1). On the other hand, Dadfa cloud forest had slightly
higher temperature than Mt. Nom and Duan Hok cloud forests. This could be due to the fact
that Dadfa cloud forest was located near coastal area at low elevation (i.e. 700 m a.s.l). On
small tropical islands, TMCFs can be found at lower altitudes. TMCFs obtain more moisture
from deposited fog water in addition to precipitation [1-3]. Mt. Nom, Duan Hok and Dadfa
cloud forest had
1
less than 1 (Table 1). This indicates the spike curve on the modal 1 and
can be used as an indicator of cloud forest located near the equator. On the other hand, this
could not be applied to high latitude cloud forest like Doi Intanon (Table 1).
The temperature distribution of Kanom and Muang Nakhonsithammarat weather
stations were similarly because they were located near coastal areas with the elevation of 8 m
a.s.l. (Figure 1a,b). The relative frequency of the first peak was approximately 0.16 which was
lower than tropical forest and cloud forest sites (Figure 1a,b).
The temperature distribution of Mt. Nan Headquarters and Hui Lek weather stations were
similarly because they were located at tropical rain forest with the elevation of 200 m a.s.l.
(Figure c,d). The relative frequency of the first peak was approximately 0.20 which was
intermediate (Figure 1c,d).
The temperature distribution of Mt. Nom, Duan Hok, and Dadfa weather stations were
similarly because they were located at cloud forest with the elevation above 700 m a.s.l.
(Figure 1e-g). The relative frequency of the first peak was approximately 0.28 which was
highest among three habitat types (Figure 1e-h).
The temperature distribution of Doi Intanon weather station was different from the rest
because it was located at high latitude cloud forest with the elevation above 1300 m a.s.l.
(Figure h). The relative frequency of the first peak was approximately 0.40 and the second
peak of 0.23 (Figure 1g). This could be due to the fact that Doi Intanon had winter period
which had lower temperature than the rest. However, the relative frequency at the second
peak was higher than 0.20. This indicates that cloud forest temperature distribution should
exhibit the relative frequency of a shape peak more than 0.20.
The distributions of cloud forest sites had two marked maxima (Figure 1e-h). This
bimodalilty might be due to the interference of the annual cycle: summer and rainy seasons.
D00016
March 23-26, 2010
286
The summer peak is shorter than the rainy season peak because rainy season peak is less
variable than summer weather. Cloud forest with saturated fog and lower temperature occurs
during rainy season peak.

5. CONCLUSION
Bimodal distribution of temperature can be used as to separate forest types. From eight
study sites, the mean and variance of the bimodal distribution can group them into
three classes: (1) four cloud forest sites (Duan Hok, Dadfa, Mt. Nom, and Doi Intanon
stations), two tropical forests (Huilek and Khao Nan headquarters stations) and two
coastal sites (Khanom and Muang Nakhonsithammarat stations). Further studies on
other climatic data such as rainfall, relative humidity, and solar radiation should be
studied.

REFERENCES
1. Bruijnzeel, L. A., and Proctor, J., Tropical Montane Cloud Forest, 1995, 38-78.
2. Still, C.J., Foster, P.N., and Schneider, S.H. Nature., 1999, 398, 608-610.
3. Weathers, K.C., Tree., 1999, 14, 214-215.
4. Foster P., Earth-Science Rev., 2001, 55, 73-106.
5. Chang, S.C., Lai, I.-L., and Wu., J.-T., Atmos. Res., 2002, 64, 159-167.
6. Brubb, P., May, L., Miles, L., and Sayer, J., UNEP-WCMC, Cambridge, UK.
7. Suzuki, E., Stat. Climatol. Dev. Atm. Sci., 1980, 13, 1-381.
8. Von Storch, H., and Zwiers, F. W. Statistical Analysis in Climate Research. Cambridge
University Press, Cambridge, 1999, 34-5.

ACKNOWLEDGMENTS
program for Biodiversity Research Training grant BRT R351151, BRT T351004, BRT
T351005, Walailak University Fund 05/2552 and 07/2552, WU50602, and Center of
Excellence for Ecoinformatics, the Institute of Research and Development, Walailak
University and NECTEC. We thank Mt. Nan National Park staff for their invaluable
assistance in the field.

D00017
March 23-26, 2010
287
Band structures and thermoelectric properties of CuAlO
2

from first-principles calculations

P. Poopanya
1c
, and A. Yangthaisong
2

1, 2
Computational Materials and Device Physics Group, Department of Physics,
Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand.
c
E-mail: lookdok@hotmail.com; Fax: 045-288381; Tel. 086-640179

ABSTRACT
The thermoelectric properties of delaffossite material CuAlO
2
have been investigated
based on semiclassic Boltzmann theory, starting with the first principles electronic
structure. In particular, the lattice constant and band structures are calculated by using
the total energy plane-wave pseudopotential method as implemented in CASTEP code.
It is shown that the lattice constants of CuAlO
2
in space group R-3m are a=2.888 ,
c=17.128 . By inspecting the calculate band structures, it has been found that the
material is an indirect band material ( -F point) which has the energy gap of 1.133 eV.
The calculated band structures are then used in combination with the Boltzmann
transport equation solver under the constant relaxation time approximation to calculate
electrical transport quantities. Seebeck coefficient decreases exponentially at low
temperature, while the power factor with respect to scattering time
2
/ S rises
dramatically when temperature is more than 160 K. At room temperature, Seebeck
coefficient and the power factor are 389.321 VK
-1
, 1.565x10
11
Wm
-1
K
-2
s
-1

respectively. The doping strongly effects on the conductivity of material at low
temperature, but becomes less important at high temperature.

Keywords: Band structures, CASTEP, Thermoelectric properties.

1. INTRODUCTION
Delafossite structure oxides ABO
2
have been attracting much interest due to their layered
structure and large variation in chemical and physics properties. The delaffossite CuAlO
2
is a
typical p-type transparent conducting oxide. Its applications are transparent electrodes in flat
panel display and touch panel. In order to study the electrical, optical and thermal properties
of the material, the energy band structure has been investigated. Shi et al. [1] calculated the
electronic structures, optical, and hole conductivities of CuAlO
2
. It is found that CuAlO
2
has a
direct optical band gap of 3.2 eV and an indirect band gap of 2.08 eV ( F point). Hamada
et al. [2] have performed ab intio calculations of the electronic structure and energetic of
native defect in CuAlO
2
. From the density of states (DOS) of CuAlO
2
, it has been concluded
that the valence state is mainly composed of the hybridized band of the Cu-d states and O-p
states. The band structure of CuAlO
2
is an indirect band with a value of 2.1 eV band gap . In
addition, the copper vacancy and oxygen interstitial are relevant defect in CuAlO
2
. Falabretti
et al. [3] calculated the band structure to investigate the charge neutrality level (CNL) in
various transparent conducting oxides (SnO
2
, CuAlO
2
, and CuInO
2
). The CNL energy
describes the doping polarity achievable in the systems and can be used to predict the
systems response to doping.

There has been considerable interest in nontraditional thermoelectric materials since the
surprising discovery of high thermoelectric performance in oxides, particulary Na
x
CoO
2
.
Recently, delaffosite materials such as CuAlO
2
exhibits a large thermopower [4-5] and
D00017
March 23-26, 2010
288
promising thermoelectric performance may be achieved. The performance of thermoelectric
material can be considered by the dimensionless figure of merit ZT . It is defined as

2
S T
ZT

(1)

where , and S are the electrical conductivity, thermal conductivity and Seebeck
coefficient respectively. Thus, a good thermoelectric material should have a high the power
factor
2
S and low thermal conductivity. However, the increase in Seebeck coefficient leads
to the decrease in the electrical conductivity in materials and hence a low figure of merit.
Therefore, these parameters have been widely studied in an attempt to achieve a high value of
ZT . Banerjee et al. [4] have synthesized the CuAlO
2
thin films by dc-sputtering method. It
was found that the films showed high values of Seebeck coefficient in the range of 120-230
VK
-1
at room temperature. The power factors were 1.1x10
-7
and 7.5x10
-5
Wm
-1
K
-2
at 300 K
and 500 K respectively. Park et al. [5] have prepared polycrystalline samples of CuAlO
2
by a
solid state reaction method. The power factor was measured at 1140 K, it was found that the
power factor increased from 4.98x10
-5
to 6.62x10
-5
Wm
-1
K
-2
when increased the temperature
from 1433 to 1473 K. Brahimi et al. [6] used the flux method for the growth of single crystal
CuAlO
2
. The results showed that the value of the conductivity is 5.7x10
-5
Sm
-1
and the
themopower was 980 VK
-1
at room temperature. Seebeck coefficient decreased
exponentially and at 260 K reached saturation.

The space group of CuAlO
2
studied here is R-3m with the hexagonal atom positions of Cu
(0, 0, 0), Al (0, 0, 0.5), O (0, 0, z)[7]. There are two steps for calculations of thermoelectric
properties of CuAlO
2
. First, the lattice constant, density of states and band structure are
calculated by using the plane-wave pseudopotential method based on the density functional
theory (DFT), as implemented in CASTEP code. The local-density approximation (LDA) is
used for the exchange and correlation interactions. Only valence electrons are taken into
account and represented as Cu-3d
10
4s
1
, Al-3s
2
3p
1
and O-2s
2
2p
4
. A plane-wave cutoff energy
600 eV is employed throughout the calculation. Geometry optimization is determined using
the Broyden-Fletcher Goldfarb-Shenno(BFGS) minisation technique, with the thresholds for
converged structure: energy change per atom less than 110
-5
eV/atom, residual force less
than 0.03 eV/, stress below 0.05 GPa and the displacement of atoms during the geometry
optimisation less than 0.001 . The tolerance in the self-consistent field (SCF) calculation is
1.010
-6
eV/atom. The Brillouin zone was sampled by 121212 mesh generated
according to the Monkhorst-Pack scheme to ensure well convergence of the computed
structures and energies. Second, the calculated band structures are then used in combination
with the Boltzmann transport equation solver under the constant relaxation time
approximation to calculate Seebeck coefficient and the power factor. Our approach is the
same as those reported by D. J. Singh[8]. The diagram of our approach can be illustated in
Figure 1.

Figure 2(a) shows the optimised structure of conventional cell of CuAlO
2
obtained
with the lattice constants of a = 2.888 and c = 17.128 . Note that our lattice
parameters a and c of this work are different from the experimental data [5] by 1.12%
and 1.09% respectively. Our results are in agreement with other studies as shown in
Table 1.
D00017
March 23-26, 2010
289
It is worthwhile to mention that the key problem with all LDA (GGA) calculations on
semiconductors and insulators is that they underestimate the band gap compared to
experiment. The methods beyond the LDA with give correct band gap are required. These
calculations include GW approximation[13], B3LYP[10], LDA plus U[14], screened
exchange (sX)[15], and weighted density approximation (WDA)[16]. It is very instructive to
investigate further and in fact, our calculations utilizing sX is being studied and will be
reported elsewhere. In this work, we correct this problem by an empirical upward shift of the
conductional band to the experiment value of 3.0 eV[12] to obtain more realistic
thermoelectric properties.

The first step:
First Principle calculations
Lattice constants, energy band structure,
and density of states
Output
The scond step:
The classic Boltzmann theory
Input
Thermoelectric properties
Output

Figure 1. Diagram of calculations

Table 1. The lattice parameters of CuAlO
2
.

Reference CuAlO
2
(R-3m) a () c ()
[5] Experimental 2.856 16.943
[1] Calculated 2.828 16.852
[2] Calculated 2.831 16.871
This work Calculated 2.888 17.128

D00017
March 23-26, 2010
290

(a) (b)

Figure 2. (a) Structures used in the first principles calculations of CuAlO
2
.
(b) Energy band structure of the delafossite material CuAlO
2
.

(a) (b)

Figure 3. Total (a) and partial (b) density of states of CuAlO
2
.

The calculated total density of states and partial density of states (PDOS) at equilibrium
lattice constants are illustrated in Figure 3(a) and 3(b) respectively. Note that PDOS of each
state are contributions for each state from all atoms. The Fermi energy (
F
E ) is located at 0
eV. The bottom of conduction band is dominated by p state (Al-3p) and s state (Cu-4s) and
the top of valence band is dominated by d state (Cu-3d) and p state (O-2p), as seen from the
analysis of the partial density of states in Figure 3(b).

D00017
March 23-26, 2010
291

(a) (b)

Figure 4. Transport coefficient as a function of temperature:
(a) Seebeck coefficient.
(b) Power factor with respect to scattering time. .

The calculated Seebeck coefficient is plotted as a function of temperature in Figure 4. It
can be seen that Seebeck coefficient decreases dramatically at low temperature and remains
constant at high temperature, in agreement with the experimental results[6]. At room
temperature, Seebeck coefficient is 389.321 VK
-1
. On the other hand, the power factor
increases with increasing temperature with threshold temperature of 160 K. Thus it is
expected that the material behaves as a thermoelectric material at this temperature.
The temperature dependence of conductivity is calculated and illustrated in Figure 5(b).
By considering the effects of doping, the different values of Fermi level are studied by
varying from 2.55 2.70 eV as depict as a band scheme in Figure 5(a). It is notice that if the
Fermi energy level is close to the top of valence band, the conductivity is large because
electrons in valence band

Conduction Band
Valence Band
E
F
h
h
h
h
h
h h
e e
e
e
e
e
e
Middle of gap
2.48 eV
3.98 eV
4.48 eV

(a) (b)

Figure 5. (a) Band scheme for p-type semiconductor
(b) Temperature dependence of conductivity

D00017
March 23-26, 2010
292
can be excited into the acceptor level easily and leave holes in the valence band. These holes
behave as carriers move in the valence band. However, this effect becomes less important at
high temperature since more electrons can be thermal excited to the acceptor level.

4. CONCLUSION
The lattice constants and the energy band structure are calculated by first principle
method. The calculated energy band structure is then used to calculate the thermoelectric
properties of CuAlO
2
and the obtained results are comparable to the experimental results.

REFERENCES
1. Shi, L.J., Fang, Z.J., and Li, J., J. App. Phys., 2008, 104(073527), 1-5.
2. Hamada, I., and Katayama-Yoshida, H., Physica B, 2006, 376-377, 808-11.
3. Falabretti, B., and Robertson, J. J. Appl. Phys., 2007, 102(123703), 1-5.
4. Banerjee, A.N., Maity, R., Ghosh, P.K., and Cahattopadhyay, K.K., Thin solid films,
2005, 474, 261-6.
5. Park, K., Ko, K.Y., and Seo, W.-S., J. Euro. Cer. Soc., 2005, 25, 2219-22.
6. Brahimi, R., Bellal, B., Bessekhouad, Y., Bouguelia., and Trari, M., J. Cryst. Growth.,
2008, 310, 4325-9.
7. Li, J., Sleight, A.W., Jones, C.Y., and Toby, B.H., J. Sol. State Chem., 2005,
178, 285-94
8. Madsen G. K. H. and Singh D. J., Com. Phys. Comm., 2006, 175, 67-71.
9. Robertson, J., Xiong, K., and Clark, S.J., Thin solid films, 2006, 496, 1-7.
10. Dittrich, Th., Dloczik, L., Guminskaya, T., and Lux-Steiner, M.Ch., Appl. Phys. Lett.,
2004, 85(5), 742-4.
11. Kim, D.S., Park, S.J., Jeong, E.K., Lee, H.K., and Choi, S.Y., Thin solid films, 2007, 515,
5103-8.
12. Pellicer-Porres J., Segura A., and Kim D., Semicond. Sci. Technol., 2009, 24, 01502.
13. Aryasetiawany, F., and Gunnarssonz, O., Rep. Prog. Phys., 1998, 61, 237312.
14. Dovesi, R., Orlando, R., Roetti, C., Pisani, C., and Saunders, V.R., Phys. Stat. Sol. (b),
2000, 217, 63-88.
15. Anisimov, V.I., Zaanen, J., and Andersen, O.K., Phys. Rev. B., 1991, 44(3), 943-54.
16. Bylander, B.M., and Kleinman, L., Phy. Rev. B., 1990, 41(11), 7868-71.

ACKNOWLEDGMENTS
This work has partially been supported by the National Nanotechnology Center
(NANOTEC), National Science and Technology Development Agency (NSTDA),
Ministry of Science and Technology, Thailand, through its Computational
Nanoscience Consortium (CNS). A.Y. is very grateful to S. J. Clark of Durham
University, UK for use of his codes.
D00018
March 23-26, 2010
293
Electronic Structures and Thermoelectric Properties of
SrTiO
3

T. Chanapote
1
, A. Yangthaisong
1c
, S. Vannarat
2

1
Computational Materials and Device Physics Group, Department of Physics, Faculty of Science,
Ubon Ratchathani University, Ubonratchathani, 34190, Thailand
2
National Electronics and Computer Technology Center111 Thailand Science Park, Paholyothin Rd,
Klong 1, Klong luang, Pathumthani, 12120, Thailand

c
E-mail: a.yangthaisong@physics.org, Tel : 086-8657805

ABSTRACT
The electronic structures and density of states of SrTiO
3
have been calculated by the first
principles calculations using density functional theory. In addition to the calculations
performed within the framework of local density approximation (LDA) and generalized
gradient approximation (GGA) like other most first principles calculations. The calculated
band energies are then used in combination with the Boltzmann transport equations to
obtain thermoelectric properties such as seebeck coefficient, electrical conductivity,
electronic thermal conductivity and the approximation of figure of merit.

Keyword: Electronic structure; Density functional theory; Boltzmann transport equation

1. INTRODUCTION
The search for new thermoelectric materials is a quest to maximize the dimensionless
figure of merit
2
( / ) zT T S o k = , where S is the Seebeck coefficient, o and k are the
electronic and thermal conductivity, respectively. zT quantifies the performance of a
thermoeletric, and one must therefore maximize the power factor
2
S o and minimize k . As
S , o and k are coupled and all depend strongly on the detailed electronic structure, carrier
concentration, and crystal structure, the task of finding new compounds with large values of
zT is very challenging.
ABO
3
-type perovskite materials are important for numerous technological applications in
electro-optics, waveguides, laser frequency doubling, high capacity computer memory cells,
etc [1]. Recently, it has been reported that a single crystal of La doped SrTiO
3
exhibit a
comparable high power factor to conventional thermoelectrics at room temperature [2]. In
fact, a solid solution of SrTiO
3
and BaTiO
3
is proposed to reduce the thermal conductivity and
hence an enhanced thermoelectric performance might be achieved [3].
In main purpose of this paper is to investigate the thermoelectric properties of SrTiO
3
via
by employing the band structure calculated from density functional theory in conjunction with
the Boltzmann transport equation. The paper is organized as follows. Section 2 describes the
theory used and related works. Section 3 presents computational details. The results and
discussions of our investigation are presented in Section 4. A conclusion has been made in
section 5.

In order to predict thermoelectric properties of materials, the BoltzTraP program based on
Boltzmann theory developed by D.J. Singh group [4] have been modified and used in this
paper. In summary, by using the calculated band structure in conjunction with the
Boltzmann transport equation and the rigid band approach as described in details in
D00018
March 23-26, 2010
294
[4], the conductivity of material is based on the transport distribution

,
,
( )
1
( , )
i k
i k
T
N d
o| o|
o c c
o u o
c
=

(1)
The transport tensors, Eq. (1), can then be calculated from the conductivity distributions

( ; )
1
( , ) ( )
f T
T d
u
o| o|
c
o u o c c
c
c
=

O c

(2)

( ; )
1
( ; ) ( )( )
f T
T d
eT
u
o| o|
c
u u o c c u c
c
c
=

O c

(3)

0 2
2
( ; )
1
( ; ) ( )( )
f T
T d
e T
u
o| o|
c
k u o c c u c
c
c
=

O c

(4)

where is the electrical conductivity, is group velocity,
0
is the electronic part of the
thermal conductivity, is the Fermi-Dirac distribution function, T is absolute temperature,
and is chemical potential. It is worthwhile to mention that in the rigid band approach, the
bands and hence ( ) o c are left fixed. This means that only band structure calculations are
required, simplifying the calculations. To further simplify the problem, one can also assume
that Seebeck coefficient (S) is independent of relaxation time (), hence it can be written
as
1
S o u
= . Furthermore, only electronic thermal conductivity is considered here and it can

be defined as

0 1
( )
e
ij ij i j
T
o |o |
k k u o u
= (5)

In fact, the electronic thermal conductivity,
e
can be aproximated by Wiedemann-Franz law

2
2
3
e B
ij ij
k
T
e
t
k o
| |
=
|
\ .
(6)

In this paper we used
0

as the approximation of
e
to simplify the calculation. Since the
electrical conductivity and electronic thermal conductivity can only be calculated with respect
to the relaxation time, , the approximation of zT can be calculated and limited by

2
2 2 2
( / ) ( )
/ /( ) 156 /
e e e
S T S T S S
zT
T V K
o t o
k t k k o u
| |
= = = <
|
\ .
(7)


The SrTiO
3
structure is displayed in Fig. 1. In this structure oxygen atoms, located in the
face-center positions of a cubic unit cell, form a perfect octahedron with the titanium atom in
a
0
Fig. 1. Cubic perovskite structure of SrTiO
3
. The
length a
0
is the lattice constant.

D00018
March 23-26, 2010
295
its center, and strontium atoms lying outside the oxygen octahedron, in corner of a cube, the
a
0
is the cube lattice constant (we used a
0
= 3.90, the experiment value for our initial input
value [5]). The calculations of electronic structures of cubic perovskite SrTiO
3
have been
performed using the plane-wave pseudopotential code CASTEP [6]. In this code the Khon-
Sham equations are solved within the framework of density functional theory by expanding
the wave functions of valence electrons in a basis set of plane waves with kinetic energy
smaller than a specified cut-off frequency. The interaction between the valence electrons and
the core represented by norm-conserving pseudopotentials[8]. The states Sr(4s
2
4p
6
5s
2
),
Ti(3d
2
6s
2
) and O(2s
2
2p
4
) are treated as valence states. Note that a plane wave basis set cutoff
energy is 600 eV. The electronic exchange and correlation effects were described by local
density approximation (LDA) and generalized gradient approximation (GGA). The structural
parameters of SrTiO
3
were determined using the Broyden-Fletcher-Goldfarb-Shenno(BFGS)
minimization technique, with the threshold for converged structure: energy change per atom
less than
5
1 10
eV, residual force less than 0.03 eV/ , stress below 0.05 GPa and the
displacement of atoms during the geometry optimisation less than 0.001 . The Brillouin
zone was sampled by a 4 4 4 mesh generated according to the Monkhorst-Pack scheme to
ensure well convergence of the computed structures and energies.

The optimised lattice constants cubic SrTiO
3
are 3.88674 (LDA) and 3.92551
(GGA). Note that the experiment value is 3.90 eV[1]. The electronic structures of
cubic SrTiO
3
and its density of states in GGA calculation are shown in Fig. 2. It shows that
SrTiO
3
is a semiconductor with energy gap of 2.10 eV and 1.964 eV(not shown) obtained
from GGA and LDA calculations repectively. The experimental indirect band gap of SrTiO
3

is 3.25 eV[7]. Nonetheless, our calculations agree well with other calculations reported[1].

Figure 2. Electronic structure of cubic perovskite SrTiO
3
in GGA calculations.

It is worthwhile to mention that the key problem with all LDA(GGA) calculations on
semiconductors and insulators is that they underestimate the band gap compared to
experiment. The methods beyond the LDA with give correct band gap are required. These
calculations include GW approximation [8], B3LYP [9], LDA plus U [10], and screened
exchange (sX) [11]. It is very instructive to investigate further and in fact, our calculations
utilizing sX is being studied and will be reported elsewhere. In this work, we correct this
problem by an empirical upward shift of the conduction band to the experiment value of 3.25
eV to obtain more realistic thermoelectric properties.

Since the BoltzTrap code developed by D. J. Singh is designed to work with WIEN2k, an
augmented plane wave plus local orbitals program. A modification is required to interface the
code with CASTEP code employed here. Figure 3 shows the calculated seebeck
coefficients using the band structure imported from WIEN2k and CASTEP codes.
Egap = 2.10 eV
D00018
March 23-26, 2010
296
The studied structure is a well know thermoelectric material, CoSb
3
. It can be seen
that our results agree very well with those obtained from the original code.

Figure 3. Comparison of seebeck coefficient of CoSb
3
calculated from Wien2k and CASTEP
band structures.

The electronic thermal conductivity
0
calculated using eqs. (5), (6) are shown in
Fig. 4. It can be seen that electronic thermal conductivity obtained from both
expressions are almost the same, hence we employ
0
as
e
in our calculations.

Figure 4 The electronic thermal conductivity at zero electric current
0

and
e
from Wiedemann-Franz law

The calculated thermoelectric properties of SrTiO
3
in comparison with other
well known thermoelectric materials, -FeSi
2
(from CASTEP code, gap shifted),
Bi
2
Te
3
and CoSb
3
(from WIEN2k code[3], gap not shifted) at T = 300 K are shown in
Fig. 5.

(a) (b)

(C) (d)
D00018
March 23-26, 2010
297

(g) (h)

Figure 5. Thermoelectric properties; (a) Carrier concentration (b,c) Seebeck coefficient (d,e)
Electrical conductivity (f) Electronic thermal conductivity (g) Power factor and
(h) Dimensionless figure of merit

Fig. 5 shows that SrTiO
3
is a good thermoelectric material with zT of about 1. Our
calculated seebeck coefficient is in range -2000 V/K - 2000 V/K at T= 300 K depending
on carrier concentration.
From [5], seebeck coefficient of rare earth doped Sr
0.9
R
0.1
TiO
3
(R=La, Sm, Gd, Dy, Y) is
about -100 V at T = 300 K. We can approximate the carrier concentration by shifting the
fermi energy to that condition and the carrier concentration calculated for T=300 K to 1000 K
is in range -1.75 10
-4
to -3.50 10
-4
e/u.c. showing it is n-type doped semiconductor. The
calculated seebeck coefficients are ranging from -200 V to - 100 V, comparable to the
experiment values of -250 V to -100 V [5]. The electrical conductivity is in order 10
4

1/.m (constant relaxation time used is 1 10
-14
s) , in a very good agreement with [5] which
in order 10
3
10
4
1/.m. For
e
, our calculation is in range 2 20 W/m.K (constant
relaxation time used is 1 10
-14
s) which is more than [5] that reported in range 2 6
W/m.K but so close at low temperature. Our calculations predicts the figure of merit (Z) of
the n-type doped Sr
0.9
R
0.1
TiO
3

(R=La,Sm,Gd, Dy,Y) in range 5 10
-4
10 10
-4
1/K which is more than that reported in
range 1 10
-4
10 10
-4
1/K [5]. This may be the effect of lattice thermal conductivity
which is not considered here.

5. CONCLUSION
Our calculation shows that band structure from CASTEP used in combination with
the Boltzmann transport equations from modified BoltzTraP program is able to predict
electronic properties of crystals effectively. Our calculation shows that doped SrTiO
3
can be considered as a high thermoelectric performance material.

(e) (f)
D00018
March 23-26, 2010
298
REFERENCES
1. S. Piskunov et al., Com. Mat. Sci., 29 (2004) 165-178.
2. T. Okuda, K. Kurosaki., S. Miyasaka, Y. Tokura, Phys. Rev. B. 63(2001)113104.
3. H. Muta, K. Kurosaki, S. Yamanaka, J. Alloy. Compd., 368(2004) 22-24.
4. G.K.H Madsen, D.J. Singh, Comp. Phys. Comm. 175 (2006) 67-71.
5. Y.A. Abranov, V.G. Tsirelson, Acta Cryst. B 51 (1995) 942.
6. S.J. Clark et al., Zeitschrift fuer Kristallographie, 2005, 220(5-6), 567-570.
7. K. van Benthem, C. Elsasser, R.H. French, J. Appl. Phys. 90 (12) (2001) 6156.
8. F. Aryasetiawany, and O. Gunnarssonz, Rep. Prog. Phys., 1998, 61, 237312.
9. R. Dovesi, R. Orlando, C. Roetti, C. Pisani, and V.R. Saunders, Phys. Stat. Sol. (b),
2000, 217, 63-88.
10. V.I. Anisimov, J. Zaanen, and O.K. Andersen, Phys. Rev. B., 1991, 44(3), 943-54.
11. B.M. Bylander, and L. Kleinman, Phy. Rev. B., 1990, 41(11), 7868-71.

ACKNOWLEDGMENTS
T. C. acknowledges the financial support from the National Science and Technology
Development Agency through YSTP project. This work has partially been supported by
National Nanotechnology Center (NANOTEC), National Science and Technology
Development Agency (NSTDA), Ministry of Science and Technology, Thailand, through its
Computational Nanoscience Consortium (CNC). A. Y. acknowledges Dr S. J. Clark, Durham
University, UK for providing his code. The computing resources through SILA clusters at
Ramkhamhaeng University are very grateful.
D00020
March 23-26, 2010
299
The Study of Illuminance and Thermal Effect in
High Power LED Arrays

P. Premvaranon
C
, Y. Pratumwal, A. Teralapsuwan and J. Soparat
National Metal and Materials Technology Center 114 Thailand Science Park, Paholyothin Rd., Klong
1, Klong Luang, Pathumthani 12120
C
E-mail: piyapp@mtec.or.th; Fax: 02-5646370; Tel. 02 564 6500 ext 4355

ABSTRACT
In this paper, the illuminance and thermal distribution of a high power light emitting
diode (LED) array on a heat dissipater have been explored using ray tracing and finite
element method. To evaluate the illumanance and beam pattern of the LED array, the
ray tracing method was use to calculate the ray path and illumination of LED array on
the working area. Then, the 3D finite element model was used to predict the thermal
distribution on the LEDs and heat dissipater comparing to the IR thermal measurement
obtained from the prototype testing. Next, The LED power density was varied to
further investigate the impacts on beam quality and thermal performance of the LED
array system. Finally, with the suitable heat conducting coefficient, the heat dissipater
can be designed to reduce the high operating temperatures at the LED junction,
resulting in the better light output and longer lifetime.

Keywords: High Power LED, Thermal Effect, Illuminance, Heat Dissipater.

1. INTRODUCTION
Light emitting diode (LED) presently plays a very important role in various illumination
applications, due to its advantages in reliability, efficiency, durability and power consumption
as well as colors. Typically, the LEDs are used in the display instrument such as mobile
phone screen, LCD display or automotive interior lighting. In the near future, LEDs will
move their paces into the more challenging arena in the general illumination area. Currently,
the high power LED can generate the maximum lumen at 170 lm/bulb, which falls short
comparing to the traditional light source. However, to compete with the traditional light
source such as incandescent or fluorescent lamp, a multiple of high power LEDs is inevitably
used in order to obtain more lumens. Due to the use of those high power LED arrays, the
excessive heat concentration within semiconductor chip will increase. Consequently, the
higher temperature at the LED junction will reduce the light output and speed up the chip
degradation. The lifetime of LED can decrease from 42,000 h to 18,000 h, if the temperature
increases from 40C to 50 C.Therefore, both illuminance and thermal management of LED
arrays must be considerately designed to meet illuminating demand and thermal performance.
In this paper, 30 high power LEDs are mounted on an aluminum heat dissipater to use as a
30 W-90W variable light source for a street lamp. Firstly, the beam quality, such as
illuminance and beam pattern of the LED arrays on the road area is investigated by applying
the ray tracing method. Next, The thermal distribution on the LEDs and heat dissipater is
analyzed by 3D finite element heat transfer model. Then, the appropriate heat convection
efficiency is figured out by comparing the experimental results obtained from IR thermal
camera and thermocouple with the simulation results. Then, the heat dissipater are redesigned
to improve the thermal performance of the LED system at varied power input. The results
gained from this study will be used as a guideline in evaluating the beam quality and thermal
performance of the new street lamp design prior to real physical prototyping.

D00020
March 23-26, 2010
300

2.1. Monte Carlo Ray Tracing
In ray tracing method, light is considered as an electromagnetic wave traveling through
space. A light ray is defined as a line normal to the direction of wave propagation. A light ray
or ray obeys the laws of geometrical optics and can be transmitted, reflected, and refracted
through an optical-mechanical system following the Snells law as described below;

2 2 1 1
sin sin u = u n n (1)
where
1
n and
2
n are the refractive indices of medium 1 and 2, respectively,
1
u is the
incident angle of light ray with respect to the normal, and
2
u is the refracted angle.
For Monte Carlo ray tracing, the luminous flux of light source can be expressed with
photon bundles P. The emission and direction of the luminous flux can be directly measured
or determined randomly based on the emission characteristics of light source. The emitted
photon bundles (rays) from the light source travel through the system following the ray paths
determined by the law of geometrical optics. When each photon bundle encountering the
surface, its reflection, refraction or absorption is determined with a random number. After
each ray passes all of surfaces in the system, the illuminance, E of each surface is calculated
from the number of photon bundles;

A
F
P
P
E
O
=
(2)
where
O
P is the total number of photon bundles emitted from the light source, P is the
number of photon bundles intercepted by the surface, F is the total luminous flux of light
source, A is the area of the surface.
The principle of inverse square law is used in the simulation. The luminous intensity is
related to illuminance, shown as the following,

2 2
D
A
F
P
P
D E I
O
|
|
.
|
\
|
= =
(3)
where D is the distance between light source and the detector surface.

2.2. Heat Transfer
In this paper, the LEDs and the finned aluminum heat dissipater were modeled as solids
that transport energy by conduction. For the LED, the heat is generated from the
semiconductor chip and conducted through epoxy lens before exposed the LED outter's
surface by natural convection. Hence, the heat transfer path has been conducted through the
bottom of LED and out to ambient air through the finned heat dissipater, where the radiative
effects have been ignored in this analysis. Therefore, the heat transfer equation for conduction
and convection would be presented in the calculation.

Figure 1. General heat transfer in 3-Dimension.
D00020
March 23-26, 2010
301

Governing heat transfer equation

t
T
c Q
z
q
y
q
x
q
p
z
y
x
c
c
p = +
|
|
.
|
\
|
c
c
+
c
c
+
c
c
(4)
where
x
q ,
y
q and
z
q are the heat flow rate in x, y and z axis, respectively, Q is an
internal heat generation, p is the mass density,
p
c is the specific heat, and T is the
temperature changing by time t .
The essential ingredients of forced convection heat transfer analysis were given by
Newton's Law of Cooling,
( ) T A h T T A h Q
w
A = =

(5)
The rate of heat Q
transferred to the surrounding fluid is proportional to the object's

exposed area A, and the difference between the object temperature
w
T and the fluid free-
stream temperature
T . The constant of proportionality h is termed the convection heat-

transfer coefficient.
By applying the method of weighted residuals (MWR) with Bubnov-Galerkin technique
and the boundary conditions, the finite element equations could be represented in the
following form
[1]
,

| |{ } | | | | | | | | { } { } { } { } { }
r h q Q c r h c
Q Q Q Q Q T K K K T C + + + + = + + +
(6)


3.1. Ray Tracing Analysis
In ray tracing analysis, each LED is modeled in 3D geometry assembling of semiconductor
chip, PC solder, aluminum base, aluminum ring, silicone encapsulant and PMMA bulb. For
LED system modeling, The 30 w to 90 w variable white light source consisting of 30 high
power LEDs with 90 degree viewing angle (FWHM) are mounted on finned aluminum heat
dissipater. Figure 2 shows the model of LED assembly and LED arrays on the heat dissipater.
To evaluate the beam quality of this LED arrays, the LED system was installed at 7 m. and
10m. high and illuminated light on the 10 x 10 sqm. working area as shown in Figure 3. First
of all, the CAD model was imported and material's optical properties including the surface
characteristics of each component were applied. After launching the ray tracing calculation,
the position and direction of photon bundles were calculated, these photon bundles were
traced from the LED light source to the reflector ring surface until it leaved the PMMA bulb
and strikes on the sensor on the working area. At the sensor panel, the intersection of each
rays and detector surface were identified and accounted for beam pattern, illuminance and
intensity following the previous formulae.

D00020
March 23-26, 2010
302

Figure 2. CAD model of high power LED source and the arrays of 30 LEDs on heat
dissipater.

Figure 3. Simulation setting up for illuminance and beam pattern evaluation.

3.2. Heat Transfer Analysis and Experimental

In order to asses the thermal performance of the system, the 30cm. x 15cm. aluminum heat
dissipater with mounted 30 LEDs exposed to natural convection was analyzed by using finite
element method. Only one fourth of the LED system in Figure 2 was used in heat transfer
analysis due to its symmetry in layout. For our investigation, the heat source is generated by
the semiconductor chip and conducted through encapsulant, lens, and fins of heat dissipater
before exposed to ambient air by natural convection. By ignoring the effect of radiation, a
range of convection coefficient was used in calculation and comparing with the data obtained
from thermocouple and IR camera measurement as shown in Figure 4 to select the appropriate
convection heat coefficiency for heat dissipater redesign.

D00020
March 23-26, 2010
303

Figure 4. The LED system prototype and thermal measurement by IR camera.


4.1. Ray Tracing Analysis Result

For illumination assessment, the arrays of 30 LEDs are used as a variable light source with
the drive current at 350 mA., 600 mA. and 875 mA. to give 100%, 150% and 200%
luminous flux sequentially, where the 100 % luminous flux of LED is 100 lm. Therefore, this
LED system can generate 3000 lm. to 6000 lm. at the power consumption at 30W -90W. To
explored the possibility of using high power LEDs for street lighting application, the main
criteria in beam quality evaluation i.e. illuminance, beam pattern and intensity distribution on
the 10 x 10 sqm. working area at 7m. and 10m. installation height was calculated by ray
tracing method. Table 1. shows the ray tracing calculation results and Figure 5-6 shows the
beam pattern and intensity distribution at 7m. and 10m. high.

Table 1. Illuminance from ray tracing calculation at various driving current for 7m.
and 10m. installation height.

Distance
Drive 100%, 350 mA Drive 150%, 600 mA Drive 200%, 875 mA
illumination (lux) illumination (lux) illumination (lux)
7 m
Max. 34.48 Max. 51.38 Max. 68.19
Min. 5.203 Min. 7.816 Min. 10.46
Ave. 16.85 Ave. 25.28 Ave. 33.71
Min. / Ave. 0.3088 Min. / Ave. 0.3092 Min. / Ave. 0.3103
Min. /
Max.
0.1509
Min. /
Max.
0.1521
Min. /
Max.
0.1534
10 m
Max. 17.02 Max. 25.26 Max. 33.68
Min. 5.328 Min. 7.936 Min. 10.64
Ave. 11.16 Ave. 16.74 Ave. 22.32
Min. / Ave. 0.4774 Min. / Ave. 0.4741 Min. / Ave. 0.4767
Min. /
Max.
0.3130
Min. /
Max.
0.3142
Min. /
Max.
0.3159

D00020
March 23-26, 2010
304

Figure 5. The beam pattern and intensity distribution of LED system at 7m. high

Figure 6. The beam pattern and intensity distribution of LED system at 10m. high

According to ray tracing analysis, although the illuminance of the LED system is quite
directly proportional to the drive current, due to the LED's luminous flux characteristic, but
this LED system, however, gives the same circular beam pattern and down-light intensity
distribution for all drive current as shown in Figure 5 and 6. Therefore, the E min/E average
ratio and E min/E max ratio of this LED arrays will constant at any power density. From table
1, when comparing the illuminance , beam pattern and intensity distribution curve of the LED
arrays at the 7m. and 10m. installation height, the results show that the illuminance is still
proportional to the drive current, but decreases with height follow to the inversed square law
as mention previously. From the simulation, the uniformity of light at 10m. high is better and
the beam pattern is broader than those of 7m. From the results above, the 30 high power
LEDs used in this study can illuminate the 100 sqm. area at the average illuminance from
11.2-33.7 lux depending on the drive current and installation height. In order to use this LED
system in street lamp application, the LED arrays should be applied with at least 600 mA
drive current (approximately 63W at forward voltage 3.5V) to obtain the average illuminance
20 lux on the working area at the 10m. installation height.

4.2. Thermal Analysis Result

For considering the thermal aspect of the LED arrays, the temperature distribution on the
heat dissipater surface was measured by thermocouple and IR thermal camera. In the
D00020
March 23-26, 2010
305
experiment, a prototype of the LED system was built and each LED was applied with 600 mA
drive current or 2W power density, the temperature at the center of heat dissipater's surface
read from the thermocouple is 46.86C and the ambient air temperature is 24.3C, while the
data from thermal camera show that the maximum temperature is approximately 42C at the
LED and the ambient air is 21.4C. Figure 7 shows the temperature distribution on LED
arrays and aluminum heat dissipater and Figure 8 is the temperature distribution of each
LED's arrays. By comparing the finite element analysis results with the measurement, the
proper convection heat coefficient at 6.85 W/m2 C can be figured out by trials and errors at
25C ambient air. The distribution of temperature on LEDs and heat dissipater was shown in
Figure 9.

Li1
Li2
Li3
21.4
42.5 ?C
25
30
35
40

Figure 7. The thermal distribution of LED system at 600 mA drive current

Figure 8. The graph shows temperature of each LED array

D00020
March 23-26, 2010
306

Figure 9. The thermal analysis of the symmetric LED system model

As shown in Figure 9, the temperature at the center of heat sink surface is 46 C, which
closes to the temperature read from thermocouple, while the maximum temperature locates at
the semiconductor chip of an LED at 51.3C and the temperature on heat dissipater is
approximately 43-46 C. Then, the same finite element model was employed to predict the
thermal distribution of LED system at 350 mA and 875 mA as shown in Figure 10 and Figure
11.

Figure 10. The thermal analysis of the symmetric LED system at 350 mA drive current

Figure 11. The thermal analysis of the symmetric LED system at 875 mA drive current

D00020
March 23-26, 2010
307
From the analysis results, when the drive current was varied to 350 and 875 mA, the
maximum temperature at the chip is 38 C and 64 C, consecutively. The temperature on the
heat dissipater surface is 35C for 350 mA drive current (1W) and 56C for 875 mA (3W).
The thermal data gained from this simulation can be used as a guideline in passive cooling
design including the housing design of this LED system.

5. CONCLUSION
In this paper, the ray tracing method is employed to evaluate the beam quality of the high
power LED array for street lamp application. According to the simulation, although, the array
of 30 LEDs can give the sufficient average luminance on the 100 sqm. working area at 10m.
high, but the down-light intensity distribution is not quite suitable to use as a highway
lighting. Therefore, the additional optical system must be required. With the data gained from
ray tracing analysis, engineers can determined the illuminance and beam pattern of the LED
lighting system in the early design stage. The information gained from the ray tracing method
can give useful information on an optical redesign. Besides, with the finite element method,
the thermal performance of the LED system can be predict and used as a guideline in heat
dissipater and housing design. By using the ray tracing and finite element methods, the
lighting quality and thermal performance of a LED lighting system can be predicted in the
design step without involving the costly fabrication and testing of multiple prototypes.

REFERENCES
1. M. Alan, Solid state lighting-a world of expanding opportunities at LED 2002, III-V
Review, 16 (1), 2003, 30-33.
2. X.B. Luo, T. Cheng, W. Xiong, Z.Y. Gan, S. Liu, IET Optoelectronics, 2007, 1(5), 191-
196.
3. L. Kim, M.W. Shin, Thermal analysis and design of high power packages and systems,
Proceedings of SPIE 6337 (2006) 63370U-1-63370U-9.
4. J. Petroski, Spacing of high-brightness LEDs on metal substrate PCB's for proper thermal
performance, in: Thermomechnical Phenomina in Electronic Systems, Proceedings of the
intersociety Conference, Las Vegas, NV, USA, 2004, 507-514.
5. Dechaumphai, P., Finite Element Method in Engineering, 3
rd
ed., Chulalongkorn
University Press, Bangkok, Thailand, 2004, 217-220, 261-266.
6. Kays, W.M., and Crawford M.E., Convective Heat and Mass Transfer, 3
rd
ed., McGraw-
Hill Inc., Singapore, 1993, 398.
7. Siegel, R., and Howell, J.R., Thermal Radiation Heat Transfer, 2
nd
ed., Hemisphere
Publishing Corporation, USA., 1981, 31.
8. Walther, A., The Ray and Wave Theory of Lenses, 1
st
ed., Cambridge University Press,
Cambridge, UK, 1995, 283-285.
9. Hecht, E., Optics, 4
th
ed., Addison Wesley, San Francisco, USA, 2002, 87-103, 128.
10. Murdoch, J.B., Illumination Engineering From Edisons Lamp to the Laser, 1
st
ed.,
Macmillan Publishing Company, New York, USA, 1985, 27-35.

ACKNOWLEDGMENTS
The authors are grateful to Asst. Prof. Wiroj Limtrakarn, Department of Mechanical
Engineering, Faculty of Engineering, Thammasat University for software and high
performance computers used in this project. The authors also wish to thank to the National
Metal and Materials Technology Center (MTEC) for funding and CAD software used in this
project.
D00022
March 23-26, 2010
308
The Critical Temperature of Transition Energy of Single
Quantum Well

W. Techitdheera
1,C
, K. Kulsirirat
1
, and W. Pecharapa
2
1
Physics Department, Faculty of Science, King Mongkuts Institute of Technology Ladkrabang,
2
College of KMITL Nanotechnology, King Mongkuts Institute of Technology Ladkrabang, Bangkok
10520, Thailand
C
E-mail: wdheera@gmail.com

ABSTRACT
The transition energy of GaAs/AlGaAs single quantum well heterostructure was
theoretically studied. Temperature-dependent energy gap of GaAs and AlGaAs were
calculated employing Vias and Psslers models. The effect of temperature on the
energy gap of both materials was then taken into account. The calculated results
revealed that temperature-dependent band gap had significant influence as temperature
is over than 100 K. The numerical method was utilized to calculate transition energy
corresponding to the recombination between first level electron and first level heavy
hole (e
1
-hh
1
) in quantum well formed by these two compounds. The simulated results
showed the critical temperature of transition energy. Below this temperature, the
calculated transition energy decreases with increasing temperature from 12 K to 200 K
meanwhile, opposite phenomena is obtained as the temperature is above this critical
point. The calculated results were in good agreement with experimental
photoluminescence results. Further details of results and discussion will be discussed at
the conference.

Keywords: Single Quantum well, Photoluminescence, Computational Physics.

1. INTRODUCTION
The GaAs/Al
x
Ga
1-x
As single quantum well is very important family of semiconductor
materials used as active elements of high performance optoelectronic and high-speed
electronic devices [1]. One of the most important parameter of any semiconductor material is
temperature-dependent energy gap. The temperature dependence of the band-gap energy can
be explained by the distinct mechanisms: the electron-phonon interaction [2], which has
significant influence of the first level electron (e1) and first level heavy hole (hh1) in quantum
well. However in this work transition energy of single quantum well have been calculated the
by numerical method. Critical temperature occurred around 100K has been found from our
investigation.

AlGaAs/GaAs heterostructure materials, a single quantum well (SQW) barrier widths 20 nm.
thick Al
0.3
Ga
0.7
As, wells widths 20 nm. thick. In this case it will be calculated in one
dimension of methods of solving Schrdingers equation which is just a second-order
differential equation. Consider the one-dimensional time-independent Schrodinger equation
can be written as follows

( ) ( ) | | ( ) z E z V z
z 2m
0
2
2 2
+
c
c
=

(1)
D00022
March 23-26, 2010
309
with this aim, consider expanding the second-order derivative in terms of finite differences
and
put more mathematically, the second derivative follows as

( ) ( ) ( ) ( )
z 2
z 2
z 2 z f z f
z 2
z f z 2 z f
dz
f d
2
2
+
~

then this finite difference representation of the second derivative can be simplified slightly by
substituting z for 2z, i.e.

( ) ( ) ( )
( )
2 2
2
z
z z f z f 2 z z f
dz
f d + +
~

using this form for the second derivative in the original Schrodinger equation (equation (1))
and taking the step length z as sufficiently small that the approximation is good the ~can
be dropped in favour of an = then

( ) ( ) ( )
( )
( ) | | ( ) z E z V
z
z z z 2 z z
m 2
0
2 *
2
+
+ +
=

which can finally be written as:

( ) ( ) ( ) ( ) ( ) ( ) z z z 2 E z V z
m 2
z z
2
2
*

+ = +
(2)
case of a potential symmetric about some origin (z = 0),V(z) = V(-z) which means that for the
symmetric potential above, the wave functions are either symmetric (z) = (-z) or anti-
symmetric: (z) = -(z). Odd-parity initial conditions (0) = 0, (z) = 1 now consider the
starting conditions for the symmetric (even-parity) solutions. The wave function is equal
(-z) = (z). Choosing z = 0 in the shooting equation (2) then

( ) ( ) ( ) ( ) ( ) ( ) z 0 2 E 0 V z
m 2
z
2
2
*

+ =
(3)
now, of course, (z) = (-z) therefore:

( ) ( ) ( ) ( ) ( ) 0 2 E 0 V z
m 2
z 2
2
2
*
+ =
(4)
as discussed above, as the wave function can be scaled by a constant factor without changing
the energy eigenvalues, then follows from equation (4) and summarizing: (0) = 1
in barrier

( ) ( ) | | 1 E 0 V z
m
(z)
2
2
*
+ =
(5)
in well
( ) | | 1 E z
m
(z)
2
2
*
+ =
(6)
analytical models used to describe the temperature dependence of the band gap energy or of
the excitonic transition energy [3]. The Psslers model is characterized by the expression

( ) ( )

|
.
|
\
|
+ = = 1
T 2
1
2
0 T E T E
p
p
g g
(7)
where E
g (T=0,x)
= 1.517+1.23x(eV) is the energy gap at zero Kelvin temperature,
S() -(dE(T)/dT)
T
(x) = 4.9+0.7x+3.7x
2
(10
-4
eV/K)

is the high-temperature limit
value for the forbidden gap entropy, is a characteristic temperature parameter of the
material representing the effective phonon energy k
B
, (x) = 202+5x+260x
2
(K) and p

is an empirical parameter related to the shape of the electron-phonon spectral function T is the
variation temperature [2]. The Vias model is characterized by the expression

( ) ( )

+ =
1 /T) exp(
2
1 E T E T E
B
B B th g
(8)
D00022
March 23-26, 2010
310
where
B
represents the strength of the electron-phonon interaction,
B
/k
B
is the
characteristic temperature parameter representing the effective phonon energy on the
temperature scale and E
g(T = 0)
= E
B

B
. [1]. the temperature dependence band-gap in well of
quantum well structure approach is defined by the expression

( )( )
( )
204 T
T 0.0005405
1.519 T GaAs E
2
g
+
=
(9)
In this work equations (7) and (8) where E
g
(T) represents to fit the temperature
dependence band-gap in barrier quantum well structure and equation (9) where E
g
(GaAs)(T)
represents to fit the temperature dependence band-gap in well of quantum well structure[4].

Computational implementation, for numerical solution used the equations (7) and (8) the
Psslers model and Vias model respectively for temperature dependence of optical band
gap energy in barrier (Al
x
Ga
1-x
As) and used the equations (9) for temperature dependence of
the band gap energy in well (GaAs) the equations (5) and (6), where m
*
c
is the effective
mass of the electron. We used m
*
c1
= 0.067m
0
in well (GaAs) [5], and used the effective mass
of the electron in barrier (Al
0.3
Ga
0.7
As) m
*
c2
= (0.067+0.083x)m
0
, where m
*
hh
is the effective
mass of the electron we used m
*
hh1
= 0.45m
0
in well (GaAs), and used the effective mass of
the electron in barrier (Al
0.3
Ga
0.7
As) m
*
hh2
= (0.51+0.25x)m
0
[6], x is Aluminum mole fraction
= 0.3 and used the band offset ratio 60:40 [4]. Then used shooting method and graphical
method to find energy level e
1
in conduction band and find energy level hh
1
in valence band.

The results of experiment by photoluminescence measurements obtained are shown in Figs 1.

Figure 1. Variation of PL intensity of GaAs/Al
0.3
Ga
0.7
As at different temperatures.

The temperature-dependent energy gaps of GaAs have revealed in Figs. (2). Its
clearly show that when temperature increases energy gap of GaAs decreases. The first level
electron (e
1
) and first level heavy hole (hh
1
) in quantum well were calculated by using Vias
model and Psslers model shown in Figs. (3) and Figs. (4) respectively. The calculated
transition energy were in good agreement with experimental photoluminescence results
shown in Figs. (5). Whatever we calculated with Via or Pssler models both figures
indicated that (e
1
+ hh
1
) energy increases with increasing temperature from 12K to 100K but
when increasing temperature from 100K to 200K the (e
1
+ hh
1
) energies decreased. It might be
come from the influence of electron phonon interaction on band gap like a prior report by
D00022
March 23-26, 2010
311
Sarkar[7]. He quoted that when increasing temperature from 12K to 100K the (e
1
+ hh
1
)
energy increases because influence of electron phonon interaction was a small, then
increasing temperature from 100K to 200K the (e
1
+ hh
1
) energy decreases because influence
of electron phonon interaction was more and more.

Figure 2. The temperature dependence of the band gap energy in well (GaAs).

Figure 3. The critical temperature of the first level electron (e
1
) plus first level heavy hole
(hh
1
) using Vias model.
D00022
March 23-26, 2010
312

Figure 4. The critical temperature of the first level electron (e
1
) plus first level heavy hole
(hh
1
) using Psslers model.

Figure 5. The comparison of transition energy between experimental data and calculation
data (Psslers model).

5. CONCLUSION
Temperature-dependent energy gap of GaAs/AlGaAs single quantum well were
calculated employing Vias and Psslers models. The calculated results revealed the
temperature-dependent band gap had significant influence when temperature is over than 100
K. The numerical method was utilized to calculate transition energy corresponding to the
recombination between first level electron and first level heavy hole (e
1
-hh
1
) in quantum well
formed by these two compounds. The simulated results showed the critical temperature of e
1

+ hh
1
energy. The calculated transition energy increases with increasing temperature from 12
D00022
March 23-26, 2010
313
K to 100 K and the transition energy decreases with increasing temperature from 100 K to
200 K meanwhile, opposite phenomena is obtained as the temperature is above this critical
point. The calculated results were in good agreement with experimental photoluminescence
results.

REFERENCES
[1] Loureno, S.A., Dias,I. F. L., Duarte,J. L., Laureto, E., Poas, L. C., Toginho, F. D. O.
and Leite, J. R., Brazilian Journal of Physics, vol, 34, no. 2A, 2004.
[2] Pssler, R.,phys. Star. Sol., (1999) (b) 216, 975.
[3] Harrison, Paul., Quantum Wells Wires and Dots, 2
nd
Edition, Baffins Land, Chichester,
West
Sussex PO19 1UC, England, 2001, 31 - 32.
[4] Jukgoljun, B., Pecharapa W. and Techitdheera W., 2007 Proceeding of the International
Workshop and Conference on ICPN 2007 (Edited by P. Yupapin and P. Saeung).
[5] Goldberg Y.A., Handbook Series on Semiconductor Parameters, Vol.2. Greek :
Rensselaer
Polytechnic Institute, 1999, chapter1,1-2.
[6] Wu, S. D., Wang, W. X., Guo, L. W., Li, Z. H., Shang, X. Z., Liu, F., Huang, Q. and
Zhou*
J.M. Journal of Crystal Growth 278, (2005), 548 552.
[7] Sarkar, N. and Ghosh, S school of Physical Sciences, Jawaharlal Nehru University, New
Delhi 110067, 2005.

ACKNOWLEDGMENTS
This research project has been supported by Computational Physics Research Laboratory
(CPRL.) with Physics Department, Faculty of Science, King Mongkuts Institute of Technology
Ladkrabang, Bangkok 10520, Thailand.

D00028
March 23-26, 2010
314
Refinement of a One-Dimensional Modulated Structure

W. Somphon
1
, K.J. Haller
2
and O.M. Oeckler
3

1
Department of Chemistry, Faculty of Liberal Arts and Science, Kasetsart University
Kamphaeng Saen Campus, Nakhon Pathom, 73140, Thailand
2
3
Department Chemie und Biochemie, Anorganische Festkrperchemie, Ludwig-Maximilians-
Universitt Butenandtstrae 5 - 13 (Haus D), D-81377 Mnchen, Germany
1
E-mail: weenawan.s@ku.ac.th; Tel. (+66) 89-7215409

ABSTRACT
The structure of InMo
4
O
6
has been reported [1]. Preliminary investigation in tetragonal
space group P

4/mbm, Z = 2 with a = 9.6890(14) and c = 2.8695(6) gives a parent
structure with apparent excess atomic displacement, U
33
= 0.158, parallel to c for the In
atoms (SHELXL refinement [2], R = 0.0668 for 283 reflections). The structure consists
of
3
O face capped Mo
6
octahedral linked into infinite columns parallel to c by
sharing MoMo edges. Octahedra in adjacent columns are linked by
3
O connecting
the perpendicular MoMo edge with the Mo vertex of a four-fold related chain.
Examination of the three-dimensional intensity data shows satellite peaks described by
the modulation vector q = 0, 0, 0.1528, which would be consistent with a modulation
dominated by moving the In atoms along z. The structure has been modeled again using
the superspace methods implemented in the JANA2000 program suite [3].

Keywords: Modulated structure, InMo
4
O
6
, Four-dimensional refinement, X-ray
diffraction

REFERENCES
1. McCarley, R. E., Lii, K.-H., Edwards, P. A., and Brough L. F., J. Solid State
Chem., 1985, 57, 17-24.
2. Sheldrick, G. M. SHELXl-97 Program for the Refinement of Crystal Structures.
University of Gottingen, Germany, 1997.
3. Petricek, V., Dusek, M., and Palatinus, L. JANA2006 The Crystallographic
Computing System. Institute of Physics, Praha, Czech Republic, 2006.

D00031
March 23-26, 2010
315
Solvation in 3-[(2-hydroxyethoxy)-methyl]-6-methyl-3H-
imidazolo [1,2-a]purin-9(5H)-one dihydrate;
C
11
H
13
N
5
O
3
2H
2
O

M. Meepripruek
C
and K.J. Haller
Nakhon Ratchasima 30000, Thailand.
C
E-mail: montha_mee@hotmail.com; Tel. +(66)4422-3150

ABSTRACT
Cocrystals, solvates, and polymorphs are of intense interest in current pharmaceutical
research for their ability to modify physical properties of active pharmaceutical
ingredients (drug molecules). 3-[(2-hydroxyethoxy)-methyl]-6-methyl-3H-
imidazolo[1,2-a]purin-9(5H)-one, tricyclic acyclovir, has been reported as the
dihydrate [1] and the complex hydrogen bond network of water and tricyclic acyclovir
molecules suggested to be related to the solvation of the molecules in solution. The
asymmetric unit in the crystal contains two independent molecules of tricyclic
acyclovir and four solvent water molecules. The water molecules form an (H
2
O)
8

cluster, with a strong hydrogen bond between two water molecules across an inversion
center. The disordered hydrogen bond may potentially effect the 2-hydroxyethoxy-
methyl side chain disorder in one of the molecules. This and other supramolecular
features will be discussed.

Keywords: disorder, solvation, supramolecular structure.

REFERENCES
1. Suwinska K., Golankiewicz B., and Zielenkiewicz W., Acta Cryst. Sect. C, 2001, 57,767-
769.
2. Mooibroek Tiddo J., Gamez, P., and Reedijk, J., CrystEngComm, 2008, 10, 1501-1515.
3. T. Steiner, Angew. Chem. Int. Ed., 2002, 41, 48-76.

D00037
March 23-26, 2010
316
Molecular and Supramolecular Structure of Fe(OEP)picrate

R. Puntharod
1
, K.J. Haller
1,C
, and B.R. Wood
2

1
School of Chemistry, Institute of Science, Suranaree University of Technology, Nakhon Ratchasima,
30000, Thailand
2
Centre for Biospectroscopy and School of Chemistry, Monash University, Victoria, 3800, Australia
C
E-mail: ken.haller@gmail.com; Fax: 66+44-224-185; Tel. 66+81-547-5377

ABSTRACT
The structure of five-coordinate high-spin iron(III) Fe(OEP)picrate has been
determined by single X-ray crystallography to study the molecular structure and
supramolecular interactions. The complex crystallizes in the monoclinic space group,
C2/c, with unit cell dimensions of a = 26.3997 (20), b = 13.7806 (18), c = 25.4126 (20)
, = 119.955 (9), V = 8010.2 (14)
3
, Z = 8, and D
calc
= 1.354 Mg m
3
at 298 (1) K.
The porphyrin core conformation of this five-coordinate iron(III) heme shows an
unusual folded distortion from planarity, rather than the normal doming or S
4
ruffling.
In addition, there are significant differences in the FeN
por
bond distances; 2.040(2),
2.028(2), 2.055(2), and 2.053(2) . Supramolecular interactions are studied to examine
the unusual conformation of the porphyrin core and the unequal FeN bond distances.

Keywords: iron porphyrin, doming, ruffling, supramolecular interaction.

REFERENCES
1. Scheidt, W. R., Lee, Y. J. Struct. Bonding, 1987, 64, 1-70.
2. Medforth, C. J., Haddad, R. E., Muzzi, C. M., Dooley, N. R., Jaquinod, L., Shyr, D. C.,
Nurco, D. J., Olmstead, M. M., Smith, K. M., Ma, J. G., Shelnutt, J. A. Inorg. Chem.
2003, 42, 2227-2241.

D00040
March 23-26, 2010
317
Redetermination of the Structure of the Radical Cation of
9,9-Bis-9-azabicyclo[3.3.1]nonane

K.J. Haller and P. Boonkon
E-mail: ken.haller@gmail.com; Fax: +(66)4422-4185; Tel. +(66)4422-3150

ABSTRACT
The structure of the radical cation of 9,9-bis-9-azabicyclo[3.3.1]nonane has been
reported from crystallographic analysis of the hexafluorophosphate salt in space group
P4
2
/mnm (No.136) to contain planar nitrogen atoms with d[NN] = 1.269(7) [1]. The
reported structure exhibits a regular pattern of oriented atomic displacement parameters
that is consistent with a statistical disordering of nonplanar nitrogen atoms in the
structure. Recalculation, assuming an independent atom approximation, gives a more
reasonable d[NN] = 1.384(6) along with pyramidal geometry for the nitrogen
atoms. The pattern could also be consistent with the actual space group having lower
symmetry than the space group used for the refinement. A redetermination of the
structure model, based on the original data, will be presented. Comment will be offered
on recognizing similar problems from examination of the model parameters.

Keywords: disorder, structure redetermination, atomic displacement parameter
analysis.

REFERENCES
1. Nelson, S. F., Hollinsed, W. C., and Calabrese, J. C., J. Am. Chem. Soc., 1978, 100(25),
7876-7881.
2. Busing, W. R. and Levy, H. A., ORFFE, Report ORNL-TM-306, Oak Ridge National
Laboratory, Oak Ridge, Tennessee, USA.

D00041
March 23-26, 2010
318
Phase Characterization and Saturation Modeling of the
Calcium Phosphate-Arsenate Apatite System

W. Dungkaew, O. Saisa-ard, and K.J. Haller
C

Nakhon Ratchasima 30000, Thailand
C
E-mail: ken.haller@gmail.com; Tel. +(66) 4422 3150

ABSTRACT
Precipitates from mixed phosphate-arsenate solutions were studied in this work to
support previous results which demonstrated nearly complete arsenate removal by
precipitation with Ca
2+
in the presence of PO
4
3
. X-ray diffraction results obey
Vegards law indicating solid-solution behavior as the d-spacings expand with higher
percent arsenic incorporated into the calcium hydroxyapatite structure. Saturation
modeling of the calcium hydroxyapatite-arsenate system was carried out using
PHREEQC with modified Minteq.v4 database. Even through the calculations at the
concentration levels studied show that several calcium and arsenate containing phases
reach saturation, arsenic removal in the absence of phosphate is poor. However, in the
presence of phosphate, calcium hydroxyapatite precipitate is identified, and the
arsenic removal is greatly improved, presumably due to incorporation of the arsenate
in solid solution with the calcium hydroxyapatite. This work supports understanding
of arsenate removal by the calcium-phosphate-arsenate precipitation system.

Keywords: Arsenic removal, Coprecipitation, Incorporation, PHREEQC, Solid
solution

1. INTRODUCTION
Apatite structures have the empirical formula A
5
(BO
4
)
3
X, where usually, A is a
divalent cation, BO
4
is a trivalent anion, and X is a monovalent anion. Calcium
hydroxyapatite (Ca
5
(PO
4
)
3
OH) is a naturally occurring apatite material that can incorporate
various ions, including Cd
2+
and Zn
2+
[1, 2], and Pb
2+
[3], leading to its use as a host for heavy
metals contaminant immobilization by removing the toxic substances being removed into a
chemically stable form. A previous study [4] examined anionic toxicant (arsenic) removal via
coprecipitation of calcium phosphate-arsenate hydroxyapatite compounds, providing up to
98% arsenic removal from 25 ppm synthetic arsenic solution. Due to the similarity of
phosphate and arsenate ions, the high removal efficiency was suspected to occur through the
formation of calcium phosphate-arsenate hydroxyapatite solid-solution. The current study
demonstrates Vegards Law behavior of calcium phosphate-arsenate hydroxyapatite solids,
thereby confirming solid solution formation. PHREEQC computer simulation for saturation
modeling of saturated phases in the system was also utilized in the present work to support
better understanding of the removal process which will be useful for further application
purposes.


2.1 Solid Solution
Solid solutions are solid state phenomena wherein the composition of a material varies
significantly within the framework of the same overall crystal structure [5]. Incomplete or
partial substitution in the calcium hydroxyapatite structure may result in the formation of
hydroxyapatite solid solution.
D00041
March 23-26, 2010
319
Vegards law has often been used when characterizing homogeneous solid solution
phases and other materials resulting from coprecipitation of insoluble materials. The law
states that, unit cell parameters change linearly with composition as seen in Fig. 1a. When
Vegards law holds over the composition range studied the material is generally assumed to
be solid solution. However, the law often holds approximately, and accurate measurements
reveal both negative (Fig. 1b) and positive (Fig. 1c) departures from linearity [5].

C
o
m
p
o
s
i
t
i
o
n
a) b) c)

d-spacing d-spacing d-spacing

Figure 1. a) Vegards law behavior, b) negative departures, and c) positive departures.

2.2 Saturation indices
A saturation index (SI), indicating the supersaturation state of an individual phase in
the solution, is calculated for each phase made up of ions in the solution as determined by
speciation and other solution processes.

Ksp
IAP
SI log = .. (1)

where IAP is the ion activity product and Ksp is the thermodynamic solubility product
constant of the precipitate phase. When SI = 0, the solution is in equilibrium; when SI < 0, the
solution is undersaturated; and when SI > 0, the solution is supersaturated and precipitation
can occur. The Ksp of a compound is calculated based on the dissolution equilibrium,

A
x
B
y
(s) xA
y+
(aq) + yB
x
(aq)

y
B
x
A
x y
a a Ksp
+
= *

where
i
a = thermodynamic activities of aqueous species, and are obtained from

i i i
m a = .. (2)

where
i
= activity coefficient of species i, and
i
m = molality of species i. In nonideal
aqueous solution activity, coefficients for charged species are defined by the Davies equation,

|
|
.
|
\
|
+
= u
u
u
3 . 0
1
log
2
i i
Az .. (3)

and for uncharged species the activity coefficient is defined by the WATEQ Debye-Hckel
equation,
u
u
u
i
o
i
i
i
b
Ba
Az
+
+
=
1
log
2
.. (4)
D00041
March 23-26, 2010
320
where the first term is zero and the activity coefficient equation is thus reduced to the
Setchenow equation:

u
i i
b = log .. (5)

where
i
z is the ionic charge of aqueous species i, A and B are constants dependent only on
temperature.
0
i
a and
i
b in the WATEQ Debye-Hckel equation are ion-specific parameters
fitted from mean-salt activity-coefficient data. For the Setchenow equation
i
b is assumed to
be 0.1 for all uncharged species.


3.1 Precipitation Experiment
Arsenic removal experiments were performed in the previous work by coprecipitation
of calcium-arsenate-phosphate [4]. The precipitate products were characterized by powder X-
ray diffraction, XRD, for phase identification and shift of d-spacing study. The correlation of
composition and shift of d-spacing was plotted, and follows Vegards law indicating solid-
solution behavior.

3.2 Saturation Modeling
The computer program, PHREEQC version 2 [6] was used to calculate ion speciation
and saturation indices of phases in solution. The calculations are based on the equilibrium
chemistry of an aqueous solution interacting with solids and gases. The database used in this
work is based on Minteq.v4 database which is distributed with the PHREEQC version 2
program. Some of the existing values in the database were updated using more recent values.
Also relevant values that had not been previously included in the database were added. These
modifications and additions are easily done through the input data set. The equilibrium
constant values of the relevant phases taken into account for the calculation are given in Table
1.

Table 1. Equilibrium constant values of the relevant phases used in saturation index
calculations.

Reaction Log K
Minteq.v4 Value Updated Value Reference
Ca5(PO4)3OH + H
+
= 5Ca
+2
+ 3PO4
3
+ H2O -44.333 -53.28 [7]
Ca4(OH)2(AsO4)24H2O = 4Ca
+2
+ 2OH
+ 2AsO4
3
+ 4H2O nd -29.20 [8]
Ca5(AsO4)3OH = 5Ca
+2
+ 3AsO4
3
+ OH
nd -38.04 [8]
Ca3(AsO4)23H2O = 3Ca
+2
+ 2AsO4
3
+ 3H2O nd -21.00 [8]
Ca3(AsO4)24H2O = 3Ca
+2
+ 2AsO4
3
+ 4H2O nd -21.00 [8]
CaHAsO4H2O = Ca
+2
+ HAsO4
2
+ H2O nd -4.79 [8]
nd = not previously included in the database.

The previous work suggests arsenic removal efficiency from calcium-phosphate-arsenate
coprecipitation is affected by several factors (Fig. 2) [4]. Calcium-arsenate precipitation
without phosphate in the system (P/As = 0) provides poor arsenic removal, while adding
phosphate to the system improves the arsenic removal efficiency. The more phosphate added
(P/As = 3, 5), the better the arsenic removal efficiency. However, the molar ratio of
Ca/(P+As) is another significant parameter in this coprecipitation system. Poor arsenic
D00041
March 23-26, 2010
321
removal efficiency is obtained with low calcium molar ratio (Ca/(P+As) = 1), while arsenic
removal is greatly improved in higher calcium molar ratios (Ca/(P+As) = 2, 3).

0.00
5.00
10.00
15.00
20.00
25.00
P/As = 0 P/As = 1 P/As = 3 P/As = 5
[
A
s
]
,

p
p
m
.
.
.
Ca/(P+As) = 1
Ca/(P+As) = 2
Ca/(P+As) = 3
0.00
5.00
10.00
15.00
20.00
25.00
P/As = 0 P/As = 1 P/As = 3 P/As = 5
[
A
s
]
,

p
p
m
.
.
.
Ca/(P+As) = 1
Ca/(P+As) = 2
Ca/(P+As) = 3

Figure 2. Arsenic removal by coprecipitation of calcium phosphate-arsenate hydroxyapatite
from 25 ppm arsenate solution [4].

The saturation index calculations show several calcium-arsenate phases,
Ca
3
(AsO
4
)
2
3H
2
O, Ca
3
(AsO
4
)
2
4H
2
O, Ca
3
(AsO
4
)
2
4H
2
O, Ca
4
(OH)
2
(AsO
4
)
2
4H
2
O, and
Ca
5
(AsO
4
)
3
OH, are supersaturated in all the precipitation systems studied. When phosphate is
added to a system, calcium hydroxyapatite becomes supersaturated as well. The saturation
index calculations support the explanation of the arsenic removal process. When there is no
phosphate in the system, removal is controlled by formation of only arsenate solid phases, and
low removal efficiency most likely results from one of two reasons; first, during the short (20
hr) period of the experiment the solution remains supersaturated in the phases for which SI is
only slightly positive, and second, the phase (or phases) that nucleates first (kinetic control)
reduces the concentration of components such that less soluble phases do not precipitate. The
quantities of precipitates formed were insufficient to determine which phase or phases were
formed. However, when phosphate was included, the saturation index of calcium
hydroxylapatite was considerably higher than those for the arsenate containing phases,
implying a considerably higher driving force for nucleation of the phosphate containing
apatite. In the phosphate containing experiments visible white precipitates were obtained
under the conditions studied.

The known solid solution nature of the calcium phosphate/arsenate hydroxyapatite
system [9] suggests a mechanism. As the calcium hydroxyapatite nucleates and/or grows the
arsenate ions are swept from the solution obviating the need for nucleation of any pure
calcium arsenate phase. This is further supported by the increasing efficiency of removal on
increasing either phosphate or calcium concentration in the mixed phosphate/arsenate systems
studied.

The correlation of arsenic removal efficiency versus saturation indices of calculated
calcium-arsenate and calcium-phosphate phases in the Ca/(P+As) = 2 system is shown in Fig.
3. While the same trend is observed in the Ca/(P+As) = 3 system, no correlation between
arsenic removal efficiency versus saturation indices of calculated calcium-arsenate and
calcium-phosphate phases is observed in the Ca/(P+As) = 1 system (results not shown). In
addition, there are at most tiny amounts of precipitate in the Ca/(P+As) = 1 system even
through the saturation indices suggest calcium hydroxyapatite is supersaturated, and therefore,
insufficient solid material to confirm the identity of calcium hydroxyapatite. This may be due
to kinetic control of calcium hydroxyapatite formation, with the solution remaining
supersaturated within the overnight reaction time, and subsequently poor arsenic removal in
this precipitation condition.
D00041
March 23-26, 2010
322
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00
% As removal
S
a
t
u
r
a
t
i
o
n

I
n
d
i
c
e
s

(
S
I
)
Ca3(AsO4)24H2O
Ca3(AsO4)24H2O
Ca4(OH)2(AsO4)24H2O
Ca5(AsO4)3OH
Ca5(PO4)3OH

Figure 3. Correlation of arsenic removal efficiency versus saturation indices of calculated
calcium-arsenate and calcium-phosphate phases in Ca/(P+As) = 2 system

XRD is a technique used for precipitated phase identification. Calcium
hydroxyapatite is found as the major phase in calcium-arsenate-phosphate precipitation,
corresponding to the saturation index calculation results. The formation of calcium
hydroxyapatite precipitate induces removal of arsenate. It is expected that arsenate ion
removed must be incorporated into the calcium hydroyapatite structure at phosphate positions
due to the similarity in geometry and size of arsenate and phosphate ions. XRD
characterization observed shifts in the [211] peak of calcium hydroxyapatite that correspond
to expansion of d-spacing in the lattice occurring when arsenate is incorporated in its structure
(Fig. 4a). The expansion of d-spacing in calcium hydroxyapatite structure correlates with the
As/P ratio in the starting solution, represented by the plot, according to Vegards law (Fig.
4b).

30 31 32 33 34 35
2-Theta-Scale
I
n
t
e
n
s
i
t
y

(
A
.
U
.
)
Ca5(PO4)3OH
Ca5(AsO4)3OH
50% As
Ca5(PO4)3OH
As/P ratio in starting solution
d
-
s
p
a
c
i
n
g

a
t

[
2
1
1
]
,

y = 0.0006x + 2.7945
R
2
= 0.9941
2.78000
2.80000
2.82000
2.84000
2.86000
2.88000
0 20 40 60 80 100
a) b)

Figure 4. (a) Shift of calcium hydroxyapatite XRD peaks when arsenate is incorporated
into the structure, and (b) correlation between As/P ratio in starting solution and the
expansion of d-spacing in calcium hydroxyapatite structure.

D00041
March 23-26, 2010
323

5. CONCLUSION
Speciation and saturation index calculation by PHREEQC computer program
suggests several calcium-arsenate and calcium-phosphate phases are supersaturated in the
system. These calculations support understanding of arsenic removal through calcium-
arsenate-phosphate precipitation. Moreover, XRD characterization results suggest that
removed arsenate anion from the solution is incorporated into the calcium hydroxyapatite
structure as d-spacings in the calcium hydroxyapatite materials expand with higher percent
arsenic incorporated into the solid products. This observation obeys Vegards law of solid-
solution supporting that calcium hydroxyapatite can form solid-solution with arsenate anion
enhancing the arsenate removal process.

REFERENCES
1. Peld, M., Tonsuaadu, K., and Bender, V., Environ. Sci. Technol., 2004, 33, 5626-5631.
2. Marchat, D., Bernache, A. D., and Champion, E., J. Hazard. Mater., 2007, 139(3), 453-
460.
3. Chen, X., Wright, J. V., Conca, J. L., and Peurrung, L. M., Environ. Sci. Technol., 1997,
31, 624-631.
4. Dungkaew, W., Haller, K. J., Flood, A. E., and Scamehorn, J. F., Proceedings of the
AIChE Annual Meeting, held on 12-17 November 2006, San Francisco, CA.
5. West, A. R., Solid State Chemistry and Its Applications. John Wiley & Sons, Singapore,
1984.
6. Parkhurst, D. L. and Appelo, C. A. J., Users Guide to PHREEQC (version 2): A
Computer Program for Speciation, Batch-reaction, One-dimensional Transport, and
Inverse Geochemical Calculations, Water-Resources Investigations Report 99-4259,
1999, U.S. Geological Survey.
7. Zhu, Y., Zhang, X., Chen, Y., Xie, Q., Lan, J., Qian, M., and He, N., Chem. Geol., 2009,
268, 89-96.
8. Bothe, J. V. and Brown, P. W., J. Hazard. Mater., 1999, B69, 197-207.
9. Mahapatra, P. P., Mahapatra, L. M., and Mishra, B., Bull. Chem. Soc. Jpn., 1989, (62),
3272-3277.

ACKNOWLEDGEMENTS
The authors gratefully acknowledge the Thailand Research Fund (TRF) for financial
support through the Royal Golden Jubilee Ph.D. scholarship program (3.C.TS/44/B.1), and
Suranaree University of Technology for grants and instrumentation support.
D00042
March 23-26, 2010
324
Phase Characterization and Saturation Modeling of the
Calcium-Lead Phosphate Apatite System

O. Saisa-ard*, W. Dungkaew, and K.J. Haller
Nakhon Ratchasima 30000, Thailand.
*E-mail: Oratai_phasai@yahoo.com; Tel. +(66)4422-3150

ABSTRACT
Calcium phosphate hydroxyapatite (CaHAP, Ca
10
(PO
4
)
6
(OH)
2
) is the main
component of mammalian hard tissues such as teeth and bones. CaHAP is a typical
apatite mineral belonging to space group P6
3
/m. The structure is susceptible to ionic
substitution in both anion and cation sites, including Ca
2+
can be replaced by various
divalent cations such as Ba
2+
, Mg
2+
, Cd
2+
, Sr
2+
, and Pb
2+
[1-3], PO
4
3
can be replaced
by AsO
4
3
and VO
4
3
; and OH
can be replaced by F
and Cl

[4-5]. Because of the
ease of substitution and the stability of the resulting apatite structures, CaHAP has
been used as a material for removal and immobilization of heavy metals from
contaminated water. However, a recent study has shown the interesting result of
CaHAP dissolution and PbHAP precipitation in the same system [6-7]. This and the
fact that Pb
2+
in the human body accumulates in the bones suggests further study of
the structure stability and solution equilibria. In an effort to better understand the
system we have utilized PHREEQC for saturation modeling and have examined
published structure results. Bond valence calculations were used to try to rationalize
unusual interatomic bond parameters in the published structure of PbHAP.
Comparison of MO and PO distances in the two apatites reveals little difference in
corresponding bond lengths. PbO distances are longer than the corresponding CaO
distances because of the larger cationic size of the Pb
2+
ions.

Keywords: Apatite, PHREEQC, Bond valance calculation

REFERENCES
1. Sugiyama, S., Fukuda, N., Massumoto, H., Shigemoto, N., Hiraga, Y., and Moffat, J. B.,
J. Colloid Interface Sci., 1999, 220, 324-328.
2. Yasukawa, A., Yokoyama, T., Kandori, K., and Ishikawa, T., Colloids Surf., 2007, 238,
203-208.
3. Barinova, A.V., Lusvardi, G., Menabue, L., and Saladini, M., Kristallografiya. 1998, 43,
224- 227.
4. Dong, Z., White, J. T., Wei, B., and Laursen, K., J. Am. Ceram. Soc. 2002, 85, 2515-
2522.
5. Dai, Y. and Hughes, M. J., Can. Mineral. 1989, 27, 189-192.
6. Mavropoulos, E., Rossi, A. M., and Costa, A. M., Environ. Sci. Technol. 2002, 36, 1625-
1629.
7. Mavropoulos, E., Rocha, C. C. N., Moreira, C. J., Rossi, M. A., and Soares, A. G., Mater.
Charact. 2004, 53, 71-78.
D00043
March 23-26, 2010
325
Order-Disorder Structure in a New Zinc Oxovanadate,
[Zn(Im)
4
][V
2
O
6
]

S. Krachodnok
1,c
, K.J. Haller
1
and I.D. Williams
2

1
Nakhon Ratchasima 30000 Thailand
2
Department of Chemistry, Hong Kong University of Science and Technology,
Clear Water Bay, Hong Kong
C
E-mail: dsk4610137@live.com; Fax: 044-224185; Tel. 083-9333228

ABSTRACT
Hydrothermal synthesis from vanadium pentoxide, zinc acetate dihydrate, and
imidazole in water with mole ratio 1:1:4:222 (pH 11) at 110 C for 2 days affords a
new zinc-imidazole-oxovanadate compound, [Zn(Im)
4
][V
2
O
6
]. The structure has been
investigated by single crystal X-ray diffraction at 100 (2) K and 293 (2) K. The space
group is triclinic P1 at both temperatures, but the volume doubles for the lower
temperature. The structure consists of infinite one-dimensional chains of corner-sharing
VO
4
tetrahedra. At 100 K the chains are ordered with every fourth O atom lying on
an inversion center, with the second and sixth O atoms ordered on positions
alternating up and down along the chain propagation axis. As the temperature is
increased the ordered structure transforms to a disordered structure with additional
inversion centers appearing between those of the 100 K structure, thereby cutting the
unit translation in the chain direction in half. The two crystallographically independent
structurally different Zn(Im)
4
2+
complexes at 100 K rearrange and reorient to inversion
related complexes at 293 K.

Keywords: Order-disorder structure, zinc oxovanadate compound

REFERENCE
1. Sheldrick, G. M. SHELXL-97 Program for the Refinement of Crystal Structures.
University of Gttingen, Germany, 1997

D00044
March 23-26, 2010
326
The Defect Generated in PN Junction Analysis by
the Arrhenius Activation Energy Technique

W. Pengchan
1,C
, S. Cheirsirikul
1
, T. Phetchakul
1
, A. Ruangphanit
2
and A. Poyai
2

1
Department of Electronics Engineering, Faculty of Engineering,
King Mongkuts Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
2
Thai Microelectronics Center (TMEC), NECTEC, NSTDA, Pathumthani, Thailand
C
E-mail: kpweera@kmitl.ac.th ; Tel. 081-7000058

ABSTRACT
This paper presents an improved method for analyzing the implantation-induced defect
generated in p-n junction by using the graphical determination of the arrhenius
activation energy technique. The Arrhenius plot is used to study the activation energy
by graphing the logarithmic value of the generation current density versus the inverse
temperature. The resulting negatively sloped line is useful for finding the activation
energy (E
a
). The low leakage current relates to the defects in the depletion region of p-n
junction. Among variety process steps, implantation step may generate defects.
Therefore, the implantation-induced defects have been studied from the activation
energy which has been obtained from the leakage current of p-n junction. The different
geometry p-n junctions have been fabricated by a standard CMOS technology. The
current-voltage (I-V) and high frequency capacitance-voltage (C-V) characteristics of
p-n junctions with temperature dependence have be measured. The electrically active
defects from implantation process can be extracted from the junction generation current
density versus temperature. Base on this analysis, it will be demonstrated that the
implantation-induced defects have been found in p+-n-well more than in n+-p-substrate.

Keywords: Defect, PN Junction, Arrhenius plot, Activation Energy, CMOS
Technology

1. INTRODUCTION
The development of modern complementary metal-oxide- semiconductor (CMOS)
technology is aiming to increase performance per chip and reduce cost. This technology is not
only use to produce microchip but can also apply to fabricate smart sensor. Unique features of
this device are integrated several kind of sensor and low power consumption. For reasonable
cost to produce these devices, 0.8m CMOS technology is usable. In this technology a high
substrate doping concentration is required in order to control the short-channel effects and
lowering leakage current. The p-well or n-well concentration is increased by an increasing ion
implantation dose. This process introduces a substrate damage, which is expected to be
removed after annealing. A low temperature and short time thermal treatment is needed to
maintain the junction depth after ion implantation, which may not sufficient to remove the
implantation-induced defects. These defects can be a source of the leakage current in each of
p-n junction.
The leakage current in p-n junction is one of the main parameters that affects device
performance. This leakage current is related to electrically active defects in the silicon. The
defects determine generation (
g
) lifetime. Therefore, one way to study defects is by analyzing
the lifetime. Usually, it can be extracted from the generation current.

D00044
March 23-26, 2010
327
2. EXPERIMENTAL
2.1 Sample preparation
For this study, shallow p-n junction diodes compatible with 0.8 micron CMOS technology
was fabricated on 150 mm 5 -cm p-type silicon substrate. The n-well was obtained by 140
keV, 4x10
12
ions/cm
2
phosphorus implantation. The n
+
region was made by 50 keV, 5x10
15

ions/cm
2
arsenic implantation and 40 keV, 3x10
15
ions/cm
2
boron implantation for the p
+

region. Finally, the junction was contacted by aluminum metallization.
In order to study the leakage current components, a difference area (A) and perimeter (P)
diodes have been fabricated on wafer, as shown in Table 1

TABLE 1 Description of the diode geometry

Diode Type Area (cm
2
) Perimeter (cm)
Square (SQ) (n
+
- P
sub
) 8x10
-4
0.12
Meander (ME) (n
+
- P
sub
) 8x10
-4
8.04x10
-4

Square (SQ) (p
+
- N
well
) 8x10
-4
0.12
Meander (ME) (p
+
- N
well
) 8x10
-4
8.04x10
-4

2.2 Measurement procedure
The current-voltage (I-V) characteristics of the different geometry diodes were measured
on wafer with bias step of 0.01 V. from reverse (V
R
) to forward (V
F
)voltage, in range of 5 to
+1 V, whereby the bias was applied to the back p-type substrate and the current was measured
at the top n-type. The temperature (T) were controlled at 25 50
o
C, in dark shield box.
The capacitance-voltage (C-V) characteristics were performed on the same diode with the
frequency of 1 MHz at 25 50
o
C. The area depletion width in the substrate can be extract
from C-V characteristics.

The analysis strategy can be summarized as follows. The total reverse current (I
R
) of p-n
junction consists of different geometrical components, the area leakage current (I
A
) and the
peripheral leakage current (I
P
) and given by

P A R A R
PJ AJ I I I (1)

where J
A
(A/cm
2
) is the area current density scaling with the diode area (A), J
P
(A/cm) is the
perimeter current density scaling with the perimeter (P). These can be split up further into
physical components such as the generation (J
g
) and the diffusion (J
d
) current density. This
follows from the relationship.

Ad Ag A
J J J (2)

and
Pd Pg P
J J J (3)

Figure 1 is a plot of current versus voltage for a) n
+
- P
sub
junction diode and b) p
+
- N
well

junction diode

D00044
March 23-26, 2010
328

a) n
+
- P
sub
junction diodes b) p
+
- N
well
junction diode

Figure 1 The current-voltage characteristics of the different diodes

The area and perimeter current density of the different geometrical diodes. From (2), J
A

can be expressed as

Ad
g
A i
A
J
W qn
J
(4)

but
A Ad
J J Therefore, J
A
can be rewritten as

A
A i
g
J
W qn
(5)

where q (C) is electron charge (=1.602x10
-19
C), n
i
(cm
-3
) is intrinsic carrier concentration
(=1.08x10
10
cm
-3
), W
A
(cm) is area depletion width and
g
(s) is generation lifetime.
As shown in (5), the generation lifetime depends on the area depletion width and the area
current density. So, a method to obtain
g
is proposed as follows. By considering (6), the area
depletion width can be obtain from the area capacitance density (C
A
)

A
o si
A
C
W

(6)

where
si
is dielectric constant of silicon (=11.8) and
0
(F/cm) is permittivity of vacuum
(=8.854x10
-14
F/cm)
Similar to the current components and assuming that the total C
j
is a linear combination of
the different capacitance components, given by

P A j
PC AC C (7)

where C
A
(F/cm
2
) is the area capacitance density, which scales with the diode area (A), C
P

(F/cm) is the perimeter capacitance density, which scales with the diode perimeter (P)
Normally, the junction capacitance (Cj) is measured by monitoring the response of the
junction to a small-signal voltage superimposed upon the dc voltage. The area capacitance
density (C
A
) and the perimeter capacitance density (C
P
) can be assumed a linear combination
of different geometrical components as given in (7).

-1 0 1 2 3 4 5
1E-13
1E-12
1E-11
1E-10
1E-9
1E-8
1E-7
1E-6
1E-5
1E-4
1E-3
0.01

SQ_N
well
-P
+
ME_N
well
-P
+
C
u
r
r
e
n
t

(
A
)
Bias (V)

-5 -4 -3 -2 -1 0 1
1E-13
1E-12
1E-11
1E-10
1E-9
1E-8
1E-7
1E-6
1E-5
1E-4
1E-3
0.01

SQ_P
sub
-N
+
ME_P
sub
-N
+
C
u
r
r
e
n
t

(
A
)
Bias (V)

D00044
March 23-26, 2010
329
From (6), the area depletion width (W
A
) can be calculated from the area capacitance (C
A
).
Figure 2 is a plot of the area depletion width versus reverse bias with the different doping of
the n
+
- P
sub
junction diode compared with the p
+
- N
well
junction diode

Figure 2 The area depletion width versus Figure 3 The generation lifetime versus
reverse bias of the different junction diodes the area depletion width of the different
junction diodes

From Fig.2, It indicates that the n
+
- P
sub
junction doping is uniform, which is not the case
in the p
+
- N
well
junction. This may relate to the phosphorus doping profile.
The area current density can be calculated from (4) and the generation lifetime (
g
) can be
obtained from (5). The plot of the generation lifetime versus the area depletion width are
shown in Fig. 3.
The figure 3 shows that the generation lifetime of the n
+
- P
sub
junction is less than the p
+
-
N
well
junction. This implies that the defect can be generated in the p
+
- N
well
junction more than
the n
+
- P
sub
junction. The
g
is not constant may relate to n-well implantation induced defect.
The implantation of phosphorus dopant can introduce defects. These defects can be
analyzed with the Arrhenius plot of area current density versus the temperature of the
different bias as show in Fig. 4.

Figure 4 The Arrhenius plot of area current density versus the temperature of the different
bias

0 1 2 3 4 5
0.0
0.5
1.0
1.5
2.0
2.5
3.0

N
well
-P
+
P
sub
-N
+
D
e
p
l
e
t
i
o
n

W
i
d
t
h

(
m
)
Reverse Bias (V)

0.4 0.6 0.8 1.0 1.2 1.4
0
10
20
30
40
50
60
70
80
90
100
110
120

N
well
-P
+
P
sub
-N
+
G
e
n
e
r
a
t
i
o
n

L
i
f
e
t
i
m
e

(
s
)
Depletion Width (m)

35.536.036.537.037.538.038.539.0
-21.5
-21.0
-20.5
-20.0
-19.5
-19.0
-18.5
-18.0
-17.5

1 V.
2 V.
3 V.
4 V.
A
r
e
a

C
u
r
r
e
n
t

D
e
n
s
i
t
y

(
A
/
c
m
2
)
1/kT (eV
-1
)

D00044
March 23-26, 2010
330
From the slope of the plot, the activation energy can be obtained. The activation energy
versus depletion width are shown in Fig. 5. The difference activation energy below the
junction is clearly indicated. This can be implied that the different defects are formed in our
cases. This may relate to the phosphorus doping induced defect. Further investigation is
needed to identify the types of defect.

Figure 5 The defect energy level (eV) versus the depletion width

4. CONCLUSION
This paper demonstrated the method to check the remaining electrically active defects after
processing by considering generation lifetime and activation energy. The generation lifetime
can be obtained from I-V and C-V characteristics of p-n junction. The activation energy can
be calculated from the Arrhenius plot of area current density versus the temperature of the
different bias.

REFERENCES
1. W. Pengchan, T. Phetchkul, and A. Poyai., The Leakage Current of Doping Silicon effects
on the Generation Lifetime Profile., The 7th International Conference on Materials
Processing for Properties and Performance (MP3-2008), MIDAS-9008(2008).
2. Y. Tamai, M.M. Oka, A. Nakada and T. Ohmi., influence of substrate dopant
concentration on electrical properties and residual defects in pn junction formed by low-
temperature post-implantation annealing., J. Appl. Phys., Vol. 87(7), pp. 3488-3496,
2000.
3. A. Poyai, C. Claeys and E. Simoen., Improved extraction of carrier concentration and
depletion width from capacitance-voltage characteristics of silicon n+-p-well junction
diodes., Appl. Phys. Lett., Vol. 80(7), pp. 1192-1194, 2002.
4. A. Czerwinski, E. Simoen, C. Claeys, K. Klima, D. Tomaszewski, J. Gibki and J. Katcki,.,
Optimized diode analysis of electrical silicon substrate properties., J. Electrochem. Soc.,
Vol. 146(6), pp.2107-2112,1998.

ACKNOWLEDGMENTS
The authors are thanks to continuous help from Mr. Anucha Ruangphanit and Mr. Nopphon
Phongphanchantra of Thai MicroElectronics Center (TMEC), National Electronics and
Computer Technology Center, Thailand.
0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
0.70
0.72
0.74
0.76
0.78
0.80
0.82
0.84

D
e
f
e
c
t

E
n
e
r
g
y

L
e
v
e
l

(
e
V
)
Depletion width (m)

D00046
March 23-26, 2010
331
Diagnostics of Ion Implantation with 0.8 micron
CMOS Technology based on TCAD Simulation

W. Pengchan
1,C
, S. Cheirsirikul
1
, T. Phetchakul
1
, A. Ruangphanit
2
and A. Poyai
2

1
2
C
E-mail: kpweera@kmitl.ac.th ; Tel. 081-7000058

ABSTRACT
This paper investigates the ion implantation technology and mathematical modeling of
the impurity distributions obtained with ion implantation. The ion implantation is
widely used in semiconductor process fabrication. In the TCAD simulation, the effect
of the former ion implant on the impurity profile has been studied in detail. Analysis of
the alternative implant conditions such as energy, dose and tilting of beam is proposed.
In addition, the channeling effect of shallow junction with 0.8 micron CMOS
technology will be discussed.

Keywords: Ion Implantation, CMOS, Process, TCAD, Simulation

1. INTRODUCTION
The ion implantation is widely used in semiconductor process fabrication. In the integrated
circuits (IC) fabrication Industry, ion implantation is used for well and drain/source
formation, threshold voltage adjustment, channel-stop implantation and graded drain/source
formation in CMOS fabrication. Each implant application can ne achieved by dosage, beam
current, implant energy, angle and dopant species controlling.
The channeling effect of shallow junction with 0.8 micron CMOS technology has limited
in idealing with the profiles encroachment due to resist shadowing for implant process.
The shadowing creates shadow regions (lateral encroachment) of the profile as shown in
Fig.1

Figure 1 Shadowing Effect by th e Photoresist

Thus, there is an interest to study the impact of implant angle and resist shadowing on sub-
micron technology. This research will focus on various tilt angle and energy implantation
since this implant regime has more severe issues on the resist shadowing and dopant
channeling.

D00046
March 23-26, 2010
332
2.1 MATHEMATICAL MODEL FOR ION IMPLANTATION
As an ion enters the surface of the wafer, it collides with atoms in the lattice and
interacts with electrons in the crystal. Each nuclear or electronic interaction reduces the
energy of the ion until it finally comes to rest within the target. Interaction with the crystal is a
statistical process, and the implanted impurity profile can be approximated by the Gaussian
distribution function illustrated in Fig. 2.

Figure 2 Gaussian distribution resulting from ion implantation. The impurity
is shown implanted completely below the wafer surface (x=0)

The distribution is described mathematically by

( ) | | p
p p
R R x N x N
2 2
2 / exp ) ( A = (1)

R
p
is call the projected range and is equal to the average distance an ion travels before
it stops. The peak concentration N
p
occurs at x = R
p
. Because of the statistical nature of the
process, some ions will penetrate beyond the projected range R
p
, and some will not make it as
far as R
p
. The spread of the distribution is characterized by the standard deviation, R
p
, call
the straggle.
The area under the impurity distribution curve is the implanted dose Q, defined by

=
0
) ( dx x N Q (2)

For an implant completely contained within the silicon, the dose is equal to

p p
R N Q A = t 2 (3)

2.2 CHANNELING EFFECT
The projected range of an ion in an amorphous material always follows Gaussian
distribution, also called normal distribution. In single-crystal silicon, the lattice atoms have
orderly arrangement, and lots of channels can be seen at certain angles. If an ion with the right
implantation angle enters the channel, it can travel a long distance with very little energy loss,
as illustrated in Fig. 3. This is called channeling effect.

D00046
March 23-26, 2010
333

Figure 3 Channeling effect in single-crystal silicon

The channeling effect causes some ions to penetrate deeply into the single-crystal
substrate. This can form a tail on the normal dopant distribution curve, as shown in Fig. 4.
It is undesired dopant profile, which could affect microelectronic device performance.
Therefore, several methods have been used to minimize this effect.

Figure 4 Dopant distribution with channeling effect

One way to minimize the channeling effect is ion implantation on a tilted wafer,
typically with a tilt angle of u = 7
o
. By tilting the wafer, the ions impact with the wafer at an
angle and can not reach the channel, as shown in Fig. 5. The incident ions will have nuclear
collisions right away, and effectively reduce the channel effect. Most ion implantation
processes use this technique to minimize the channeling effect, and most wafer holders of ion
implanters have the capability to adjust tilting angle of the wafer.

Figure 5 a convention well structure by (a) tilt implantation and (b) channeling
implantation
3. EXPERIMENTAL
The investigation was initiated at a requirement to find a compromise between dopant
channeling and resist shadowing for the implantation. The channeling effect and shadowed
region was characterized with TCAD Simulation Tools. The MOS device structure is
simulated on a wafer p-type at implant condition, implant dose 5e15 cm
-2
, energy 40 keV and
various angle.

D00046
March 23-26, 2010
334
Results of the simulations are shown in Fig. 6. There is a good qualitative agreement
between the shadowed region and implantation condition predicted by simulation and that
seen in the experiment data.

(a) angle = 0
o
(b) angle = 7
o

(c) angle = 14
o
(a) angle = 21
o

Figure 6 Phosphorus impurity profiles for 40 KeV implantations at various angle

Figure 7 Phosphorus impurity profiles for 40 KeV implantations at various angle

Depth (um)
Impurity Concentration (atoms/cm
3
)

D00046
March 23-26, 2010
335

5. CONCLUSION
The channeling effect of shallow junction with 0.8 micron CMOS technology had been
studied from various angle implantation condition which has been obtained from TCAD
Simulation Tools. For 40 KeV implantations at various angle, it will be demonstrated that
the shadowed regions of channeling effect have been found in implantation with a tilt
angle of more than u = 7
o
. And, the implantation with the angle u = 0
o
and 7
o
enters the
channel, it can penetrate a long distance more than u = 14
o
and 21
o
.

REFERENCES
1. G. Dearnaley, J.H. Freeman, G.A. Card, and M.A. Wilkins., Implantation Profiles of P
channeled into Silicon Crystals., Canadian Journal of Physics, 46, 587-595 (March 15,
1968).
2. W.K. Hofker., Implantation of Boron in Silicon., Philips Research Reports Supplements,
No.8, 1975
3. J.W. Mayer, L. Eriksson, and J.A. Davies., Ion-Implantation in Semiconductors.,
Academic Press, New York, 1970.
4. G. Dearnaley, J.H. Freeman, R.S. Nelson, and J. Stephen., Ion-Implantation., North-
Holland, New York, 1973.
5. R.C. Jaeger., Volume V-Introduction to Microelectronic Fabrication., Addison-Wesley
Publishing Company Inc, 1988.

ACKNOWLEDGMENTS
The authors are thanks to continuous help from Mr. Anucha Ruangphanit and Mr. Nopphon
Phongphanchantra of Thai MicroElectronics Center (TMEC), National Electronics and Computer
Technology Center, Thailand.

D00047
March 23-26, 2010
336
Electronic structures of CoSb
3
calculated
by first principle method

A. Srisaikum
1
, A. Yangthaisong
1,C
, and N. Tanpipat
2

1
Computational Materials and Device Physics laboratory, Department of physics, Faculty of Science,
Ubon Ratchathani University, Ubonratchathani, 34190, Thailand
2
National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Rd,
Klong 1, Klong luang, Pathumthani, 12120, Thailand
C
E-mail: A.Yangthaisong@gmail.com ; Tel. 086-8657805

ABSTRACT
Electronic structures and density of states (DOS) of CoSb
3
were calculated using first
principle calculations based on the density functional theory under local density
approximation(LDA) and generalized gradient approximation(GGA). The ultrasoft and
norm-conserving pseudopotentials were used to represent interactions between valence
electrons and ion cores. The LDA calculations suggest that CoSb
3
is a narrow gap
semiconductor with energy gap of 0.54 eV. Our calculations are in agreement with
other calculations reported values between 0.05 0.70 eV. Note that the experimental
value obtained by transport measurement is 0.55 eV, whilst those obtained via optical
measurement is 0.07 eV. By employing the calculated band structure in conjunction
with Boltzmann transport equation, we are able to predict thermoelectric transport
coefficients.

Keywords: First principles calculations, CoSb
3
, Thermoelectric properties

1. INTRODUCTION
Thermoelectric devices have gained interest for environment benign power generation.
There converts thermal energy directly to electrical energy via Seebeck effect by temperature
difference in solid materials. The performance of thermoelectric materials can be defined
through dimentionless thermoelectric figure of merit, ZT(=TS
2
/), where T is the
temperature, S is the seebeck coefficient, is the electrical conductivity, and is the thermal
conductivity. A large magnitude of ZT indicates a high efficiency. Recently, skutterudite
compounds are highlighted as thermoelectric conversion materials with high conversion
efficiency[1]. Generally, these compounds show very high carrier mobillities and large
thermoelectric power. In this study, thermoelectric properties of CoSb
3
were investigated.

Cobalt Triantimonide (CoSb
3
) belongs to the space group of Im3 (No.204). The structural
parameters used in this calculations are taken from reference[2]. In particular, the lattice
constant a = 9.0713, Co atom lie in the 8c position (0.25,0.25,0.25) and Sb atom in the 24g
position (0, y = 0.3349, z = 0.1583). Electronic structure calculations are performed using the
plane wave pseudopotential method within density functional theory (DFT) as implemented
in CASTEP[3].The electronic exchange-correlation potential is described in terms of local
density approximation (LDA) and generalized gradient (GGA). The atoms are represented by
ultrasoft pseudopotential and norm-conserving pseudopotentials. Note that a plane wave
energy cutoff 300 eV is required the first and higher value of 550 eV is used for the latter.
The states Co(3d
7
4s
2
) and Sb(5d
2
5p
3
) are treated as valence states. The structural parameters
of CoSb
3
were determined using the Broyden-Fletcher-Goldfarb-Shenno(BFGS) minimisation
technique, with the threshold for converged structure: energy change per atom less than
D00047
March 23-26, 2010
337
5
1 10
eV, residual force less than 0.03 eV/ , stress below 0.05 GPa and the displacement
of atoms during the geometry optimisation less than 0.001 . The Brillouin zone was sampled
by a 4 4 4 mesh generated according to the Monkhorst-Pack scheme to ensure well
convergence of the computed structures and energies.

The cubic CoSb
3
structure used is shown in Figure 1. The structure is composed of eight
corner-shared TX
6
octahedra. As seen in the figure, the link octahedra produces a void at the
center of (TX
6
)
8
-clusters and this vacant site occuppies a body-centred position of the cubic
lattice. The calculated lattice constant of CoSb
3
from LDA(GGA) calculations using ultrasoft
pseudopotential is 9.0385(9.1188) , the LDA(GGA) calculations using norm-conserving
pseudopotentail is 8.7956(8.9035) , H. Takizawa showed that lattice constant form XRD
technique is 9.0350 [2].

Figure 1 Crystal structure of CoSb
3
.

For the LDA(GGA) calculations with ultrasoft pseudopotential used, provides us a direct
band gap of 0.20(0.41) eV. On the other hand, similar calculations employing the norm
conserving pseudopotential reveal that CoSb
3
has a direct band gap 0.23(0.54) eV. Note that
other calculations reported predict the band gap of CoSb
3
lies between 0.05 - 0.70 eV[4]. The
experimental band gap of CoSb
3
obtained by transport measurement is 0.55 eV, whilst those
obtained via optical measurement is 0.07 eV[5]. Figure 2(a) shows the band structure of
CoSb
3
with its direct gap of 0.54 eV. The minimum conduction band is located at in
agreement with the value obtained by transport measurement[5].

Figure 2. LDA calculation results of (a) band structure and (b) patial density of state
(PDOS) of CoSb
3
. Norm-conserving pseudopotential is used.

The PDOS calculated for Co 3d
7
4s
2
and Sb 5s
2
5p
3
are also shown in Figure 2(b), The
lower valence band is Sb 5s
2
orbitals and higher valence band is consisted of Co 3d
7
and Sb
5p
3
orbitals. The conduction band is composed of Sb 5s
2
5p
3
and Co 3d
7
orbitals. It is
worthwhile to mention that CoSb
3
contains 192 valence electrons, 97 bands are fully occupied
and a gap is generated between the 97
th
and 98
th
band. Inspecting Figure 2(b), it can be seen
that the highest occupied band comes from Sb 5p
3
states. The corresponding charge density
map is reported in Figure 3. The important point is the density is not situated on Sb-Sb
D00047
March 23-26, 2010
338
bonding [Figure 3(b)]. The total valence density is displayed in (100) plane containing the Sb
and Co atoms.

Figure 3. CoSb
3
charge density map from CASTEP in (a) (100) plane
and (b) (400) plane.

In addition to electronic structure calculations, we have investigated the thermoelectric
properties of CoSb
3
. By using the calculated band structure in conjunction with the
Boltzmann transport equation and the rigid band approach as described in details in [6], the
conductivity of material is based on the transport distribution

,
,
( )
1
( , )
i k
i k
T
N d
o| o|
o c c
o u o
c
=

(1)
The transport tensors, Eq.(1), can then be calculated from the conductivity distributions

( ; )
1
( , ) ( )
f T
T d
u
o| o|
c
o u o c c
c
c
=

O c

(2)

( ; )
1
( ; ) ( )( )
f T
T d
eT
u
o| o|
c
u u o c c u c
c
c
=

O c

(3)

0 2
2
( ; )
1
( ; ) ( )( )
f T
T d
e T
u
o| o|
c
k u o c c u c
c
c
=

O c

(4)
where
0
is the electronic part of the thermal conductivity, is the Fermi-Dirac distribution
function, T is absolute temperature, and is chemical potential. It is worthwhile to mention
that in the rigid band approach, the bands and hence ( ) o c are left fixed. This means that only
band structure calculations are required, simplifying the calculations. To further simplify the
problem, one can also assume that Seebeck coefficient(S) is independent of relaxation time
(), hence it can be written as
1
S o u
= . Eventually, the calculated Seebeck coefficients and

electrical conductivities are then used to predict thermoelectric power factor( PF = S
2
).
Figure 4 shows the calculated power factor of CoSb
3
. It can be seen that PF increases with
increasing temperature, in agreement with experiment for temperature below 650 K[7]. The
electronic thermal conductivity at zero electric current can be calculated using the following
expression:

0 1
( )
e
ij ij i j
T
o |o |
k k u o u
= (5)
or employing the well know Wiedemann-Franz law for degenerate change carriers;

2
2
3
e B
ij ij
k
T
e
t
k o
| |
=
|
\ .
(6)

D00047
March 23-26, 2010
339

Figure 4. Thermoelectric power factor of CoSb
3

The comparison between the electronic thermal conductivity calculated from the two
approaches have been performed and shown in Figure 5. It can be seen that both models
provide almost the same values for all
F
E considered. Eventually, the predicted figure of
merit (zT) of CoSb
3
are calculated by using
0
in eq(5) and thermoelectric power term.

Figure 5. Calculated electronic thermal conductivity of CoSb
3
using
0
and
el
in eq (5),(6).

The calculated zT are shown in Figure 6. Note that our results are comparable to those
reported by G. J. Synder for temperatures of 300 - 700 K[8].

4. CONCLUSION
The calculated electronic structure shows that CoSb
3
is a semiconductor with a direct gap
of 0.54 eV, which are good agreement with experimental. From calculated band structure, we
can predict thermoelectric properties of CoSb
3
. Our calculations are comparable with those
investigated experiments.

D00047
March 23-26, 2010
340

Figure 6. Calculated figure of merit of CoSb
3
as a function of temperature.

REFERENCES
1. Y. Kawaharada et al., J. Alloy. Compd., 2001, 315, 193-197.
2. H. Takizawa, K. Miura, M. Ito, T. Suzuki, T. Endo, J. Alloy. Compd., 1999, 282, 79-83.
3. S.J. Clark et al., Zeitschrift fuer Kristallographie, 2005, 220(5-6), 567-570.
4. H. rakoto, M. Respaud, J.M. Broto, E. Arushanov, T. Caillat, Physica B. 1999, 269, 13-
16.
5. K.T. Wojiechowski, J. Alloy. Compd., 2007, 439, 18-24.
6. Madsen G. K. H., J.Am. Chem. Soc., 2006, 128, 12140-12146.
7. J.L. Mi et al., J. Alloy. Compd., 2008, 452, 225-229.
8. G. Jeffrey snyder and Eric S. Toberer, Nature materials. 2008, 7, 105-114.
9. Z.J. Pan, L.T. Zhang, J.S. Wu, Mat. Lett. 2007, 61, 2648-2651.
10. K. Takegahara, H. Harima, Physica B. 2003, 328, 74 76.

ACKNOWLEDGMENTS
The authors acknowledge the Cooperation on Science and Technology Researcher
Development Project by office of the Permanent Secretary Ministry of Science and
Technology for financial support. This work has partially been supported by National
Nanotechnology Center (NANOTEC), National Science and Technology Development
Agency (NSTDA), Ministry of Science and Technology, Thailand, through its Computational
Nanoscience Consortium (CNC). A. Y. acknowledges Dr S. J. Clark, Durham University,
UK for providing his code. The computing resources through SILA clusters at
Ramkhamhaeng are very grateful.

D00048
March 23-26, 2010

341
The Effect of the Coulomb Interaction and Exchange Interaction
on Spin Magnetic Moment of MnO

C. Thassana
1*
and W. Techitdheera
1

1
Department of Physics, Faculty of Science, King Mongkuts Institute of Technology Ladkrabang,
Bangkok, Thailand 10520
*
Email : ct2709@gmail.com Tel : 0858429843

ABSTRACT
The effect of the Coulomb interaction U and the exchange interaction J on the spin magnetic
moment (m
s
) of MnO were studied by using the local spin density approximation plus the
Coulomb interaction U (LSDA+U) within the linear muffin-tin orbital (LMTO) method. Our
calculation shown the spin magnetic moment increased with increasing of both the Coulomb
interaction U and the exchange interaction J. However, when m
s
equal to 4.58
B
was
fixed ,according to an experiment, and we calculated by increasing U with each step 1 eV.
Then the value of J decreased each 0.249 eV and equal to zero at U = 5.5 eV. Based on this
LSDA+U model, it very clear to conclude that the linear relation between U and J of MnO is J
= -0.249U + 1.346 eV.

Keywords: : Coulomb interaction U, exchange interaction J, spin magnetic moment, MnO.

1. INTRODUCTION
The spin magnetic moment of MnO was report with the value 4.58
B
[1] and some research groups
have used some theoretical models to verify this value such as SIC-LSDA [2, 4], constrained LDA [3],
SIC-LDA+U [5,7], hybrid exchange-correlation[6], QP-GW [8], LSIC [9], EXX [10]. All calculated
methods used different values of the Coulomb interaction U and the exchange interaction J for
example U = 6.9eV, J = 0.86 eV [3], U = 4, 6, 8 eV, J =0.9 eV [5]. However the values of spin
magnetic moment obtained from these methods have both overestimated [3,4,9,10] and
underestimated [2,5,6] compare with the experimental value.[1] as shown in the table 1.
At normal pressure and room temperature, MnO has a NaCl structure with lattice constant 4.44 .
The Mn
2+
ion has a 3d
5
configuration and below T
N
122K, MnO orders in the type II
antiferromagnetic structure which the magnetic moment of Mn
2+
ion align ferromagnetically on ever
(111) planes. This crystal structure can be as a face center cubic (fcc) lattice with four atoms in the
unit cell: Mn
2+
spin up ions at (0.0,0.0,0.0), Mn
2+
spin down ions at (1.0,1.0,1.0) and O
2-
ions at
(0.5,0.5,0.5) and (1.5,1.5,1.5).

In LSDA+U calculation method, were performed by using the muffin-tin radii of Mn
2+
and O
2-

ions with the value 2.346 and 1.843 a.u., respectively. After that, the spin magnetic moment were
investigated by using the lattice constant 0.443nm and temperature 122K. In this work, we studied the
effect of both the U and J on spin magnetic moment of MnO which our calculations were divided into
two cases by the following.
The first one, spin magnetic moment was calculated within U and J varied from 1.0 to 5.0 eV and
0 to 1.0 eV, respectively and another one, we investigated the relation between U and J which give the
spin magnetic moment correspond to experimental data 4.58
B
.[1]

D00048
March 23-26, 2010

342
3. RESULTS AND DISCUSSIONS
The value of the spin magnetic moment of MnO were calculated self-consistently by LSDA+U.
In Fig. 1 shows the spin magnetic moment depend on J. When J varied from 0 to 1.0 eV and U = 3.0,
4.0 and 5.0 eV. Since the exchange interactions should be analyzed by means of quantum theory, so it
strongly concerns with spin-spin interactions. More specifically, on the atomic scale, the exchange
interaction J tends to align neighbor spins so the spin magnetic moment increase with increasing of J.
Furthermore the spin magnetic moment also depend on U, as shown in Fig. 2. Our calculation
shows the spin magnetic moment increase with increasing of U. Since the Coulomb interaction U
enhances the electron localization, which leads to magnetic moment gain.
In table 1 shows the calculated values of spin magnetic moment which increase from 4.37
B.

(U=1.0 eV, J=0 eV) to 4.67
B.
(U=5.0 eV, J=1.0 eV). Therefore we can conclude that the spin
magnetic moment depend on both U and J.

Figure 1. Show the spin magnetic moment depend on J while J varied from 0 to 1.0 eV and U = 3.0,
4.0 and 5.0 eV

Figure. 2 Show the spin magnetic moment depend on U while U varied from 0 to 6 eV and J =0, 0.86
and 1.0eV.
D00048
March 23-26, 2010

343
Table 1. Show the methods, the Coulomb interaction U (eV), the exchange interaction J (eV) and
spin magnetic moment m
s
(
B
) of MnO.
Calculating Methods U(eV) J (eV) m
s
(
B)

Experiment [1]
-
4.58
This Work 0 5 0 1 4.37- 4.67
SIC-LSDA [2] - - 4.49
Constrained LSDA [3] 6.9 0.86 4.61
SIC-LSDA ASA [4] - - 4.64
Hybrid [6] 4.46
LDA+U
d+p
[7] 6.9 0.86 4.59
QP+GW [8]

4.50
LSIC [9]

4.63
EXX [10]

4.81

Fig. 3 show the relation between the Coulomb interaction U and exchange interaction J. When
we calculated by fixed the value of the spin magnetic moment = 4.58
B
. We found that the exchange
interaction J linearly decreased about 0.24 eV with increasing of the Coulomb repulsion U. 1.0 eV
which now we can not explain this phenomena. Our full range show in table 2, the values of U
increase from 1.0 eV to 5.58 eV but J decrease from 1.11 eV to 0 eV. Some previous papers reported
the values of the spin magnetic moment overestimated [3,4,9,10] and underestimated [2,5,6] compare
with the experimental data[1]. While some papers, LDA+U
p+d
[7] and Constrained LSDA [3] which
both methods used the same U= 6.9 eV and J = 0.86 eV, reported the spin magnetic moment was
4.59
B
[7] and 4.61
B
[3] which it was very closed values to the experiment in ref. [1]. While our
calculated spin magnetic moment is 4.58
B
, If we use J = 0.86 eV and 2 eV of U. But when we used
U = 5.0 eV then J must be equal to 0.1 eV and so on as shown in table 2.

Figure. 3 Show the relation between the Coulomb interaction U and the exchange interaction J which
the value of spin magnetic moment is 4.58
B.

D00048
March 23-26, 2010

344
Table 2. The values of the Coulomb interaction U (eV) and the exchange interaction J (eV) which
the spin magnetic moment of MnO correspond to the experimental data 4.58
B
[1].
U (eV) 5.5 5.0 4.0 3.0 2.0 1.0
J (eV) 0
0.1
4
0.3
8
0.6
2
0.86
1.1
1
4. CONCLUSIONS
The effect of the Coulomb interaction U and the exchange interaction J on the spin magnetic
moment of MnO were studied by using LSDA+U within the linear muffin-tin orbital (LMTO) method.
The resulting spin magnetic moment of MnO increased with increasing of both the Coulomb
interaction U and the exchange interaction J. When m
s
was fixed at 4.58
B
and the values of U and J
were varied. We found the relation between U and J is J = -0.249U + 1.346 eV.

REFFERENCES
[1] A.K. Cheetham and D.A. Hope, Magnetic ordering and exchange effects in the antiferromagnetic
solid solutions Mn
x
Ni
1-x
O, J. Phys. Rev B., 27, 1983, pp 6964-6967.
[2] A. Svane Du and O. Gunnarsson, Transition-Metal Oxide in the Self-Interaction-Corrected
Density-Functional Formalism, J. Lett.65,vol 9, 1990, pp 1148-1151.
[3]. V.I. Anisimov, J Zaanen and O.K Anderson, Band Theory and Mott insulators: Hubbard U
instead of Stoner I , J. Phys. Rev. B, 44, vol 3, 1991, pp 943-954.
[4] Z. Szotek, and .W.M. Temmerman, Applicaton of the self-interaction correction to transition-
metal oxides, J. Phys. Rev. B.47, vol 7, 1992, 4029-4032.
[5] D.W. Boukhvalov, A.I. Lichtenstein and V.I. Anisimov, Effect of local Coulomb interactions on
the electronic structure and exchange interactions in Mn
12
magnetic molecules , J Phys. Rev.
B.65, 2002, pp 184435-1- 1184435-6.
[6] F.Tran, P. Blaha, K. Schwarz and P. Novak, Hybrid exchange-correlation energy functionals for
strongly correlated electrons: Applications to transition-metal monoxides, J. Phys. Rev. B. 74,
2006, pp 155108 -155117.
[7] I.A. Nekrasov, M.A. Korotin and F. V.I. Anisimov, cond-mat 0009107v1, 2008.
[8] C. Rodl, F. Fuchs, J. Furthmuller and F. Bechstedt, Quasiparticle band structures of the
antiferromagnetic transition-metal oxides MnO, FeO, CoO, and NiO, J. Phys. Rev. B.79, 2009,
pp 235114-235121.
[9] G. Fisher, M. Dane, W. Temmerman and W. Hergert, Exchange coupling in transition metal
monoxide : Electronic structure calculations, J. Phys. Rev.B 80, 2009, pp 014408-1- 014408-11.
[10] E. Engel, and R.N. Schmid, Insulating Ground States of Transition-Metal Monoxides from Exact
Exchange, J. Phys. Rev. Lett 103, 2009, pp 036404-1-036404-4

ACKNOWLEDGMENTS
The authors thank Department of Physics, Faculty of Science, King Mongkuts Institute of
Technology Ladkrabang, Bangkok, Thailand for their facilities support.

D00056
March 23-26, 2010
345
First-principles study of cubic perovskites Ba
1-x
Sr
x
TiO
3

W. Chaiyarat
1
and A.Yangthaisong
2,C

1,2
Computational Materials and Device Physics group, Department of Physics, Faculty of Science,
Ubon Ratjathanee University, Ubon Ratchathani, 34190, Thailand
C
E-mail: a.yangthaisong@physics.org ; Fax; ++66-45-288381; Tel. ++66-086-867805

ABSTRACT
A theoretical study of structural, electronic and optical properties of cubic perovskites
Ba
1-x
Sr
x
TiO
3
( x = 0, 0.25, 0.50, 0.75 and 1) are presented using the total energy plane-
wave pseudopotential method as implemented in CASTEP code[1]. The calculations
suggest that the energy gaps tend to increase with increasing Sr fraction contens. These
can be determined by considering the Ti O bonding [2]. In fact, the calculated density
of states of electrons reveal that the peak about -15 eV dominated by Sr 4p state,
increases with the increasing of Sr contents in the materials. This is in good agreement
with the experimental result[3].

Keywords: First-principles calculations, BSTO, High dielectric constant.

REFERENCES
1. Clark, S.J. and et al., Zeitschrift fuer Kristallographie., 2005, 567.
2. Tang,Y. H., Tsai, M. H., Jan, J. C., and Pong., W. F., Chinese J. Of Phys., 2003, 41(2),
167-176
3. Samantaray, C. B., Hyunjun, S. and Hyunsang, H., Microelec. J.,2005, 36, 725728.

D00061
March 23-26, 2010
346
On the Estimation of Solar Particle Fluence at Jupiter's Orbit

A. Siz
1,2,C
, D. Ruffolo
1,2
, J. W. Bieber
3
, and P. Evenson
3
1
2
3
Bartol Research Institute and Department of Physics and Astronomy, University of Delaware, Newark,
DE, United States
C
E-mail: scasa@mahidol.ac.th; Fax: 022015762; Tel. 022015756

ABSTRACT
Solar energetic particles (SEPs) are one major hazard concern for astronauts in space
missions, and their possible effects need to be evaluated before planning long-term
missions such as eventual manned trips to Mars. Although particle transport between the
Sun and the Earth is currently well understood, an accurate modeling technique for
transport to larger distances is still needed in order to predict potential damage to
spacecraft and crew by SEPs, especially during extreme events. A common consensus is
that the pitch-angle scattering radial mean free path can be assumed to be constant, but new
results in simulations of solar wind turbulence suggest that there is a dependence on
distance to the Sun. In this work we model the radial transport of SEPs in the inner
heliosphere and out to the orbit of Jupiter by specifying a different radial dependence for
the pitch-angle scattering mean free path. We estimate time profiles and fluence at different
distances from the Sun for different particle energies, and compare the results with those
corresponding to the previous assumptions. We use the new technique to model the
intensity of SEPs at Jupiter during selected past events. Partially supported by the Thailand
Research Fund and NASA's Living With a Star program under grant NNX08AQ18G.

Keywords: solar energetic particles, astronaut safety, equation of transport.

REFERENCES
1. Palmer, I. D., Rev. Geophys. Space Phys., 1982, 20, 335.
2. Ruffolo, D., Astrophys. J., 1995, 442, 861.
3. Nutaro, T., Riyavong, S., and Ruffolo, D., Comp. Phys.
Comm., 2001, 134, 209.

D00062
March 23-26, 2010
347
First principle study on the optical band-edge absorption of
Fe-doped SnO
2

S. Pabchanda
1C
, J. Putpan
1
, R. Laopaiboon
2
and A. Yangthaisong
2

1
Department of Chemistry and Center of Excellence for Innovation in Chemistry (PERCH-CIC),
Faculty of Science, Ubonratchathani University, Ubonratchathani, 34190, Thailand
2
Department of Physics, Faculty of Science, Ubonratchathani University,
Ubonratchathani, 34190, Thailand.
C
E-mail: pabchanda@gmail.com Fax: 045-288379; Tel. 089-9077250

ABSTRACT
The optical properties of pure and Fe-doped SnO
2
were characterized in terms of the
optical band-edge absorption. In this work, optical properties were studied employing first-
principle calculations. The electronelectron exchange and correlation effects were
described by PerdewBurkeEruzerhof (PBE) in generalized gradient approximations
(GGA) within the density functional theory (DFT). The optical band gaps of pure and iron-
doped SnO
2
were calculated to be 3.75 and 3.58 eV, respectively. Our calculations were
compared with UV-Visible absorption spectroscopy measurements. The sample films were
prepared by spray pyrolysis deposition method onto glass substrate considering fixed at 6
at% iron doping levels. The direct optical band gaps of pure and iron-doped SnO
2
thin
films were determined to be 3.78 and 3.65 eV, respectively. We found for pure and iron-
doped SnO
2
thin films qualitatively good agreement of the calculated optical band-gap
energy as well as the optical absorption with the experimental results.

Keywords: SnO
2
, Fe-doped SnO
2
, band gap, GGA, DFT, spray pyrolysis.

REFERENCES
1. Roman, L.S., Valaski, R., Canestraro, C.D., Magalhes, E.C.S. Persson, C., Ahuja, R., da
Silva Jr., E.F., Pepe, I., and Ferreira da Silva, A., Applied Surface Science, 2006, 252, 5361-
5364.
2. Errico, L. A., Physica B: Condensed Matter, 2007, 389, 140-144.
3. Xu, J., Huang, S., and Wang, Z. Solid State Communications, 2009, 149,527-531.
4. Rantala, T. S., Lantto, V., and Rantala T. T., Sensors and Actuators B: Chemical, 1994, 19,
716-719.
5. Mki-Jaskari, M. A., Rantala, T. T., and Golovanov, V. V., Surface Science, 2005, 577, 127-
138.
6. Bagheri-Mohagheghi, M.-M., Shahtahmasebi, N., Alinejad, M. R., Youssefi, A., and
Shokooh-Saremi, M., Solid State Sciences, 2009, 11, 233-239.
7. Qin, G., Li, D., Feng, Z., and Liu, S., Thin Solid Films, 2009, 517, 3345-3349.

Computational
Fluid Dynamics
and
Solid Mechanics

C00005
March 23-26, 2010
348
Pressure Distribution along the Silo Wall

W. Chuayjan
1
, B. Wiwatanapataphee
1, C
, Y.H. Wu
2
, and I.M. Tang
3

1
Department of Mathematics, Faculty of Science, Mahidol University, 272, Rama 6 Road, Bangkok,
10400, Thailand
2
Department of Mathematics and Statistics, Curtin University of Technology, Perth, WA, 6845,
Australia
3
Department of Physics, Faculty of Science, Mahidol University, 272, Rama 6 Road, Bangkok, 10400,
Thailand
C
E-mail: scbww@mahidol.ac.th; Fax: 02-2015343; Tel. 02-2015541

ABSTRACT
We present numerical simulation of granular flows in vertical-sided silo with conical
hopper bottom during discharge. A mathematical model has been developed and
discrete element method is used to solve the solution of the model. Pressure distribution
along the silo wall with various angles of hopper bottom is investigated. The simulation
results obtained in the present work are reasonable accuracy comparing with existing
experimental observations.

Keywords: Silo, Wall Pressure, Granular Flow, Discrete Element Method.

REFERENCES
1. Cundall, P. A. and Strack, O. D. L., Geotechnique, 1979, 29, 47-65.
2. Guaita, M., Couto, A., and Ayuga, F., Biosystems Engineering, 2003, 85(1), 101-109.
3. Ristow, G. H., Physica A, 1997, 235, 319-326.
4. Rotter, J. M., Brown, C. J., and Lahlouh, E. H., Engineering Structures, 2002, 24, 135-
150.
5. Ooi, J. Y., Pham, L., and Rotter, J. M., Engineering Structures, 1990, 12, 74-87.
6. Cleary, P. W., Engineering Computations, 2004, 21(2/3/4), 169-204.
7. Cleary, P. W., Engineering Computations: International Journal for Computer Aided
Engineering and Software, 2009, 26(6), 698-743.
8. Chen, J. F., Rotter, J. M., and Ooi, J. Y. and Zhong, Z., Engineering Structures, 2007, 29,
2308-2320.

E00001
March 23-26, 2010
349
Linear and weakly nonlinear solutions of
subcritical free-surface flow over submerged obstacles

P. Guayjarernpanishk
1,C
, and J. Asavanant
2

1
Department of Mathematics Statistics and Computer, Faculty of Science, Ubon rajathanee University,
Ubon Ratchathani, 34190, Thailand
2
Advanced Virtual and Intelligent Computing (AVIC) Research Center, Faculty of Science,
C
E-mail: gpanat@sci.ubu.ac.th; Tel. 089-1065125

ABSTRACT
Two-dimenional subcritical flows over submerged obstacles are considered. The
fluid is assumed to be inviscid and incompressible, and the flow is irrotational.
Both gravity and surface tension are included in the dynamic boundary
condition. Far upstream, the flow is assumed to be uniform. Numerical solutions
of the linear and weakly nonlinear problems show that there exist two branches
of subcritical flow. It is found that the solutions are characterized by the Froude
number, F, the Bond number, B, and the height of obstacles. Finally, unsteady
solutions of the weakly nonlinear solutions are presented.

Keywords: Free-surface Flow, Submerged Obstacles, Gravity, Surface Tension, Linear
and Weakly Nonlinear Problems.

REFERENCES
1. Binder, B. J., Vanden-Broeck, J.-M., and Dias, F., Chaos, 2005, 15, 037106.
2. Faltas, M. S., Hanna, S. N., and Abd-el-Malek, M. B., Acta Mechanica, 1989, 78, 219-33.
3. Forbes, L. K., J. Fluid Mech., 1983, 127, 283-97.
4. Fornberg, B., A Practical Guide to Pseudospectral Methods, Cambridge University Press,
New York, 1998.
5. Lamb, H., Hydrodynamics, 4
th
ed., Cambridge University Press, London, 1932.
6. Shen, S. S. P.,Quar. App. Math, 1995, 53, 701-19.

E00003
March 23-26, 2010
350
Coastal Simulation of the Gulf of Thailand:
Effects of tidal forcing

S.Tomkratoke, S. Vannarat and S. Sirisup
C

Large-Scale Simulation Research Laboratory
National Electronics and Computer Technology center
112 Thailand Science Park, Klong 1 Klong Luang Pathumthani 12120, Thailand
C
E-mail: sirod.sirisup@nectec.or.th; Fax: 662-5646776; Tel. 662-5646900

ABSTRACT
In this study, we provide the simulation results with high fidelity for the Gulf of
Thailand and nearby areas. The high fidelity includes high resolution bathymetry
(30secs), finer scale capturing and ability to preserve the realistic, complex coastline
and islands. Here, we employ the finite-volume method for its geometric flexibility,
computational efficiency as well as volume and mass conservation assurance. We will
focus mainly on the effects of the tidal forcing in the Gulf of Thailand and nearby area
as well as analysis of local tidal characteristics. The resolution and critical parameter
like bottom drag coefficient dependency will be fully discussed.

Keywords: Coastal Simulation, Gulf of Thailand, Finite-volume method.

1. INTRODUCTION
The Gulf of Thailand is located between longitude of 99 to 105 degrees east and latitude
of 6 to 13 degrees north. The Gulf occupies in northwest of the Sunda shelf of the South
China Sea. The east coast of the gulf is adjacent to the coastline of Vietnam and South China
Sea and the west coast of the gulf is surrounded by the Southern coastline of Thailand and
Malaysia. The coastal area along the northern boundary of the gulf is bordered to the Central
Plain of Thailand (Chao Phraya Delta), see Figure 1. The propagation of the tidal wave across
the deep basin can be easy recognized because its tidal patterns are very simple, and the tidal
currents are very weak. In contrast, when the tidal wave propagates into the sunda shelf, the
tidal regimes on the shelf, especially the semidiurnal tides, are complex and the tidal currents
may be strong. The characteristic of tidal wave in the sunda shelf such as the Gulf of Thailand
has been receiving some great attention in the oceanography researchers. The complexity of
coastline and the sea bed topography of the sunda shelf can be responded to the tidal wave
with the unique character. Most of the important character of tidal wave is the amphidromic
systems. If their existence, locations also with their rotary direction of amphidromic points of
the major tidal constituents can be identified in a tidal system, it is easy to get a clear pattern
of the tidal wave; otherwise, an uncertain or wrong amphidromic point location will lead to
uncertain or wrong tidal charts. For many decades, early investigations of tides in the South
China Sea have been pursued by numerous oceanographers such as [1], [2], [3] as well as
many others. The co-tidal charts drawn by different researchers before the 1980s revealed
great diversity in shelf areas. The discrepancies among the published co-tidal charts have been
significantly reduced since 1980s. However, noticeable differences can still be observed. The
recent numerical study of Fang, 1999, [3] revealed that there is clockwise rotating M
2

amphidromic system with the point lies in the center of the Gulf of Thailand and the one near
the Central Plain of Thailand and an S
2
amphidromic system with the point lies near the
Natuna islands. To this, there are some debates on whether there is an S
2
amphidromic point
near the Natuna islands. However, from the recent study through the altimetry data for
studying the tidal wave in the South China Sea, especially near that area. Based on such data,
Qingwen et.al, 2006, [4] confirmed that there is no S
2
amphidromic point but rather a nodal
E00003
March 23-26, 2010
351
line of a standing wave around the Natuna islands instead. In the present study, we aim to
gain detailed mechanism of tidal wave propagation as well as their amphidromic systems for
the four major M
2
, S
2
, K
1
, and O
1
constituents which are nearly equally important in
generating shallow water tides in the shelf areas. In order to obtain the most realistic
simulation results with complicated coastlines, islands as well as well represented bathymetry;
the investigation will be based on simulation using the finite-volume method.

Figure 1. Topography and bathymetry of the Gulf of Thailand and nearby areas

The governing equations used in the present study are the boussinesq, hydrostatic
approximations of the primitive equations of momentum and (water) momentum conservation
which can be written in Cartesian coordinates as below:
where u and v are the dept-averaged currents in the x and y directions.
is the sea surface

elevation, h is the undisturbed depth, f is the Coriolis parameter, g is the acceleration due to
gravity, p
a
is the atmospheric pressure at the sea level, p
0
is the water density, N
h
is the
horizontal eddy viscosity coefficient, and sx
, sy
are the surface wind stresses and bx

,
by

are the bottom friction stresses in the x and y directions, respectively. We employ the
unsteady finite volume coastal ocean model (FVCOM), to simulate the propagation of tidal
E00003
March 23-26, 2010
352
waves form the four major M
2
, K
1
, S
2
and O
1
constituents. FVCOM solves the governing
equations by using a flux calculation integrated over each model grid control volume with
mode-splitting method to external and internal mode time steps to accommodate the faster
and slower barotropic and baroclinic responses. More detailed descriptions of FVCOM code
can be found in Chen et al. (2003, 2006), [5, 6].

Figure 2. Computational domain (left) and mesh overlaid bathymetry (right)
used to simulate the propagation of tidal wave in the Gulf of Thailand

The computational domain and the bathymetry are showed in Figure 2. The domain is set up
at the area along latitude 1.35 to 13.53 degree north and longitude 99.27 to 101.86 degree east
(141030N to 1496482N and 519691E to 1875413E in UTM datum) covered the sunda shelf
the South China Sea. The domain is decomposed into 69,000 triangular elements; here, the
hifidelity bathymetry data with the resolution of 30 seconds from GEBCO [7] is used. The
coastlines and the bathymetry data near the coast are also enhanced with the ENC data from
the Hydrographic department of the Royal Thai navy. The boundary conditions at the open
boundaries (the edge of sunda shelf and Karimata Strait) are prescribed with an effect of each
constituent derived from OTIS [8]. The resolution dependency will be studied by measuring
of the phase and amplitude from the present mesh and the refined one. The effect of bottom
drag coefficient to the propagation of the tidal wave will be also investigated with value in the
range of 0.002 to 0.004.

3.1 Results verification: The amphidromic systems in the Gulf of Thailand for M
2
,
S
2
, K
1
and O
1
constituents are depicted in Figure 2. As revealed in Figure 2, the pattern of co-
tidal lines for both case of semi-diurnal and diurnal tides are in quite good agreement to the
results of [3] with some exceptions. The amphidromic point for M
2
is found near the central
of the Gulf of Thailand and rotating in clockwise direction. We have found that our
simulations also produces the nodal band oriented in west-east direction near the upper-north
of the Gulf of Thailand meanwhile there is an amphidromic point [3], the reason for this
situation will be fully discussed in the next section. In [3], the amphidromic system for S
2

constituent is rather similar to that of M2 but the an amphidromic point with counterclockwise
rotating fashion is shifted eastward to the Ca Mau Peninsula , however, the tidal regime in
the upper-north of the Gulf of Thailand remains the nodal band. However, for S
2
amphidromic
E00003
March 23-26, 2010
353
system, we have rather find that, there is the nodal band of S
2
instead of the amphidromic
point along the southern part of the Gulf of Thailand. And the amphidromic point with
rotating clockwise direction is found near the one that is found in the M
2
amphidromic
system. We thus make a comparison to other results to justify this mismatch for the case of S
2
.
We have found that our result is actually in good agreement to the solution from the OTIS [8]
and especially, [4]. The rotating clockwise phase propagation feather of semi-diurnal tide
which is quite unusual for the Northern Hemisphere will be fully discussed in the next section
as well.

For the case of diurnal tides, the diurnal tides propagate from the northeast of the
South China Sea in to the Gulf of Thailand. The amphidromic systems are found in the
middle region of the Gulf but slightly shifted toward the Thailand southern coast and the
phase propagation is much closed to [3]. Both systems rotate counterclockwise: this is normal
and in good agreement to the Kevin wave theory for the Northern Hemisphere.

3.2 Tidal dynamics: Here, the mechanism of clockwise phase propagation of semi-
diurnal tide and counterclockwise phase propagation of diurnal tide are discussed. For the
case of M
2
tide, as we can see in Figure 3, the clockwise amphidromic system exists close to
the tip of Ca Mau Peninsula and far away from the southern coast of Thailand. To explain this
mechanism, the effect of sea bed slope, the influence of Coriolis and natural oscillation are to
be concerned. The geometry of the Gulf can be characterized by an L shape channel with the
longest extension is in the direction to Zhongsha islands and the deepest bottom locate in the
middle, see also [4]. Supporting by the bottom slope, the incoming M
2
wave acts as an edge
wave and propagates along the southern coast to the northern coast. The propagating edge
waves are then reflected at the northern and western coast then the natural oscillation in east-
west directions along the deepest bottom is generated. The propagating M
2
tidal wave along
the northern coast decreases its amplitude along the eastern coast due to the effects of the
node of natural oscillation along the deepest bottom. On the other hand, the amplitude
decreasing ratio of the propagating M
2
tidal wave along the southern coast is not so large
compared to that of the other propagating wave along the northern coast. Therefore, the phase
propagation direction of the M
2
amphidromic system is thus clockwise. We also see that there
is a large nodal band appears along the deepest bottom slope of the Gulf in the direction to the
west tip of Borneo, in this situation, the effects of the interference between the incoming wave
from the deep basin and that from the incoming wave from the Karimata Strait is dominated.
There is also another nodal band locates near the upper north of the Gulf of Thailand resolved
in the current simulation. This nodal band, instead of developing into an amphidromic point
as found in [3], results from the interference of the northward tidal wave corresponding to the
clockwise amphidromic system and the southward reflected wave from the Central Plain of
Thailand. It is also noted here that the periods of natural oscillations (defined from
T
i
=2L
b
/i(gH)
0.5
) in south-north and east-west direction in the upper-North Gulf are much
smaller than that of the M
2
tide. With this reason, the semi-diurnal tide should not resonate in
this region.

For the diurnal tides, the phases of K
1
and O
1
tidal wave propagate counterclockwise
with effect from the Coriolis force with Kelvin waves characteristics. This is based on the
fact that the periods of the diurnal tides is longer than the period of semi-diurnal tides,
therefore the effects of Coriolis force on the K
1
and O
1
tides are much larger than the M
2
and S
2
tides.

E00003
March 23-26, 2010
354
3.3 Resolution and bottom friction parameter dependency: In this section
, the resolution and bottom friction parameter dependency are presented.

Figure 3. Co-tidal charts for M
2
,S
2
, K
1
and O
1
(Top to bottom).
Left: The present study. Right: The results from [3]

E00003
March 23-26, 2010
355
We have found that as the results from the simulation with number of meshes increased from
2 to 5 times as much, the overall profiles (amplitudes and phases variations and profiles of
wave propagation) of the amphidromic systems do not show any significant difference. The
bottom friction parameter rather plays important role in the change of the magnitude of the
amplitudes in the co-tidal charts; however, it does not change any general characteristics of
the phases in the co-tidal charts

4. CONCLUSION
In this study, the effects of tidal forcing of four major M
2
, K
1
, S
2
and O
1
constituents in
the Gulf of Thailand and nearby areas are investigated. The results of the investigation can be
summarized as follow:
The effects of the botom slope, the period of the incoming wave tides and the natural
oscillation produce clockwise phase propagation for the M
2
and S
2
tides but produce
the counterclockwise amphidromic for K
1
and O
1
tides.
The M
2
and S
2
tides produce large nodal bands mainly on the deepest bottom of
southern part of the Gulf and the narrow nodal band appear near the upper-North of
the Gulf.

REFERENCES
1. Ye, A.L., Robinson, I.S., Geophysical Journal of the Royal Astronomical Society, 1983,
72, 691-707.
2. Yu, M. Acta Oceanologica Sinica, 1984, 6, 293-300.
3. Fang G H, Kwok Y K, Yu K J, et al., Contin Shelf Res, 1999, 19, 845-869.
4. Qingwen M., Yiquan Q., Ping S., Haigang Z., Zijun G. Chinese Science Bulletin ,2006
,51 Supp. II ,26-30.
5. Chen C., Liu, H., Beardsley R.C. J.Atmos.Ocean.Technol. 2003, 20,159-186.
6. Chen C., Huang H., Beardsley R.C., Liu H., Xu Q., Cowles, Technical Report-06-
0602,2006, 315 pp., School of marine Science and Technology, University of
Massachusetts-Dartmouth, New Bedford, MA.
7. IOC, IHO and BODC, 2003, "Centenary Edition of the GEBCO Digital Atlas", published
on CD-ROM on behalf of the Intergovernmental Oceanographic Commission and the
International Hydrographic Organization as part of the General Bathymetric Chart of the
Oceans; British Oceanographic Data Centre, Liverpool.
8. G. D. Egbert and S. Y. Erofeeva, Journal of Atmospheric and Oceanic Technology, 2002,
19 (2), 183-204

ACKNOWLEDGMENTS
The authors would like to express their gratitude to the HPC services from the Thai National
Grid Center and Large-Scale Simulation Research Laboratory of National Electronics and
Computer Technology center for providing computing resources for this project
E00004
March 23-26, 2010
356
Multiphysics Analysis of Gas Turbine Blade Cooling using
Computational Fluid Dynamics (CFD)

A. Srimungkala
1
, P. Dechaumphai
2
, V. Juntasaro
3c

1
Department of Mechanical Engineering, Faculty of Engineering, Kasetsart University, Thailand

2
Department of Mechanical Engineering, Faculty of Engineering, Chulalongkorn University, Thailand
3c
Department of Mechanical Engineering, Faculty of Engineering, Kasetsart University, Thailand
E-mail: fengvrj@ku.ac.th Fax: +66-02579-4576; Tel. +66-892017160

ABSTRACT
The computational fluid dynamics (CFD) simulation on a gas turbine blade is important for
the life prediction of the turbine blade. However, the flow characteristic occurred on the
first stage of the gas turbine blade is complicated and difficult to predict due to the
combined effects of turbulence, complex geometry and multiphysics phenomena. This
paper presents the fluid-thermal analysis on the gas turbine blade. The analysis concerns
the flow and heat transfer of the internal cooling system inside the turbine blade by
sepearate calculation in 4 components. The predicted results obtained from this study can
be further used to estimate the life of the coating and the life of the blade.

Keywords: Computational Fluid Dynamics (CFD), Turbine Blade Cooling, Gas Turbine,
Aerothermal Analysis, Heat Transfer.

1. INTRODUCTION
Gas turbines are playing an increasingly important role throughout the industrialized
world. While these engines are most notably used for aircraft propulsion and land based power
generation, they are also used for marine propulsion, and a variety of other industrial applications.
As the demand for power, in the form electricity or thrust, continues to increase, engineers must
develop engines to meet this demand. The power output can be increased by raising the
temperature of the gas entering the turbine. However, increasing the gas temperature must be
done cautiously. The temperature of this hot mainstream gas is limited by the turbine
components, namely the turbine blades and vanes. The extremely hot gases create excessive
thermal stresses and result in premature failure of a blade or vane which is detrimental to the
operation of the engine.
Various cooling techniques have been implemented in the engine design to increase the
life of the turbine components. Air is extracted from the compressor (air which has not passed
through the combustor) and injected into the blades and vanes of the turbine. This cooling air
passes internally through the components. This coolant air removes heat from the blade before it
is expelled out of the blade through discrete holes, known as film cooling holes. This relatively
cool air forms a protective film on the surface of the blade protecting the blade from the hot
mainstream gas. Sophisticated cooling techniques must be employed to cool the components to
maintain the performance requirements. A widely used method for cooling turbine blades is to
bleed lower-temperature air from the compressor and circulate it within and around each blade.
The coolant typically flows through a series of straight ducts connected by 180 turns and
roughened with ribs to enhance heat transfer. Fig. 1.1 shows the basic concept of common
cooling technique in gas turbine. The channel bend complexes the flow physics, the presence of
turbulators (rib) adds a further complexity since these turbulators produce complex flow fields
E00004
March 23-26, 2010
357
such as flow separation, reattachment and secondary flow, which produce a high turbulence level
that leads to high heat transfer coefficients.

Figure 1 Conceptual View of Internal Cooling Passage in a Gas Turbine

The aim of this work is to predict internal cooling flow in a gas turbine blade by using CFD
to compute component of cooling gas turbine blade. Fourth test case are separate for computation.
Since the experiment of cooling gas turbine blade system does not available. By compare the
result of CFD computational and separate part experimental. Accuracy of fluid flow and heat
transfer analysis will be considered the problem of predicting the life of gas turbine blade.

2. Governing Equations
Transport Equations
For steady state turbulent flows, the averaged governing equations in an arbitrary
coordinate system are written as

( )
u =0
i
x
i
(1)

( )

P ij
' '
u u =- + + -u u
i j i j
x x x x
j i j j

(2)

Where P is the static pressure and the stress tensor
ij
is given by

u
u u
2 j
i l
= + -
ij ij
x x 3 x
j i l

(3)

The Reynolds stresses,
' '
-u u
i j
, must be modeled in order to close the momentum
equations. The most popular method is to employ the Boussinesq hypothesis to relate the
Reynolds stresses to the mean velocity gradients:
E00004
March 23-26, 2010
358
u
u u
2 j
' ' i l
-u u = + - k+
t t
i j ij
x x 3 x
j i l

(4)

3. NUMERICAL METHOD
The computational fluid dynamics software FLUENT is employed in this work. The
calculation domain is divided into discrete control volumes by the unstructured grid, which has a
high flexibility to fit the complex geometry. The governing equations are discretized by using the
finite volume method. The pressurevelocity coupling is achieved through the SIMPLE
algorithm. The discretized equations are solved using pointwise Gauss-Seidel iterations. The
Craft et al.s nonlinear k - turbulence models are implemented separately in the FLUENT using
the user-defined function.

The first test case is the fully-developed turbulent flow through a rotating square duct.
The results are compared with the DNS data Martensson (2005) at Reynolds number of 4400
based on the bulk velocity and the hydraulic diameter. Figure 2 show the duct with the reference
axis. The x axis designates the streamwise direction. The cross-stream direction is parallel to the
y axis and the spanwise direction is parallel to the z axis. The rotating numbers is 0 055 . .

Figure 2 Geometry of the fully-developed turbulent flow through a rotating square duct.

The dimensionless mean streamwise velocity profiles at 0 5 z / h . = at 0 055 Ro . =
from Figure 3. Show that the nonlinear k - turbulence models and DES gave the same results
and better than LES model in both case.

Figure 3 The dimensionless mean streamwise velocity profiles 0 055 Ro . = at (a) 0 5 z / h . =
and (b) 0 5 y / h . =
(a) (b)
E00004
March 23-26, 2010
359

The second test case is turbulent flow and heat transfer in rectangular channel with
turbulator. The results are compared with the DNS data of Acharya (1998) at Reynolds number of
14000 based on the bulk velocity and the hydraulic diameter. Figure 4 (a) shows geometry
turbulent flow and heat transfer in rectangular channel with turbulator. Rib turbulator wide (w)
=6.35 mm. Rib turbulator height (h) =6.35 mm. Distance between inlet to rib turbulator =15h.
Distance between rib turbulator to outlet=30h.input uniform heat flux =280W/m
2
.Height
rectangular channel (H) =61mm.
The results are compared with the DNS data from figure 4 (b) Show that the linear k-
model give an accurate result in first range of x/h until x/h=6 from this range k-omega and RSM
is more accuracy.

Figure 4 (a) Geometry turbulent flow and heat transfer in rectangular channel with Turbulator
and (b) Comparison between calculated and measured Nusselt number.

The third test case is turbulent flow stationary and rotating two-pass rectangular channels
(Figure 7) .The results are compared with the experimental data of Iacovides (1996). Upstream
and downstream bend lengths are assumed to be 1.5D and 6D, respectively. The Reynolds
number (Re =U
b
*D/) based on the width of the duct and streamwise bulk velocity is 100,000.
The rotational axis is 4.5D away from the bend axis. The rotation number (Ro =*D/U
b
) is 0.2.
Non-slip boundary conditions are applied for inner, outer and lateral walls.

Figure 7 Geometrical of two-pass rectangular channels

From figure 8 Comparisons of the mean streamwise velocity predictions and
measurements at =0, 90 All turbulent models give the same result from experimental data but
LES is more accuracy at (c) Ro=0.2 and =0
(a)
(b)
E00004
March 23-26, 2010
360

Figure 8 Comparisons of the mean streamwise velocity predictions and measurements
(a) Ro=0 at =0 (b) Ro=0 at =90 and (c) Ro=0.2 at =0

The fourth test case is turbulent flow of the film cooling gas turbine blade. (Figure 9).The
results are compared with the experimental data of Gritsch (1998).The boundary conditions were
chosen to match the experimental test case Gritsch et al. as closely as possible. This study is
carried out for internal Mach number of (Ma
c
=0) and external Mach number of (Ma
c
=0.6). The
total temperature at the primary channel inlet is 540 K and 290 K at the secondary channel inlet.
Thus, the coolant-to-main flow temperature ratio is 0.54, which can be assumed to be more
representative for typical gas turbine applications. The total pressure in the plenum was set to
109,750 Pa, whereas, the total pressure at the main flow inlet is 100,400 Pa, and the static
pressure at the outlet is 68,000 Pa.

Figure 9 Computational domain of a 3-D gas turbine end wall with one fan-shaped cooling hole

(a) (b)
(c)
E00004
March 23-26, 2010
361
The results are compared with the experimental data of Gritsch (1998). The result from
DES and LES is the same and in the same trend of models of linear k-. In case three of models is
quite accurate with experimental data. which show on figure 10.

Figure 10 Comparison of computed centerline film cooling effectiveness

CONCLUSION
A study of multiphysics analysis of gas turbine blade cooling using computational fluid
dynamics are summarized as follows: For fully-developed turbulent flow through a rotating
square duct, a fluid swirl inside square duct made a prediction more difficult, in this case non-
linear and DES is better to use than LES. For turbulent flow stationary and rotating two-pass
rectangular channels, a behavior of fluid at =0 is fully develop flow, =90 a fluid velocity is
increasing high at the inner surface and S/D = 3 a velocity is high at the outter surface, from using
linear k- , LES and DES the result are not different much but LES is more accuracy at Ro=0.2
and =0. For turbulent flow of the film cooling gas turbine blade, the result by using linear k- ,
LES and DES is fairly well from x/d = 4 to x/d = 10.

REFERENCES
1. Craft, T.J., B.E. Launder and K. Suga. 1996. Development and Application of a Cubic Eddy-
Viscosity Model of Turbulence. International Journal of Heat and Fluid Flow. 17: 108-115.
2. Gritsch, M. A. and Schulz, S. Wittig, Adiabatic wall effectiveness measurements of lm
cooling holes with expanded exits, ASME J. Turbomach. 120 (1998) 549556.
3. Cheah, S.C., Iacovides, H., Jackson, D.C., Ji, H., Launder, B.E., 1996. LDA investigation of
the ow development through rotating U-ducts.J. Turbomach. 118 (3),590596.
4. Suga, K., 2003. Predicting turbulence and heat transfer in 3-D curved ducts by near-wall
second moment closures. Int. J. HeatMass Transfer 46, 161173.
5. Gururatana, S., Juttijudata, V., Juntasaro, V. 2006. Evaluation of turbulence models for
combined effects of rotating and secondary flows in square duct. Submitted to ASME Journal
of Fluid Engineering.
6. Acharya, S, Dutta, S., and Myrum, T.A., Heat transfer in turbulent flow past a surface-
mounted two-dimensional rib ASME Journal of Heat Transfer 120 (1998) : 724-734.

ACKNOWLEDGMENTS
This research work is partly supported by the Thai National Grid Project and Electricity
Generating Authority of Thailand (EGAT). The financial supports from the Kasetsart University
Research and Development Institute (KURDI), the Commission on Higher Education and the
Thailand Research Fund (TRF) for the Senior Scholar Professor Pramote Dechaumphai and the
Scholar Associate Professor Varangrat Juntasaro are also acknowledged.
E00005
March 23-26, 2010
362
Computational Study of Totally Enclosed Fan Cooled
System in an Electric Induction Motor

J. Soparat

, C. Benyajati, N. Pitaksapsin, P. Wattanawongsakun and A. Phuchamnong
National Metal and Materials Technology Center 114 Thailand Science Park, Pahonyothin Rd., Klong 1,
Klong Luang, Pathumthani 12120
E-mail: jenwits@mtec.or.th ; Fax: 02-5646370; Tel. 02-5646500 ext 4357

ABSTRACT
In order to use an electric induction motor to power an automotive vehicle,
consequential heat in a motor is one of key issues. Generally, an induction motor could
be operating under high load for extensive periods. The generated heat in motor can
cause damage on the motor or its parts, subsequently decreasing their useful lifetime.
One of examples is an overheating of electric insulators coated on stator wires. This can
cause an electrical short circuit and a motor failure. A Totally Enclosed Fan Cooled
(TEFC) system is often used in a small size induction motor. Its general configuration
consists of a fan attached on the rear-side of a housing case. A rotation of fan generates
an air flow along the surface of housing case to remove the heat away from the
induction motor. The objective of the current study is to demonstrate that the capacity
of induction motor can be improved by using a higher efficiency fan cooling system.
This study is concerned with the effect of the air flow generated by fan which, in turn,
affects the heat transfer on the induction motor. Main parameters of the TEFC cooling
system that will be considered are fan and end cap geometries. For each case, the effect
of the heat transfer on the induction motor will be simulated. The simulation results will
be compared with those obtained experimentally from the prototype model of a 3
phases 4 poles induction motor rated at 5 horsepower. Finally, the studied parameters
will be discussed to make the comparison of the variation of parameters.

Keywords: Fan cooled, convection, induction motor

1. INTRODUCTION
Due to a demanding nature of various applications, modern class of induction motors
need to come in a compact size while performing with high efficiency and capable of running
at high speed. All these demands indicate a higher working temperature of a stator wiring. As
a result, a computational analysis becomes an important tool in enhancing the working
performance of existing motor models. In a current study, a thermal analysis was carried out
by means of a numerical method. The calculated results were then compared with those
obtained experimentally from an actual induction motor.
The objective of the current study was to demonstrate that the capacity of TEFC induction
motor could be improved by using a higher efficiency fan cooling system. The air flow
generated by fan and its effect on the heat transfer on the induction motor were studied. Fan
and end cap geometries were main parameters of interest. The simulation results were
compared with those obtained experimentally from a prototype model of a 3 phases 4 poles
induction motor rated at 5 horsepower.

Previously, there have been various studies on thermal performance of an electrical motor
by using a thermal network method. [1,2,3] Majority of proposed models showed that a
highest temperature would occur on the wiring on the stator. However, one of limitations for
E00005
March 23-26, 2010
363
such method was that no detail of temperature contribution on motor parts was provided.
Also, parts geometry was not taken into the account either.
These issues become apparent when the convection effect needed to be determined on
motor with finned housing or other configurations. This was shown experimentally by
Farsane et al. [4] who employed a Laser Doppler Anemometry technique to measure air
velocity around the motor housing. It was reported that the shape of motor housing had a
significant effect on both air velocity and a flow pattern along the housing. Thus, a
performance of a motor cooling system could be directly affected by the design of housing
and other parts.


Experimental - induction motor efficiency measurement
An induction motor efficiency testing was carried out according to international
standards, IEEE 112 and IEC 34-2 [5]. Generally, a test motor was connected to a
dynamometer which acted as a controlled load of the system. A load or a torque required to
operate a dynamometer could be varied by means of a current adjustment while the rotational
speed of the motor was kept at a rated value. Thermocouples were also attached on the outer
surface of the housing to monitor corresponding temperature at various locations.
Additionally, an infrared camera was employed to capture overall temperature distribution
pattern at constant interval during the test. Each test was carried out under uniform conditions
until a thermal equilibrium was observed.

Figure 1. setup of the induction motor efficiency measurement

Computational analysis convection simulation on TEFC motor
A computational model for a convection analysis of the induction motor is displayed
in Figure 2. The model mainly consisted of a finned-motor housing case, a fan, and a fan cap.
All components were assembled and placed within a cube-shaped space filled with air. The
total air region size was set to be 2 3 m with 1 m height. The air property was assumed to be
in atmospheric conditions. The air velocity was generated by the movement of a fan which
was mounted at the rear of motor. The elements of the fan and nearby air were set to be the
smallest, approximately 3mm, in order to resolve the steep gradients in the impeller region.
Similarly, all spaces in the vicinity of vessel walls were meshed with a dense grid. Total
meshing was approximately comprised of 400,000 nodes and 2,000,000 elements. The fan
velocity was set at 1,450 RPM in a clockwise direction, in accordance with the experiment. A
heat flux boundary condition was applied on inner surface of motor housing where there are
contacts with a stator core. Referring to the motor efficiency test results, the heating value of
9800 W/m was used. Furthermore, a k- turbulence model was employed in a fluid dynamic
part of the analysis.

E00005
March 23-26, 2010
364

Figure 2. A computational model for a convection analysis of the induction motor


Motor efficiency testing results
Total heat loss occurred in an induction motor could be considered to comprise four
different heat sources i.e. stator, rotor, friction, and stray losses. Variation of losses
determined under three different loading conditions, 100%, 125%, and 150%, is displayed in
Figure 3.

0
200
400
600
800
1000
1200
1400
50 100 150 200
Load (%)
L
o
s
s

(
W
)
Stator
Rotor
Friction
Stray

Figure 3. Losses in an induction motor at 100% , 125% and 150% of rated loads

Resulting total loss was found to be directly proportional to the applied load. It could be seen
from the results that majority of heat came from stator and rotor components. The values
included resistive losses occurred in copper wiring and induction bars for stator and rotor
respectively. These obtained heat losses were used as a boundary condition for the thermal
analysis. On the other hand, friction loss seemed to be minimally affected by load variation.

Computational Results
The results from the computational analysis mainly consisted of a resulting air flow
pattern around the motor and a corresponding temperature distribution on the outer surface of
the housing. Three different cases were considered: normal motor configuration, fan blade
number variation, and end cap variation. For the initial case in which a motor used in the
calculation was identical to that employed in the experiment, the calculated air flow pattern is
shown in Figure 4. It can be seen that the air velocity was relatively high near the tip of fan
blades and around the gap between the end cap and the housing.
Furthermore, a velocity vector plot indicated that a movement of the fan drew the surrounding
air through the end cap before propelling it onto the housing surface towards the front end of
the motor (Figure 4b). Areas of circulating air flow were also determined in the frontal part of
the motor. This might explain a relatively high temperature zone observed in this area via the
thermal camera (Figure 5).
E00005
March 23-26, 2010
365

a. velocity contour b. velocity vector
Figure 4. The computational velocity of original model

Figure 5. The measured temperature of the original model at 100% rated load.

a. measurement position b. the comparison of velocity
Figure 6. The air velocity along the housing between the fins

The profiles of generated air velocity along the housing between the fins from three different
locations are displayed together in Figure 6. Generally, the air velocity was relatively high
coming out from the end cap at the rear side of the housing before rapidly dropping to a much
lower value and steadily decline towards a front part of the housing. However, the velocity
profile from bottom part of the motor (location 3) seemed to remain high for a longer distance
compared to other locations. This could be due to a tunnel-liked air path that the motor
housing formed with the floor. While the velocity profile at the top part of the motor (location
1) displayed another significant dip around half way through the housing. This was most
likely because of the presence of a mounting holder for a carrying hook. It effectively acted as
an air flow blockage. Furthermore, a good agreement could be seen from a velocity profile
comparison between calculated and measured results at location 2 as shown in Figure 7.

3
2
1
E00005
March 23-26, 2010
366

Figure 7. The velocity comparison between calculated and measured of the original model

A comparison of resulting temperature distribution on housing surface obtained via
computational and experimental means is shown in Figure 8. It can be seen that the
computational results displayed a distribution pattern in a similar fashion to that observed in
the experiment despite some discrepancies in terms of temperature values. This could be
supported by a temperature profile comparison between computational and experimental
results as shown in Figure 9 for location 2. Good agreement between two sets of results can
be clearly seen. A discontinuity seen in the experimental temperature profile was due to
technical difficulty of an infrared camera to measure the temperature of reflective surface
such as the adhesive tapes that were used to attach the thermocouples to the housing surface
and all the wirings

Figure 8. the temperature distribution comparison between simulation and measurement

Figure 9. The temperature profile comparison at location 2 of the original model

After it has been verified that the computational model shown in Figure 2 could be used to
satisfactory predict the convection phenomena occurred on TEFC induction motor, the effect
of a number fan blades on corresponding heat transfer was studied. The original fan was
modified using CAD commercial software such that five different numbers of fan blade were
studied i.e. 3, 5, 8, 10, and 15 blades. The calculated air velocity and surface temperature
profiles along the housing for each case are shown in Figure 10. All the results were taken
from location 2. For all number of fan blades considered, air velocity results displayed a
similar trend to that obtained from the original case of 10 blades. However, small difference
between each set of results could be seen such that the higher the number of blades, the higher
E00005
March 23-26, 2010
367
the resulting air velocity. Similarly, the small difference between temperature results could
also be seen. A higher number of blades yielded a slightly lower surface temperature of the
housing. Furthermore, a significant difference between the temperature at the front and the
rear section of the housing can be seen. This was most likely due to the rapid drop of air
velocity profile as the air flowed from the end cap to the front section of the housing as
explained earlier.

Figure 10. The temperature profile comparison of the variation of the number of fan blade at
3, 5, 8, 10 and 15 blades

In order to study the effect of end cap on heat transfer of induction motor, the end cap itself
was removed from the computed model. The determined air flow and temperature results
were compared with those obtained from the original case as shown in Figure 11. The
distinction between with and without end cap can be clearly observed. Without the end cap,
there seemed to be almost no air flowing along the housing compared to the case in which
there was an end cap present. This could be displayed graphically by the streamline plot
shown in Figure 12. Without the end cap, the air was just propelled outwards in radial
direction rather than being guided to the housing. Thus, this resulted in a much higher housing
surface temperature.

a. velocity profile b. temperature profile
Figure 11. The velocity profile and temperature profile comparison between original model
and no end cap model

a. no end cap b. original
Figure 12. the streamline generated by the rear fan between the original model and no
end cap model
E00005
March 23-26, 2010
368

5. CONCLUSIONS
A computational study was carried out to investigate the convective heat transfer
phenomena in an electric induction motor. A 3 phases 4 poles TEFC induction motor rated at
5 horsepower was chosen as a model of interest in this study. Comparison between the
analyzed results and those observed from the experiment, under the same settings, was found
to be satisfactory. A small discrepancy presented, especially for corresponding temperature
distribution, was probably due to a lack of knowledge on an accurate heat source inside the
motor. Nonetheless, it was demonstrated that the resulting air flow pattern was very important
while the shape of the housing could also have a significant effect on the flow pattern.

An analytical study on the effect of fan blade showed that by number of fan blade seemed to
have only slight effect on the convection process. Another possible relevant parameter that
would probably be interesting to investigate further is the shape of the blade itself.
Furthermore, the importance of the end cap as part of the heat removal system in the TEFC
configuration was highlighted. It was the geometry formed between the end cap and the
housing that generate an air flow sufficiently needed to cool the motor.

Even though no practical recommendation regarding a motor design could be drawn at
present, a computational approach has been shown as an effective tool in investigating the
heat transfer process of the induction motor. It is therefore strongly recommended to be
employed as a main designing tool for either inventing a new type of motor or improving the
performance of the existing ones.

REFERENCES

1. Y. Huai, RV.N. Melnik, P.B.Thogersen, Computational Analysis of Temperature Rise
Phenomena in Electric Induction Motors. Applied Thermal Engineering 2003, 23,
779-795.
2. M.K. Yoon, S. K. Kauh, Thermal Analysis of a Small, Totally Enclosed, Fan-Cooled
Induction Motor, Heat Transfer Engineering 2005, 26(4), 77-86.
3. J. Bellettre, V. Sartre, F. Biaist, A. Lallemand, Transient State Study of Electric
Motor Heating and Phase Change Solid-Liquid Cooling. Applied Thermal
Engineering, 1997, 17 (1), 17-31
4. K. Farsane, P. Desevaux, P.K. Panday, Experiment Study of the Closed Type Electric
Motor, Applied Thermal Engineering 2000, 20, 1321-1334.
5. Anonymous, IEEE Standard Test Procedure for Polyphase Induction Motors and
Generators, IEEE Std 112-1996, The Institute of Electrical and Electronics
Engineers, USA, 1996
6. Y.Jaluria, Design and Optimization of Thermal Systems, 2
nd
Edition, CRC Press,
USA, 753
7. S. Poncet, E. Serre, High-Order LES of Turbulent Heat Transfer in a Rotor-Stator
Cavity, International Journal of Heat and Fluid Flow, 2009, 30, 590-601.
8. J. Saari, Thermal Analysis of High-Speed Induction Machines, Electrical Engineering
Acta Polytechnica Scandinavica, 1998, 90.

ACKNOWLEDGMENTS

The author specially thanks the National Metal and Material Technology Center (MTEC)
and KMITL and Thammasat University for all of supports during this study
E00006
March 23-26, 2010
369
Numerical Simulation of Two-Phase Flows and Heat
Transfer in Continuous Steel Casting Process

T. Mookum
1
, B. Wiwatanapataphee
1,C
, Y.H. Wu
2
, and S. Orankitjaroen
1

1
Department of Mathematics, Faculty of Science, Mahidol University, 272, Rama 6 Road, Bangkok,
10400, Thailand
2
Department of Mathematics and Statistics, Curtin University of Technology, Perth, WA, 6845,
Australia
C
E-mail: scbww@mahidol.ac.th; Fax: 02-2015343; Tel. 02-2015541

ABSTRACT
In this paper, we develop a mathematical model of lubricant oil-molten steel flows and
heat transfer in the continuous casting process under an electromagnetic force. The
level set method is applied to solve the two-phase flow problem. The influence of two-
phase flows on the transport of momentum is modeled by the addition of the surface
tension force. The complete set of the field equations are established and solved
numerically. The influence of the electromagnetic force on the flow patterns of the
lubricant oil and molten steel, and meniscus shape as well as steel solidification are
presented in the paper.

Keywords: Continuous Casting, Two-Phase Flows, Heat Transfer, Level Set Method.

REFERENCES
1. J. Anagnostopoulos and G. Bergeles, Metall. Mater. Trans. B, 1998, 30B, 1095-1105.
2. J.U. Brackbill, D. Kothe, and C. Zemach, J. of Comp. Phys., 1992, 100, 335-353.
3. K. Cukierski and B.G. Thomas, Metall. Mater. Trans. B, 2007, 39B, 94-107.
4. F-C. Chang, J.R. Hull and L. Beitelman, Metall. Mater. Trans. B, 2004, 35B, 1129-1137.
5. D. Gupta and A.K. Lahiri, Metall. Mater. Trans. B, 1993, 25B, 227-233.
6. D. Lakehal, M. Meier, and M. Fulgosi, Int. J. of Heat Fluid Flow, 2002, 23, 242-257.
7. T. Li, S. Nagaya, K. Sassa, and S. Asai, Metall. Mater. Trans. B, 1994, 26B, 353-358.
8. C.L Lin, H. Lee, T. Lee, and L.J. Weber, Int. J. for Numer. Method Fluids, 2005, 49, 512-
547.
9. R. Miranda, M.A. Barron, J. Barreto, L. Hoyos, and J. Gonzales, ISIJ Int., 2005, 45, 1626-
1635.
10. T. Mookum, B. Wiwatanapataphee, and Y.H. Wu, Int. J. of Pure Appl. Math., 2009, 3,
373-390.
11. H. Nakata and J. Etay, ISIJ Int., 1992, 32, 521-528.
12. C. Ojeda, J. Sengupta, B.G. Thomas, J. Barco, and J.L. Arana, AISTech 2006 Proc., 2006,
1, 1017-1028.
13. S. Osher and J.A. Sethian, J. of Comp. Phys., 1988, 79, 12-49.
14. M. Sussman, P. Smereka, and S. Osher, J. of Comp. Phys., 1994, 114, 146-159.
15. Y.H. Wu and B. Wiwatanapataphee, Discrete and Continuous Dynamic System-Series B,
2007, 8, 695-706.

E00007
March 23-26, 2010
370
Forecasting Tropical Cyclone Movement by Neural Network

W. Kanbua
1 , C
, C. Khetchaturat
2
and K. Visuthsiri
3

1
Marine Meteorological Center, Thai Meteorological Department, Bangkok 10260, Thailand
2
Department of Mathematics, Faculty of Science, Kasetsart University, Bangkok , Thailand
3
Department of Mathematics, Faculty of Science, Mahidol University, Bangkok , Thailand
C
E-mail: wattkan@gmail.com ; Fax: 023669375 ; Tel. 023994561

ABSTRACT
In this study a decision-making for forecasting results of tropical cyclone movements
by using neural network which has been adopted. An artificial neural network (ANN) as
a learning mechanism for track prediction was developed through training with the huge
samples of tropical cyclone tracks from 1945 to 2008 The prediction of tropical cyclone
movement is to consider how to appropriate the physical change of environment, where
tropical cyclone passed through. Parallel comparing analyses for forecasting abilities of
unusual tropical cyclone tracks in recent years by ANN with multi-forecast methods
were conducted. More types of cases were employed in the comparing analyses, which
consist of many tropical cyclones in several years, characterized by a long life over the
ocean and in the mainland with severe effect on Philippines, China, Vietnam and
Thailand, the tropical cyclone tracks with landfall and affected action which were born
in the Northwestern Pacific and the South China Sea, including the 24 hr. ahead and 48
hr. ahead forecast, etc. The results indicated that for forecasting ability of tropical
cyclone tracks, especially for tropical cyclones movement, decision-making developed
in this study was computed from the tropical cyclone data,1945 - 2009 and tested using
the tropical cyclone forecast verification data from 2001 - 2009. This model has been
very successful for predicting cyclone movement in the Thailand area of responsibility.

Keywords: Neural Network, Tropical Cyclone, GTS, Tropical Cyclone Movement.

1. INTRODUCTION
Prediction of the tracks of tropical cyclones is one of the most difficult and challenging
problems of current international tropical cyclone research. Many places around the world are
exposed to Tropical Cyclones (TC) and associated storm surges. The focal point of this
research is to minimize the forecast errors to the extent that the forecast can be used
effectively for issuing appropriate warnings for disaster management purposes. In spite of
massive efforts, a great number of people die each year as a result of tropical cyclone events.
To mitigate this damage, improved forecasting techniques must be developed. The level of
importance is reflected in the large number of forecast techniques that have been developed
using wide range of approaches, from empirical through statistical and dynamical. However,
due to complexities of the problem, no single technique has proven to have outstanding
performance relative to the others. The technique presented here uses artificial neural
networks to predict tropical cyclone movement. A multi-layer neural network, resembling the
human visual system, was trained to forecast the movement of cyclones based on history best
tracking. The trained network produced correct directional forecast of test TC tracking, thus
showing a good generalization capability. The results indicate that multi-layer neural
networks could be further developed into an effective tool for tropical cyclone track
forecasting using various types of models.

In this paper we have used Artificial Neural Network (ANN), it is a parallel and dynamic
system of highly interconnected interacting parts based on neurobiological models. Here the
E00007
March 23-26, 2010
371
nervous system consists of individual but highly interconnected nerve cells called neurons.
These neurons typically receive information or stimuli from the external environment. Similar
to its biological counterpart, ANN is designed to emulate the human pattern recognition
function through parallel processing of multiple inputs i.e. ANN have the ability to scan data
for patterns and can be used to construct non-linear models. Multi-Layer Perceptron ANN
have become widespread in recent years. Three layer networks with sufficient number of
hidden nodes are usually applied due to the continuity of the relevant function. Every network
contains an appropriate number of input and output nodes which is equal to the number of
input and output variables, and the assumed number of hidden nodes. There is no effective
rule for the estimate of the number of hidden nodes.

Input layer
Hidden layers
Output layer
.

.

.
.

.

.
.

.

.
.

.

.

Figure 1. Multi-Layer Perceptron Artificial Neural Network scheme

Multi-Layer Forward Neural Network can explain complex data and more multi-layer, when
the problem has more complex. However test-train process is to combination of weight of
links. One of methods is popular, is Backpropagation.

Y
1
Y
2
Y
3
Y
h
.

.

.
.

.

.
.

.

.
Input layer Hidden layer Output layer
w
jk
x
1
x
2
x
n
z
1
z
2
z
o
w
ij
Index i Index j Index k
Z
1
Z
2
Z
o
e
e
e
e e

Figure 2. Backpropagation Artificial Neural Network scheme

The backpropagation algorithm works in much the same way as the name suggests: After
propagating an input through the network, the error is calculated and the error is propagated
back through the network while the weights are adjusted in order to make the error smaller.
When I explain this algorithm, I will only explain it for fully connected ANNs, but the theory
is the same for sparse connected ANNs. Although we want to minimize the mean square error
for all the training data, the most efficient way of doing this with the backpropagation
algorithm, is to train on data sequentially one input at a time, instead of training on the
combined data. However, this means that the order the data is given in is of importance, but it
also provides a very efficient way of avoiding getting stuck in a local minima.
E00007
March 23-26, 2010
372

I will now explain the backpropagation algorithm, in sufficient details to allow an
implementation from this explanation:

First the input is propagated through the ANN to the output. After this the error
k
e on a
single output neuron k can be calculated as:

k k k
y d e ------------------------------(1)

Where
k
y is the calculated output and
k
d is the desired output of neuron k . This error value
is used to calculate a
k
value,
which is again used for adjusting the weights. The
k
value is calculated by:

) (
k k k
y g e ------------------------------(2)

Where g is the derived activation function. The need for calculating the derived activation
function was why I expressed the need for a differentiable activation function.

When the
k
value is calculated, we can calculate the
j
values for preceding layers. The
j
values of the previous layer is calculated from the
k
values of this layer. By the
following equation:

K
k
jk k j j
w y g
0
) ( -----------------------------(3)

Where K is the number of neurons in this layer and is the learning rate parameter, which
determines how much the weight should be adjusted. The more advanced gradient descent
algorithms does not use a learning rate, but a set of more advanced parameters that makes a
more qualified guess to how much the weight should be adjusted.

Using these values, the w values that the weights should be adjusted by, can be
calculated by:

k j jk
y w ------------------------------(4)

The
jk
w value is used to adjust the weight
jk
w , by
jk jk jk
w w w and the
backpropagation algorithm moves on to the next input and adjusts the weights according to
the output. This process goes on until a certain stop criteria is reached. The stop criteria is
typically determined by measuring the mean square error of the training data while training
with the data, when this mean square error reaches a certain limit, the training is stopped.
More advanced stopping criteria involving both training and testing data are also used.

3. EXPERIMENTAL
We have used backpropagation algorithm in training data in order to construct
equation system and use input and output nodes as the following:

E00007
March 23-26, 2010
373
Table 1. Input and output layer
Input layer Output layer
Latitude 0 hr. Latitude +18 hr.
Latitude +6 hr. Longitude +18 hr.
Latitude +12 hr. Max Wind Speed +18 hr.
Longitude 0 hr.
Longitude +6 hr.
Longitude +12 hr.
Max Wind Speed 0 hr.
Max Wind Speed +6 hr.
Max Wind Speed +12 hr.

The parameters composed of input nodes amount as 9, hidden nodes amount as
18. The activation function is hyperbolic Tangent Function. The output nodes amount
is 3 and activation Function is linear function. The iteration is 300 and learning rate is
0.0010.

In this paper we have used tropical cyclone tracks data in Western Pacific Ocean
area in figure 3. The input layer as latitude, when it was trained and tested, there are
training set and testing set is shown in figure 4. The Root Mean Square Error (RMSE)
of trained data is 0.5438 and The Root Mean Square Error (RMSE) of tested data is
0.6866.

a) Tropical cyclone tracks year 2008 a) Tropical cyclone tracks year 2009
Figure 3. Tropical cyclone tracks in Western Pacific Ocean and Northern Indian Ocean

(a)

(b)
Figure 4. Time series of training set (a) and testing set (b)
E00007
March 23-26, 2010
374

(a)
(b)
Figure 5. Scatter plot of training set (a) and testing set (b)

The input layer as longitude, when it was trained and tested, there are training set
and testing set is shown in figure 6. The Root Mean Square Error (RMSE) of trained
data is 0.6644 and The Root Mean Square Error (RMSE) of tested data is 1.0128.

(a)

(b)

(a) (b)

The input layer as Max Wind Speed, when it was trained and tested, there are
training set and testing set is shown in figure 6. The Root Mean Square Error (RMSE)
of trained data is 6.86565 and The Root Mean Square Error (RMSE) of tested data is
10.2909.

E00007
March 23-26, 2010
375

(a)

(b)

(a) (b)

Table 2. Track forecasting example
Row Latitude Longitude Date Wind speed Status
10 13.40 112.40 04/16/06Z 65 Typhoon
11 13.70 112.40 04/16/12Z 75 Typhoon
12 14.70 112.30 04/16/08Z 75 Typhoon
13 15.20 112.10 04/17/00Z 80 Typhoon

Table 2. describes tropical cyclone (TC) information which row 10 is 0 hr., row11 is +6
hr., row12 is +12 hr. and row13 is +18 hr. which we want to forecast location of TC in the
future.

Figure 10. The application software interface.

E00007
March 23-26, 2010
376

Table 3. A table compare the results from forecasting with the real data.

Data +18 hr. Real Data Forecasting
Latitude 15.20 16.02
Longitude 112.10 112.93
Max Wind Speed 80 69.43

Table 3. shown tropical cyclone (TC) forecasting location in 18 hr. ahead which
forecasted latitude, longitude and maximum wind speed after that we have attempted to
compare computing and observing.

5. CONCLUSION
Tropical cyclones (TCs) are fundamental to the everyday weather of the mid-
latitudes. They provide essential rainfall for human activities such as agriculture, but can also
cause large amounts of damage by their strong winds and heavy precipitation. It is therefore
very important that these tropical cyclones are predicted as accurately and as far in advance as
possible by number of forecast techniques that have been developed using wide range of
approaches, from empirical through statistical and dynamical. However, due to complexities
of the problem, no single technique has proven to have outstanding performance relative to
the others. In this paper used soft computing approach as neural network technique to propose
in order to forecast tropical cyclone movement. The aim of this paper is to explore the
prediction of tropical cyclones by ANN. When we compare computed output with observing
best tracking, the results is good enough and reasonable for direction of movement of tropical
cyclone but wind speed forecasting output is not good enough to be used properly. The
reasons are the large-scale circulation is a key factor in determining the movement of TCs,
wind data at mid-troposphere correlate best with both the direction and speed of TC
movement. Future work includes extension of the present network to handle a wide range of
tropical cyclones and to take into account supplementary information, such as wind speeds,
water temperature, humidity, and air pressure.

REFERENCES
1. B.B. Nasution, A.I. Khan, A Hierarchical Graph Neuron Scheme for Real-Time Pattern
Recognition, IEEE Transactions on Neural Networks, vol 19(2), 212-229, Feb. 2008
2. Siegelmann, H.T.; Sontag, E.D. (1991). "Turing computability with neural nets". Appl.
Math. Lett. 4 (6): 7780.
3. "Global Guide to Tropical Cyclone Forecasting: chapter 2: Tropical Cyclone structure".
Bureau of Meteorology. 2009-05-07. Retrieved 2009-05-06.
4. Neumann, Charles J. "Worldwide Tropical Cyclone Tracks 1979-88". Global Guide to
Tropical Cyclone Forecasting. Bureau of Meteorology. Retrieved 2006-12-12.
5. Hurricane Models Information" Hurricane Alley, 2008, webpage: HurricaneAlley-models.
6. Summary of the NHC/TPC Tropical Cyclone Track and Intensity Guidance Models.

ACKNOWLEDGMENTS
I would like to express my sincere gratitude and deep appreciation to Director General of Thai
Meteorological Department for guidance, invaluable advice, supervision and encouragement
throughout this research which enabled me to complete this research successfully. He was
never lacking in kindness and support.

E00008
March 23-26, 2010
377
Analysis of Coastal Erosion by Using Wave Spectrum

W. Kanbua
1 , C
, C. Khetchaturat
2
and S.Chuai-aree
3

1
2
3
Department of Mathematics & Computer Science, Faculty of Science & Technology,
Prince of Songkla University, Pattani Thailand
C
E-mail: wattkan@gmail.com ; Fax: 023669375 ; Tel. 023994561

ABSTRACT
Waves are the main factor inducing coastal erosion. The main objectives of this study
are to investigate coastal erosion in a short period. Wave spectrum is used for this
purpose. Wave spectrum output was taken in 1997 2009. The wave spectra refraction
patterns are modeled from WAM-cycle4 model. The simulations of model and radon
transform were used to model the shoreline change. The rate of shoreline change was
modeled with a different date. The vectors of shoreline were used to find also the rate of
shoreline change. The different methods used to detect shoreline change were compared
by using a statistical model. The results show that the wave spectra extracted from
WAM-cycle4 model which resolution is 0.083x0.083 degree. The different wave
spectra patterns induced a different pattern of wave refraction. The rate of erosion
modeled from wave spectrum data is reasonable. The erosion has a good correlation
with wave spectra model and vectors of shoreline rate change. The coastal erosion
occurred due to the change of wave spectra refraction pattern. The wave spectrum could
be used as method to detect a shoreline change.

Keywords: Coastal erosion; Ocean Wave Modeling; Shallow water, WAM-cycle4,
Wave spectra.

1. INTRODUCTION
Ocean waves are produced by the wind. The faster the wind, the longer the wind blows,
and the bigger the area over which the wind blows, the bigger the waves. In designing ships
or offshore structures we wish to know the biggest waves produced by a given wind speed.
Suppose the wind blows at 20m/s for many days over a large area of the North Atlantic. What
will be the spectrum of ocean waves at the downwind side of the area. It is important to
realise that the spectra presented in the section are attempts to describe the ocean wave
spectra in very special conditions, namely the conditions after a wind with constant velocity
has been blowing for a long time. A typical ocean wave spectrum wil be much more
complicated and variable. For example it may have two peaks, one from distance swell and
the other generated by the local wind. The concept of a wave spectrum can be quite abstract
and is described in Waves and the Concept of a Wave Spectrum. The wave spectra
information was then used to model shoreline changes by investigating the wave refraction
patterns. From these patterns, the volume transport at several locations was estimated. The
location of sedimentation and erosion along the shoreline of the gulf of Thailand was
estimated. The wave spectra extracted from ocean wave analysis model data showed
wavelengths ranging. The main direction of the waves given by the spectra was from the
upper Gulf. The wave refraction patterns varied, showing both convergence and divergence,
indicating erosion and sedimentation locations, respectively. Finally, the regression model
showed that erosion occurred.

E00008
March 23-26, 2010
378
Wave spectra derived from WAM-TMD model, The WAM-model is a third generation
wave model which solves the wave transport equation explicitly without any presumptions on
the shape of the wave spectrum. It represents the physics of the wave evolution in accordance
with our knowledge today for the full set of degrees of freedom of a 2d wave spectrum.

= V +
c
c
i g
S E C
t
E

(1)

bt ds nl in i
S S S S S + + + =
(2)

where E is the two-dimensional wave spectrum. The value of E depends on the spatial
coordinates x and y , the temporal variable t , the frequency domain f , and the direction
domain u . The parameter
g
C
is the group velocity function of variables f y x , , and u . The

parameter
i
S defines source/sink terms, such as the atmospheric input
in
S , the nonlinear
wave-wave interaction
nl
S , the high frequency dissipation
ds
S ,and the bottom friction
bt
S .
The goal is to solve for the time rate of change in E , or the directional spectra in a prescribed
gridded system.
The bottom may induce wave energy dissipation in various ways : e.g. friction,
percolation (water penetrating the bottom) wave induced bottom motion and breaking.
Outside the surf zone, bottom friction is usually the most relevant. It is essentially nothing but
the effort of the waves to maintain a turbulent boundary layer just above the bottom. Several
formulation have been suggested for the bottom friction. A fairly simple expression, in terms
of the energy balance is due to [51] in the JONSWAP project:

2
2
( , ) ( , )
sinh
bt
S F
g kd
e
e u e u = I (3)

Where I is an empirically determined coefficient.

The wave energy spectral density ) ( f E or simply the wave spectrum may be
obtained directly from a continuous time series of the surface ) (t n . Using a Fourier analysis,
the wave profile time trace can be written as an infinite sum of sinusoids of amplitude
n
A ,
frequency
n
e , and relative phase
n
c , that is

0
0
( ) cos( )
cos sin
n n n
n
n n
n
t A t
a n t b n t
n e c
e e
=
=
= +
(4)

E00008
March 23-26, 2010
379

Figure 1. A schematic for a two-dimensional wave spectrum ( , ) E f u

The coefficients
n
a and
n
b in the above equation may be determined explicitly from
the orthogonality properties of circular functions. Note that a0 is the mean of the record.
Because real observations are of finite length, the finite Fourier transform is used and the
number of terms in the summation is a finite value.
By an intuitive extension of this simple wave, the variance of a random signal with
zero mean may be considered to be made up of contributions with all possible frequencies.
We thus can find a random signal as

2
2
0
1
0
( )
2
n
n
a
E f df m
n
o

=
= = =

(5)

where
0
m is the zero-th moment of the spectrum. Physically,
0
m represents the area under
the curve of ( ) E f . The area under the spectral density represents the variance of a random
signal whether the one-sided or two sided spectrum is used.
The moments of a spectrum can be obtained by

0
( ) 0,1, 2,...
i
i
m f E t df i
= =
(6)

Figure 2. Sketches of wave spectral energy and energy density

We now use the definition of the variance of a random signal (Equation (5)) to define
the significant wave height. As stated earlier, this gives an estimation of the significant wave
E00008
March 23-26, 2010
380
height using the wave spectrum. For Rayleigh distributed wave heights,
s
H may be
approximated by

0 0
3.8 4
s
H m m = ~ (7)

Therefore, the zero-th moment
0
m , the total area under the wave energy density
spectrum, determines the significant wave height for a given ) ( f E .
Mathematical model of shoreline changes is based on the rate changes of the sediment
volume. Mathematical model utilized several types of data. In order to predict the shoreline
changes over one hundreds years or more, the probability distribution function of the different
sources data was used. This was done under the following assumptions:
- Human activities was neglected
- The time interval between erosion and sedimentation events begin equaled or
exceeded
- The shoreline change occurred seasonally.
- The shoreline change rate for different period are estimated from the following
relation:

B Ay R + = (8)

Where Y is given by Weibull distrubiton.

} {
k
ST Y
/ 1
) ( ln =

Where S is number of erosion and sedimentation events per year, T is return period
(years) and k is the length of records by years. The probability of erosion and sedimentation
occurrence could be expressed as a percentage change of occurrence. This could be given by
the following

| |
L
T P ) / 1 1 ( 1 100 = (9)

Where P is the chance of erosion and sedimentation occurrences and L is time. The model
has been used to estimate return period of the wave percentage occurrence. In its study we
have used the above model to predict shoreline change over one hundreds year.

3. EXPERIMENTAL
One of the area reported to experience rapid erosion is the coastline of Thailand. This area
is exposed to highest wave during the north-east monsoon compare to south-west and
transitional period. The study area is located in the Gulf of Thailand and Andaman Sea
between latitude 5 N to 15 N and longitude 95 E to 105 E.
Bathymetry grid is taken from ETOPO5 covering the region 95E to 105E and 5N to
15N (see Figure 1.) with 0.083 degree resolution in both latitude and longitude (121 x 121
grids). The initially employed wind data (from NOGAPS Model archives) were provided by
the NRLMRY. The winds are from the period 00Z 1-8-97 to 00Z 31-12-2009, with 1.0
resolution and are linearly interpolated to specify wind components at each wave grid point.
Wave data were obtained from 3 moored buoys of GISTDA (HHN, KCH, and KSI stations)
and 1 automatic marine meteorological station (UNC station).

E00008
March 23-26, 2010
381

Figure 3. Study area

Figure 4. Wave fields on November 3
rd
, 1997 every 6 hours (WAM output)

E00008
March 23-26, 2010
382

(a) (b)
Figure 5. Wave spectrum at UNOCAL Platform

(a) (b)
Figure 6. Wave spectrum at Koh Chang

Figure 7. Wave spectrum near coastline in the Gulf of Thailand

E00008
March 23-26, 2010
383

Figure 8. Wave spectrum near coastline in Andaman Sea.

Wave spectra peaks have different size and direction at different seasons. The wave
spectra peaks normally change its directions along coastline of the east side of southern part
of Thailand during the northeast monsoon and the southwest monsoon effect along coastline
the west side of southern part of Thailand. The shoreline change modeled from ship
observation data and found to varies seasonally. During the north-east monsoon, the erosion
events occurred along coastline of the east side of southern part. The south-west monsoon
caused erosion along upper gulf, eastern part and the west side of southern part. Due to the
wave energy is higher during the north-east monsoon and is proportional with square of wave
height. This means that the waves are more destructive during the north-east monsoon. In
addition the probability distribution function shows non change on the cycles of shoreline
change. This means that the beaches are in an equilibrium state with the nature. Changes
caused by the natural forces are usually temporary and beaches could normally recover back
to their original state.

5. CONCLUSION
In conclusion, ocean wave model output showed a good integration with wave spectra
model to detect shoreline change. The statistical model showed shoreline changes are in
equilibrium state with nature. The changes are usually temporary and can normally recover
back to its original state. The study indicates that coastal erosion occurred seasonally during
the northeast monsoon period under wave actions. In contract, sedimentation occurred during
south-west monsoon period and during the internal monsoon period. The work of this study is
linked to the evaluation of coastal processes and to assessing the wider uses of ocean
environments, particularly through the use of data incorporated into coastal.
The work requires the development of validation procedures before wind field data
obtained from a numerical weather prediction model as high resolution limited area model
can be incorporated into ocean wave model, together with the further validation of the wave
data outputs. This validation though is inherently difficult because of the limited sources of
good quality and readily available observational data from the offshore zone. Together with
further validation, future work hopes to include comparative studies with numerical wave
models, and the compilation of significant wave height and wave period climatologies, in
order to study the geographical, seasonal/annual variability and wave spectrum of the ocean
wave field.

E00008
March 23-26, 2010
384
REFERENCES
1. KANBUA,W. ,SUPHARATID,S. and TANG, I. [2005] : Ocean wave forecasting in the
Gulf of Thailand during typhoon Linda 1997: Hard and soft computing approaches,
Journal of Atmospheric and Ocean Science Vol. 10, No. 3, September 2005, 145161
2. Kanbua, W. and Chuai-Aree, S. [2005] : Virtual Wave : an Algorithm for Visualization of
Ocean Wave Forecast in the Gulf of Thailand, KMITL Science Journal Vol.5, No.1
Feb.2005, 140-150.
3. W. Kanbua, I Ming Tang and B. Wiwatanapataphee (2001) : A study on ocean wave
forecasting from typhoon LINDA using WAM model , Proceeding. Of the 5th annual
national symposium on computational science and engineering (ANSCSE5), Bangkok
Conventional Center, Central Plaza, Bangkok, Thailand, June 18 -20, 2001.
4. Gunther, H., K. Hasselmann and P.A.E Janssen, 1992:Report NO.4, The WAM Model
Cycle 4, Edited by Modellberatungsgruppe, Hamburg.
5. Hasselmann, K, 1962: On the nonlinear energy transfer in a gravity-wave spectrum. 1.
General theory, JFM, vol. 12, 481-500.
6. Komar ,P.D.1979.beach processes and sedimentation. Prentice-Hall, New Jersey.
7. Lukman,M.H.Rosman and S.Sand 1995. " Beach Erosion Variability during a North-east
Monsoon: The Kuala Setiu Coastine. Terengganu, Malaysia". J .Pertanika 3(2) pp.337-
348.
8. Maged ,M.M.,M.M Ibrahim and Z.Ibarhim 1997. ERS-1 AND Wave Refraction
Modeling in the South China Sea. Paper Presented in International Marine Science
Conference Assessment and Monitoring of Marine System. August 25-27.Primula Park
Royal Beach Resort. Kuala Terenggenu. Malaysia.
9. Masrtura,S.,1987.coastal Geomorphology of desaru and its Implication for costal Zone
management. Monograph no.13.Bagi : University Kebangasan Malaysia.
10. Mazlan , H.I. Aziz and a Abdullah 1989.Preliminary Evaluation of Photogrammetric-
remote sensing Approach in Monitoring Shoreline Erosion Proceeding of the Tenth Asian
Conference on Remote Sensing. November 23-29 1989. Kuala Lumpur, Malaysia.
11. Raj,J.K 1982 Net Direction and Rates of Present-day Beach Sediment Transport by
Littoral Drift along the East Coast of Peninsular Malaysia. Geol. Soc. Malaysia
bull..1557-82
12. Stanley consultants Inc.1985.Malaysia National Coastal Erosion Study.Vol II Kuala
Lumpur: UPEN
13. Vachon P.W.,K.E.Haroba and J.Scott 1994 Airborne and Space borne Synthetic Aperture
Radar Observation of ocean waves. J.Atmo-ocean 32(10);83-112.
14. Wong,P.P.1981 Beach change on a monsoon coast, Peninsular Malaysia
J.Geol.Soc.Malaysia. 14:47-59.

ACKNOWLEDGMENTS
I would like to express my sincere gratitude and deep appreciation to Director General of Thai
Meteorological Department for guidance, invaluable advice, supervision and encouragement
throughout this research which enabled me to complete this research successfully. He was
never lacking in kindness and support. I am also grateful to GISTDA for support
oceanographic data from moored buoys in the Gulf of Thailand and Andaman Sea which used
in researching this paper.

E00010
March 23-26, 2010
385
Characterisation of Non-linear Viscoelastic Properties via
Indentation Techniques

C. Gamonpilas
1,C
, M.N. Charalambides
2
, and J.G. Williams
2

1
National Metal and Materials Technology Center (MTEC), 114 Thailand Science Park, Paholyothin
Road, Klong 1, Klong Luang, Pathumthani 12120, Thailand

2
Department of Mechanical Engineering, Imperial College London, London, SW7 2AZ, U.K.
E-mail: chaiwutg@mtec.or.th; Fax: 02-5646446; Tel. 02-5646500 ext. 4447

ABSTRACT
A procedure for obtaining stressstrain properties of non-linear viscoelastic materials
from their indentation loading data was evaluated. Bismaleimide adhesive was used as
the test material, and indentation tests were performed at three loading speeds using
two spherical indenters of different sizes. The corresponding indentation loading
curves were fitted with an analytical solution to obtain time-dependent constants and
the instantaneous indentation loading response. The time-dependent function was
assumed to follow the Pronys series. Subsequently, an inverse analysis based on the
Marquardt-Levenberg optimisation algorithm in conjunction with finite element
analysis was implemented to derive the strain-dependent constants from the
instantaneous loading response. The strain dependent function was represented by the
Van der Waals hyperelastic material model. The predictions of the viscoelastic stress
strain properties from the indentation tests were compared to independent
measurements through uniaxial compression tests. It was found that a reasonable
agreement was obtained when indentation data from one indenter was used. The
predictable agreement was enhanced when indentation data from two indenters were
use employed.

Keywords: Indentation, Viscoelasticity, Inverse Analysis, Finite element analysis.

1. INTRODUCTION
One of the means to obtain the mechanical properties of materials is by pushing a probe
into the material in the form of indentation tests. Such technique offers many advantages as it
is relatively simple, quick and does not require preparation of the specimens in any specific
size and shape. Furthermore, indentation curves are not significantly affected by friction
between sample and indenter, as normally found in the case of compression experiments [1,
2]. Indentation tests can also be performed on a large scale using high throughput screening
techniques where large amount of different formulations/recipes can be automatically tested
using such high speed robotic indenters on small samples moving on conveyor belts.
However, there are still shortcomings in analytical solutions that can be easily applied to
convert the indentation response to the stress-strain properties unless indented materials
behave elastically [3]. As a consequence, there is a need for converting indentation load-
depth data, in particular for non-linear materials, into the fundamental material properties
using an inverse parameter identification technique. The availability of such method to
determine the mechanical properties through the indentation test would be very useful since
the mechanical parameters can be used for quality control of the manufacturing process.
The aim of this work is to make use of inverse analysis in conjunction with indentation
test data to derive the stress-strain characteristics of non-linear viscoelastic materials. The
procedure proposed by Goh and co-workers [2] is implemented. The predictive capability is
analysed using indentation data of bismaleimide (BMI) adhesive. Results from the inverse
predictions are validated with independent uniaxial compression data.

E00010
March 23-26, 2010
386
2. EXPERIMENTS
BMI adhesives were prepared in a cylindrical form with a diameter of 9.20 mm and a
height of 10.40 mm. All experiments were performed at 21
o
C and 50% relative humidity.
Indentation tests were performed on an Instron 5543 testing machine using two spherical
indenters of 4.0 and 5.9 mm in diameter at three loading speeds of 0.5, 5 and 50 mm/min. A
1kN load cell was used and the maximum loading depth of 2 mm was applied to these
samples.
Independent mechanical tests were also performed under uniaxial compression loading in
order to validate the suggested inverse analysis methodology (described in later section).
Tests were performed at three constant loading speeds of 0.5, 5 and 50 mm/min using an
Instron 5584 fitted with 500kN load cell. The interface between sample and compression
platens was smeared with Superlube (Loctite Corp.) in order to eliminate the friction [2].

3. VISCOELASTIC MATERIAL MODEL
The two characteristics of the stress-strain properties to be determined are the nonlinear
dependence of the stress on the strain, i.e. the strain-dependent behaviour and the time
dependent behaviour. For simplicity, the material is assumed to be incompressible. During a
step strain relaxation test, it is assumed that the relaxation stress is the product of strain and
time dependent functions as;

o(c,t) =o
0
(c)g(t) (1)
where

o is the stress at true strain

c and time t. The strain dependent function has a
dimension of stress and is assumed to follow the Van der Waals hyperelastic model [4].
During a uniaxial deformation state, the stress is given by
( )
+

|
.
|
\
|
=

2
3 2
3 2 3
3 1
1 2
1 2 2
2
2 0

u o a
m
m
(2)
where u is the shear modulus,

m
and a are dimensionless material constants, and

is the stretch ratio in the direction of loading and is related to the strain as ) exp(c = .
The time dependent function, g(t), is dimensionless and is represented by the Prony series,

g(t) = g
+ g
i
exp
t
t
i
|
\

|
.
|
i=1
N
(3)
where

t
i
are time constants,

g
and

g
i
are dimensionless constants and 1
1
= +
N
i
i
g g .
The Prony series has been used to characterise the time-dependent behaviour of many
materials including rubber [5] and foods [6].

4. INDENTATION FORWARD ANALYSIS
The aim of indentation forward analysis was to predict the indentation load-displacement
response using the constitutive properties as calibrated from compression tests. The
calibration of the constitutive model was performed using a discretised form of the
convolution integral described in [6, 7]. Four (N = 4) exponential terms in the Prony series
were used corresponding to time constants (t) of 0.1, 1, 10 and 100 seconds. The constitutive
properties were then used in finite element simulations [4] of the indentation tests. Four
noded, axisymmetric elements were used to model the BMI sample while the indenter was
modelled as rigid surface. A frictionless condition was prescribed at the interface between the
sample and indenter.

E00010
March 23-26, 2010
387
5. INDENTATION INVERSE ANALYSIS
Goh and co-workers [1] have shown that the indentation load-displacement response
under a constant rate of indentation, V, can be described by
) ( exp
2
3
) , (
1
0
2
3
2
1
t P ds
t
s s t
g g P t h P
s hs
N
i
t
i
i hs s

t
=
|
|
.
|
\
| +
+ =

=
(4)
where s is the time variable, P
hs
is the instantaneous load at the displacement h and
|
|
.
|
\
| +
+ =
N
i
t
i
i s
ds
t
s s t
g g t
1
0
2
3
2
1
exp
2
3
) (
t
. By fitting the indentation load-depth data at
various loading speeds with Eq (4), it is possible to obtain the time dependent constants and
P
hs
. The procedure to obtain the constitutive behaviour makes use of an inverse analysis
which can be summarised as shown in Figure 1. The inverse analysis was performed in two
stages. In the first stage, the instantaneous indentation response P
hs
and the time dependent
constants were calculated by fitting the indentation load-depth data to Eq. (4). Ten loads at
ten equal depths interval up to the maximum depth were chosen to perform the calculations.
Then, the indentation loads for all speeds, e.g. 0.5, 5 and 50 mm/min, at each depth were
divided with the corresponding load ratios for V = 0.5 mm/min at the same depth. This gave
three values of load ratios as functions of ) (s
t
at each depth interval. The dimensionless
terms g
i
(i = 1-4) and
g were then computed using a least square error method by matching

the ratios ) (s
t
simultaneously to the values of the experimental load ratios for all depths.
For these calculations, the time constants were assumed to be known a priori. Since the
integral in ) (s
t
has no closed-form solution, it was computed using the Simpsons rule.

Once the time-dependent parameters were obtained, the instantaneous loads at each depth
increment were calculated by matching the measured indentation loads with Eq. (4) for the
three speeds using the least squares method. This was performed separately for each of the
chosen depth, resulting in ten instantaneous loads to define the relationship between P
hs
and h.
In the second stage, the inverse analysis based on the finite element technique was performed
as there is no closed-form solution that relates the strain dependent parameters, u,
m
and a
with the instantaneous load-depth response. The commercial FE software code ABAQUS [4]
was used. Since u is a scaling factor, the inverse problem could initially be simplified to
determining only
m
and a by dividing the indentation loads with a reference load, P
ref
, to
obtain load ratio values. The inverse analysis uses an optimising algorithm known as the
Levenberg-Marquardt method. The method is described extensively in the work of Schnur
and Zabaras [8] and its use within the context of this work is illustrated by the flow chart
shown in Figure 1. In order to minimise the computation effort in performing FE calculations
within each iteration during the inverse analysis, a reference database was created. The
database contained numerical generated indentation load-depth data up to the depth of 2 mm
with increments of 0.02 for a number of combinations of
m
and a . During the inverse
analysis the values of the force ratio for any combinations of
m
and a were interpolated
between the points in the database. The value of u was fixed at 1 MPa during the generation
of the database. Once
m
and a were obtained, the finite element forces for these parameters
were obtained for u = 1 MPa. The measured forces were then divided with these numerically
generated forced to determine the actual value of u.
E00010
March 23-26, 2010
388

Figure 1. Flow chart summarising the inverse procedure to obtain constitutive constants from
the indentation response of non-linear viscoelastic materials.
Prior to applying the inverse analysis to predict stress-strain properties of the BMI
adhesive from its indentation data, it is necessary to verify that the material parameters
calibrated from the compression data can be used to predict the indentation response at
various loading rates, i.e. indentation forward analysis. The finite element predictions of the
indentation response in the forward analysis are compared to the experimental measurements
in Figures 2(a) and 2(b) for 5.9 mm and 4.0 mm indenters, respectively. The agreement is
excellent for both diameters. The agreement in the forward prediction results supports the use
of the nonlinear viscoelastic model based on the Van der Waals and Prony series to
characterise this adhesive and, hence, that it should be possible to obtain the stress-strain
properties through inverse analysis method.
E00010
March 23-26, 2010
389

(a) (b)
Figure 2. Experimental data and finite element forward predictions of indentation response
for (a) 5.9 mm and (b) 4.0 mm indenters.
Indentation inverse analysis was performed following two different scenarios. Firstly, only a
single set of indentation data, i.e. data obtained from a single indenter, was used. Using the
average indentation load-depth data corresponding to the 5.9 mm indenter, shown in Figure
2(a), the constitutive behaviour of BMI was predicted following the inverse procedure
outlined in Section 5. The prediction of the relaxation behaviour is plotted in comparison
with the data obtained from the compression test in Figure 3, showing a reasonable
agreement. Furthermore, the predictions of the strain-dependent parameters of BMI, hence
the stress-strain properties, were compared with the monotonic compression stress-strain data
for all three loading rates in Figure 4(a). It is clearly seen that the predictions from the
inverse analysis agree well with the uniaxial compression test data only up to a strain of 0.3.

Figure 3. Predictions of time-dependent behaviour from indentation tests using (a) one
indenter (d = 5.9 mm) and (b) two indenters (d = 4.0 and 5.9 mm).
In order to improve the accuracy of inverse predictions, indentation data corresponding to the
two indenters, e.g. 4 mm and 5.9 mm indenters, were used simultaneously to inversely
determine the constitutive properties. The prediction of the relaxation behaviour is shown in
Figure 3 in comparison with those predicted from the one-indenter and compression tests. It
is seen that the prediction from the two-indenter procedure gave a better fit to the
compression data than that from the one-indenter. Discontinuities in the prediction of g(t)
were observed because the values of g
i
were not constrained to fit with any particular smooth
function. Similarly, predictions of the stress-strain curves using two sets of indentation data
E00010
March 23-26, 2010
390
led to a better fit with the compression data as compared to those obtained from the single-
indenter, as shown in Figure 4(b).

(a) (b)
Figure 4: Predictions of stress-strain behaviour from indentation tests using (a) a single
indenter (d = 5.9 mm) and (b) a double indenter (d = 4.0 and 5.9 mm).

7. CONCLUSION
It is shown in this work that the non-linear viscoelastic properties can be obtained using
the indentation and inverse analysis techniques. The predictive capability of such techniques
was demonstrated using indentation data of bismaleimide adhesive. Although reasonable
agreements between inverse predictions and independent compression test data were found
when indentation data from one indenter was used, the efficiency and accuracy of the inverse
predictions could be enhanced when two sets of indentation data from two indenters were
used. The accuracy of the technique would depend on scatter and possible defects in the test
specimens.

REFERENCES
1. Goh, S. M., Charalambides, M. N., and Williams, J. G., Rheol. Acta, 2004, 44, 47-54.
2. Charalambides, M. N., Goh, S. M., Wanigasooriya, L., Williams, J. G., and Xiao, W., J.
Mater. Sci., 2005, 40, 3375-3381.
3. Johnson, K. L., Contact Mechanics, Cambridge University Press, Cambridge, UK, 1985.
4. ABAQUS User manual Version 6.4, Hibbitt Karlsson and Sorensen Inc, Providence, RI,
2004.
5. Quigley, C. J., Mead, J., and Johnson, A. R., Rubber Chem. Technol., 1995, 68, 230-247.
6. Goh, S. M., Charalambides, M. N., and Williams, J. G., Mech. Time-Depend. Mat., 2004,
8, 255-268.
7. Kaliske, M., and Rothert, H., Comput. Mech., 1997, 19, 228-239
8. Schnur, D. S., and Zabaras N., Int. J. Numer. Methods in Eng., 1992, 33, 2039-2057.

ACKNOWLEDGEMENTS
Support for this work has been provided by ICI plc under the SRF scheme. ABAQUS was
provided under academic license by HKS Inc., Providence, Rhode Island, USA.
E00013
March 23-26, 2010
391
Semi-Solid Die Casting Mold Development Utilizing
CAE Technique

PERAKIT Viriyarattanasak
1,C
, Prof.ANZAI Koichi
2
, Dr.ITAMURA Masayuki
2
and
NAGASAWA Osamu
3

1
National Metal and Materials Technology Center, National Science and Technology Development
Agency,
114 Thailand Science Park, Phahonyothin Rd., Klong 1, Klong Luang, Pathumthani, 12120, Thailand
2
Graduate School of Engineering, Tohoku University, Sendai, 980-8579, Japan
3
Tokyo Rika Co.,Ltd, Fukushima, 961-0835, Japan
C
E-mail: perakitv@mtec.or.th; Fax: 02-5646370; Tel. 02-5646500

ABSTRACT
Semi-solid die casting is a special die casting method which become a role player in
high quality die casting process especially in automobile industry
1)
. Due to its internal
quality of the product has more soundness when compared to other die casting
processes. Semi-solid die casting production process can be divided into 2 steps: Semi-
solid slurry and die casting process; which the quality of completed product is specified
by the quality of those production processes such as mechanical property and so on. To
meet a high quality of product, the numerical method is also used to apply to increase
the reliability whiles decreasing costs and time of development. In this paper, we are
going to present the model of applied computer aided engineering (CAE) with the high
strength parts development process by using semi-solid slurry produced by cup-
method. From the beginning of semi-solid slurry, semi-solid die castings, as well as
validation, flow behavior of semi-solid slurry occurred in the real production process
with the results from a computer simulation based procedure in order to outline the way
of high quality semi-solid die casting parts development.

Keywords: Semi-Solid Die Casting, Cup-Method, Semi-Solid Slurry, CAE.

1. INTRODUCTION
In recent years, there has been increasing demand for high quality castings, die casting
technology development starting from the conventional die casting is progressing to squeeze
casting
2)
and further to the semi-solid die casting. The casting quality basically depends on the
flow characteristics and the solidification phenomena in the cavity. As for the semi-solid
casting, those characteristics or phenomena differ significantly from the others because of its
less liquid phase.
In the conventional die casting, molten metal over the melting temperature is cooled rapidly
under the freezing temperature. In the both of the conventional die casting and squeeze
casting processes; its temperature goes down through the dual phase (liquid/solid phase)
existing zone producing the dendritic structure, while its primary crystalline is different from
its dimension. On the other hand, in the semi-solid process
34)
, the semi-solid slurry produced
during the freezing zone is cast directly to form a product which makes the spheroidal
microstructure possible without any stirring process (Cup-Method)
57)
. In this research, the
real semi-solid casting using a mold of high strength parts is carried out, and then the flow
analysis by commercial software called Adstefan is done to compare the results between
them. The result of flow analysis shows that it is applicable to the practical purpose.

E00013
March 23-26, 2010
392
2. SEMI-SOLID SLURRY PREPARATION (CUP METHOD)
The semi-solid slurry preparation by Cup Method can be carried out easily like pouring
water into a cup. However, proper conditions are required as a pre-step. From a casting
experimental laboratory, a dimension of cup indicates an important factor. In this paper, a
steady state is assumed when consider a cup condition by equilibrium thermal analysis.
When a molten aluminum was poured into a metallic cup, a heat of melt will begin transfer
to a metallic cup. Therefore, the initial temperature of aluminum will decrease below than T
c
.
On the other hand, the initial temperature of a metallic cup will increase from T
m
. Finally, the
temperature of a metallic cup and a molten aluminum will balance and no heat transfer during
a cup and a melt. At this stage, an equilibrium temperature (T
eq
) can be written by a below
equation.

Where as, T
c
is an initial temperature of a molten aluminum, T
m
is an initial temperature of
a metallic cup, f
s
is a fraction of solid and H
f
is a latent heat of solidification divided by a

specific heat. Furthermore, is a dimensionless number equation shown below;

Where as,

is the density of a molten aluminum, C
p
is a specific heat, V
c
is a volume of a
molten aluminum and V
m
is a volume of a metallic cup.
From equation (1) and (2), the equilibrium temperature T
eq
or a specified fraction of solid,
can be defined by a dimensionless number (a heat volume of the initial temperature of a
metallic cup and a molten aluminum divided by a heat volume of a metallic cup and a molten
aluminum). Furthermore, if a material of a metallic cup will be defined in equation (2), is
defined by a volume of a molten aluminum and a cup only.

When T
eq
is reached, the semi-solid state will be kept. However, in the real phenomena, a
heat will lost from a cup surface and a surface of a molten aluminum, therefore, the
temperature should below than T
eq
in equation (1). However, by insulation of a cup surface,
the final temperature can be reached near T
eq
in equation (1). Figure 1 shows the semi-solid
slurry that has a fraction of solid about 50% and can stand by itself.

Figure 1. Appearance of the semi-solid alloy (AC4C)

(1)
(2)
(3)
E00013
March 23-26, 2010
393
3. GATING SYSTEM OPTIMIZATION
Figure 2 (a) shows a conventional die design of high strength parts comprising of four
small gates presented characteristic of a conventional die casting. Figure 2 (b) shows the
optimized die design of high strength parts comprising of one thick gate presented
characteristic of a squeeze casting and the semi-solid die casting.

Figure 2. Comparison of gating system

Commercial casting simulation software named ADSTEFAN v9.0 was carried out to
calculate flow and solidification analysis of two gates with three processes (a conventional die
casting, a squeeze casting and a semi-solid die casting) by using the material properties in
Table 1, the boundary conditions in Table 2 and the initial condition in Table 3.

Table 1. Physical property
Material Density
(g/cm
3
)
Specific Heat

(cal/gC)
Thermal
Conductivity
(cal/cmsC)
Latent Heat
(cal/g)
Liquidus

(C)
Solidus

(C)
AC4C 2.70 0.23 0.370 4693 620 558
ADC12 2.80 0.23 0.240 94 580 520
SKD61 7.80 0.10 0.102 - - -

Table 2. Boundary condition
MaterialMaterial Heat Transfer Coefficient (cal/cm
2
sC)
CastingMold 4000
MoldMold 400
MoldHeater 8000

Table 3. Initial condition
Initial Temperature of Casting (T
0-cast
) 589C 750C
Initial Temperature of Mold (T
0-mold
) 200C
Initial Velocity (V
inlet
) 0.15 m/s 2.0 m/s

Remarks;
(1) Slip-condition
(2) No back pressure consideration
(3) Minimum wall-thickness : 1 mm
(4) Mesh size : 1 mm
E00013
March 23-26, 2010
394

Figure 3. Flow analysis of each casting process

Figure 4. Solidification analysis of each casting process

Figure 3 shows the flow analysis of each casting process. The conventional die casting
gives much turbulence in flow pattern accompanied with air entrapment. On the other hand,
E00013
March 23-26, 2010
395
both of squeeze casting and the semi-solid casting show the melt flow less in air entrapment
and laminar flow pattern.
The solidification analysis shown in Figure 4, in the conventional die casting, the small gate
solidifies rapidly and consequently, the applied plunger pressure from a die casting machine
can not work effectively to suppress the shrinkage defect when the internal metal temperature
is still high enough to solidify with large pore. On the other hand, the squeeze casting can
suppress the shrinkage defects effectively because of its thick gate, while it still can not
prevent the shrinkage if there is a thin product portion existed in between the finally solidified
part. Therefore, the squeeze casting process may have small shrinkage defect even smaller
compare to the conventional die casting. The semi-solid die casting has improved
solidification related to phenomena compare to those of the conventional die casting or
squeeze casting, showing low temperature at the thick portion.

4. SEMI-SOLID DIE CASTING
The semi-solid die casting process (Cup Method) can be explained are as followings;
(1) Melt pouring process; pouring specified amount of molten aluminum to a metallic cup.
(2) Nuclei generation process; natural stirring will provide a uniform nuclei generation.
(3) Cooling process; the semi-solid slurry with uniform distributed a spheroidized
structure is produced after a determined time. The fraction of solid (fs) can be easily
controlled during this process.
(4) Slurry transferring process; the semi-solid slurry is transferred to shot sleeve.
(5) Die casting process; the semi-solid slurry is injected into a cavity to form a parts.

Figure 5. Schematic of various casting process

5. VALIDATION
A short-shot die casting laboratory was carried out to investigate a flow property of the
semi-solid alloy in die cavity. As shown in Figure 6, the semi-solid slurry flows through a
gate with strong straight flow pattern according to its high viscosity. When the semi-solid
E00013
March 23-26, 2010
396
slurry hit a thin portion in die cavity, the back pressure is increasing caused a semi-solid
slurry flows into left-right side and then met together again.
As flow analysis with the completed model in the real die shown in Figure 7, the flow
pattern of the semi-solid slurry is the same as short-shot experiment. The result can be used to
further optimize to a condition of die casting machine such as a timing of intensification
pressure apply and the velocity of plunger for high quality of casting product.

Figure 6. Short-shot die casting result

Figure 7. Flow analysis result (Completed model)

6. CONCLUSION
Flow analysis for the conventional die casting, squeeze casting and the semi-solid die
casting are carried out with the commercial casting CAE software. In this research, casting
trial by the semi-solid die casting process was carried out and compared with casting
simulation using commercial software named ADSTEFAN. The summary is as the
followings;
(1) Thick gate is preferred for obtaining a laminar flow in the semi-solid die casting.
E00013
March 23-26, 2010
397
(2) Joint portion in mold cavity should be considered when the semi-solid die design due
to possibility in casting defect such as cold shut etc. that may lead to poor mechanical
property.
(3) Multi gating system is not suitable for the semi-solid die casting because of the
tendency of air entrapment producing of cold shut defect.
(4) Commercial casting simulation software can be used for the semi-solid die casting by
adjusting physical properties of alloy.

REFERENCES
1. M.C Flemings and R.Mehrabian, AFS Transaction, 1973, 81-89.
2. ITAMURA Masayuki, YAMAMOTO Naomichi, IMONO, 1996, Volume 68, 493-498.
3. ITAMURA Masayuki, ADACHI Mitsuru, MURAKAMI Kousei, HARADA Takashi,
TANAKA Motoki, SATO Satoru, MAEDA Takuma, Flow Analysis Application to Rheo-
Casting, MCSP5 (5th Pacific Rim International Conference on Modeling of Casting &
Solidification Processes), 2002.
4. ASM Handbook Committee, Metal Handbook Ninth Edition (American Society for
Metals), 1979, 142.
5. PERAKIT Viriyarattanasak, YAOKAWA Jun, ANZAI Koichi, ITAMURA Masayuki,
MAEDA Takuma, KIKUCHI Masao, NAGASAWA Osamu, NIYAMA Eisuke, Japan
Die Casting Conference, 2008, 209-214.
6. YAOKAWA Jun, FARSHID Pahlevani, ANZAI Koichi, PERAKIT Viriyarattanasak,
ITAMURA Masayuki, Journal of Japan Foundry Engineering Society, 2008, Volume 3,
156.
7. ANZAI Koichi, ITAMURA Masayuki, KIKUCHI Masao, NIYAMA Eisuke, Japan Die
Casting Congress, 2006, 253.

ACKNOWLEDGMENTS
Special thanks to Ibaraki Hitachi Information Service Co., Ltd. for supporting ADSTEFAN
software. My special thanks to Prof. NIYAMA Eisuke for an excellent academic supervising.

E00018
March 23-26, 2010
398
Numerical Simulation of the Fluid Flow
Past a Rotating Torus

P. Suwannasri
1
and N.P. Moshkin
1,C

1
School of Mathematics, Institute of Science, Suranaree University of Technology, 111 University
Avenue, Muang District, Nakhon ratchasima 30000, Thailand
C
E-mail: pairin_77@hotmail.com; Tel. 081-6513252

ABSTRACT
In the present work, the hydrodynamics of a torus rotating about its centerline is
investigated numerically. This problem is important on two reasons: firstly, swimming
of micro-organisms can be modeled as a self-locomotion of a doughnut-shaped
swimmer powered by surface rotation and secondly, it (torus) has simplest geometry
which can describe self propelled organism (particles). Rotation of tours surface can be
considered as a propulsion device for controlling the variation in the drag coefficient
for flow past rings orientated normal to the direction of flow. A numerical model bases
on the projection method have been developed for the governing equation in the
toroidal coordinate system. The numerical algorithm has been validated by comparing
our numerical results with available data from laboratory physical modeling and other
numerical results. The drag coefficients and flow patterns for the axisymmetric flow
past a torus rotating about its centerline were computed and are analyzed for moderate
Reynolds number, various rotational speed and different aspect ratio (ratio of the ring
diameter to the cross-section diameter)

Keywords: Toroidal Coordinate System, Rotating Torus, Self-Propelled.

REFERENCES
1. Johnson, R.E. and Wu, T.Y., J. Fluid Mech, 1979, 95(2), 263-277.
2. Leshansky, A.M. and Kenneth, O., Phys. Fluid, 2008, 20, 063104.
3. Majumdar, S.R. and ONeill, M.E., J. Applied Mathematics and Physics, 1997, 28, 541-
550.
4. Sheard, G.J., Thompson, M.C. and Hourigan, K., J. Fluid Mech., 2003, 492, 147180.
5. Sheard, G.J., Hourigan, K. and Thompson, M.C., J. Fluid Mech., 2005, 526, 257275.
6. Surattana, S. and Nikolay, M., Suranaree J. Sci. Technol., 2006, 13(3), 219-233.
7. Surattana, S. and Nikolay, M., The 5
th
International Conference on Computational Fluid
Dynamics (ICCFD5), 2009, 40, 771-119.
8. Thaokar, R.M., Schiessel, H. and Kulic, I.M., Eur. Phys. J.B, 2007, 60, 325-336.

E00021
March 23-26, 2010
399
VirtualFlood3D: Software for Simulation and Visualization
of Water Flooding

A. Busaman
1,C
, S. Chuai-Aree
2
, W. Kanbua
3
, S. Siripant
1

1
Advanced Virtual and Intelligent Computing (AVIC), Department of Mathematics, Faculty of Science,
2
Department of Mathematics and Computer Science, Faculty of Science and Technology,
Prince of Songkla University, Pattani Campus, 181, Rusamilae, Muang, Pattani, 94000, Thailand
3
Thai Marine Meteorology, Bangkok, 10330, Thailand
C

ABSTRACT
The floods in Thailand have often occurred every year. To simulate the possible floods
in Thailand, this paper proposes algorithms and software for simulating and visualizing
of water flooding. The algorithms are provided for three cases of floods, which are
caused by sea water level, continuously heavy rain and collapsed dam, respectively. We
used the finite difference method (FDM) to solve the wave equation for flooding from
sea water level and diffusion equation for heavy rain and water from collapsed dam.
The numerical method for solving partial differential equations (PDEs) and image
processing are combined and implemented in our algorithms for visualizing the
simulated data in 2D and 3D space. The grid size resolution of earth topography
(ETOPO) data is 92.5 meters and 92.5 meters. The results from the simulation can
show the water propagation from sources of water to the risk regions. This algorithm
can be applied to simulate the floods for any region. The software so-called
VirtualFlood3D is useful for the preliminary resources management and disaster
prevention from water flooding.

Keywords: VirtualFlood3D, Water Flooding, Simulation and Visualization.

REFERENCES
1. Benes, B., and Forsbach, R., Visual Simulation of Hydraulic Erosion, Journal of WSCG
1., 2001, 79-86.
2. Fiedler F.R., and Ramirez, J.A., A numerical method for simulating discontinuous
shallow flow over an infiltrating surface, International Journal for Numerical Methods in
Fluids., 2000, 219-240.
3. Busaman, A., Chuai-Aree, S., Saelim, R., and Kanbua, W., Modeling, Simulation and
Visualization of Water Flooding, Proc. of 14
th
Annual Meeting in Mathematics, 2009.
E00022
March 23-26, 2010
400
Kinematics and Dynamics of Coherent Structures within a
Turbulent Spot in Plane Channel Flow

V. Juttijudata
1,C

1
Department of Aerospace Engineering, Faculty of Engineering, Kasetsart University, Jatujak,
Bangkok, vc fc10900, Thailand
C
E-mail: vejapong.j@ku.ac.th.; Fax: 02-679-8570; Tel. 02-942-8555

ABSTRACT
The dynamics of turbulent spots play a vital role in transition process and its modeling.
The objective of this study is to identify coherent structures within an isolated turbulent
spot of plane channel flow in a self-similar state, and study their dynamics by analyzing
a database obtained from a direct numerical simulation (DNS). The turbulent spot is
artificially initiated by a vortex-pair initial condition. A large computational domain
allows the spot to grow to a self-similar state independent of the initial disturbance. The
coherent structures are extracted by means of proper orthogonal decomposition (POD).
Only centre-line plane is analyzed in this study. The coherent structures in the turbulent
spot comprise of streaky structures resemble to their counter part in a fully-developed
turbulent flow, and wave packet at wing tips of the spot. The result also suggests that a
turbulent breakdown is associated with a bursting of streaky structures and a triggering
mechanism due to wave packet similarly to a fully-developed turbulent flow. Future
work includes the analysis of the spot in three-dimension and interactions between
turbulent spots.

Keywords: Turbulent spots, Coherent structures, Proper orthogonal decomposition
(POD).

1. INTRODUCTION
The location of the onset and extent of transition are of major importance in
turbomachinery design where wall shear stress and heat transfer rate are of interest. The
bigger picture goal of the whole project is to study the physics of transition process and the
dynamics of turbulent spots in the hope that this will shed some light on how to model and
predict transition. The objective of this study is to identify coherent structures within an
isolated turbulent spot of plane channel flow in a self-similar state, and study their dynamics
by analyzing a database obtained from a direct numerical simulation (DNS). The proper
orthogonal decomposition (POD) was used to analyze the database. The problem is simplified
into the analysis of velocity field on a two-dimensional centerline plane rather than a full
complicated three-dimensional volume of the flow.


2.1 TURBULENT SPOTS IN A PLANE CHANNEL FLOW
When a localized disturbances evolves in an unstable flow, the breakdown of the
disturbances results in a turbulent spot - a localized region of turbulence. This spot further
grow, spread and merge with neighbor spots resulting in a fully-developed turbulence region.
The growth and the merging rate of the spot and spots interaction control the extent of
transition in the boundary layer. Turbulent spots are found in boundary layers as well as plane
channel flows. However those spots in plane channel flow have arrowheads pointing in the
E00022
March 23-26, 2010
401
upstream direction, in contrast to that found in boundary layers (see [5]). Furthermore a wave
pattern seen around the spot in channel flow is not observed in other types of spots. The
propagation velocities and spreading angle of the spots in the boundary layer and plane
channel are very similar; however the shape and wave structures in the spots are different.

Henningson and Alfredsson [4] studied the waves at the wingtips of the spot in channel
flow experimentally. They consisted of the least stable Orr-Sommerfeld mode, and
propagated into the turbulent region as they broke down. Henningson [3] showed that the
growth of the spot is the result of cross-flow instability and wave energy focus. Henningson
and Kim [6] studied the characteristics of turbulence inside a turbulent zone in a turbulent
spot. Characteristics of turbulence inside the spot is strongly resemble to that of fully-
developed turbulence.

2.2 PROPER ORTHOGONAL DECOMPOSITION (POD)
The proper orthogonal decomposition (POD) is an objective technique to extract the most
energetic organized structures (also known as coherent structures) from highly irregular
turbulent flows. The basic principle of the method is based on the eigenvalue problem of the
two-point correlation matrix (see detail in [8]). Sirovich [12] reformulated the POD such that
it is more computational efficiency. The method is known as the method of snapshots and
briefly outlined here. In method of snapshots, the eigenfunctions are expressed in terms of a
linear combination of fluctuating velocity fields i.e.
( )
1
, , o
=
' =
M
k
i k i j
k
b u x t

(1)
where coefficients b
k
remain to be determined from the eigenvalue problem, and u
i
(x
j
,t
k
) is
the k
th
snapshot of fluctuating velocity field (there are M snapshots in the ensemble
here) defined as
. ' =
i i i
u u u

(2)
Here u
i
represent a total velocity from the simulation and <> represents the ensemble
average. With the help of (1), the eigenvalue problem of the POD becomes
( ) ( ) ( )
1
1
, , , ; 1, , ,
=
' ' = =

M
k l
i j i j l k
l
u x t u x t b b k M
M

(3)
where (,) is an inner product operator defined by
( ) , .
O
O
i i i i
u v u v d

(4)
The eigenvalues of the problem represent the energy content in their corresponding coherent
structures defined by the orthogonal eigenfunctions (1). The eigenfunctions are normalized
according to
( )
1, n=p
, .
0, otherwise
o o o o
O
O =

n p n p
i i i i
d

(5)
The fluctuating velocity may be obtained from the reconstruction from
( ) ( ) ( )
1
, , o
=
' =
M
k n k n
i j i j
n
u x t a t x

(6)
where the modal coefficients a
n
are determined from
( ) ( ) ( )
, , . o =
n k k n
i j i
a t u x t

(7)
Energy in each coherent structure at any time may be deteremined from
( ) ( ) ( ). =
n n n
E t a t a t

(8)
By averaging E
n
over ensemble, average energy content of each coherent structure is obtained
which is equal to the corresponding eigenvalue i.e.
E00022
March 23-26, 2010
402
( ) ( ) . =
n n n
a t a t

(9)
POD was extensively used in analysis of turbulent channel flows e.g. [1], [13]. Velocity
reconstructions of the POD recreated streaks, streamwise vorticies, propagating structures and
their interaction leading to bursting. POD was also used in transitional boundary layer
problem [10-11], and recently to investigate coherent structure within a turbulent spot of the
boundary layer [7]. Nevertheless, the experimental data available in the latter study were very
limited as a result they could only extract streaky coherent structures close to the rear of the
turbulent spot without any further information on coherent structures in other parts or
kinematics/dynamics of coherent structures within the spot.

The analysis of the kinematics and dynamics of a turbulent spot is based on the direct
numerical simulation (DNS) database of a turbulent spot in plane channel flow. DNS employs
a spectral method for the solution of the time-dependent, three-dimensional, incompressible
Navier-Stokes equations, and use the wall-normal velocity/vorticity formulation similar to [9].
Variables are expanded in Fourier modes in both horizontal directions (streamwise, x, and
spanwise, z, directions) and Chebyshev polynomials in the wall-normal, y, direction. Aliasing
errors are removed by the 3/2-rule. Periodic boundary conditions are applied in both
horizontal directions while no-slip condition is applied in the wall-normal direction. A low-
storage semi-implicit third-order Runge-Kutta time advancement scheme is employed for
time-marching [14].

The condition of the simulation in this study is set up according to the simulation in [6] so
that the results could be validated against their results. The simulation is performed under a
constant mean pressure gradient condition at a Reynolds number of 1500 (based on the half-
channel width h and centerline velocity U). The size of the computational domain is 35h x
2h x 25h in x, y and z directions, respectively. Here 256 x 33 x 256 spectral modes in such
directions are employed corresponding grid spacing of 20, 0.28-6.3, and 15 wall units (based
on viscous length and mean friction velocity) after dealiasing. Though a factor of two greater
than the grid spacing used in [9], energy spectra do not show any abnormality in high-
wavenumber. In contrast to the simulation in [6], the present simulation was performed on the
full domain without a spanwise symmetry assumption. All lengths and velocities are
normalized by half-channel width, h, and, centerline velocity, U. The initial condition for the
simulation is defined as a super-position of an undisturbed laminar velocity profile and an
initial turbulent spot disturbance. The initial turbulent spot disturbance is a velocity field of a
vortex-pair [6] defined analytically as

( )
2 2 2
2 16 4
,
1 ,
0,
,
v
v
v

=

=
`
=

=
)
x z
z
y
y ze
u
v
w

(11)
where u, v and w are velocity in streamwise, wall-normal and spanwise components. The
turbulent spot quickly breaks down and develops into an approximate self-similar turbulent
spot downstream. The time span of the simulation was from t=0 to 270 h/U with the time-step
of 0.03 h/U.

In order to study a turbulent spot in an approximate self-similar state, one-hundred and
twenty-one velocity fields between t = 222 to 258 h/U are interpolated into the horizontal
E00022
March 23-26, 2010
403
conical coordinates i.e. = x/t and = z/t using spline interpolation. Averaging and POD
analysis are performed on this dataset. The number of samples in the ensemble increased by a
factor of two as the spanwise reflection i.e. u(x,z) = u(x,-z), v(x,z) = v(x,-z), and w(x,z) = -
w(x,-z) us also used to ensure the spanwise symmetry of the data in terms of statistics.


4.1 THE DEVELOPMENT OF A TURBULENT SPOT
The contours of the wall-normal velocity (0.02 U) at the centerline of the channel are
plotted in figure 1 at three different times to show the development of the spot. Point A, B and
C denotes the rear interface, wingtip, and front interface of the turbulent spot. The shape and
evolution of the spot are similar to the well-validated simulation in [5] whose set up is the
same as that in [6]. Figure 2 shows the position of point A, B and C as a function of time. The
propagation speeds of points A and C, and the spreading rate of wingtip at point B agree well
with the value reported in [5]. Here the interface of the spot is defined by the contour line at
the outer edge of the turbulent spot on the centerline plane that has a value higher or lower
than 0.02 U and -0.02 U, respectively. The two figures below validate the simulation result in
this study.

Figure 1. Top view of surface contours (0.02 solid/dash lines) of the wall
normal velocity at the centerline of the channel at tU/h = 114, 174 and 258.

4.2 THE COHERENT STRUCTURES WITHIN THE TURBULENT SPOT
The POD analysis is done on the centerline plane using the velocity fields in the conical
coordinates from t=222 to 258 h/U and the velocity fields resulted from spanwise reflection.
Figure 3 shows the average velocity in streamwise and spanwise components. The average
velocity components compares relatively well with that of [6] even though the exact
averaging processes are different. Note that the contours of average velocity components are
not very smooth indicated that the number of samples may not be sufficient. Statistical
convergence test is performed. The result shows the average velocity components for the
same given time span do not change much as the number of samples was halved. Streamwise
velocity component shows that the front interface moves with high velocity compare to the
rear interface. Furthermore undisturbed flow outside the spot has higher velocity than flow
E00022
March 23-26, 2010
404
inside turbulent region within the spot. From the spanwise velocity, it can be seen that fluid is
entrained into the spot at the wingtips.

Figure 2. Positions of turbulent spot feature as a function of time. Point A, B and C
are rear interface, wingtip and front interface of the turbulent spot (see Figure 1). The
corresponding velocity at each feature is shown be the value at the top right of each
line.

Figure 3. Contours of average velocities at the centerline plane. (Left) Streamwise
component; minimum and maximum values are 0.64 to 1.12 with spacing of 0.04. (Right)
Spanwise component; minimum and maximum values are -0.2 to 0.2 with spacing of 0.04.

Table 1 shows the eigenvalues from the POD. Only the first six eigenvalues are shown as
only these modes are converged in statistical sense for this given time span. Energy and
cumulative energy of each mode (in terms of % of the total energy) are also shown in the
table. The eigenvalues indicate amount of energy content in that certain mode. However,
some modes have (almost) the same values of eigenvalues i.e. modes 2 and 3. These modes
usually come in pair and have the same coherent structures. Only difference is that these two
modes have a 90
o
phase shift (see figure 4). Hence, these two modes are treated as one single
coherent structure. Energy and cumulative energy are then the contribution of the sum of
eigenvalues of these two modes. The eigenvalues of modes 4 and 5 are different by a factor of
two. However the closer look at the eigenfunctions of these modes in figure 4 indicates that
they should have come in pair as well. It is logical to treat two modes as another coherent
structure. It is not clear why the eigenvalues are different by a factor of two here. It is possible
that the time span in this analysis covers only some fraction of the whole cycle; the
eigenvalues of mode 4 and 5 are not fully converged the same value. Mode 1 and mode 2&3
have about the same energy content and together take more than 50% of total energy. The
E00022
March 23-26, 2010
405
contribution from mode 4&5 (13.5%) still significant and are included in the analysis. The
contribution from mode 6 is only 3% which is less than 5% of the cumulative energy; thereby
keeping only the first five modes in the subsequent analysis.

Table 1. Eigenvalues, energy and cumulative energy of each mode.
Mode x10
5
Energy
(% of Total)
Cumulative Energy
(% of Total)
1 3.4014 24.6 24.6
2 1.9369
27.2 51.8
3 1.8166
4 1.2420
13.5 65.3
5 0.6170
6 0.3860 2.8 68.1

Figure 4 shows the x-, y- and z-components of eigenfunctions in conical coordiates (,)
from mode 1 (top) to mode 5 (bottom). Mode 1 exhibits strong streaky fluctuating regions
close to the wingtips of the spot most likely indicating high turbulence activities there. The x-
component of mode 1 also shows scatter spotty regions of high turbulence activity inside the
turbulent region within the spot. This mode represents high turbulent mixing i.e. entraining
process at the wingtips of the spot as well as turbulent activity within the spot. Modes 2 and 3
come in pair as they are very similar except the 90
o
phase difference from each other. The
almost the same values of eigenvalues from these two modes confirm that they should come
in pair. The striking feature of these modes is wave packets at the wingtips of the spot. These
wave packets may be viewed as arrays of small oblique vorticies from the wall-normal
velocity point of view. Mode 4 and 5 show similar features as mode 3 and 4 even though the
eigenvalues of mode 4 is twice as large as that of mode 5. Once again, the striking feature of
these modes is the wingtip wave packets. The differences between the wingtip wavepackets of
mode 2&3 and mode 4&5 are that of mode 4&5 have slightly longer wavelength, and located
slightly away from the center of the spot and upstream (toward the rear interface of the spot)
compared to that of mode 3&4.

4.3 THE DYNAMICS OF THE COHERENT STRUCTURES
Figure 5 shows energy content in each structure, a
n
2
, and the total energy as a function of
time. As mode 2&3 are treated as a single coherent structure, energy content in this structure
is then the summation of energy in mode 2 and 3. Mode 4&5 are handled in the same way.
From the total energy, it is clear that the simulation does not cover a complete cycle of the
turbulent spot life; any conclusion drawn must be done with great cautious. The total energy
shows that energy peaks up just right after the beginning, goes down, and rises up back again
toward the end of the time span. Mode 1, mode 2&3, and mode 4&5 tend to have strong
interactions among them as it is seen in the energy content of each coherent structure. Figure
6 shows the reconstruction of velocity field from the first five modes (without the mean
velocity to underline the role of coherent structures).
During the period of t 222 to 226 h/U, energy transfers from mode 4&5 to mode 1 while
mode 2&3 is relatively dormant. The shapes of eigenfunctions of mode 1 and mode 4&5 are
inherent in the velocity reconstruction at t = 222 h/U. At t = 226 h/U , where energy for mode
1 reaches the maximum, we clearly see the pattern of mode 1 in the reconstruction. During
this period, streaks of mode 1 at the wingtips of the spot are regenerated from the break down
of the arrays of small oblique vorticies of mode 4&5. This is the period that the spot entrains
undisturbed laminar flow into the spot; spot is spreading out into undisturbed laminar flow.

E00022
March 23-26, 2010
406

Figure 4. Eigenfunctions corresponding to the first five most energetic modes (in table
1). Streamwise, wall-normal and spawise components are shown in the left, middle and right
columns, respectively. Mode 1 is in the top row mode 2, 3, are in the following rows;
minimum and maximum values are -20 to 20 with spacing of 4.

E00022
March 23-26, 2010
407

Figure 5. Total energy and energy in different structures as a function of time.

Right after t 226 h/U, energy transfers from mode 1 to mode 2&3 until t 235 h/U. At t
= 232 h/U where energy are distributed quite evenly, the reconstruction seems to mix all the
shapes of eigenfunctions in and there is no single dominated mode here. Here streaky
structures from mode 1 breaks down and the wave packets or arrays of small oblique vortices
at wingtips of the spot from mode 2&3 are regenerated and dominated. This phase is similar
to streak instability leading to the breakdown in the self-sustaining process (SSP) of the
turbulence wall-bounded flow (Hamilton, Waleffe and Kim 1995). The result of the streak
instability in SSP is the regeneration of streamwise vortices in the case of turbulence wall-
bounded flow. At t = 240 h/U when mode 2&3 are dominated, the reconstruction clearly
shows the structures of mode 2&3.
After t 245 h/U, energy transfers from mode 2&3 to mode 1. During this phase arrays of
small oblique vorticies (mode 2&3) are pumping low (high)-momentum fluid to (away from)
the centerline plane creating low(high)-speed streaks (mode 1). This is similar to streak
generation phase in the SSP. Streak (mode 1) becomes very strong again at t = 253 h/U
whereas arrays of small oblique vortices (mode 2&3) disappears. Again the turbulent spot is
entraining surrounding fluid into itself here. The streaking pattern of mode 1 is clearly visible
in the reconstruction.

After t = 253 h/U , energy transfers from mode 1 to mode 4&5. Most likely that the
process of streak instability and the regeneration of oblique vorticies are taking place but now
to regenerate vorticies of mode 4&5. It is speculated that these oblique vortices should be
dominated for some time and generate streak (mode 1) before its energy goes down again.
After completing this segment of this phase, it is speculated that the cycle will repeat itself
again. Further study on an extended period of time has to be carried out to solidify the
speculation here.

5. CONCLUSION
The kinematics and dynamics of coherent structures within a turbulent spot in plane
channel on the centerline plane are studied by means POD analysis using well-validated DNS
database. The first five modes capture more than 63% of the total energy; this study focused
on the kinematics and dynamics of the first five modes. Coherent structures in the spot are
dominated by streaky structure and wave packets (or arrays of small oblique vorticies) at the
wingtip of the spot. The dynamics at the wingtips of the spot are very similar to the SPP of the
fully-developed turbulence in wall-bounded flow i.e. streak generation, streak instability,
vortex regeneration and then repeat all the processes again. Entrainment into the spot occurs
during streak generation phase.

E00022
March 23-26, 2010
408
The present work is constrained by the DNS database in some ways. In particular, the time
span in the present work is too short and fails to cover a complete cycle of the turbulent spot.
Further study will cover a longer time span in order to do analysis on a complete cycle.
Another area to be study is to do a full three-dimensional POD analysis. This will give a more
complete picture of the kinematics and dynamics of coherent structures within a turbulent
spot including interaction between different layers of the spot.

E00022
March 23-26, 2010
409

Figure 6. Velocity Reconstruction from the first five modes (without average velocity)
at different times. Streamwise, wall-normal and spawise components are shown in the left,
middle and right columns, respectively. Velocity reconstruction at t = 222 h/U is in the first
row, and that at t = 226, 232, 240, 248, and 253, respectively are in the following rows.;
minimum and maximum values are -0.2 to 0.2 with spacing of 0.05 for streamwise, and wall-
normal components, and minimum and maximum values are -0.1 to 0.1 with spacing of 0.025
for spanwise component.

REFERENCES
1. Ball, K. S., Sirovich, L. and Keefe, L. R., Int. J. for Numerical Methods in Fluids, 1991,
12, 585-604.
2. Halmilton, J. M., Kim, J. and Waleffe, F. J. Fluid Mech., 1995, 287, 317-348.
3. Henningson, D. S., Phys. Fluids A, 1989, 1, 1876-1882.
4. Henningson, D. S. and Alfredsson, P. H., J. Fluid Mech., 1987, 178, 405-421.
5. Henningson, D. S., Spalart, P. R. and Kim J., Phys. Fluids, 1987, 30(10), 2914-2917.
6. Henningson, D. S. and Kim J., J. Fluid Mech., 1991, 228, 183-205.
7. Hildalgo, P., Lang, A. W. And Thacker, W. D., AIAA 2006-1099., 2006, 44
th
AIAA
Aerospace Sciences, Meeting and Exhibit, 9-12 January 2006, Reno, Nevada.
8. Holmes, P., Lumley, J. L. and Berkooz, G. Turbulence, Coherent Structures, Dynamical
Systems and Symmetry, Cambridge University Press, Cambridge, 1996.
9. Kim, J., Moin, P. and Moser, R. D., J. Fluid Mech., 1987, 177, 133-166.
E00022
March 23-26, 2010
410
10. Rempfer, D. and Fasel, H. J. Fluid Mech., 1994, 260, 351-375.
11. Rempfer, D. and Fasel, H. J. Fluid Mech., 1994, 275, 257-283.
12. Sirovich L., Qaur. Appl. Math., 1987, XLV(3), 561-590.
13. Sirovich, L., Ball, K. S. and Keefe, L. R., Phys Fluids A, 1990, 2, 2217-2226.
14. Spalart, P. R., Moser, R. D. and Rogers, M. M., J. Comp. Phys., 1991, 96, 297-324.

ACKNOWLEDGMENTS
The authors acknowledge Kasetsart University Research and Development Institute for
financial support, and National Electronics and Computer Technology Center, National
Science and Technology Development Agency for providing computing resources that have
contributed to the research results reported within this paper.

E00023
March 23-26, 2010
411
Towards an Extension of the SST-k- Model for
Transitional Flow

K. Ngiamsoongnirn
1,C
, P. Malan
2
, and E. Juntasaro
1

1
Mechanical Engineering (Simulation and Design),
The Sirindhorn International Thai-German Graduate School of Engineering (TGGS),
King Mongkuts Institute of Technology North Bangkok,
1518 Pibulsongkram Road, Bangsue, Bangkok 10800, Thailand
2
CD-adapco, Lebanon, NH, 03766
C
E-mail: kiattisakn@kmutnb.ac.th; Fax: 02-913-2500 ext. 2922; Tel. 02-913-2500 ext. 2919

ABSTRACT
The SST-k- turbulence model of Menter (1994) is widely accepted for predicting
turbulent flow in many engineering applications. Recently, Menters group (Menter,
Langtry, Volker, and Huang [2005]) modified it and was able to predict transitional
flow by using an intermittency concept to weight the source terms in the turbulent
kinetic energy equation. The intermittency and related transport equations are
formulated and correlated from the experimental data. Therefore its utilization might
not be versatile. Moreover, there are two important parameters that are not published.
Therefore, it is inappropriate for use in a research group or at academic level. More
recently, on the other hand, Walters group (Walters and Leylek [2004], Walters and
Leylek [2005], and Walters and Cokljat [2008]) proposed the transition model using the
laminar kinetic energy arising from the laminar fluctuation in the pre-transitional flow
regime. The model is actually based on physics of flow and details of the model are
clearly complete. Consequently, the objective of this paper is to present the new model
that incorporates the proposed functions and constants in Walter and Cokljat (2008) for
transitional parts into the SST-k- turbulence model. The SST-k- turbulence model
will be changed as little as possible according to Menters group concept. Then, three
transition models: Menter et al (2005), Walters and Cokljat (2008), and present model,
will be evaluated in terms of accuracy for flow over a flat plate with zero-pressure
gradient.

Keywords: SST, transition, intermittency, laminar kinetic energy, Menter, Walters.

REFERENCES
1. Menter, F. R., AIAA Journal, 1994, 32(8), 1598-1605.
2. Menter, F.R., Langtry, R.B., Volker, S., Huang, P.G., ERCOFTAC International
Symposium on Engineering Turbulence Modeling and Measurements (ETMM6), 2005.
3. Walters, D.K. and Leylek, J.H., Journal of Turbomachinery, 126, 2004, 193-202.
4. Walters, D.K. and Leylek, J.H., Transactions of the ASME, 127, 2005, 52-63.
5. Walters, D.K. and Cokljat, D., Journal of Fluids Engineering, 130, 2008, 121401:1-
121401:14.
E00027
March 23-26, 2010
412
Inelastic Transient Dynamic Analysis by BEM Using
Domain Decomposition

Bupavech Phansri
1
, Kyung-Ho Park
1,C
and Pennung Warnitchai
1

1
School of Engineering and Technology, Asian Institute of Technology
Kyung-Ho Park. E-mail: khpark@ait.ac.th; Fax: 662-5245509; Tel. 662-5245508

ABSTRACT
This study examines the domain decomposition method for multi-region inelastic
transient dynamic BEM analysis. The particular integral BEM formulation for single-
region inelastic transient dynamic analysis is obtained by eliminating the acceleration
volume integral and treating the initial stress term by volume cell. The domain
decomposition method is used for multi-region analysis. The domain of the original
problem is subdivided into subregions in which each can be treated as sub-problem and
independently solved. A numerical example of stress wave propagation in a bi-material
rod is given to demonstrate the validity of the present formulation.

Keywords: BEM, Multi-region, Domain decomposition method, Inelastic transient
dynamic

1. INTRODUCTION
The Boundary Element Method (BEM) using particular integrals has developed for various
engineering problems [5-9]. While the previous efforts focused mainly on single-region
problems, the extension to a multi-region form is needed to show the applicability of the
method to large scale practical problems.
Because of acceleration and initial stress terms in the governing equation, the direct
application of the BEM to the inelastic transient dynamic problems generates domain
integrals in addition to the usual surface integrals [2-3]. The particular integral formulation for
inelastic transient dynamic analysis can be obtained by eliminating the acceleration volume
integral only and treating the initial stress term by volume cell.
With the aid of the domain decomposition method (DDM) [10], applications of inelastic
transient dynamic analysis by BEM can be extended to the problems including piecewise
inhomogenities and multi-part assembly. The multi-region techniques in BEM focused mainly
on the direct coupling approach for problems involving zone inhomogeneities [2-4]. In such
approach, subregions are assembled into a large single equation system and solved either by
direct or iterative method.
This study examines the domain decomposition method for multi-region inelastic transient
dynamic BEM analysis. First, the particular integral formulation for inelastic transient
dynamic analysis is presented. The Houbolt time integration scheme is used for the time-
marching process. A standard predictor-corrector method, together with the Newton-Raphson
algorithm for plastic multiplier, is used to solve the system equation. The initial stress term is
treated by volume cell. Second, the DDM based on alternating Schwarz algorithm is
described and examined for the problem consisting of two subregions. Finally the numerical
results of an example problem are compared with those by ABAQUS [1] to demonstrate the
validity of the present formulation.

E00027
March 23-26, 2010
413
2. BEM FORMULATION FOR INELASTIC TRANSIENT DYNAMIC
The governing differential equation for inelastic transient dynamic analysis of a
homogeneous, isotropic body in the absence of body force can be written in an incremental
form as [2,6]
( )
o
j ij i jj i ji j
u u u
, , ,
o p u u A + A = A + A + (1)
where
i
u is the displacement,
i
u is the acceleration,
o
ij
o is the initial stress, is the mass
density, and u are Lames constants, A denotes incremental quantity, commas represent
differentiation with respect to spatial coordinates, and i,j=1,2(3) for two(three) dimensions.
The incremental initial stress rate is defined as
ep
ij
e
ij
o
ij
o o o A A = A (2)
where
kl
e
ijkl
e
ij
D c o A = A ,
kl
ep
ijkl
ep
ij
D c o A = A ,
kl
c is the strain and
e
ijkl
D ,
ep
ijkl
D are the elastic
and elastoplastic constitutive tensors respectively.
The total solutions for displacement, traction and stress rates can be expressed as
p
i
c
i i
u u u A + A = A (3a)

p
i
c
i i
t t t A + A = A (3b)

p
ij
c
ij ij
o o o A + A = A (3c)
where
c
i
t ,
c
ij
o and
p
i
t ,
p
ij
o are the complementary function and particular integrals for
traction and stress rates, respectively.
The boundary integral equation related to the complementary functions,
c
i
u and
c
i
t , can be
written as [2]
( ) ( ) ( ) ( ) ( ) ( ) | | ( ) x dS x u x F x t x G u C
S
c
i ij
c
i ij
c
i ij

A A = A c c c c , , (4)
The complementary function for the interior stress rate can be written by using the stress-
strain relationship as [2]
( ) ( ) ( ) ( ) ( ) | | ( ) x dS x u x F x t x G
S
c
k kij
c
k kij
c
ij

A A = A c c c o
o o
, , (5)
The corresponding particular integrals for displacement, stress and traction rates can be
obtained using the following relations
( ) ( )
n k
n
n ik
p
i
x U x u c o c

A = A

=1
, ) ( (6)
( ) ( )
n k
n
n ikj
p
ij
x S x c o c o

A = A

=1
, ) ( (7)
( ) ( )
n k
n
n ik
p
i
x T x t c o c

A = A

=1
, ) ( (8)

After discretization, the above equations can be formed into the discrete system of equations
as the following equations
u M u F t G = (9)
u M u F t G

= (10)
where
( ) ( )
1
C U F T G M

= (11)
( ) ( )
1
C S U F T G M

= (12)
In this study, the Houbolt method is used such that
( )
t t t t t t t
2
t t
4 5 2
t
1
2 + +
+ = u u u u u (13)
E00027
March 23-26, 2010
414
Considering the equilibrium equation (9) at time t+t one can obtain
( )
t t t t t
2
t t t t
2
4 5
t
1
t
2
2 + +
+ + =
|
.
|
\
|
+ u u u M t G u F M (14)
By substituting the boundary conditions at time t+t, the final system of equation can be
rewritten as
o
N y x A
t t t t t t + + +
= (15)
The combined form of the stress at boundary nodes and interior points can be written as
o
N y x A + = (16)
where x is the unknown variables obtained by solving Eq. (15), y
denotes the vector of

known boundary values. Eqs. (15) and (16) are nonlinear system due to the unknown initial
stress vector
o
. The details of the Newton-Raphson algorithm for elastoplastic solution and
the calculation sequence for elastic predictor and plastic corrector can be found in the
references [5-7].

3. DOMAIN DECOMPOSITION METHOD
The DDM can be used for solving partial differential equation (PDE) of two or more
subregions. In this study, indirect coupling approach is performed by so-called alternating
Schwarz algorithm. To explain the algorithm, consider a problem consisting of two
subregions with
II I
O O = O (Fig. 1). The compatibility and equilibrium conditions are
enforced along the common interface:
0 , = + =
II
F
I
F
II
F
I
F
t t u u on
F
(17)
where superscripts I and II denote the subregion number, and subscript F indicates the
variables that belong to interface boundary.
A sequential form of Dirichlet-Neumann algorithm can be described as follows:

For t = t A , 2 t A , do
Assume initial value of | | 0 =
+
0
t t
I
F
u
For n =1, 2, do
(1) Solve subregion I from
| | | | | |
n
t t
n
t t
n
t t + + A +
=
I
o I I I
N y x A (18)
and rearrange
I
x into | |
T
I
F
I I
u u u = and | |
T
I
F
I I
t t t =
(2) Apply | | | |
n
t t
n
t t + +
=
I
F
II
F
t t on
F
(3) Solve subregion II from
| | | | | |
n
t t
n
t t
n
t t + + A +
=
II
o II II II
N y x A (19)
and rearrange
II
x into | |
T
II
F
II II
u u u = and | |
T
II
F
II II
t t t =
(4) Apply | | ( )| | | |
n
t t
n
t t
1 n
t t + +
+
+
+ =
II
F
I
F
I
F
u u u e e 1
where e is an under-relaxation parameter to assist convergence. The value is problem
dependent. For initial guess, the value of e =0.5 may be used.
(5) Check convergence of the solution from
| | | |
| |
) ( tolerance given
1 n
t t
n
t t
1 n
t t
c <
+
A +
A +
+
A +
I
F
I
F
I
F
u
u u
(20)
(6) Finish current time step if the solution converges, otherwise repeat step (1).
E00027
March 23-26, 2010
415

The standard predictor-corrector scheme with Newton-Raphson method employing plastic
multiplier can be applied to obtain the solutions of Eqs. (18) and (19). The corresponding
stresses for each subregion are obtained from Eq. (16).

4. NUMERICAL EXAMPLE
The example of stress wave propagation in bi-material inelastic rod is solved. A rod is
formed by a series of two materials producing different speed of wave propagation. It is
composed of two subregions where each region is modeled by 58 quadratic boundary
elements with 100 quadratic volume cells (Fig. 2). The elastic properties used are:
1
E =100,
2
E =400,
1 2
0 v v = = . The mass density for both materials is equal to be 1.0. The materials
obey von Mises criterion. The yield stress,
Y
o =1.0, is considered with the elastoplastic
modulus
T
E =0.25E. The time step ( t A ) is taken to be 5.010
-3
. To compare with the result
of ABAQUS [1], the same modeling mesh is used.
The numerical results of the normal stresses
xx
o at three points, A(1,0), B(5,0) and C(8,0),
is shown in Fig. 3, together with the results by ABAQUS. Good agreement can be seen. The
stress wave amplitudes take longer durations to gain the steady value. This clearly implies
reduction of stress wave speed when travelling in inelastic region.

Figure 1. A problem domain decomposed into two subregions.

Figure 2. Modeling mesh of a bi-material rod.

5.0 5.0
2
O
1
O
F
I
p(t)
x
5.0 5.0
2
O
1
O
F
I
p(t)
x
E00027
March 23-26, 2010
416
-2.0
-1.5
-1.0
-0.5
0.0
0.5
0 0.2 0.4 0.6 0.8
Time
x
x
A(1,0) B(5,0)
C(8,0) ABAQUS

Figure 3. Time history of
xx
o at three points

5. CONCLUSION
The inelastic transient dynamic BEM analysis using DDM has been presented. The
analysis domain is subdivided into two subregions, which are modeled by the particular
integral BEM. Based on iterative coupling approach, each region is treated as sub-problem
and independently solved.
The validity of the present formulation is evaluated by comparing the results of an example
problem of stress wave propagation in a bi-material rod with ABAQUS. The results
demonstrate the applicability of DDM to multi-region inelastic transient dynamic BEM
analysis.

REFERENCES
1. ABAQUS Inc., ABAQUS 6.5 documentation. ABAQUS Inc., Rawtucket, 2004.
2. Banerjee, P.K., The boundary element methods in engineering, McGraw-Hill, London,
1994.
3. Banerjee, P.K. and Butterfield, R., Boundary element methods in engineering science,
McGraw-Hill, London, 1981.
4. Kane, J.H., Kumar, B.L.K. and Saigal, S., AIAA J., 1990, 28, 1277-1284.
5. Owatsiriwong, A. and Park, K.H., Int J Solids Struct., 2008, 45, 2561-2582.
6. Owatsiriwong, A., Phansri, B. and Park, K.H., Compute Model Eng Sci., 2008, 31, 37-59.
7. Owatsiriwong, A., Phansri, B., Kong, J.S. and Park, K.H., Comput Mech., 2009, 44, 161-
172.
8. Park, K.H. and Banerjee, P.K., Int J Solids Struct., 2002, 39, 2871-2892.
9. Park, K.H. and Banerjee, P.K., Comput Methods Appl Mech Engrg., 2002, 191, 3233-
3255.
10. Smith, B., Bjorstad, P. and Gropp, W. Domain decomposition: parallel multilevel
methods for elliptic partial differential equations, Cambridge University Press,
Cambridge, 1996.

High Performance
Computing
and
Grid Computing

F00001
March 23-26, 2010
417
Multi-GPUs Voxelization of 3D Data

W. Khantuwan
C
, and N. Khiripet
Knowledge Elicitation and Archiving Laboratory,
National Electronics and Computer Technology Center, Pathumthani, 12120, Thailand
C
E-mail: wongnaret.khantuwan@nectec.or.th; Fax: 02-5646772; Tel. 02-5646900 ext. 2273

ABSTRACT
Recently volumetric data have been used in a wide range of applications in computer
graphics including medical imaging, scientific visualization, CAD/CAM and virtual reality.
Conceptually, many 3D algorithms required to generate the volumetric representation of 3D
object. This stage is typically called voxelization. The one of popular voxelization methods
is called the parity count algorithm, a 3D model is place in a volumetric grid and
approximated by storing information representing the geometry in each grid entry. The
voxelization only uses boolean for indicating the present of matter. In this paper we present
an efficient parallel version of ray tracing and parity count algorithm on multi-GPUs. The
experimental results show that our approach is very efficient. For example, on a Stanford
bunny data using dual Nvidia GTX-280 graphic cards require only 5 seconds of running
time on 100010001000 grid dimension, which is faster than the CPU gold standard. With
this result, GPUs computing is a good choice of platform for solid voxelization in terms of
performance, cost, and availability

Keywords: Voxelization, GPGPU, CUDA.

1. INTRODUCTION
Volumetric data has been used in a wide range of applications in computer graphics including
medical imaging [1], scientific visualization [2], virtual reality [3]. These applications require the
volumetric representation of 3D model that consists of many volume elements (voxels) that
approximates the shape of 3D model as closely as possible. This stage is typically called
voxelization.
The earliest voxelization idea is a 3D scan-conversion algorithm [4]. But this idea does not
provide internal features of the 3D model. The voxelization of solid objects, (solid voxelization),
is more difficult and time-consuming because this process requires an inside test for each voxel.
Thus, there is a huge processing time when the grid dimension is large.
In recent years, graphics processing units (GPUs) are standard components of todays
desktops and with the development on programmability of GPUs, General-purpose computing on
graphics process unit (GPGPU) has been becoming a trend that addresses data intensive
computing on GPUs rather than to CPU [5]. The characteristic of GPGPU is its adoption of the
data parallel computational paradigm that is achieved by the stream processing over the CPU-
based approach.
To produce voxel of solid objects, the exterior/interior classification of a voxel must take into
account non-local aspects of polygon model. This procedure is called parity count method [6]. For
such models, the method classifies a voxel by counting the number of time that a ray with its
origin at the center of voxel intersects polygons of model. This parity count method can be
extended to determine whether a point is interior to a polygon in 2D-slice of 3D object.
Furthermore, this method is slice-independent, which is very suitable for parallelization. Moreover
we can take advantage of this method to speed up the voxel classification by casting many parallel
rays to the polygon slice, and each one of these rays classifies all of the voxels along the ray.
In this paper we propose the usage of graphic processing unit (GPU) to accelerate the
voxelization process by an efficient and fast voxelization algorithm that takes advantage of
parallelism on multi-GPUs. The structure of this paper is as follows. In section 2 nVidias CUDA
are introduced. Section 3, the details of the proposed method is explained. In section 4 the
experimental, computational details, results and discussions are described. Finally some
concluding remarks and directions for future work are presented in section 5.
F00001
March 23-26, 2010
418

2.1 Compute Unified Device Architecture (CUDA)
The first generation of General-Purpose computation on Graphics Processing Unit (GPGPU)
[5] requires that non-graphics application is mapped through the graphics application. Recently,
nVidia released their new parallel programming model, named Compute Unified Device
Architecture (CUDA) [7] that extends the C programming language. Another GPU vendor AMD
announced Close To Metal (CTM) programming model that uses an assembly language for
application development and Intel provides Larrabee, a new multi-core GPU architecture for GPU
computing.
In the CUDA programming model, GPU is viewed as a co-processor to perform a large
number of parallel threads and a source program consists of the host code and the kernel code. The
host code is run on CPU, and the kernel code is executed on GPU as shown in figure 1.

Figure 1 CUDA programming model.

Just as scientific computing can be done on clusters composed of a large number of CPU
cores, in some cases, the GPGPU problem can be decomposed and run parallel on multiple GPUs
within a single host machine. One of the drawbacks to the use of multi-core CPUs for scientific
computing has been the limited amount of memory bandwidth available to each CPU socket. But,
GPU contain their own on-board high performance memory, the available memory bandwidth
available for computational kernels scales as the number of GPUs is increased. This property can
allow single-system multi-GPUs codes to scale much better than their multi-core CPU based
counterparts. Highly data-parallel and memory bandwidth intensive problems are often excellent
candidates for such multi-GPUs performance scaling.

2.2 Traditional Voxelization
In this section, we describe the traditional method of the parity-count method. A voxel
representation of a model is a regular grid of cells, in our case a rectilinear grid, which each cell
(voxel) contains a density value in the range of zero to one. In this paper, we will use a voxel-
value of zero to represent a portion of unoccupied space and a value of one to represent a voxel
that is the interior element of the model.
As described above, there are several published methods for performing voxelization of
polygons [8]. For our purposes, the voxel representation should not be thin-shelled. A thin-shelled
voxelization of polygons is one in which only voxels that are near the surface of the original
model have a non-zero value. A thin-shelled representation of a sphere, for instance, would
contain nonzero voxels only near the spheres surface. Thus, a sphere would have too many of
zero-valued voxels inside its boundary. Figure 2a shows a slice of a thin-shelled sphere model. In
contrast, the voxel models that we use have voxel values of one in the interior of the object. Figure
2b shows a slice through such a voxel model of a sphere.

F00001
March 23-26, 2010
419

(a) (b)

Figure 2 (a) Slice through a thin-shelled volumetric representation of a sphere.
(b) Slice through a solid volumetric representation of a sphere.

2.3 Parity Count Method
To produce voxel models with true interiors, the exterior/interior classification of a voxel
must take into account of the polygonal model by counting the number of times that a ray with its
origin at the center of voxel intersects polygons of the model. An odd number of intersections
mean that voxel is interior and an even number means it is outside. This is simply method to
determine a point in 2D. In the Nooruddin and Turk report [6]. Instead of using ray-tracing, they
speedup the parity count method by using OpenGL orthographic projection function and polygon
scan-conversion to create a deep z-buffer.
However, the Nooruddin and Turk technique is still computationally expensive and thus
inappropriate for high resolution grids. Thus, we can take advantage of GPGPU to accelerate the
voxelization process. Furthermore, this method is slice-independent, which is very suitable for
parallelization. The next section, we describe how to accelerate the voxelization method by using
the advantage of parallelism on multi-GPUs.

3. PROPOSED METHOD
3.1 3D MODEL REPRESENTATIONS
In this paper the face-vertex array [9] are used for representing 3D model on the GPU. Since
the CUDA programming model treats memory as arrays and cannot support user defined data
structure efficiently. An example 3D model is presented using 2 arrays F and V with each vertex
in V containing the coordinate in 3D space as shown in figure 3

f0 v0 v1 v2 v0 x0 y0 z0
f1 v0 v3 v1 v1 x1 y1 z1
f2 v3 v4 v1 v2 x2 y2 z2
f3 v3 v5 v4 v3 x3 y3 z3
f4 v3 v0 v2 v4 x4 y4 z4
f5 v3 v5 v2 v5 x5 y5 z5
f6 v2 v1 v5
f7 v1 v4 v5

Figure 2 A 3D model representation using face-vertex arrays

Face list Vertex list
v2
v1
v0
v3
v4
f0
f1
f2
F00001
March 23-26, 2010
420
3.3 PARALLEL VOXELIZATION - SINGLE GPU IMPLEMENTATION
Since the ray-tracing process in the parity count algorithm is the most time-consuming and we
can speed up this process better than OpenGL orthographic projection function. In this section, we
present a parallel voxelization method that is simple to implement and based on previous work:
reading input 3D model, ray tracing, parity count and save output voxels. The system overview of
the proposed method is shown in figure 3

Figure 3 The flow-chart shows an overview of our parallel voxelization system.

PARALLEL RAY TRACING
We employ a single kernel for implementing the parallel ray tracing algorithm. For each face
of 3D model a thread is created and executing the ray tracing kernel for calculating its position in
voxel grid using ray-triangle intersection test as show in figure 4. The result of this process is thin
shell voxels that is input data for parallel parity count method.

Figure 4 The voxel position calculation using ray-triangle intersection test.

PARALLEL PARITY COUNT
Since the result from parallel ray tracing method is thin shell voxels. To produce the solid
voxels, we extend the parity count method to fill the volume by creating a thread for each voxel in
the grid and each tread votes on the classification of a voxel (interior or exterior) using three major
axis direction (see Figure 5). For watertight models, all votes will agree. However, for models
those have a hole through which a ray may pass, the majority vote is the voxels final
classification.

Figure 5 Parity vote process in each thread using three major axis directions.
3.3 MULTI-GPUs IMPLEMENTATION
Since many desktop PC can equip more than one GPU. Therefore, the programmer must
decide how many GPUs will be used. To implementing the multi-GPUs processing, the first thing
Begin
Input Mesh Data
Kernel 1: Parallel Ray Tracing
Kernel 2: Parallel Parity Count
End
Save Voxels Data
F00001
March 23-26, 2010
421
to do is to determine a workload distribution. There are two schemes: one is to distribute the
number of threads to be executed in each GPU and the other is to divide workloads inside the
kernel function. After that, the main function needs to merge outputs from each GPU to get the
final result.
In our experiment, we create the same number of CPU thread as GPUs to be utilized, each of
which controls an individual GPU. Each CPU thread copies input data from the CPU to the GPU,
executes the kernel, and copies result back to the CPU. The host CPU waits for all CPU thread to
complete and merges result into one. This process is illustrated in figure 6.

Figure 6 Multi-GPUs utilization.

To compare both performance (see Figure 7) and final voxels (see Figure 8), the experiment
start with a desktop PC equipped with an Intel
Core2 Quad Q9550 at 2.83 GHz, 8 GB RAM

with 2 Nvidia GeForce GTX280 graphics cards (1 GB memory, 30 multiprocessors, 240 stream
processors). For CPU gold standard comparison we use voxelization names Binvox [10] that
implemented by parity count algorithm and the 3D data from the Stanford 3D scanning repository
[11] is used as data set. Note that both the GPU and CPU implementation, the final voxels are
completely the same.

(a) (b)

Figure 7. (a) The Stanford Bunny, (b) Final Voxels (grid size = 200)

Figure 7 shows the execution time and the performance when the grid dimensions are in range
100 -1000, respectively. Using only a single CPU core, the gold standard version takes 5 seconds
on grid dimension = 100100100 and up to 3 minutes when grid dimension = 100010001000.
However, the execution time for GPU version using GTX-280 graphic cards is faster than the gold
standard version. For example, the single GPU takes only 10 seconds on grid dimension =
100010001000. And for the double GPUs version we can speed up to 1.8 times faster from the
single GPU. However, the speeds up results shown in figure 7b are reducing. For example: by
increasing the grid dimension to 100010001000, the performance on single GPU version is 18
the performance of gold standard and the GPUs version is 34 faster than the gold standard. This
is because the numbers of voxels per thread in parallel ray tracing process are increase and more
than one thread are write the data to the same voxel address concurrently is possible.
Create 2 CPU threads

Merge results
GPU1 GPU2
Thread2
Data Copy
Thread1
Data Copy
Result Copy Result Copy
CPU
CPU
GPU
10
100
1000
10000
100000
1000000

Binvox Proposed Method (Single GPU) Proposed Method (Double GPUs)
F00001
March 23-26, 2010
422

(a) (b)

Figure 8. (a) The Stanford Bunny, (b) Final Voxels (grid dimension = 200200200)

5. CONCLUSION
In this paper, we presented a multi-GPU voxelization algorithm for 3D data on the GPU using
the Nvidias CUDA programming model. By using dual nVidia GTX-280 GPUs we can reduce
the execution time for voxelizing the Stanford bunny with grid dimension = 100010001000 to 5
seconds and approximately 34 times faster than the gold standard. Since our proposed method is
lose the performance when the grid dimension increased, further research is required to use a fast
share memory for increasing the parallelization.

REFERENCES
1. Studholme, C. Hill, D. L. G., Hawkes, D.J., Overlap invariant entropy measure of 3D medical
image alignment, Pattern Recognition. 1999, Vol. 32, no. 1, pp. 71-86.
2. Marschallinger, R. A voxel visualization and analysis system based on AutoCAD, Computers
& Geosciences, 1996, Vol. 22, Issue 4, pp. 379-383.
3. Kuhnapfel, U. Cakmak, H. K. Maa[ss] H., Endoscopic surgery training using virtual reality
and deformable tissue simulation, Computers & Graphics, 2000, Vol. 24, Issue 5, pp. 671-
682.
4. Kaufman, A. and Shimony, E., '3D Scan-Conversion Algorithms for Voxel-Based Graphics',
Proc. ACM Workshop on Interactive 3D Graphics, 1986, pp. 45-76.
5. What is GPGPU :: GPGPU.org http://gpgpu.org/about/.
6. Nooruddin, F. S. and Turk, G., Simplification and Repair of Polygonal Models Using
Volumetric Techniques, IEEE Transactions on Visualization and Computer Graphics, 2003,
Vol 9, pp. 191-205.
7. NVIDIA, NVIDIA CUDA Compute Unified Device Architecture Programming Guide,
Version 2.3, 2008.
8. He, L. Hong, L. Kaufman, A.E. Varshney, A. and Wang, S. Voxel-Based Object
Simplification, IEEE Visualization 95 Proc.,1995, pp. 296-303.
9. Tobler and Maierhofer, A Mesh Data Structure for Rendering and Subdivision. 2006.
10. Binvox 3D Mesh Voxelizer http://www.cs.princeton.edu/~min/binvox/
11. The Stanford 3D Scanning Repository http://www-graphics.stanford.edu/data/3Dscanrep/

ACKNOWLEDGMENTS
This study is fully supported by NECTEC, National Electronics and Computer Technology
Center, grant #KM5201.
F00002

March 23-26, 2010

423
Simulation Study of Channel Engineering Design for Sub
Micrometer Buried Channel PMOS Devices

Anucha Ruangphanit
1
, Nopphon Phongphanchanthra
1
, Nipapan Klungien
1
, Rangsan
Muanglhua
2
, Surasak Niemcharoen
2
and Sanya Khunhao
3

1
Thai MicroElectronics Center(TMEC), National Electronics and Computer Technology Center 51/4
Moo 1, Wang takien District, Amphur Muang Chachoengsao 24000
2
Electronics Department, Faculty of Engineering, King Mongkuts Institute of Technology
Ladkrabang Lampratue District, Ladkrabang, Thailand
3
Department of Electrical Engineering, Faculty of Engineering, Sripatum University,
61 Phahonyothin Road, Jatujak, Bangkhane, Bangkok, 10900 Thailand.
E-mail: anucha.ruangphanit@nectec.or.th, Tel: +66-38-857-100--9 ext 513

ABSTRACT
This paper investigates the channel engineering design consideration of sub micrometer
buried channel PMOS devices by using the semiconductor process and the device
simulation software. Virtual wafer fabrication factory Synopsys TCAD Tools is used
for fabrication and simulation of a testing device namely Sentaurus process and
Sentaurus devices. The channel design consists of a scaled down of a physical
parameter such as gate oxide thickness(T
ox
), channel design gate length(L
g
), ion
implantation for the threshold voltage adjust(VTA) and the conventional N-well
formation with a various doping concentration done by N-well implantation dose with
an implantation energy of 140 keV. In this simulation study, we characterize the effect
of the channel modulation by engineering doping concentration profile and determine
how to enhance the performance of PMOS. The threshold voltage (V
TH
) in linear and
saturation region, sub threshold slope (S), off state drain leakage current(I
off
), saturation
drain current(I
Dsat
) and also the Drain Induced Barrier Lowering(DIBL) are
investigated. The simulated threshold voltage was -0.90 V, the simulated subthreshold
swing was 103 mV/dec and the saturation current was 264 A/m at a specitific
leakage current of 1010
-12
A/m. The simulation results shows that, it is possible to
fabricate buried channel PMOS with single threshold voltage adjust (VTA) and with n-
type poly silicon gate electrodes in the 0.5 m regime.
Keywords: MOSFET, DIBL, CMOS, Channel engineering.

1. INTRODUCTION
As MOSFET scaling continues, not only ultra-shallow junction but also channel profile
optimization are essential for device performance improvement and short channel effect
(SCE) control. The onset of short channel effects, such as drain-induced barrier lowering,
punchthrough, and shifts in threshold voltage (V
TH
), severely affect MOS device
performance. To control punchthrough, conventional MOSFETs must have progressively
higher doping in the channel region as the gate length is decreased. High channel doping
makes it difficult to control and is expected to result in reduced channel mobility [3]. In an
attempt to overcome these limitations, channel-engineered structures have been proposed
and fabricated for 0.8um CMOS twin well technology. These devices have channel lengths as
small as 0.5 um and utilize a low doped region at the oxide interface that varies to a high
doping level over depths of 50 nm in the channel. These theoretical studies were only based
on simulation. In this paper, we propose an experimental and extensive evaluation of the
channel profiles implanted in a 0.8 um buried channel PMOS devices. The investigation is
based upon the following criteria:
F00002

March 23-26, 2010

424
- Threshold voltage and sub threshold characteristics
- Drive currents
- Off state leakage currents
- Gate length and process sensitivity

The threshold voltage is the first significant parameter of MOSFET for circuit application.
The threshold voltage for large dimension device on uniform doping substrate is given by.

0 TH FB s s
V V = + u + u (1)
2
si sub
ox
qN
C
c
= (2)
2 ln
sub
s T
i
N
v
n
| |
u =
|
\ .
(3)

q
T k
vt
B
= (4)
The subthreshold current is the current flow between the drain and source occurs in a MOS
transistor when the gate voltage is below V
TH
. There are several factors which impact
threshold voltage and subthreshold current. They include drain-induced barrier lowing
(DIBL) effect, body effect, channel width, short channel effect and temperature effects.
Mathematically, subthreshold leakage current (BSIM) can be modeled as

exp 1 exp
( )
GS TH SB DS DS
sub o
T T
V V V V V
I I
n v v
n
| | | | | | +
=
| | |
|
\ . \ .\ .
(5)

2
( ) exp(1.8)
eff
o o ox T
eff
W
I C v
L
u = (6)

Where V
FB
is the flat band voltage, N
sub
is the substrate doping concentration,u
S
is the
surface potential, n
i
is the intrinsic carrier concentration,
0
is the zero bias mobility, C
ox
is
the gate oxide capacitance per unit area, W
eff
and L
eff
are transistor effective width and
effective channel length, v
T
is the thermal voltage , V
TH
is the threshold voltage, is the
body effect coefficient, V
SB
is the source body voltage bias, V
DS
is drain supply voltage, is
the DIBL coefficient, q is the electron charge, and n is the transistor subthreshold swing
coefficient (n =1+C
D
/C
ox
) . And off state leakage current (I
off
) is defined by the subthreshold
current at 0V of V
GS
and 5V of V
DS
. Then the term ( ) 1 exp( /
DS T
V v can be neglected. Then
the off state leakage current at zero substrate bias can be described as following.

exp
( )
TH DS
off o
T
V V
I I
n v
n | | +
=
|
\ .
(7)
It can be seen that a channel length decrease will not only directly increase I
off
, but also
decrease V
TH
, which further increase I
off
. The sub threshold slope ( ln10
T
S nv = ) of the
transistor is give the inverse of the slope of I
DS
versus V
GS
in millivolts per decade (mV/dec)
of change in I
DS
. The ideal of S at room temperature is 60 mV/dec. Then the off state leakage
current is in the function form:

10
TH DS
off o
V V
I I
S
n + | |
=
|
\ .
(9)
F00002

March 23-26, 2010

425
2. EXPERIMENTAL
The N-well was from by phosphorus implantation on p- type substrate 20 ohm-cm . The
N-well implant dose varied from 310
12
, 410
12
, 510
12
, 610
12
cm
-2
with a fixed energy of
140 keV. The simulated N-well doping profile with various implant doses was shown in
Fig.1. The junction of N-well is approximately 3.0 um and the surface concentration is in
order of 210
16
to 410
16
cm
-3
for the N-Well implant dose at above condition. A self align n+
polysilicon gate 350 nm of thickness was used with gate oxide 15 nm of thickness. To
investigated the effect of gate oxide thickness, the gate oxide thickness varied from 12 nm to
23 nm which done by fixed the oxidation temperature of 900
o
C and varied the oxidation time
at 30, 45, 60 and 90 minutes in dry oxidation ambient. A BF
2
+
ion implantation for threshold
voltage adjusts (VTA) in a channel was implemented in order to match the threshold voltage
of PMOS and PMOS device, as require in the modern CMOS technology process. The VTA
implantation dose varied from 710
11
, 810
11
, 910
11
, 110
12
cm
-2
with energy of 70 keV and
the implantation energy varied from 40, 50, 90 and 120 keV for determined the effect of the
p-type junction depth in the channel of PMOS. There have no anti punch through process for
PMOS since the N-well concentration is high enough. An oxide spacer technology (spacer
width of 200nm) with heavily doped source/drain extensions is used. The source/drain
extensions and deep source/drain junction depths are 200nm and 500nm respectively.
Simulations are performed for channel lengths of L
g
= 0.5, 0.6, 0.7, 0.8, 1.0,1.2 and 3.0 um.
Figure.2 show the simulated PMOS structure on N-Well with design gate length of 0.5
micron. All device simulations are performed using sentaurus devices 2-D. The models
activated in simulation include the carrier mobility degradations due to high doping
concentration, the velocity saturation within high-field regions and the mobility degradation
due to surface roughness scattering.

0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8
10
12
10
13
10
14
10
15
10
16
10
17
10
18
Implantation Dose
silicon
Depth (um)

C
o
n
c
e
n
t
r
a
t
i
o
n

(
c
m
-
3
)
6e16 cm
-2
5e16 cm
-2
4e16 cm
-2
3e16 cm
-2

Figure 1. N-Well concentration profile Figure 2. The simulated PMOS
versus depth in silicon substrate structure of L=0.5um

To carry out this channel engineering, PMOSFETs with physical design gate lengths down
to 0.5 um were characterized. Fig. 3 shows the threshold voltage (V
TH
) versus channel implant
dose with fixed energy of 70 keV with various design gate length (Lg). Fig. 4 shows the
threshold voltage (V
TH
) versus design gate length (Lg) with various channel implant dose. As
you seen that, The threshold voltage is linearly depended on the channel implant dose . The
threshold voltage is decrease as the channel implant dose is increased for all design gate
F00002

March 23-26, 2010

426
length. In this VTA implant energy, we get the peak concentration at the distance of 19, 20,
60 and 90 nm under the SiO
2
/Si interface with a surface concentration of 1.510
17
, 1.010
17
,
3.210
16
and 1.710
16
cm
-3
at the SiO
2
/Si interface of PMOS device respectively. Thus for
the PMOS threshold voltage, the VTA implant dose in a channel is an important parameter
compared with VTA implant energy. And we observe a better control of the roll off as the
design gate length is not less than 0.6 micron with all various channels doping concentration

0.0
2.0x10
11
4.0x10
11
6.0x10
11
8.0x10
11
1.0x10
12
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-1.2
-1.4
-1.6

N-Well Dose 6.0E12 cm
-2
L
g
(um)
No channel Doping
V
T
H

(
V
)
Channel Implant Dose (cm
-2
)
0.5
0.6
0.7
0.8
1.0
1.2
3.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5
-0.2
-0.4
-0.6
-0.8
-1.0
-1.2

N-Well Dose 6E12 cm
-2
Channel implant dose cm
-2
Lg (um)
V
T
H

(
V
)
7E11
8E11
9E11
1E12

Figure 3. V
TH
versus channel implant dose with Figure 4. V
TH
versus Lg with
Lg as a parameter channel doping as a parameter

2x10
12
3x10
12
4x10
12
5x10
12
6x10
12
7x10
12
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-1.2
-1.4
Channel implant dose (cm
-2
)
Lg=0.8 um
V
T
H

(
V
)
N-Well implant dose (cm
-2
)
0
7e11
8e11
9e11
1e12
10 12 14 16 18 20 22 24
0.0
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1.0 N-Well Dose= 6e12 cm
-2
Temp = 900
o
C
VTA Dose 9e11 cm
-2
VTA Dose 1e12 cm
-2
V
T
H

(
V
)
T
OX
(nm)

Figure 5. V
TH
versus well implant dose with Figure 6. V
TH
versus T
ox
with
channel implant dose as a parameter VTA implant dose as a parameter

F00002

March 23-26, 2010

427
0.5 1.0 1.5 2.0 2.5 3.0
-20
0
20
40
60
80
100
120
140
Channel implant dose (cm
-2
)

D
I
B
L

(
m
V
/
V
)
Lg (um)
7e11
8e11
9e11

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
40
60
80
100
120

-2
S
u
b
t
h
r
e
s
h
o
l
d

S
l
o
p
e

(
m
V
/
d
e
c
)
Lg (um)
8e11
9e11
1e12

Figure 7. DIBL versus Lg with various Figure 8. Subthreshold slope versus
channel implant dose Lg with various channel implant dose

0.5 1.0 1.5 2.0 2.5 3.0
10
-16
10
-15
10
-14
10
-13
10
-12
10
-11
10
-10
10
-9
10
-8
10
-7
N-Well Dose= 6E12 cm
-2

-2
I
o
f
f

(
A
/
u
m
)
Lg (um)
1e12
9e11
8e11
7e11

0 50 100 150 200 250 300 350 400
10
-17
10
-16
10
-15
10
-14
10
-13
10
-12
10
-11
10
-10
10
-9
10
-8
10
-7
10
-6
10
-5
Short Channel Effect(SCE)
Device
N-Well Dose=6E12 cm
-2
Lg =0.5 um
Lg =0.6 um
Lg =0.7 um
Lg =0.8 um
Lg =1 um
Lg =3 um
-2
I
on
(uA/um)

I
o
f
f

(
A
/
u
m
)
7e11
8e11
9e11
1e12

Figure 9. I
off
versus Lg with channel implant Figure 10. I
off
versus I
on
. with various
Dose as a parameter Lg and channel implant dose
at fixed T
ox
= 15nm ,V
DS
=5 V.

The DIBL and the subthreshold slope dependence versus gate length which are plotted on
Fig. 7 and Fig.8, confirm these behaviors. The DIBL is increased as the gate length scale
down and change rapidly as the gate length is below 0.6 m. Not only the DIBL depended on
gate length but also depended on the channel implant doping. In Fig.8, The subthreshold slope
(S) is depended on the channel length and the channel implant doping. The S is changed
rapidly as the channel is scale down in to short channel region. Fig.9 shows I
off
versus Lg
measured at 5.0V for the transistors with various values of channel implant doping . For Lg <
0.6um, the off state leakage current becomes significant part. Fig. 10 shows I
on
versus I
off

performances with various Lg and channel implant doping. The minimum mask gate length
should not be less than 0.6 um due to better roll-off control.

4. CONCLUSION
Channel profile engineering optimizations have been performed for 0.8 um CMOS
technology. The short channel effect control with relatively thick gate oxide (15 nm) is one of
the main issues of the gate length scaling. Therefore, we have investigated the p channel
MOSFETs performance with a various channel doping profiles on a manufacturing 0.8 um
gate length. A simulation reveal that, the channel engineering improved the threshold voltage
and the off leakage current but have not a dominant effect on DIBL and subthreshold slope. In
design process flow, the minimum design gate length should not be less than 0.6 um due to
F00002

March 23-26, 2010

428
V
TH
roll-off and I
off
-I
on
characteristics. In conclusion, the channel engineering architecture, the
expected performance of PMOS that closed to NMOS with design gate length of 0.8 um is
200 uA/um on current, threshold voltage is 0.77 V, leakage current is in a range of pA/um,
DIBL is in approximately less than 50 mV/V and sub threshold slope is in the order of 100
mV/dec. The design process parameter should be, 610
12
cm
-2
of N-well implant dose and
910
11
cm
-2
of threshold voltage adjust.

REFERENCES
1. S. Song et al., IEDM Tech, 1999, 427.
2. M. Fujiwa et al., VLSI Symp Tech, 1999,122.
3. G.Guegan et al, Proc, ESSDERC, 2001,171-174.
4. S. Song et al., 2000 Symposium on VLSI Technology Digest of technical Papers,2000,
190-191.
5. Alice Wang, Sub-threshold Design for Ultra Low-Power Systems, edition, Springer, New
York,
2006, 30-32.
6. Ruangphanit. A et al., Proc, ANSCSE12, 2008, 346-349.

F00006
March 23-26, 2010
429
Parallel Program Development for Tsunami Simulation with
the Message Passing Interface

S. Thawornrattanawanit
1, C
, K. Virochsiri
1
, V. Muangsin
1
, and A. Ruangrassamee
2

1
Department of Computer Engineering, Faculty of Engineering
2
Department of Civil Engineering, Faculty of Engineering
C
E-mail: sittikorn.t@student.chula.ac.th; Fax: 02-2186955; Tel. 02-2186956

ABSTRACT
Tsunami simulation is an important tool for forecasting, warning and risk assessment of
tsunami. The simulation program computes wave propagation, run-up and inundation at
the coastal areas after an underwater earthquake or a submarine landslide occurred. The
TUNAMI program is a tsunami simulation program developed at Tohoku University,
Japan. It is based on the finite difference method (FDM) implementation of the linear
shallow water wave theory in spherical coordinate system and the nonlinear shallow
water wave theory in Cartesian coordinate system with multi-scale regions. Being a
sequential and computing intensive program, it is not appropriate for real-time tsunami
warning. This paper presents the development of a parallel tsunami simulation program
based on TUNAMI using the Message Passing Interface (MPI). The aim is to forecast
tsunami wave propagation and inundation at near real-time for tsunami warning using a
cluster computer. The multi-scale nature of the problem and the time constraint require
an integration of several parallel programming techniques. The parallel program is
evaluated on the TERA Cluster at Thai National Grid Center and the TSUBAME
Cluster at GSIC Center of Tokyo Institute of Technology. The results suggest that
adaptive task partitioning is the key to load balancing and reducing communication
overheads and thus minimizing the computing time.

Keywords: Parallel program, Tsunami simulation, Message Passing Interface,
TUNAMI program.

1. INTRODUCTION
The Indian Ocean tsunami generated due to the magnitude 9.0 earthquake on 26
December 2004 at the west coast of northern Sumatra, Indonesia was the most havoc
tsunami in the history of Thailand. Because of the unexpected to the tsunami and the lack
of prevention policy for tsunami warning has caused severe damage to property and loss of
many lives. The simulation of the tsunami that calculated by the computer for forecasting,
warning and risk assessment of the tsunami is one way to reduce damage and losses from the
tsunami in the future. However, tsunami simulation for forecasting requires real-time
calculation to be able to announce the tsunami warning as soon as possible.
A tsunami simulation program developed at Tohoku University, Japan, TUNAMI (Shuto
et al., 1986; Imamura et al., 2006) [1], can be compute the wave propagation, run-up and
inundation at the coastal areas with multi-scale region. The seismic information includes
bathymetry (Region data) and fault parameters (Deformation data) are used to initialize the
first tsunami wave after an underwater earthquake or a submarine landslide occurred. The
calculation is based on the finite difference method (FDM) implementation of the linear
shallow water wave theory in spherical coordinate system and the nonlinear shallow water
wave theory in Cartesian coordinate system with run-up computation. The shallow water
F00006
March 23-26, 2010
430
R1
R2 R3
R4
R2
R3
R4
R3
R4
R4
equations are derived from equations of conservation of mass and momentum. Because of the
several domain sizes with multi-scale region, tsunami simulation requires large amounts of
computation time. Being a sequential and computing intensive program, it is not appropriate
for real-time tsunami warning.
This paper presents the development of a parallel tsunami simulation program based on
TUNAMI using the Message Passing Interface (MPI) [2]. The aim is to forecast tsunami
wave propagation and inundation at near real-time for tsunami warning using a cluster
computer. The multi-scale nature of the problem and the time constraint require an integration
of several parallel programming techniques. Parallel tsunami simulation program is developed
with the message passing interface (MPI) and the use of both the functional decomposition
and the domain decomposition technique. The parallel program is evaluated on the TERA
Cluster [3] and the TSUBAME Cluster [4]. In the next section, related work is provided first.
Then, the experiment are described in Section 3. Section 4 shows the performance results.
Finally, the conclusion and future works are given in Section 5.

TUNAMI (Tohoku Universitys Numerical Analysis Model for Investigation of
tsunamis) Program of Dr. Fumihiko Imamura (Prof. of Tsunami Engineering School of Civil
Engineering, Asian Inst. Tech. and Disaster Control Research Center, Tohoku University,
Japan) is tsunami simulation program which is developed by FORTRAN programming
language with a sequential programming model. Then, this program has been developed by
Dr.Anat Ruangrassamee (Prof. of Department of Civil Engineering, Chulalongkorn
University, Thailand) to simulate the Indian Ocean tsunami that happened in South East Asia
at the end of 2004. The simulation results are accurate and precise but take long computation
time depending on the domain size of each region. The multi-scale region of TUNAMI
program consists of 4 levels of resolution and there is exchange of data between each adjacent
boundaries of region to provide accurate calculations and continuity of data in each level of
resolution.

Figure 1. The multi-scale region for tsunami simulation

Each level of resolution is called region that use symbols R1, R2, R3 and R4 for each
region as in Figure 1. And R4 is called zone is computed the inundation (Run-up) that
shows damage of tsunami at the coastal areas. For each computation of TUNAMI program
can calculate the wave propagation and inundation only one zone.

F00006
March 23-26, 2010
431
Table 1. The coordinates of the interesting area of tsunami simulation
Region
levels

Region
No.
Resolution
The coordinate of the interesting area
No. of Grid
The coordinate of grid on
overlapped region Latitude Longitude
' "
From To From To From To
' " ' " ' " ' " X Y X Y X Y
R1 1 2 0 -10 0 0 18 0 0 87 0 0 110 0 0 690 840 - - - -
R2
21 0 15 5 59 45 9 30 0 95 59 45 99 0 0 721 841 271 481 360 585
22 0 15 9 31 45 13 2 0 95 59 45 99 0 0 721 841 271 587 360 691
23 0 15 4 45 45 8 16 0 99 1 45 102 2 0 721 841 362 444 451 548
R3
211 0 5 7 45 25 8 12 30 97 59 55 98 20 0 241 325 482 424 561 531
212 0 5 8 32 25 9 11 30 97 26 55 98 20 0 637 469 350 612 561 767
213 0 5 9 11 55 9 29 30 97 58 55 98 32 0 397 211 478 770 609 839
214 0 5 8 12 55 8 32 0 97 59 55 98 20 0 241 229 482 534 561 609
215 0 5 7 31 55 8 30 0 98 20 25 98 59 30 469 697 564 370 719 601
221 0 5 9 32 55 10 13 30 97 47 0 98 40 15 639 487 430 6 642 167
231 0 5 7 7 25 7 46 15 99 3 25 99 58 0 655 466 8 568 225 722
232 0 5 6 27 55 7 6 45 99 8 55 100 2 0 637 466 30 410 241 564
R4
2111 0 1.667 7 50 58 8 0 0 98 5 28 98 19 0 487 325 68 68 229 175
2112 0 1.667 8 0 18 8 12 0 98 5 28 98 19 0 487 421 68 180 229 319
2121 0 1.667 8 36 58 8 50 0 98 0 8 98 18 0 643 469 400 56 613 211
2122 0 1.667 8 50 18 9 11 0 98 0 8 98 19 0 679 745 400 216 625 463
2131 0 1.667 9 12 18 9 29 0 98 11 28 98 28 0 595 601 152 6 349 205
2141 0 1.667 8 13 18 8 31 0 98 8 28 98 19 0 379 637 104 6 229 217
2151 0 1.667 7 47 18 8 10 0 98 21 28 98 29 0 271 817 14 186 103 457
2152 0 1.667 8 10 18 8 24 0 98 26 28 98 47 0 739 493 74 462 319 625
2153 0 1.667 7 55 18 8 10 0 98 42 28 98 57 0 523 529 266 282 439 457
2211 0 1.667 10 0 18 10 13 0 98 20 28 98 40 0 703 457 403 330 636 481
2212 0 1.667 9 46 18 10 0 0 98 20 28 98 40 0 703 493 403 162 636 325
2213 0 1.667 9 34 18 9 46 0 98 20 28 98 40 0 703 421 403 18 636 157
2311 0 1.667 7 33 18 7 46 0 99 4 28 99 23 0 667 457 14 312 235 463
2312 0 1.667 7 20 18 7 33 0 99 10 28 99 28 0 631 457 86 156 295 307
2313 0 1.667 7 8 18 7 20 0 99 25 28 99 45 0 703 421 266 12 499 151
2321 0 1.667 6 53 18 7 6 0 99 25 28 99 45 0 703 457 200 306 433 457
2322 0 1.667 6 43 18 6 53 0 99 35 28 99 53 0 631 349 320 186 529 301
2323 0 1.667 6 29 18 6 43 0 99 43 28 100 1 0 631 493 416 18 625 181

The data used to simulate tsunami has collected data from the actually survey and used to
create data file for calculating by Dr.Anat Ruangrassamee and his student of Department of
Civil Engineering, Chulalongkorn University. For each region was determined by the number
(Region No.) as in Table 1. And its overlapped by a number of regions for example. Region
2111 is over Region 211, Region 21, and Region 1 respectively or Region 2211 is over
Region 221, Region 22, and Region 1 respectively as in Figure 2.
F00006
March 23-26, 2010
432
1
21
211
2111

Figure 2. The interesting area of tsunami simulation

The work flow of the simulation program when analyzing the behavior of program can be
divided into 5 sections as follows: Initial data section, computed data section, exchanged data
section, collected data section, and saved output section. The initial data section and the
collected data section is work only one time. The time-step of the computed data section for
R2, R3, and R4 is equal 36,000 times but the time-step of the computed data section for R1 is
equal 9,000 times as in Figure 3(a) and 3(b). In the exchange data section, the time-step is
equal 36,000 times. Finally, the time-step of the saved output section is equal 600 times.

Parallel tsunami simulation program is developed from the combination of C++ and
FORTRAN programming language with the message passing interface (MPI) and the use of
both the functional decomposition and the domain decomposition technique. The C++
programming is used to control the work flow of parallel program by using MPI library and
the FORTRAN programming is remain use to control subroutine for compute wave
propagation and inundation of tsunami simulation. The message passing interface is used to
exchange data of each region and synchronize time of each processor between the
computations of program as in Figure 4.

F00006
March 23-26, 2010
433

R1: MASS
R1: MOMENT
R1: INTERQT R1-R2: JNQ_S2C
R2: NLMASS R3: NLMASS R4: NLMASS
R2-R3: JNZ R3-R4: JNZ
R2: NLMMT R3: NLMMT R4: NLMMT
R2-R3: JNQ R3-R4: JNQ
Every 1 Time
Every 4 Times
Time Step = 1s
Time Step = 4s

R1: MASS
R1: MOMENT
R1: INERTQT R1-R2: JNQ_S2C
R2: NLMASS R3: NLMASS R4: NLMASS
R2-R3: JNZ R3-R4: JNZ
R2: NLMMT R3: NLMMT R4: NLMMT
R2-R3: JNQ R3-R4: JNQ
Time Step = 1s
Time Step = 4s
Data Transfer

Figure 3(a). The original flow chart for TUNAMI program
Figure 3(b). The flow chart of the parallel program implementation

Parallel tsunami simulation program is developed from the combination of C++ and
FORTRAN programming language with the message passing interface (MPI) and the use of
both the functional decomposition and the domain decomposition technique. The C++
programming is used to control the work flow of parallel program by using MPI library and
the FORTRAN programming is remain use to control subroutine for compute wave
propagation and inundation of tsunami simulation. The message passing interface is used to
exchange data of each region and synchronize time of each processor between the
computations of program as in Figure 4.

Figure 4. The exchanged data section

The function decomposition technique is used to divide all area by level of region into
each group of processors as in Figure 5(a). and the domain decomposition technique is used to
divide each region by the number of processors in each group with the row partitioning
technique as in Figure 5(b).

INTERQT 1
JNQ_S2C 1
JNZ 2 JNZ 3
NLMMT 3
JNQ 3 JNQ 2
NLMMT 2
R1 R2 R3 R4
3(b) 3(a)
F00006
March 23-26, 2010
434

Figure 5(a). Example of the partitioning by the functional decomposition technique
Figure 5(b). Example of the row partitioning by the domain decomposition technique

The experiment of parallel tsunami simulation program is evaluated on the TERA Cluster
(Intel EM64T Xeon 5050 3.00 GHz, Gigaband) and the TSUBAME Cluster (AMD x86_64
Opteron Dual Core 2.4 GHz, Infiniband) with 4 different overlapped zones (Region 2111,
2112, 2121, and 2211) and multiple the number of processor for each region (1x, 2x, 3x, and
4x). The rest of this experiment is organized as follows in Section 4.

4. RESULTS
The performance of parallel tsunami simulation results is show as in Table 2.

Table 2. Execution time comparison between sequential and parallel program

R1
R2
R3
R
4
5(b)
5(a)
F00006
March 23-26, 2010
435

5. CONCLUSION
In this paper that proposed the development of a parallel tsunami simulation program
based on TUNAMI using the Message Passing Interface (MPI). The aim is to forecast
tsunami wave propagation and inundation at near real-time for tsunami warning using a
cluster computer. The results suggest that adaptive task partitioning is the key to load
balancing and reducing communication overheads and thus minimizing the computing time.

REFERENCES
1. Imamura, F., Yalciner, A. C., and Ozyurt, G., TSUNAMI MODELING MANUAL, Disaster
Control Research Center (DCRC), Tohoku University, Japan, 2006, 71, available at
http://www.tsunami.civil.tohoku.ac.jp/hokusai3/J/projects/manual-ver-3.1.pdf, last check
29 November 2009.
2. Barney, B., MPI Performance Topics, Lawrence Livermore National Laboratory,
available at https://computing.llnl.gov/tutorials/mpi_performance/, last check 29
November 2009.
3. User Manual: Tera Cluster, Thai National Grid Center (TNGC), available at
http://tera.thaigrid.or.th/drupal/en/user_manual, last check 29 November 2009.
4. TSUBAME Users Guide, Global Scientific Information and Computing Center (GSIC),
available at http://www.gsic.titech.ac.jp/~ccwww/tebiki/tebiki-eng.pdf, last check 29
November 2009.

ACKNOWLEDGMENTS
Thank you for the high performance computing support by TERA Cluster at Thai National
Grid Center, Thailand and TSUBAME Cluster at GSIC Center of Tokyo Institute of
Technology, Japan.
F00007

March 23-26, 2010

436
Optimization of Geometry of LOCOS Isolation in Sub
micrometer CMOS by TCAD Tools

Nopphon Phongphanchantra
1
, Weera Pengchan
2,C
, Narin Atiwongsangthong
2
and
Somsak Ckeersirikul
2

1
2
C
E-mail: kpweera@kmitl.ac.th, Tel: 081-7000058

0BABSTRACT
This paper presents the optimization of geometry design of LOCOS technique (LoCal
Oxidation of Silicon) in sub micrometer CMOS process by TCAD tools. The LOCOS
technologies are formed using thermal oxidation. In order to accurately simulate birds
beak effect, we must use Stress-Dependent Oxidation (SDO) model. The optimization
of LOCOS condition design has been achieve by minimizing the lateral encroachment
also call the birds beak length under the active mask(L
bb
), the birds beak height(H
bb
)
and the ratio between the L
bb
and the field oxide thickness also the ratio between the
H
bb
and the field oxide thickness. Beside the optimization procedure condition, the
difference conditions for the ratio of nitride and pad oxide are determined.
Keywords: LOCOS, TCAD, SDO

1. 1BINTRODUCTION
LOCal Oxidation of Silicon (LOCOS) is the oxidation of selective areas on a wafer uses
SiO
2
to electrically isolate adjacent devices on the silicon surface. First step, deposited a thin(
100-200 nm) silicon nitride (Si
3
N
4
) layer on pad oxide. Next step, etched nitride will become
the active area. Then the field implantation or channel stop ion implantation is done. The
silicon nitride acts as an implantation mask over the active areas. Next the field oxide is
grown at around 1000 deg C for 2-4 hours, and becomes 0.6-1.0 m thick of field oxide(Fox).
The field-oxide growth actually consumes (45%) some of the silicon wafer, so the silicon is
recessed. This is a good thing since a smoother topology improves subsequent processing
steps. Due to the oxidant diffusion path under the nitride mask, lateral oxidation can take
place. As shown in figure 1, the lateral encroachment or called the birds beak, can be
characterized by a length (L
bb
) and a height (H
bb
) in LOCOS structures. After thermal
oxidation, removed nitride and any underlying barrier oxide to bare silicon. As the oxygen
diffuses through the grown oxide, it moves in all directions. Some of the oxygen moves down
into the silicon and other oxygen atoms move sideways. This means there slight lateral
growth of the oxide under the nitride mask. Since the oxide is thicker than any silicon
consumed, growth under the nitride mask serves to push up the nitride edges. This action is
referred to as the birds beak effect. This phenomenon is an undesirable by-product of
LOCOS type oxidation processes.
F00007

March 23-26, 2010

437

Figure 1. Geometrical parameters of a LOCOS isolation

2. PROCESS SIMULATION
Sentaurus Process [1] is an advanced 1D, 2D, and 3D process simulator suitable for
silicon and
semiconductor semiconductors. It features a modern software architecture and state-of-the-art
models to address current and future process technologies. Sentaurus Process simulates all
standard process simulation steps, diffusion, implantation, Monte Carlo (MC) implantation
(Taurus MC or Crystal-TRIM), oxidation, etching, deposition, and silicidation kinetic Monte
Carlo (KMC). Capabilities in 3D include meshing of 3D boundary files through the
MGOALS library, implantation through the Imp3D module from FhG Erlangen, mechanics
(stress and strain), diffusion, a limited capability for 3D oxidation, and an interface to
Sentaurus Structure Editor, which is the 3D geometry editing tool based on the ACIS solid
modeling library. Sentaurus Process can simulate the thermal oxidation of silicon. Due to the
conversion ratio from Si to SiO2 being greater than one, new volume is generated, which, in
turn, leads to the motion of materials and mechanical stress in the structure. The oxidation
process has three steps:
Diffusion of oxidants (H2O, O2) from the gasoxide interface through the
existing oxide to the siliconoxide interface.
Reaction of the oxidant with silicon to form new oxide1.
Motion of materials due to the volume expansion, which is caused by the
reaction between silicon and oxide.

Figure 2. physical geometry of LOCOS Figure 3. physical geometry of LOCOS
Oxidation of Silicon in micro scale Oxidation of Silicon in micro scale
(SiO2=10 nm, Si3N4=100 nm ) (SiO2=10 nm, Si3N4=250 nm )
Fox
F00007

March 23-26, 2010

438

Figure 4. physical geometry of LOCOS Figure 5. physical geometry of LOCOS
Oxidation of Silicon in micro scale Oxidation of Silicon in micro scale
(SiO2=20 nm, Si3N4=150 nm ) (SiO2=20 nm, Si3N4=250 nm )

In this paper, the LOCOS isolation samples have been processed by growing a padoxide
followed by a nitride layer deposition. the pad oxide thickness varied from 10,15 and 20 nm
which done by fixed the oxidation temperature of 900
o
C and varied the oxidation time at 20,
45 and 80 minutes in dry oxidation ambient. The nitride thickness was form by the CVD
process. The nitride thickness varied from 100,120,150, 200 and 250 nm .After etching
nitride, n-field implantation was done by boron implant through pad oxide, after that a wet
field oxidation has been performed a temperatures of 1000
o
C, 210 min in O
2
/H
2
ambient to
get a field oxide thickness of approximately of 620 nm. Oxidation was simulated using the
stress-dependent oxidation parameters extracted. Figure.2 show the simulated LOCOS
structure with design active width of 1.0 micron. All device simulations are performed using
sentaurus process 2-D. The models activated in simulation include the stress-dependent
oxidation parameters.

2B3. RESULTS AND DISCUSSION
To carry out the optimization of geometry of LOCOS. The LOCOS with physical design
active width was vary from 1.0 m to 2.4 m also the variation of pad oxide thickness and
nitride thickness were characterized. Fig. 6 shows birds beak height versus nitride
thickness. The field oxide was grown at 1000 C, 210 min at O
2
/H
2
ambient to a
thickness of approximately 610 nm. As you seen that, the birds beak height is decrease
as the nitride thickness is increase. Fig.7 shows the birds beak length versus nitride
thickness. The field oxide was grown at 1000 C, 210 min at O
2
/H
2
ambient to a
thickness of approximately 610 nm. As you seen that, The birds beak length not only
depended on the nitride thickness but also depended on the pad oxide thickness.

F00007

March 23-26, 2010

439
0 50 100 150 200 250
0
50
100
150
200
250
H
b
b

(
n
m
)
Nitride Thickness (nm)

Figure 6. birds beak high dependence on nitride thickness. The field oxide was grown at
1000 C, 210 min at O
2
/H
2
ambient to a thickness of approximately 610 nm.

100 150 200 250 300
200
300
400
500
600
700
Pad oxide 20 nm
Pad oxide 15 nm
Pad oxide 10 nm
L
b
b

(
n
m
)
Nitride Thickness (nm)

Figure 7. birds beak length dependence on nitride thickness. The field oxide was grown at
1000 C, 210 min at O
2
/H
2
ambient to a thickness of approximately 610 nm.

3B4. CONCLUSION
The physical geometry of LOCOS isolation is depends on the pad oxide thickness and the
nitride thickness. The birds beak height is decrease as the nitride thickness is increase. The
birds beak length not only depended on the nitride thickness but also depended on the pad
F00007

March 23-26, 2010

440
oxide thickness. The birds beak length is decreased as the nitride thickness is increased.
For 15 nm of pad oxide, 150 nm of nitride, the physical birds beak length is approximately
0.55 m per side. The birds beak length is the significant parameter for limiting the
isolation technology in VLSI.

ACKNOWLEDGMENTS
The authors would like to thank Mr. anucha ruangphanit for consultancy. And we are great
full for the Prcess simulation software of the LOCOS structures provided by TMEC.

REFERENCES
1. Tcad Sentaurus manual., 2007.
2. D.A. Antoniadis, R.W. Dutton., Models for Computer Simulation of Complete IC
Fabrication Process., IEEE J. Solid State Circuits SC-14, No.2, 412-422(1979).
3. D. Chin, S.Y. Oh, S.M. Hu, R.W. Dutton, J.L. Moll., Two-Dimensional Oxidation
Modeling., IEEE Trans. Electron Devices ED-30, No.7, 744-749(1983).
4. B.E. Deal, A.S. Grove., General Relationship for the Thermal Oxidation of Silicon., J.
Appl. Phys. 36, 3770-3778(1965).
5. S.M. Hu., New Oxide Growth Law and the Thermal Oxidation of Silicon., Appl. Phys.
Lett. 42, No.10, 872-874(1983).

F00009
March 23-26, 2010
441
Solving Nanocomputing Problem via Misic Inspired
Harmony Search Algorithm

K.Sujaree
1C
, and S. Wacharanad
2

1,2
Graduated school, Nanoscience and technology program, Chulalongkorn University, Phayathai
road, Prathumwan ,Bangkok , 10330 ,Thailand
C
E-mail: kanon.su@gmail.com; Tel. 086-9322033

ABSTRACT
Designing conducting polymers is a part of nanocomputing problem that represents a
complex and a large number of optimization problems. The conducting polymer can
exhibit electronic properties by varying concentration and doping agent. The harmony
search algorithm is a new softcomputing method or metaheuristic that similar to
Genetic algorithm and Ant colony optimization. This algorithm is inspired by the
observation which the aim of music is to search for a perfect state of harmony.
Harmony in music is analogous to find the optimizality in an optimization process. In
this present, we apply harmony search algorithm to design conducting polymers. This
algorithm can help to to find optimal condition of this conducting polymer.

Keywords: Designing conducting polymers, Harmony search algorithm, Genetic
algorithm, Ant colony optimization, Softcomputing, Metaheuristic

REFERENCES
1. Sahni, V., Goswami, D.,Nanocomputing, 1, Tata McGraw, Agra, 2009, 13-15.
2. Yang, X. S., Nature-Inspired Metaheuristic Algorithms, 1, Luniver Press, Cambridge,
2009, 71-78.
3. Martins, B. V. C., Brunetto, G., and Sato, F., Unicamp., 2008.
F00010
March 23-26, 2010
442
Effective Workload Management Strategies for a Cloud of
Virtual Machine

Nopparat Noppakuat
1,C
, Jatesadakarn Seangrat
1
, and Putchong Uthayopas
1
1
High Performance Computing and Networking Center, Faculty of Engineering
Kasetsart University, Bangkok, Thailand.
E-mail: nopparat.no@gmail.com; Tel. 086-534-8024

ABSTRACT
Cloud computing is an emerging computing infrastructure concept that emphasis on the
creation of a massive pool of computing system that is scalable. This infrastructure is
then used to provide services to users over the Internet. One of the important class of
cloud system is the IaaS (Infrastructure as a Service) Cloud which provide users with
a virtual machine (VM) that user can use to host their application. For an efficient
operation of such an infrastructure, there is a need to manage the placement of virtual
machines on the cloud so that each virtual machine can effectively utilized the cloud
hardware infrastructure to the maximum benefit.
This paper proposes an approach of modeling the cloud VM assignment as an
optimization problem. Then, several heuristics algorithms will be proposed for the
management of the VM cloud. A comparison study will be conducted using a
simulation to reveal the effectiveness of each proposed algorithm. The contribution of
this work is a better understanding of cloud management algorithm which will be
useful for the building of a massive cloud management system.
Keywords: Cloud Computing, Virtualization, Virtual Machine, Placement.

1. INTRODUCTION
Virtualization technology provides an abstraction of massive computation resources which
is shared among users in a multi-tenants based environment. Cloud computing system
normally composed of many virtualization servers that are connected with high speed
network. Some software usually installed to enable the system to combine these computing
resources together to makes a large pool of resource abstraction. IaaS Cloud provides a virtual
infrastructure by creating the virtual machines (VM) for users which they can use them as if
they own the real physical machines on the network. VM can be dynamically moved and
hosted on any servers according to the cloud management policy.
As the number of VM hosted by the cloud server increases, the workload congestion may
arise which can dramatically decrease the performance of the cloud server. To avoid server
congestion, there need to be a mechanism to manage and distribute VMs to the servers
efficiently. The cloud controller must attempt to relocate VMs among cloud servers to
increase the system performance. Many issues may arise such as: Which VMs will be
relocated? Which servers the VM will be placed to? Is the target server holds the sufficient
resources for the new VMs? Since the Cloud servers may located on different network,
moving VMs across low bandwidth network may be inefficient.
In this paper, a model to find a good VMs placement for a cloud of virtual machines model
that distribute VM workloads evenly with lowest relocation cost by the following algorithms:
(1) Maxload Round Robin Placing (MRR), (2) Greedy Maxload Placing (GM), and (3)
Relocate Maxload (RM). We evaluated these algorithms by comparing time used in
computation and difference from optimized cost. The result from GM and RM is nearly
optimal in this model and running time is dramatically decreased.

F00010
March 23-26, 2010
443
The VM placement optimization problem is an instant of the combinatorial optimization
problem. The number of placement combinations increase non-polynomial along with the
number of servers and VMs. The real optimal solution can very computationally intensive or
impossible to compute within a finite amount of time. Many heuristic algorithms have been
applied to solve this problem. Resource provisioning in cloud computing and virtualization
has been research for some times [1][2][3][4]. In [5], GreedyMax algorithm and its
generations are proposed to solve the placement problem on lower the power cost. However
the heterogeneity property of the network has not yet been considered which may result in a
long transport time for VMs relocation. In this paper, the Cloud model is improved by
adding cost function to make decision whether to move the VMs.

3. SYSTEM MODEL
I. MODEL COMPONENTS
The cloud of virtual machines model consists of a set of virtual machine server that host
the VMs. These set of machine is denoted by { }
S 2
S , , S , S = S .
1
. For virtual machine set is
a set of set of virtual machines (VM) in the system and is denoted by { }
V 2
V , , V , V = V .
1
.
Each VM has the workload W as they serve the users. This set is denoted by
{ }
W 2
w , , w , w = W .
1
.
We assume that the virtual machine servers are connecting in the heterogeneous network
such as located in different LAN or different ISP. We then measure the interconnect
bandwidth between each servers and store them in the bandwidth matrix
{ } S j i, b = B
j i
e :
,

Lets
j i,
p denote the placement of virtual machine
i
V on the server
j
S . The configuration
of the VM placement in the system can be denote by { }
s , V s 2, s
p , , p , p = P .
1,
for S s e . We
can see that the size of placement combination space is
V
S = P and will grow
exponentially when we increase the number of VMs.

II. SYSTEM CONSTRAINTS
The computing resources on the virtual machine server are shared among the VM hosted
on it. Since the resource on the servers is limited, it can host only a number of VMs that will
not use the resources more than the server can provide. In this model, we will consider the
computing resources such as CPU utilization, allocated memory, and hard disk capacity.
We denote the CPU provided by the servers by
)
`
S
S
2
S
1
S S
cpu , , cpu , cpu = CPU ... ,
which is the amount of processor cores on the each server. The VM also require the
processors
)
`
V
V
2
V
1
V V
cpu , , cpu , cpu = CPU ... . The amount of processors required by all
VMs on that server must be less than or equal to the CPU the host provided.
( )
S fors cpu cpu
s
P p
v s, p v
v
e s
e
e

The memory and hard disk provisioning are also like processor allocation case; we denote
the memory and hard disk space provision of the server by
)
`
S
S
2
S
1
S S
mem , , mem , mem = MEM ... and
)
`
S
S
2
S
1
S S
hdd , , hdd , hdd = HDD ...
consequently. The required memory and hard disk space for VMs are
F00010
March 23-26, 2010
444
)
`
V
V
2
V
1
V V
mem , , mem , mem = MEM ... and
)
`
V
V
2
V
1
V V
hdd , , hdd , hdd = HDD ... . The
aggregate amount of resources required by VMs must not exceed the resources their server
provided.
( )
S fors mem mem
s
P p
v s, p v
v
e s
e
e

( )
S fors hdd hdd
s
P p
v s, p v
v
e s
e
e

III. COSTING MODEL
In this model, we need to distribute the workload of the system by moving the VM to other
servers to decrease the unbalance of workload on physical machines. To consider which VM
to be moved to which severs, we use the cost function that represent the efficient of the
placement. The cost function consists of 2 functions, load balancing cost and relocation cost.
For load balancing cost, we use load average of the VMs ( )
avg
W as the baseline to measure
the efficiency of the load balancing. We consider the summation of the differences between
load average of the system and amount of loads on each server. The less total distance from
the average workload is, the more efficient load is distributed. We use average workload to
normalize the load balancing cost.
V
W
= W
V
= i
i
avg
1

( )
( )
avg
S s
P p
v s, p v
v avg
load
W
W W
= P Cost

e
e
e
|
|
.
|
\
|

The effort of moving VM from one server to other server is another crucial aspect. Moving
a VMs cross networks may costs very long time to finish the relocation. The cost of relocation
VM can be measured by the total time used in moving each VM to their destination
placements, which can be calculate by the size of the VM image divided by the bandwidth
between host server and destination.
We normalize the relocation time with the average time of moving all VMs on average
bandwidth
T
avg
.
B
b
hdd
= T
B b
j i,
V v
v
avg
e
e

( )
( ) ( )
avg
P p P, p
v j, p : v , v i, p v j i,
v
relocate
T
b
hdd
= P , P Cost
e e
:
0

IV. PROBLEM FORMAULATION
In this simulation, we evaluate some algorithms that find the optimal placements that
provide the minimum cost as we described earlier. The formulation of the problem is :

F00010
March 23-26, 2010
445
VM PLACEMENT PROBLEM:
Find the placement P that minimize :
( ) ( ) P P Cost + P Cost
relocate load 0,

Subject to :

( )
S fors cpu cpu
s
P p
v s, p v
v
e s
e
:

( )
S fors mem mem
s
P p
v s, p v
v
e s
e
:

( )
S fors hdd hdd
s
P p
v s, p v
v
e s
e
:

We use brute force algorithm to calculate the baseline cost of the model. The algorithm
creates every possible combination placements on the model, it always find the optimal
placements with minimize cost. However, when the problem size grow up, the combinations
also grow exponentially and the calculation takes very long time to complete.
To reduce the time on calculation and get the placement more efficiently, we proposed
some heuristics algorithms and evaluate the effectiveness on this model.
1) Max Round Robin Placing Algorithm (MRR) : The aim of MRR is to distribute the
workload evenly on all servers by sorting VM workloads and place them the on servers in
round robin manner. Begin with the empty host servers, we pick the highest workload VM on
the server and repeat the process until all VMs are placed on the server with respect to the
constraints.
2) Greedy Max Placing Algorithm (GM) : This algorithm is similar to MRR by
starting with empty servers, we sort the VM workloads and pick the highest workload VM to
place on the server, but instead of put it in round robin manner, we calculate the cost function
on each target server and place the VM on the server that make the lowest cost at that time,
repeat until all VMs are placed.
3) Relocate Max Algorithm (RM) : The algorithms is quite different than the two
earlier. Start from the original placement configuration, we pick the highest workload VM
from the highest workload server and relocate it to the least workload server and then
recalculate the cost function. Repeat the process until we cannot move the VM that decrease
the cost function.
We evaluate the algorithms by comparing the cost function to the optimal value from
baseline method, and their scalability of by measuring the time used in finding placements
according the increasing of problem size. We set up the simulation by creating a set of servers
connected in the network and a set of VMs with various properties, then place them in
random placements respect to the constraints and run the simulation with each algorithm.

We setup the simulation with 4 servers running 8 VMs and run the experiment 10 times
with various set of configurations. We calculated the cost from each algorithm compared to
the baseline from bruteforce method. The results are shown in figure 1 and 2. To evaluate the
accuracy of the algorithms, we find the average value of standard deviation between the cost
from algorithm and the baseline value in each experiment. The results in Table 1 show that
MRR does not provide the precise optimal cost, while GM and RM is losing their precision
on increasing the problem size.

F00010
March 23-26, 2010
446

Figure 1. Placement cost (8VMs on 4 servers) Figure 2. Placement cost (12 VMs on 4
servers)

Table 1. Average standard deviation by algorithm.
VM = 8 VM=10 VM=12
MRR 6.92 13.17 9.62
GM 0.18 0.51 1.7
RM 0.48 0.84 1.5

To valuate algorithm scalability, we compare the time used in find the placements of each
algorithm. We increase the number of placement combinations in the simulation. Note that we
ignore the bruteforce algorithm because it takes too much time comparing with the other
algorithms. The results are as shown in Figure 3a and 3b. GM suffered from increasing
combination space but still gain minimal cost comparing with the other two. MRR and RM
increase their running time very little but they are not provide good cost function when the
combination space increased.

Figure 3a. Algorithm execution time Figure 3b. Placement Cost

F00010
March 23-26, 2010
447
5. CONCLUSION
We presented a computational model for cloud of virtual machines which provide us to
evaluate the mechanisms, such as MRR, GM and RM to cope with VM placement problem on
workload distribution. The model also enhance the awareness of the effect of VM relocation.
The results of evaluating the algorithms can be useful in implementing VM migration
scheduling on Cloud infrastructure.

REFERENCES
1. A. Karve, T. Kimbrel, G. Pacifici, M. Spretizer, M. Steinder, M. Sviridenko, and A.
Tantawi, Dynamic Placement for Cluster Web Application, in Proc. the 15th international
conference on World Wide Web (WWW'06), 2006.
2. Ko, B. and Rubenstein, D., Distributed self-stabilizing placement of replicated resources
in emerging networks, IEEE/ACM Trans. Netw., 2005, 13(3), 476-487.
3. Abawajy, J.H., Placement of File Replicas in Data Grid Environments, In: International
Conference on Computational Science. 2004, 6673.
4. Wang, C., Hsu, C., Liu, P., Chen, H., and Wu, J., Optimizing server placement in
hierarchical grid environments, Supercomputing J, 2007, 42(3), 267-282.
5.Cardosa M., Korupolu M. R., and Singh A., Shares and Utilities based Power
Consolidation in Virtualized Server Environments, in Proceedings of IFIP/IEEE Integrated
Network Management (IM), 2009.

F00011
March 23-26, 2010
448
Two-Level Scheduling Technique for Mixed Best-Effort and
QoS Job Arrays on Cluster Systems

Ekasit Kijsipongse, Suriya U-ruekolan, Sornthep Vannarat
Large Scale Simulation Research Laboratory
National Electronics and Computer Technology Center

ABSTRACT
We consider the problem of dynamic scheduling of mixed best-effort and QoS job
arrays on cluster computers. Each job array consists of many independent jobs that
arrive into the cluster over time. The information of job arrays, i.e. the number of jobs
and the processing time of each job, are not known to the scheduler until they arrive. A
user submits a job array in either the best-effort or QoS type. In case of the best-effort
type, the scheduler accepts the job array and allocates approximately equal amount of
computing resources to all users under fair sharing policy. For a QoS job array, which
all jobs in the job array must be finished by the deadline, the scheduler performs the
online admission control to immediately notify the user if the newly QoS job array is
accepted or rejected. The scheduler must reorganize the existing schedule accordingly
to ensure that the fairness of the best-effort and the deadlines of all accepted QoS job
arrays are always guaranteed. We propose the online two-level scheduling technique to
schedule best-effort and QoS job arrays on shared cluster computing resources to meet
diverse requirements from users.

Keywords: Job Scheduling, Cluster Computing, Admission Control.

1. INTRODUCTION
The continuous expansion of user groups in High Performance Computing (HPC) has
given a significant impact on the design of the job scheduling and management module, an
essential part of an HPC system. The job scheduler must be able to response to the diversity
of the user groups and their applications in finding an efficient resource allocation so that
users can gain the most profit out of the HPC resources while being fair to other users. At
present, there exists another type of applications called the parameter sweep applications
which the applications are composed of a number of independent sequential jobs. Each job
performs the same task on different parameters or subsets of data. Since the jobs are
independent, there is no communication or dependencies among them, and thus makes this
type of applications suitable to be executed on a loosely-coupled distributed HPC such as
cluster computing system. These applications can often be found in data mining, simulation,
computation biology or animation rendering; each of which may consist of thousands of
independent jobs. In the context of job scheduler, relevant jobs in the parameter sweep
applications are submitted in group as a job array. Therefore, the new job scheduler should
support these parameter sweep applications in addition to the sequential and parallel jobs
found in typical HPCs.
The diversity of the user groups imposes a problem to the job scheduling and
management on HPC as well since their requirements for the HPC are often different. In
general, two main user groups can be identified at any HPC centers. The first group consists
of users who need their jobs to be executed with QoS guarantee, a primary requirement from
industrial users. In most cases, a QoS job is submitted with a given deadline. The scheduler
must find ways to allocate HPC resources to complete the QoS jobs by the deadlines and this
must be transparent to the users. The immediate notification is also another required feature of
a job scheduler. The faster the users know if the job will or will not be completed by its
F00011
March 23-26, 2010
449
deadline, the better they can make further decision. The second group consists of users who
use HPC for academic or testing purpose. They are provided with the best-effort policy such
that their jobs will be executed if the computing resources are not occupied by the QoS jobs.
The scheduler should prevent a particular best-effort user from dominating the HPC resources
and it should allocate resources equally to all of them in this group. As there exist both QoS
and best-effort jobs waiting for resources, the job scheduler must select an appropriate policy
to allocate the computing resources to satisfy the requirements of all users.
This paper presents the design and implementation of an two-level scheduler with
admission control to support the resource allocation for aforementioned QoS and best-effort
job arrays. It is a dynamic scheduler that works in the online mode which the arrival of jobs is
not known in advance. The rest of the paper is organized as follows. Section 2 gives an
overview of relevant work in this area. Section 3 presents the design and implementation of
the two-level scheduler with admission control. Section 4 draws conclusions and discusses the
future work. In this paper, the terms job and job array will be used interchangeably unless
they needs to be specific.

2. RELATED WORKS
Online deadline scheduling on a single machine has been introduced in [1] where they
emphasized on the immediate notification ability of job admission control. Similarly,
scheduling jobs with equal processing time on multiple identical machines is discussed in [2,
3, 4]. Kim and Chwa [5] considered the online deadline scheduling on multiple resources but
without admission control, which makes the job able to be rejected anytime before its
deadline.
The study of job array scheduling also appears in the context of scheduling Bag-of-Task
(BoT) applications [6, 7, 8]. However, they dedicated to the BoT scheduling to meet a single
objective such as minimizing makespan or minimizing the maximum completion time rather
than to meet distinct requirements for different user groups as in our work. Anderson et al. [9]
studied the offline scheduling for job arrays in animation rendering to maximize the utility
value and meet their deadlines. The problem of job scheduling with mixed workload
containing a steady incoming stream of best-effort and real-time jobs on clusters is presented
in [10].
Libra [11] is a job scheduler on cluster systems that can provide immediate notification to
users but it requires that at least a node is available to execute the job immediately. SGE [12],
Condor [13], and Maui [14] are open source schedulers that support running job arrays on
clusters or network of work stations; yet the admission control with immediate notification
has not been implemented. The commercial Moab scheduler [15] provides a limited support
for deadline scheduling

3. DESIGN AND IMPLEMENTATION OF THE TWO-LEVEL
SCHEDULER
Cluster computing systems typically consist of one frontend node and a number of
compute nodes. Users submit jobs into the frontend node where a job scheduler interacts with
the cluster resource manager to distribute jobs to compute nodes for execution. We use
Torque [16] for the resource manager to provide the primitive job manipulation commands
such as start, hold, cancel, and monitor jobs. Torque is an open source variant of PBS batch
system and it has been widely used in many HPC centers over years. The decision on which
job is started on which compute node is made by an external scheduler (Torque is bundled
with a built-in but simple scheduler pbs_sched). Torque consists of two main components,
pbs_server and pbs_mom, as illustrated in Figure 1. The pbs_server is a central server
running on the frontend node to receive new jobs into job queues. It communicates with
pbs_mom to collect the status of all compute nodes and jobs. The pbs_mom runs on each
compute node to monitor status of the compute node and reports it to the pbs_server. When
the pbs_server needs job scheduling, it sends a trigger message to an external scheduler.
Trigger messages are sent according to several events such as new jobs arrive, executing jobs
F00011
March 23-26, 2010
450
terminate, or scheduling time is reached. After receiving a trigger message, the external
scheduler module interacts with the pbs_server to perform job scheduling.
We implement our two-level scheduler on PluS scheduler [17], a Java-based scheduler
that can communicate with Torque. The two-level scheduler acts as a matchmaker between
jobs and computing resources. When computing resources become available, the scheduler
selects the jobs to be executed in the resources to meet our objective: the QoS jobs are always
guaranteed to be completed within the deadlines while the best-effort jobs equally share the
computing resources. To decide if the QoS jobs are accepted or rejected, we rely on the
admission control module. For a QoS Job, the modified qsub command takes additional two
arguments,
-l deadline=[[[[CC]YY]MM]DD]hhmm[.ss]
and
-l walltime=hh:mm:ss
for the deadline and the processing time of the submitted job, respectively. The qsub
command first contacts the admission control for the immediate notification of acceptance.
The job is rejected if resources are not enough for the job and all previously accepted QoS
jobs to finish by their deadlines; otherwise the job is accepted and waits in the job queue for
scheduling later. For a best-effort job, the qsub command works as usual, i.e. the job is put
into the job queue. We assume in this work that the system is a homogeneous cluster, or in
other words, all compute nodes are identical.

Frontend
Node
Compute
Node
PBS
mom
PBS
mom
PBS
mom
PBS
mom
PBS
server
Two-Level
Scheduler
Admission
Control
qsub
Compute
Node
Compute
Node
Compute
Node
User

Figure 1. Two-level scheduler with Torque resource manager.

3.1 ONLINE ADMISSION CONTROL
The online admission control needs to ensure that the new QoS job can be completed by
its deadline without violating the deadline of previously accepted jobs. The feasibility testing
algorithm to accept a new job is given in Listing 1. The available time f(m) of the machine m
is the earliest time that the machine is free to be used. When a job is dispatched to execute on
a machine, its available time is updated accordingly. The p(j) and d(j) are the processing time
and the deadline of job j given by the user, respectively. To test an incoming job arrays, every
job in it and all previously job arrays has to be checked if it can be completed by the deadline.
We use the Earliest Deadline First (EDF) algorithm to select the job to be tested. If all QoS
jobs can pass the test, the incoming job array is then admitted into the job queue of Torque.

F00011
March 23-26, 2010
451
Listing 1. Feasible testing algorithm.
Algorithm IsFeasible
Input : The incoming job arrays
Output : True or False

let E be the list of previously accepted and the incoming QoS job arrays sorted by their deadlines
for each job array A E do
for each job j job array A do
let m be the machine with the minimum available time f(m)
if f(m) + p(j) > d(j) then
return false
else
f(m) = f(m) + p(j)
end if
end for
end for
return true

3.2 FIRST LEVEL: DYNAMIC QoS SCHEDULING
The jobs in the job queue are mixed with QoS and best-effort jobs. As QoS jobs takes
higher priority than best-effort jobs, all cluster resources are always allocated to the QoS jobs
first if possible. Thus, when there is an available compute node, we apply the EDF algorithm
to select the next job to be dispatched so that the job with the earliest deadline always starts
first. Though using the EDF algorithm can meet our requirement, it is a research challenge to
see if other algorithms can be more efficient. If there is no QoS job existed in the job queue,
fairshare scheduling is invoked for the best-effort jobs as illustrated in Figure 2.

QoS
Scheduler
Fairshare
Scheduler
QoS Jobs
Has QoS Jobs
Best Effort Jobs
N
Y
Next Job Job Queue

Figure 2. QoS and fairshare schedulers

3.3 SECOND LEVEL: DYNAMIC FAIRSHARE SCHEDULING
Fairshare scheduling allows users to use cluster resources equally. If user A has done
twice as much work as user B over a period of time, then user B should thereafter be able to
do twice as much work as user A. The fairshare scheduler records the CPU usage of each user
for his best-effort jobs executed so far and selects the next job belonging to the user who has
the smallest CPU usage. The time in fairshare scheduling is discretized into a number of time
frames. At the end of each time frame, the CPU usage of each individual job in a job array is
calculated from the resources_used.cput field of Torques job status which represents the
CPU time used by this job since started. To limit the impact of usage data which are too old,
the CPU usages are exponentially decayed by time so that higher weight is put on more recent
data when calculating the accumulative CPU usages over multiple time frames. The overall
algorithm of the fairshare scheduling is shown in Listing 2. The CPU usage history is denoted
by cpuUsage, the set of running jobs at the previous time frame by R
t-1
, and the set of running
jobs at the current time frame by R
t
. The used CPU times of each job are stored in cput
t
and
cput
t-1
for the current and previous time frame, respectively. The parameter FSDECAY is the
exponential decay factor, and FSINTERVAL is the duration of each time frame. The values
F00011
March 23-26, 2010
452
of these parameters can be set to suit each specific need. For more sophisticated fairshare
scheduling such as group or project based fairshare, readers may refer to [18]

Listing 2. Fairshare scheduling algorithm.
Algorithm FairshareScheduling
Input : A set of best-effort job arrays J
Output : The selected job k

for each user u do
cpuUsage{u} = FSDECAY * cpuUsage{u}
end for

for each running job j Rt-1 and j is a best-effort job do
if j Rt continue
let u be the owner of job j
cpuUsage{u} = cpuUsage{u} + ( cputt(j) - cputt-1(j) )
end for

let A be the best-effort job array ( J) of the user with the smallest cpuUsage
let k be an arbitrary job in job array A
return k

4. CONCLUSIONS
We have proposed the two-level scheduling with admission control to support resource
allocation in cluster computing systems for mixed QoS and best-effort job arrays. The
implementation is based on the combination between two well-known job scheduling
algorithms. Job admission control and the dynamic scheduling of QoS job arrays are
accomplished by the EDF algorithm to ensure the completion time by their deadlines. The
fairshare algorithm is used in scheduling of best-effort job arrays so that the computing
resources are equally occupied by all users. The proposed scheduler can enrich the HPC
systems to support more diverse user requirements in this complex world. To the users, the
system is more responsive and versatile. To the resource owners, the new scheduler allows
the resources to be used in more efficient manner. Furthermore, the scheduler can interface
with many resource managers such as Torque, SGE or condor. In the future work, we plan to
extend our scheduler for cost-based admission control and add a global resource sharing
policy between the entire QoS and best-effort user groups.

REFERENCES
1. M. H. Goldwasser and B. Kerbikov, Admission control with immediate notification, J. of
Scheduling, vol. 6, no. 3, pp. 269-285, 2003.
2. J. Ding and G. Zhang, Online scheduling with hard deadlines on parallel machines, in
Algorithmic Aspects in Information and Management, Second International Conference,
Proceedings, Lecture Notes in Computer Science, S.-W. Cheng and C. K. Poon, Eds., vol.
4041. Springer, pp. 32-42, 2006.
3. J. Ding, T. Ebenlendr, J. Sgall, and G. Zhang, Online scheduling of equal-length jobs on
parallel machines, in Algorithms - ESA 2007, 15th Annual European Symposium,
Proceedings, Lecture Notes in Computer Science, vol. 4698. Springer, pp. 427-438, 2007.
4. M. H. Goldwasser and M. Pedigo, Online nonpreemptive scheduling of equal-length jobs
on two identical machines, ACM Trans. Algorithms, vol. 5, no. 1, pp. 1-18, 2008.
5. J.-H. Kim and K.-Y. Chwa, On-line deadline scheduling on multiple resources, in
COCOON '01: Proceedings of the 7th Annual International Conference on Computing
and Combinatorics. Springer-Verlag, pp. 443-452, 2001.
6. W. Cirne, D. Paranhos, L. Costa, E. Santos-Neto, F. Brasileiro, J. Sauve, F. A. B. Silva,
C. O. Barros, C. Silveira, and C. Silveira, Running bag-of-tasks applications on
computational grids: the mygrid approach, in Parallel Processing, 2003. Proceedings.
2003 International Conference on, pp. 407-416, 2003.
F00011
March 23-26, 2010
453
7. Y. C. Lee and A. Y. Zomaya, Practical scheduling of bag-of-tasks applications on grids
with dynamic resilience, IEEE Trans. Comput., vol. 56, no. 6, pp. 815-825, 2007.
8. A. Benoit, L. Marchal, J.-F. Pineau, Y. Robert, and F. Vivien, Offline and online master-
worker scheduling of concurrent bags-of-tasks on heterogeneous platforms, in 22nd IEEE
International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1-8,
2008.
9. E. Anderson, D. Beyer, K. Chaudhuri, T. Kelly, N. Salazar, C. Santos, R. Swaminathan,
R. Tarjan, J. Wiener, and Y. Zhou, Value-maximizing deadline scheduling and its
application to animation rendering, in SPAA '05: Proceedings of the seventeenth annual
ACM symposium on Parallelism in algorithms and architectures. ACM, pp. 299-308,
2005.
10. Y. Zhang and A. Sivasubramaniam, Scheduling best-effort and real-time pipelined
applications on time-shared clusters, in SPAA '01: Proceedings of the thirteenth annual
ACM symposium on Parallel algorithms and architectures. ACM, pp. 209-219, 2001.
11. J. Sherwani, N. Ali, N. Lotia, Z. Hayat, and R. Buyya, Libra: a computational economy-
based job scheduling system for clusters, Softw. Pract. Exper., vol. 34, no. 6, pp. 573-590,
2004.
12. Sun grid engine, http://gridengine.sunsource.net/, 2010.
13. Condor project, http://www.cs.wisc.edu/condor/, 2010.
14. Maui scheduler, http://www.clusterresources.com, 2010.
15. Moab workload manager, http://www.clusterresources.com, 2010.
16. Torque resource manager, http://www.clusterresources.com, 2010.
17. H. Nakada, A. Takefusa, K. Ookubo, M. Kishimoto, T. Kudoh, Y. Tanaka, and S.
Sekiguchi, Design and implementation of a local scheduling system with advance
reservation for co-allocation on the grid, in CIT '06: Proceedings of the Sixth IEEE
International Conference on Computer and Information Technology. p. 65, 2006.
18. D. B. Jackson, Q. Snell, and M. J. Clement, Core algorithms of the maui scheduler, in
JSSPP '01: Revised Papers from the 7th International Workshop on Job Scheduling
Strategies for Parallel Processing. Springer-Verlag, pp. 87-102, 2001.

F00012
March 23-26, 2010
454
Solving Magnetic Sounding Integral Equations from
Multilayer Earth Using Message Passing Interface

B. Dolwithayakul
1,C
, C. Chantrapornchai
1
and S. Yooyeunyong
2

1
Department of Computer, Faculty of Science, Silpakorn University, Nakhon-Pathom 73000, Thailand
2
Department of Mathemetic, Faculty of Science, Silpakorn University, Nakhon-Pathom 73000,
Thailand
C
E-mail: banpot@su.ac.th Fax: 034-272923; Tel. 084-0038688

ABSTRACT
In this research, we are developing virtualization for an earth model, particularly
cracks in the model. One part of the problem is the extensive computation due to the
integral equation. This paper presents a parallel numerical solution approach for
solving the integral equation obtained by the magnetic sounding from multilayer earth.
Generally, these integral equations comes in the form of very high oscillation which
make calculation time very long in order to obtain a precise solution. We use Message
Passing Interface (MPI) library as a tool to parallelize this part of the computation. The
computation is tested on a cluster to show the effectiveness of the approach.

Keywords: multilayer earth model, magnetic sounding, integral equation, parallel
processing, MPI, cluster computing.

1. INTRODUCTION
This paper describes a new parallel application in the field of geophysics. The geologists
usually measure the magnetic field by sounding from earth in very high oscillation integral
equation. This high oscillation integral takes very long time in order to obtain a precise
solution. The type of integral cannot be solved existing mathematic tools like Parallel
MatLAB. This is because of complexity of functions and the need for an effective very small
value error controller (less than 10
-10
). Thus, developing the effective integral solver for this
application is necessary.
Our goal is to create a parallel visualization of crack in earth modeling. In this preliminary
research, we study the computational model which are used in the application. From the
study, it is found that the extensive computations are from the magnetic sounding numerical
integration. Thus, in this paper, we attempt to parallelize this computation using Message
Passing Interface(MPI). We then measure the performance compared to the original serial
version. As in the convention, we study the effect of the precision and the execution time as
well as speedup. The correctness of the results are checked. The results also show the
consistent speedup when we added more number of nodes.
This paper was organized as follows. The next section presents the mathematical models
for magnetic sounding. Section 3 presents our parallel version. Section 4 presents the
experimental results and then Section 5 concludes the paper and discusses about the future
work.

F00012
March 23-26, 2010
455
In mathematical model we use Maxwells equations[7] in (1)

(1)

Where H is the magnetic field density vector and Sigma() is an arbitrary constant which
specifies conductivity of medium. On earth modeling we use specifies conductivity of the
medium in the earth layers which let us know which kind of the material occupied on each
earth layer. The basic mathematical models for magnetic sounding is partial differential
equations. It is written in a cylinder coordinate as in Equation (1\2):

(2)

In this equation, H is the magnetic field function, z is the depth from surface to a layer and r is
the distant between electromagnetic field generator and measuring device. This differential
equation is difficult to solve with few known values. Then, we use Hankel transformation to
simplify this equation.
Hankel transform of order v of a function f(r) is given by

(3)

We transform Equation (2) using Equation (3) and we use for Hankel transformed
function. The transform function shown in Equation (4) and can be transformed back as
Equation (5).

(4)

(5)

In this equation, J
n
(x) is the bessel function of order n which can be estimated by the integral
function in Equation (6).

(6)

The solution for Equation (4) for k
th
-layer where 1 k n and n 2 is [1,3,9].

(7)

We put Equation (7) back to (5) to get the final Equation (8)[3].

(8)
F00012
March 23-26, 2010
456

In the equation, A and B are the arbitrary constants. This function is in the form of very high
oscillation. We implement the parallel integration based on Gaussian Quadrature method
using MPI [5,10].

3. PARALLEL IMPLEMENTATION
Since the computation we involved is the integral finding part, we deal with one
dimensional array data. We describe the issues as following.
3.1 Find the integration range.
Since Equation(8) is in the infinite range, we need to find the end of the integration range.
The magnetic equation is always converged to zero.
We split the integral range into chunks. We use a rectangle method for quick and rough
integration to find the end of integral range. We integrate 50 values each round until the
integration result becomes zero. This can be written in the pseudocode below.

1: FUNCTION FINDMAXRANGE
2: BEGIN
3: Let CHUNKSIZE:=50, RESULT:=NULL, MAXRANGE:=0
4: WHILE (RESULT is not NULL AND RESULT not equal 0) DO
5: START:=MAXRANGE*CHUNKSIZE
6: END:=(MAXRANGE+1)*CHUNKSIZE
7: RESULT:=Result of integration from START to END
8: MAXRANGE:= MAXRANGE+1
9: END WHILE
10: RETURN MAXRANGE
11: END FUNCTION

3.2 Job Partitioning
Since the integration range for this type of function is not a lookahead computational
function, we evenly divide the integral range into equal chunks with communications.
We show our job partitioning for a magnetic wave integration for 7 node computer as
Figure 1.

Figure 1. Job partitioning for numerical integration

3.3 Parallel Integration
We use the Bessel estimation function from GNU Scientific Library (GSL) [4] to estimate
Bessel function in Equation (8).
The node with rank 0 separates the whole job and sends a chunk range data using blocking
communication by MPI_Send to other nodes. Then, each node starts its computation. When
F00012
March 23-26, 2010
457
each node finishes the computation. The root node uses a collective operation
MPI_ALLREDUCE with operator MPI_SUM to combine results from other nodes.
This implementation example for 8 nodes can be displayed in Figure 2.

Figure 2. Data Communication for parallel computational.

We perform our experiments on a 32 node/64 core Linux cluster, with a Gigabit Ethernet
interconnection at Louisiana Technology University, USA. In the cluster, each core is Intel
Xeon 2.8GHz with 512 MB RAM. The cluster runs LAM-MPI version 7.1
The result of integration with r=0.02 and lambda= is 6.75 (fixed relative error : 10
-14
).
The average computational time (t
n
) is measure in seconds for n node shown in Table 1.

Table 1. Execution time for different number of nodes(t
n
) in seconds

Number of Nodes Time (Seconds)
1 128,774
2 84,407
4 49,196
8 25,450
16 12,402
32 6,231

The average execution time for serial computation is 112,335 seconds. We show the
comparison chart for execution time between serial execution and parallel execution with
increasing number of nodes as Figure 3.

F00012
March 23-26, 2010
458

Figure 3. Comparing computation time measured in seconds (Y-Axis) with
respect to number of nodes (X-Axis)

The speedup of parallel simulation is defined as /
p 1 p
S =t t , where
1
t is the average
sequential simulation time and
p
t is the average parallel simulation time on p processes. The
parallel efficiency is computed by /
p p
E = S p [8]. The comparison between ideal speedup and
our experimental speedup is shown as Figure.4

Figure 4. Comparison of parallel speedup in percent (Y-Axis) with respect to
the number of nodes (X-Axis)

We found the more number of nodes, the more decrease in parallel speedup rate. By
investigation we found there overhead come from communication time and waiting time on
each node. We can decrease these overheads with using nonblocking communication and
designing a new job scheduler to improve this.

5. CONCLUSION
In this paper, we implement the magnetic sounding integration using MPI. The integration
is the part of crack in earth modeling which consume lots of computation. Our experiment is
done on 32 nodes/64 cores cluster. Based on the preliminary results, we can save at most
94.45% of time compared to the serial execution. This research and implementation could
help the geologist modeling earth layers in the observed area much quicker.
F00012
March 23-26, 2010
459
Furthermore, we will explore the effect of the accuracy and the computation time. The
integral range may be dynamic to save execution time while maintaining the accuracy. We
will compare the developed approach to existing integral approaches in both time and
precision as well as suitability to our application. Further, we use the hybrid between MPI
and OpenMP and design a appropriate job scheduler to make sure we can occupied all cores
of CPU with jobs efficiently. Also, the visualization of the earth crack will be developed.

REFERENCES
1. Banerjee, B., Sengupta, B. J. and Pal, B. P., Geophysical Prospecting, 1980, 28, 435-452
2. Bebendorf, M. and Kriemann, R., Computer and Visualization in Science, 2005, 8(3),
121- 135
3. Chave, A. D., Geophysics, 1983, 48, 1671-1686
4. GNU Organization, Full documentation for GSL, 2009, www.gnu.org
5. Jeffrey, A., A Handbook of Mathematical Formulas and Integrals, Academic Press,
London, 1995
6. Kim, H. S. and Lee, K., Geophysics, 1996, 61, 180-191
7. Maxwell, J. C., A Dynamical Theory of the Electromagnetic Field, 1865, Philosophical
Transactions of Royal Society of London, 155, 459-512
8. Quinn, M. J., Parallel Programming in C with MPI and OpenMP, 2004, McGraw-Hill,
318-335
9. Raghuwanshi, S. S. and Singh, B., Geophysical Prospecting, 1986, 34, 409-423
10.Rjasanow, S. and Steinbach, O., The Fast Solution of Boundary Integral Equations,
2007, Springer, 135-160
11.Yootuanyong, S. and Chumchob, Proceeding of the third Asian Mathematical
Conference, 2000, 590-603
12.William H., Brian P., et al., Numerical Recipes in C (2
nd
ed.), 1988, Cambridge University
Press

ACKNOWLEDGMENTS
We would like to thanks Dr.Chokchai Leaungsuksun for providing us a 32 node/64 cores
cluster at Louisiana Technology University, USA. for this experiment.
F00013
March 23-26, 2010
460
Parameters Self-Tuning Technique for Large Scale
Scheduler

Sugree Phatanapherom and Putchong Uthayopas
Department of Computer Engineering, Faculty of Engineering,
Email: g4685038@ku.ac.th, pu@ku.ac.th Tel. +6629428555 Ext. 1423

ABSTRACT
To study complex phenomenon, large scale computing resources are necessary. These
computing resources are usually managed by a batch scheduler system. By default,
most schedulers will come with preset configuration for generic usage such that jobs
and resources are matched without optimization in any criteria. As a result, jobs are
getting done in fair-share fashion which is not efficient in most cases. The reason is that
the scheduling algorithms usually consist of many adjustable parameters. Thus, finding
the right parameter set is a very challenging task since the scheduling parameters
usually depends on incoming job pattern, job length, job arrival rate for any given
period of time. This paper proposes an automatic self-tuning architecture for generic
batch schedulers using multivariate optimization technique combined with feedback
and distributed computing to find a near optimum solution in reasonable time. With the
proposed method, the scheduling will automatically adjust itself to the changed
workload. The experiments done by simulation show that the proposed method can
increase the overall performance of a scheduler for large Scale HPC system.

Keywords: Load Scheduler, Self-Tuning Scheduler, Grid computing, Cluster
computing
1. INTRODUCTION
Presently, scientific computing becomes more important to the study of many natural
phenomenons such as earthquake, storm, tsunami, and global warming. To study a complex
phenomenon, large scale computing resources are needed. This large pool of computing
resources is usually managed by a batch scheduler. One of the most challenging
configurations is the scheduling algorithm and its parameters. By default, most schedulers
will come with preset configuration for generic use. Thus, jobs and resources are matched
without a proper optimization. As a result, jobs are usually scheduled to execute in a fair-share
fashion which is not very efficient in most cases.
In practice, a scheduling algorithm may consists of many adjustable parameters.
Therefore, tuning the scheduling parameters is a time-consuming and complicated work. In
addition, the effect of scheduling parameters usually depends on the incoming job pattern, job
length, and job arrival rate. Thus, the static tuning of scheduling parameter may not be
adequate to maximize the system performance.
To solve the problem, this paper proposes an automatic self-tuning architecture for
generic batch schedulers using multivariate optimization technique combined with feedback
and distributed computing to find a near optimum solution in reasonable time.
The rest of this paper is organized as follows. The related works regarding to self-tuning
and scheduling algorithms are described in Section 2 followed by the background knowledge
about scheduling models in Section 3. In Section 4, several multivariate optimization
algorithms are reviewed as a candidate to be employed in proposed architecture described in
section 5. Section 6 and 7 describe the experimental results, conclusion and future works,
respectively.

F00013
March 23-26, 2010
461

Since job scheduling problem is one of the classic problems in this field, many
researchers have been done to improve result even now including this article. Many of these
papers address the case where tasks are independent. For example, [3] proposed both on-line
mode and batch mode scheduling heuristics: MinMin, MaxMin and Sufferage. The analysis
using stochastic model and load balancing scheme using second order moments as well as
first order momentsis proposed in [2]. This scheme is proved to improve system performance
for both static and dynamic scheduling.
In [1], XSufferage scheduling heuristic in batch mode has been proposed comparing to
the original ones. XSufferage is a modification of Sufferage to reuse already existing input
file in some resources assuming that each computing system has its own shared file system.
However, this heuristic does not take care of the economic model and only improve
performance of a class of applications where the input files could be reused by the subsequent
tasks.

To study the behavior of scheduling algorithm and its parameters, it is necessary to define
a mathematical model for scheduling algorithms. The model used in this article is based on
the model proposed in [1] and extended to cover cost and deadline parameters presented in
[2]. This model consists of 3 subcomponents; system model, resource costing model and
application model.
In this paper, a computing system is a set of computing resources represented by the
execution rate and other attributes, e.g., interconnection speed and latency. The degree of
complexity of the system depends on its attributes. The model used in this article is a
simplified version of the model presented in [2]. Instead of defining grid and cluster, the
system model S is a one dimension vector of execution rate
i
e of each computing resource i .
In addition, the interconnection between resource i and resource j is denoted by
ij
B .In our
previous work, resource costing model was defined as a generic Resource Vector. The
simplified version of that model used in this article has only one element that represent a
rental cost so that rental cost of computing resource i is denoted by
i
C . Since there is only
one element in resource vector that represents the rental cost for a period of time, the
Resource Vector
t
R also has only one element
ti
R to indicate the period of time that the
computing resource has been consumed. Thus, C can be defined as in Eq. 1;

i
R S = C
(1)
The last component of this model is application. Let
i
W denote the amount of workload of a
job. In this paper, only workload is addressed to simplify the model.
According to the unified priority index
ij
p proposed in [3], the simplified version based on
above model is defined in Eq. 2.

d + c C r + e = p
j ij j ij ij
(2)

where
ij
e and
ij
C represents the execution time,
j
r represents ready time, represents cost,
j
c represents accumulative cost and d represents deadline while , , , , and
represent their factors respectively. Since the quality of priority index proposed in earlier
research depends on its 5 factors , , , , and . So, this problem becomes the
problem of finding a point in 5 dimensional state space represented by real numbers.

F00013
March 23-26, 2010
462
This paper proposes an interval-based state space optimization algorithm to determine the
appropriate values for all 5 factors in a reasonable time. Each dimension contains infinite
valid real values; however, one of the requirements in this case is the time restriction. Instead
of trying all valid values to find the optimal value for each dimension, this algorithm proposes
to pick n random values and try them one by one. Since random values may not represent the
dimension accurately in case of wide range, each dimension is then divided into m pieces to
ensure coverage of randomization.

The experiments in this paper are conducted using simulation tool in [6] to ensure a 100%
repeatability under a controlled environment. Workload history and job characteristics are
taken from Thai National Grid Center during a period of 6 months in 2009. The cluster
configuration used in this paper is a subset of TERA cluster in ThaiGrid [7] which belong to
Thai National Grid Center.

Figure 1. 100 random values and varying number of slices

Metrics used in this article consists of makespan, cost and deadline penalty. The first
experiment is to perform random-based multivariate optimization on 5 factors by varying
amount of points where each dimension contains one interval. Figure 1 clearly shown that the
increasing number of slices leads to a better attainable performance. The second experiment is
to find out the relation between amount of intervals and quality of optimization by varying
number of random values and varying amount of intervals.

Figure 2. Varying number of slices and iterations

Generally, the varying of number of iterations can outperformed the later significantly. There
is only some glitch at 50 iterations because of the insufficient number of input data. For the
test case of 100 iterations or more the stable results is obtained. The third experiment is to
investigate the run-time of the proposed algorithm when varying the number of random
F00013
March 23-26, 2010
463
values as well as the amount of intervals. As a result each 100 iterations took time in order of
1 minute consistently. According to the second experiment, 400 iterations, 5 slices, 100 times,
it took 4 hours to run. In this case, 4 hours may be too long to simulate daily workload.

5. CONCLUSION
Most scheduling models and algorithms introduce the use of the adjustable factors to fit
job characteristics. Since these factors are not linear, the use of multivariate optimization can
potentially give a much better result. The interval-based state space optimization proposed in
this article significantly increases quality of the optimization at faster speed. In addition, this
algorithm is highly distributed. So, each piece of calculation can be run on each compute
nodes using MPI Scatter and MPI Gather to gain a faster and more accurate estimation. In the
future, more complicate estimation will be investigate to increase the effectiveness of the
scheduling algorithm. This will enable users to better the computing resources invested.

REFERENCES
1. Casanova, H., A. Legrand, D. Zagorodnov and F. Berman. 2000. In Proceeding of the 9th
Heterogeneous Computing Workshop. 349--363. Cancun, Mexico.
2. Lee, S. and C. Cho. 2000. Load Balancing for Minimizing Execution Time of a Target
Job on a Network of Heterogeneous Workstations. In Proceeding of the 6th International
Workshop on Job Scheduling Strategies for Parallel Processing. Cancun, Mexico.
3. Maheswaran, M., S. Ali, H.J. Siegel, D. Hensgen and R.F. Freund. 1999. Dynamic
Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous
Computing Systems. In Proceeding of the 8th Heterogeneous Computing Workshop. San
Juan, Puerto Rico.
4. Phatanapherom, S. 2003. Model and Implementation of Efficient Grid Resource
Scheduler. Master Thesis, Department of Computer Engineering, Faculty of Engineering,
5. Phatanapherom, S., P. Uthayopas, and V. Kachitvichyanukul. 2003. Fast Simulation
Model for Grid Scheduling using HyperSim. In Proceedings of Winter Simulation
Conference 2003. New Orlean.
6. Varavithya, V., P. Uthayopas. 2001. ThaiGrid: Architecture and Overview, In Proceeding
of South East Asia High Performance Computing 2001. Kasetsart University, Bangkok,
Thailand.

ACKNOWLEDGMENTS
The authors would like to acknowledge the contributing of Thai National grid Center for
information and facility support.

F00014
March 23-26, 2010
464
Performance Evaluation of Cache Replacement Policies for
High-Energy Physic Data Grid

Jedsada Phengsuwan
1,2,C
, and Natawut Nupairon
1

1
Department of Computer Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
2
Large Scale Simulation Research Laboratory, National Electronics and Computer Technology Center,
C
E-mail: jedsada.phengsuwan@nectec.or.th; Fax: 02-5646776; Tel. 02-5646900

ABSTRACT
A variety of high-energy physic application involve processing multiple input files in
data grid environment. These data-intensive applications demand high-throughput data
processing systems. In addition, since these data files are quite large and
networklocated at several geographically distributed institutions, accessing large data
set can become very time consuming due to bandwidth limitation in wide area network.
Data grid caching is a technique which can be used for improving the performance of
data-intensive applications in data grid environment. This technique reduces network
bandwidth requirement and minimizes access latency. Recently, Block-based Data Grid
Caching has been proposed to provide more efficient mechanism on managing large
data set in data grid environment. In general, the performance of the caching depends
heavily on cache replacement policy. Many replacement policies have been proposed in
literature, which aims to improve performance of web caching. However, existing
research does not consider the impact of a diversity of cache replacement policies,
especially for data-intensive applications like high-energy physic. In this paper, we
have evaluated the performance of Block-based data gird caching, using popular cache
replacement policies. We conducted our experiments with a real workload produced by
the SAM-Grid, a distributed data handling system supporting for D0 Grid project. The
D0 Grid project is one of the largest currently running high-energy physic experiments.
Our experiment results reveal different behavior of the replacement policies on access
pattern of high-energy physic data grid. In addition, the results provide us guidelines in
order to design an efficient replacement policy for Block-based data grid caching.

Keywords: Data Grid, Cache, Block-based Data Grid Caching, Replacement Policy

1. INTRODUCTION
Grid computing technology enables the ability to access heterogeneous and geographically
distributed resources in virtual organization. The huge resources available on the grid can be
used by scientists and researchers to facilitate large-scale scientific simulations and
experiments to solve their complicated scientific problems. This benefit encourages
organizations to share computing resources among research communities, making the
research on computational science grows up dramatically. In addition, terabytes of data per
day produced by the scientific simulations and experiments might be accessed and analyzed
by another research community. Consequently, massive amount of data will be transferred
among the communities over wide area network. Thus, data grid [1] becomes increasingly
important, especially for data-intensive applications in order to store and manage the massive
data. A variety of high-energy physic application involve processing multiple input files in
data grid environment. These data-intensive applications demand high-throughput data
processing systems. In addition, since these data files are quite large and located at several
geographically distributed institutions, accessing large data set can become very time
consuming due to bandwidth limitation in wide area network. Data grid caching is a technique
which can be used for improving the performance of data-intensive applications in grid
environment. This technique reduces network bandwidth requirement and minimizes access
F00014
March 23-26, 2010
465
latency by storing frequently accessed data in a local repository, closed to users. The users
can access the data in cache instead of accessing original data from file servers. In general, the
performance of the caching depends heavily on cache replacement policy. When the cache
storage space is full, cache replacement policy is used to decide which data in the cache
should be evicted to make cache space available for storing newly fetched data. Many
replacement policies have been proposed in literature, which aims to improve performance of
web caching. Since, web caching mechanism stores an entire file into the cache storage as a
single object, it is inefficient to utilize thie web caching mechanism to manage very large data
set in data grid environment. Recently, Block-based Data Grid Caching[2] has been proposed
to provide more efficient mechanism on managing large data set in data grid environment.
The Block-based Data Grid Caching mechanism divides cache space into fixed-size cache
blocks for storing data. The result in [2] has indicated that the block-based data grid caching
perform well comparing to conventional web caching. However, the research does not
consider the impact of a diversity of cache replacement policies, especially for data-intensive
applications like high-energy physic.
In this paper, we have evaluated the performance of Block-based data grid caching, using
popular cache replacement policies. These representative replacement policies are selected
from the categories of replacement policy presented in [3], which were classified to cover
most characteristic of current active web cache replacement policies. We conducted our
experiments with a real workload produced by the SAM-Grid [4], a distributed data handling
system supporting for D0 Grid project [5]. The D0 Grid project is one of the largest currently
running high-energy physic experiments. Our experiment results indicated that recency-based
policy performs very well comparing to other characteristic of cache replacement policies.
The rest of this paper is organized as followed: Section 2 provides background on Block-
based data grid caching, cache replacement policy as well as performance metrics for
evaluating the performance of cache replacement policies. We provide related works and
workload characteristic in section 3 and 4 respectively. Our experiments and discussion are
given in section 5 and section 6. Finally, we conclude our paper and future work in section 7.

2. BACKGROUND
2.1 Block-based Data Grid Caching
Conventional data grid caching is usually based on web caching mechanism that stores a
whole data into the cache storage as a single object. Since data files in grid environment are
quite large, there are some drawbacks on managing the large data files. To explain the
drawbacks, figure 1 (a) demonstrates an example of storage space allocation management of
web caching in data grid. Since data files in grid environment are quite large, web caching has
to reserve large storage space to store a data file, making the storage space become limited
and cannot accommodate many files. In addition, many requests have to wait for a long while
before the cache fetches the original file from file server.
Block-based data grid caching has been proposed to improve the previous problem. The
concept of Block-based data grid caching has inspired from caching mechanism in
conventional
file system. It divides cache storage space into multiple fixed-sized blocks to store data files,
as shown in figure 1 (b). In addition, each file will also be divided into multiple pieces to be
stored in these fixed-size cache blocks instead of storing the whole file as a single object.
Based on this concept, once a client sends request for data file to cache server, the cache
server checks whether the requested blocks of data file are available in its repository. If
available, the cache server returns the blocks in its repository back to the client. Otherwise, it
will allocate cache storage spaces in fixed-size blocks and then fetch the requested data from
file server. Based on this approach, data grid caching would enhance cache storage utilization,
decrease number of waiting requests as well as provides more efficiency on accessing latency.

F00014
March 23-26, 2010
466

Figure 1. Comparing object-based caching and block-based caching

2.2 Cache Replacement Policies
Cache replacement policy is a significant mechanism of caching in order to manage data
in cache. Since cache storage space is limited, the policy can be used to make a decision on
evicting unnecessary data from the cache, for storing newly fetched data. Currently, many
cache replacement policies have been proposed in the literature [3-4], to improve performance
of web caching. To understand characteristic of the diversity of cache replacement polices,
the literature have classified the characteristic into five categories as follow: (1) Recency-
based policy use recency of accessed data in cache as a primary factor. The catch preserves
most recently accessed data and evicts least recently accessed data from it. The representative
of cache replacement policy in this category is LRU. (2) Frequency-based policy use
frequency of accessed data in cache as a primary factor. The cache preserves high frequently
accessed data and evicts low frequently accessed data from it. The representative of cache
replacement policy in this category is LFU. (3) Size-based policy use size of data in cache
as a primary factor. The representative of cache replacement policy in this category is SIZE.
The policy manages data in the cache by evicting the larger data to make large cache storage
space available for newly fetched data. (4) Function-based policy usually use utility
functions which may combine many factors, such as recency, frequency, time, latency and so
on, with appropriate weighting. The representative replacement policy in this category is
GDS. (5) Random-based policy uses random functions to evict data from cache. The
representative of cache replacement policy in this category is RANDOM.

2.3 Performance Metrics
The performance of cache replacement policies is commonly evaluated by hit rate and
byte hit rate, which are used to investigate the behavior of bandwidth consumption. In
addition, we also measured average latency to investigate the behavior of cache replacement
policies in order to access large data file in grid environment.
1) Hit rate Hit rate is the ratio of the number of satisfied requests for data available in
the cache to the number of all requests.
2) Byte hit rate Byte hit rate is the ratio of the total size of data that satisfy the requests
to the total size of all requested data.
3) Average Latency Average latency indicates the delay of client taking to receive data
from cache server. The latency is comprised of propagation delay and transmission
delay. The propagation delay is the time required for a packet to travel from sender to
receiver. While the transmission delay, depending on file size and bandwidth, is the
time required for transferring a file.

3. RELATED WORKS
Cache replacement policy is a significant mechanism enhancing cache performance. Many
replacement policies, such as LRU, LFU and GDS, have been proposed in literature [3-4],
which aims to improve performance of web cache as well as memory cache management.
However, there are few researches investigating on performance of diverse cache replacement
policies, in grid environment, to cover all characteristic of current active cache replacement
F00014
March 23-26, 2010
467
policies. In addition, such researches usually proposed a novel cache replacement policy and
compare the cache performance with few policies that do not cover all characteristic of
existing cache replacement policies.
Recently, SIZE-K replacement policy was proposed in [9] to improve the performance of
data grid caching. The policy uses size of data as a primary factor to evict the data having the
same size as the newly fetched data. Though the policy performs well for large data
management, it compares the cache performance only with LRU representing only one
characteristic of existing policies. Moreover, [10] proposed a technique to improve
performance of storage hierarchies in grid computing system. The technique combines an
admission policy with a cache replacement policy to support jobs submission management,
performing better performance on average response time and waiting time of the jobs.
However, the impacts of different cache replacement policies also have not been studied.

4. WORKLOAD CHARACTERISTIC
The workload we use in this paper is a real workload produced by the SAM-Grid [5][11],
a distributed data handling system supporting for D0 Grid project. The D0 Grid project is one
of the largest currently running high-energy physic experiments and it can be a representative
for high energy physic data grid. Figure 2.a shows the distribution of file size, ranging from
234 B to 2.1 GB, with Mean 255 MB. The workload consists of 4,933 job requests recorded
from January 2003 to March 2003. There are 1,048,911 requests, which generate 287 TB of
file transfer over the network. The requests cover 160,408 of unique files, 41 TB in total file
size. Figure 2.b shows the popularity distribution of all requested files, the number of file
references are ranging from 2 to 159, with Mean 6.

(a) File size distribution (b) File popularity distribution

Figure 2. Workload characteristic

5. EXPERIMENTAL
To conduct our experiments, we have developed a trace-based simulation to evaluate the
performance of our block-based data grid caching, using conventional replacement policies in
web caching. In all experiments, we use real workloads mentioned in section 4. The
representative replacement policies mentioned in section 2.2, including LRU, LFU, SIZE,
GDS and RANDOM, have been selected for the evaluation to cover all characteristic of
currently used replacement policies. In addition, we investigated the behavior of the policies
on different size of cache storage space, ranging from 1% to 6% of the total size of overall
unique requested blocks. Since the 1% cache size, containing 5,123 blocks or 512 GB of
capacity, is quite large. The selected range of cache storage space in our experiment will be
practical for general cache server. The configurations of our experiment are described in
figure 4.a. While the 100% cache size is calculated by the equation depicted in figure 3,
where the S
f
is the size of a requested file, the S
b
is block size of the block-based cache and
the F
req
is requested files. And the block size in our experiment was set to 100 MB.

F00014
March 23-26, 2010
468

Figure 3. Cache size calculation

(a) Experiment Configuration (b) The topology of
simulation

Figure 4. The topology of simulation and experiment configurations

We have designed a topology for our simulation as shown in figure 4.b. All clients, in our
simulation are connected directly to the intermediate cache server located in the same network
domain. The bandwidth between clients and cache server is set to 1GB/s with 10ms of
propagation delay. Furthermore, cache server is connected to file servers through WAN with
100MB/s of bandwidth and 300ms of propagation delay. The value of propagation delay
between cache server and file server is derived from the actual propagation delay, the link
from Faculty of Engineering, Chulalongkorn University to AIST, Japan.

We have evaluated performance of the representative cache replacement policies, based on
the performance metrics mentioned in section 2.3. Figure 5 depicts the simulation results
including hit rate, byte hit rate and average latency. The results of hit rate and byte hit rate, as
shown in figure 5.a and 5.b respectively, exhibit similar trends. When the cache size
increases, the hit rate and byte hit rate of LRU, RAND and GDS grow up rapidly while LFU
and SIZE grow up gradually. The LRU outperforms others replacement policies and can
reaches the maximum rate of 86%. In addition the LRU provides more efficiency when the
cache size has increased only 1%. On the other hand, the SIZE does not perform well since it
tries to evict blocks which are parts of popular files. And the popular files are quite large.
Moreover, LFU also does not perform well since high frequently accessed blocks at the
beginning are stored in cache and not to be accessed for a long while.
The results on average latency, as shown in figure 5.c, indicates the average time that a job
spends for transferring data over network. The result presents the average latency in different
cache size. When cache sizes are increased, the average latency of LFU, RAND and GDS will
be decreased rapidly while LFU and SIZE become lower gradually. Obviously, the LRU,
which provides high hit rate, outperforms other policies. Although the GDS performs very
well in a large cache size, the policy is fairly performed for small percentage of cache size.

F00014
March 23-26, 2010
469

(a) Hit rate (b) Byte hit rate

(c) Average latency

Figure 5. Simulation results

7. CONCLUSION
In this paper, we have evaluated the performance of Block-based data gird caching for
high-energy physic applications, using popular cache replacement policies. Our experiments
were conducted with a real workload produced by the SAM-Grid, which can be used as a
representative of high energy physic data grid. Since the total sizes of requested files are quite
large, we thus studied the impact of diverse cache replacement policies with a small
percentage of cache size, ranging from 1% to 6%. The representative cache replacement
policies, in our experiments, reveal all characteristic of existing proposed policies. The result
indicated that recency-based replacement policy, sush as LRU, outperforms the others for all
performance metrics, including hit rate, byte hit rate and average latency. This result will
provide guideline for our future works to design more efficient replacement policies, based on
recency-based characteristic. In addition, performance of the recency-based policy in block-
based data grid caching could be improved by exploiting the locality of reference [12].

REFERENCES
1. Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke, The
data grid: Towards an architecture for the distributed management and analysis of large
scientific datasets, 2000, 23, 187-200
2. T. Hiruntaraporn, and N. Nupairoj, Block-Based Grid Caching for Grid Data Farm, 2006,
3, 2204-2207
3. K. Wong, Web Cache Replacement Policies: A Pragmatic Approach, 2006, 20, 28-34,
2006.
F00014
March 23-26, 2010
470
4. S. Podlipnig, and L. Bszrmenyi, A survey of Web cache replacement strategies, 2003,
35 , 374 - 398
5. G. Garzoglio, A Globally Distributed System for Job, Data and Information Handling for
High-Energy Physics, Ph.D. Dissertation, DePaul University, 2005
6. B. Abbott, A. Baranovski, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar, DZero
Data-Intensive Computing on the Open Science Grid, 2008
7. Doraimani, Shyamala and Iamnitchi, Adriana, Workload characterization in a high-
energy data grid and impact on resource management, 2009, 12, 153-73
8. S. Podlipnig, and L. Bszrmenyi, A survey of Web cache replacement strategies, 2003,
35(4), 374 - 398.
9. HongJin Park, ChangHoon Lee, Sized-Based Replacement-k Replacement Policy in Data
Grid Environments, 2006, 353-361
10. Ekow Otoo, Doron Rotem and Arie Shoshani, Impact of Admission and Cache
Replacement Policies on Response Times of Jobs on Data Grids, 2003, 113-123
11. T. Kurca, Grid Computing at the D Experiment, 2007, 1094-1099
12. Peter J. Denning, The locality principle, 2005, 19-24

ACKNOWLEDGMENTS
We would like to thanks Dr. Gabriele Garzoglio for the permission to use the high energy
physic trace logs. The trace logs used in this paper were produce by the SAM-Grid, a
distributed data handling system supporting for D0 Grid project.

F00015
March 23-26, 2010
471
Modeling and Simulation of Large-scale Virtualization
based on the CloudSim Toolkit

Anupong Banjongkan
1
, Supakit Prueksaaroon
1,2
, Vara Varavithya
1,C
and Sornthep Vannarat
2,C

1
Department of Electrical Engineering, King Mongkuts University of Technology North Bangkok,
1518, Piboonsongkram Road, Bangsue, Bangkok, 10800, Thailand
2
Large-Scale Simulation Research Laboratory National Electronic and Computer Technology Center
112 Thailand Science park, Phahonyothin Road, Klong 1, Klong Luang, Pathumthani, 12120, Thailand
C
E-mail: vara@kmutnb.ac.th, sornthep@nectec.or.th

ABSTRACT
Virtualization technologies have recently gained popularity in enterprise information
system as a tool for enhancing computational resource sharing. Virtualization
technology helps not only increasing utilization of computing resources but also
reducing configuration workload, administrative cost, application porting and energy
saving. Multiple commodity operating systems are allowed to share conventional
hardware in a safe environment. One application of Virtualization technology is Cloud
Computing. Cloud aims to power the next generation data center by providing a
network of virtual services. Large-scale virtualization has different compositions,
configurations and requirements. Quantifying the performance of resource scheduling
and overhead are a challenging problem to tackle. In our previous work, we had shown
a set of the experiments to measure performance the Xen Virtualization. The overhead
of Xen and another virtualization tools in a small-scale system is demonstrated. In this
work, we extend the modeling and creating of virtual machines on a large-scale data
center as well as multiple data centers. The CloudSim, a new generalized and extensible
simulation framework, is used as a simulator. The extension to demonstrate the effect of
virtualization overhead was developed in this work. The results of large-scale animation
render are shown based on Amazon EC2 Model. It is observed that different service
instances have significant impact on performance. This work can be used as a basis for
studying federation and migration of VM for reliability and scalability. The selected
simulator enables seamless modeling, simulation, and experimentation of emerging
Cloud computing infrastructures and management services in repeatable and
controllable environment.

Keywords: Virtualization Technology, Cloud Computing, CloudSim.

1. INTRODUCTION
Cloud Computing aims to power the next generation data centers by providing a network
of virtual services that included hardware, software, application, and user interfaces. A user is
able to access and deploy applications from anywhere in the world on demand. The main
advantages of cloud computing are cost competitiveness and flexibility in resource
provisioning. The user can subscribe to different QoS requirements. Cloud technology
research is focused on delivering applications as services. The term Everything-as-a-Service
(EaaS) was introduced; service in Cloud computing has separated into three classes, namely,
Software-as-a-Service (SaaS) features a complete application offered as a service on demand,
Platform-as-a-Service (PaaS) encapsulates layers of software and provides them as a service
that can be used to build higher-level services, and Infrastructure-as-a-Service (IaaS) delivers
basic storage and compute capabilities as standardized services over the network [1].

March 23-26, 2010
472
Considering service boundary, a Cloud computing implementation can be classified as a
Public Cloud and a Private Cloud which differentiated by the scope of services. The
Public Clouds are run by third parties. The applications from different customers are likely to
be mixed together on the clouds servers, storage systems, and networks. The public cloud
resource can be hosted in suitable geographic location in which the data communication and
energy cost is low. These resources can help reducing customer IT investment risk and
provide possibility of temporary enterprise expansion. In contrary, a Private Cloud is built for
intra-enterprise use. The enterprise has the utmost control over data, security, and quality of
service. The infrastructure and application is owned by a single enterprise therefore the
management has full control over application usage of cloud resources. The term Hybrid
Cloud combines concepts of both public and private cloud models. The hybrid cloud serves
not only intra-organization but also can serve external entity as well as receiving services
from outside entities.
In cloud computing, flexibility in rapid resource provisioning is the main advantage. A
user can request on additional computational resources on demand. On the other hand, a user
can also relinquish resources to reduce pay-per-use cost in an efficient manner. The resource
allocation systems must provide capabilities that can significantly reduce deployment time.
Key enabling technologies of cloud computing include virtualization technology, on-demand
deployment, Internet delivery of services, and open source software.
A physical hardware is decoupled from operating systems and applications using
virtualization technology. The concept of virtualization has long been established. With
continuously improve performance of computer systems, virtualization technology has gained
popularity in an enterprise IT as a tool for enhancing computational resource sharing.
Virtualization gives an additional layer of abstraction that securely isolates applications at the
operating system level. Several classes of applications deployed in large organization can
harvest more flexibility in maintaining efficient resource sharing using virtualization
technologies.
Recent advances in processor technology [2] further drive demands in OS-level
virtualization. In the current generation hardware, a multi-core processor significantly benefits
IT organizations by dramatically lowering the cost. Compared to traditional single-core
systems, systems utilizing multi-core processors will be less expensive, since only half the
number of sockets will be required for the same number of CPUs. By significantly lowering
the cost of multi-processor systems, multi-core technology will accelerate data center
consolidation, standardization and virtual IT infrastructure.
We anticipate large-scale deployment of virtualization images over multi-processor
systems. The high-end commodity multi-processor systems can easily scale up to 96
processors in the market. In our previous work [5], we focused on evaluating performance of
virtualization systems based on the Xen [6] virtualization platform. A group of experiments
was performed to evaluate an overhead of VMM. The experiments were set up such that
multiple instance of domain-U are executing concurrently on a high-end server. We presented
a set of results to show virtualization overhead which reflect on performance of standard
benchmark for both CPU performance and I/O performance. The performance degradation
from virtualization on VMM and virtual machine instances is demonstrated.
There are five major considerations on virtualization overhead(s) which includes context
switching overhead of CPU(s), disk access, network subsystem, and virtualization processing.
The normal operation without virtualization was used as a base case, so called native mode.
In our experiments, we used the group of benchmarks that are based on realistic
application workloads. The industry standard benchmarks and standard UNIX utilities were
selected which includes HPL [7], SPEC CPUint2000 [8], NetPerf [9], dd command,
SPECWeb99 [10], and Apache Benchmark [11]. These benchmarks can cover large portion of
requirements in enterprise environment. From the result of experiments, effects of
virtualization overhead were studied in three dimensions, CPU switching overhead, disk
access overhead, and network communication overhead.
The results indicate that the Xen virtualization overhead is very low. The context switch
overhead of CPU was accounted for 0.66% x Number of Virtual machines. However, Xen
F00015
March 23-26, 2010
473
virtualization is not performing very well in I/O intensive workload for both disk access and
network communication. The performance of I/O degrades significantly as a number of
virtualization domains increases. The networking pay-load overhead is varied between 4.0%
to 10.0% from the native cases in every number of virtual machines.
In practice, it is extremely difficult to evaluate the cloud computing systems. Several
limitations are imposed. There are quite a few grid and cloud computing simulators available
from many research groups. CloudSim is the simulation tool that evolves from GridSim
which provide flexibility in simulating behavior and performance of large-scale computing
systems. However, the CloudSim did not include the virtualization overhead. In order to
correctly estimate the cost of usage, we need to have adequate information on how the
virtualization overhead will affect the overall execution time that directly proportional to the
usage fee. We therefore used our measurement from our previous work and extend the
CloudSim to accommodate these factors. The modifications of the CloudSim framework
include CPU context switching and network pay load overhead. We tackled the problem on
the hidden cost of virtualization overhead on the cloud computing systems.
The evaluate the effects of virtualization overhead, we simulate computing environment
close to Amazon EC2 (Elastic Compute Cloud) [3] parameters. The Amazon EC2 is emerging
cloud computing systems which demonstrate the realization of virtual data center. A set of
simulation based on service instant of EC2 is given. The animation rendering is selected as a
workload. We found that instance of services provided by Amazon EC2 model has
significant impact on the total cost. In the worst case, the cost can increase by a factor of five
due to the overhead of virtualization on low service level subscription. The main contribution
of this work is the demonstration of the effect of virtualization overhead on different service
level agreement. The service subscribers have to take this consideration into account when
selecting services. The results can be used to predict behavior of large scale adoption of
virtualization and cloud computing.
In Section 2, extensions of CloudSim are presented. Design concept and experiment detail
is shown in Section 3. Experimental results are shown in Section 4. Concluding remarks are
drawn in Section 5.

2. CloudSim Extension and Amazon EC2 Model
2.1 CloudSim
The CloudSim [4] is the simulation tools for large scale computing environment. A new
layer of simulation software was added to model cloud computing over the GridSim. Several
additional aspects of cloud computing include supporting large scale Cloud computing
infrastructure, data center, service broker, scheduling, and allocation policy, the management
of virtualization image, and the allocation of processor cores.
The CloudSim architecture is shown in Figure 1. The SimJava is used in the discrete
event simulation. The well developed grid resources simulator (GridSim) is implemented in
the second layer. The CloudSim core model the cloud computing system in virtualization
environment. The virtualization management for virtual machine, memory, storage, and
bandwidth is provided. In the simulator, we can manage the instantiation and execution of
core entities (VMs, hosts, data centers, and applications). The system can extend up to
thousands of entities.
We create a new class to include the overhead of virtualization and network delay that
enclose the original CloudSim. The additional overhead is translated to the decrement of the
execution speed of the particular resource to simulate the delay from virtualization.

March 23-26, 2010
474

Figure 1. CloudSim Architecture

2.2 Amazon EC2
Amazon Elastic Compute Cloud (Amazon EC2) presents novel services in offering
business in computing utility. The EC2 is based on web service that provides flexibility in
redesign compute capacity in the cloud which makes scalability in computing resource
provisioning easier for developers. The resource is provided such that the user has a complete
control of hardware on the predefined Amazons proven computing environment. Amazon
EC2 reduces the time required to obtain and instantiate a new server to a matter of minutes.
This capability allows a user to quickly scale required computing capacity, both scaling up
and scaling down, on demand basis. Amazon EC2 changes the economics of computing by
allowing users to pay-as-you-go. Amazon EC2 provides developers the tools to build failure
resilient applications and isolate themselves from common failure scenarios. Amazon EC2
provides many types of instances as follow:
Standard Instances: This family suites most applications. It is further classified
into 3 categories: small (1 VCPU, 1.7GB of memory, 160GB of storage and
support 32-bit platform), medium (4 VCPU, 7.5GB of memory, 850GB of storage
and support 64-bit platform), and extra large (8 VCPU, 15GB of memory,
1690GB of storage and support 64-bit platform).
High CPU instances: This family represents the services for applications that
require more CPU resources compared to memory requirement and are well
suitable for computing intensive applications. The high CPU instance include 2
categories: high CPU medium instances (5 VCPU, 1.7GB of memory, 350GB of
storage and support 32-bit platform), and high CPU extra large instance (20
VCPU, 7GB of memory, 1690GB of storage and support 64-bit platform).
Amazon EC2 allows customers to pay for computing capacity by the hour with no long-
term commitments. This flexibilities from the costs and complexities of planning, purchasing,
and maintaining hardware and transforms what are commonly large fixed costs into much
smaller variable costs. The cost structures based on type of instances, resource region,
operating system class, storage capacity, network bandwidth usages and extra service usages.

In CloudSim architecture, Virtual Machine class is modeled as an instance of a VM. The
Host component manages the VM during its life cycle. A Host can simultaneously instantiate
multiple VMs and allocates processor cores based on predefined sharing policies. The sharing
policy can be either space-shared or time-shared allocation schemes. Every VM component
has access to a component that exhibits parameters related to a VM which include memory,
processor, storage, and the VMs internal scheduling policy. The VMScheduling component
is at the higher layer.

F00015
March 23-26, 2010
475
A Cloud-based application services is modeled in Cloudlet. Examples of these classes of
services include content delivery, social networking, and business workflow. Another
component, DatacenterBroker, models a broker, which is responsible for negotiating between
users and service providers. The negotiating contract depends on users QoS requirements and
deployment services.
We extend the CloudSim by deploying a new service class that supports virtualization
overhead. The CPU context switching and network overheads are the two additional items
that extend capabilities of the CloudSim. The modeling of the cloud market is used and adopts
Amazon EC2 as detail implementation. All experiments are performed using Amazon EC2
instances model. We use Parallel Rendering Project of Large-Scale Simulation Research
Laboratory (LSR), National Electronics and Computer Technology Center (NECTEC) as a
workload. The Big bug bunny [12], powered by Blender, are used in the simulation. The
total length of the animation rendering for 10 minutes video length at HD resolution, 1080p is
simulated. The experiment details and results are shown in the next section.

We first simulated a CPU context switching overhead of Amazon EC2 for both of
standard and high CPU instances. The rendering workload is set at 18,000 frames with 2.8
Tera-instruction per frame. The standard instance is set CPU rating at 1.8GHz (7,200MIPS)
and high CPU instance is set CPU rating at 3.2GHz (12,800MIPS). The average input file size
is 50kB and the average output file size in format of JPEG is equal to 5MB. We fixed the
number of instance to 16 instances for evaluating the total time of rendering, instances run
time cost, and network usage. The experiment results are compared with test cases executed
in the original CloudSim. Where the overhead of virtualization is not taken in to account. The
first experiment result is shown in Figure 2.

(a) (b)
Figure 2. (a) Average rendering time of several percentage of CPU context switch
overhead and (b) Average data transfer time

In Figure 2 (a), the percentage of CPU context switch is varied from 0% (no overhead),
0.66%, 2.0%, and 5.0%, respectively. The results show that the time of rendering increase as
the percentage of context switch overhead increase. The total computing time increases
112.0%, 147.0%, and 500.0% when context switch equal 0.66%, 2.0%, and 5.0%,
respectively. The results show that for standard instance with type small, low computing
resources, the performance with the virtualization degrade severely. The performance
degradation from virtualization overhead is not linear with the percentage of overhead.
Exponential performance loss is observed in several cases.
In the next experiment, we simulated the network pay-load overhead. The random
number of load between 4.0% to 10.0% was injected to the system. The network bandwidth is
varied from 100Mbps, 1Gbps and 20Gbps. The results show in Figure 2 (b). The network
overhead has shown a small impact for high speed network. The data transfer of overhead
system compared with native system have small significantly when connect with high speed
network. In the third experiment, we mixed the overhead of context switch and network by set

March 23-26, 2010
476
the CPU context to 0.66% and 1Gbps interconnection bandwidth. The result is shown in
Figure 3 (a), similar trend is observed. The results show the total overhead that are consistent
with the first experiment but have a bit longer of execution time, less than 0.1%. The context
switches have higher impact of the performance compared to the overhead from network
delay.

(a) (b)
Figure 3. (a) Average rendering time of original and extended CloudSim, (b) Total cost of
rendering on Amazon EC2

From our simulation results, we calculated the cost of rendering time to show the different
in cost of subscribing cloud computing. In this work, we used the pricing structure based on
price structure of Amazon EC2 as follow: Small instances $0.085/Hr., Medium instance
$0.34/Hr., Extra large $0.68/Hr., High CPU Medium $0.17Hr., and High CPU extra large
$0.68/Hr. All instance are based on Linux Operating System. The bandwidth usage is based
on data transfer in and out of Amazon EC2 in rate $0.17 per GB of first 10TB per month.
The result is shown in Figure 3 (b) the cost that customer must to pay for the lacking
performance of virtualization overhead. The hidden cost of virtualization is according to
execution time as same as the first experiment. Then, customers will be paid extra for
112.0%, 147.0% and 500.0%.

5. CONCLUSION
A large amount of research efforts has been invested in offering computational resources
as a cloud services. The Cloud technologies focus on defining novel methods, policies and
mechanisms for efficient managing Cloud infrastructure. In this work, we had studied the
effects of virtualization overhead on cloud computing in 2 dimensions, CPU switching
overhead and network delay. The CloudSim is used as a simulator and is extended in this
work to demonstrate the effect of virtualization overhead. The result of large scale animation
render is shown based on Amazon EC2 Model. We presented the overhead of virtualization
on performance of animation rendering based on cloud service models. The results show that
effect of context switch has more impact on the performance compared to network overhead.
The virtualization overhead can be translated to actual cost that the user has to pay. This
hidden cost is the virtualization cost that is needed to taken into consideration in the selection
process of service instance model offer by cloud providers. The extension of CloudSim help
predicting the provisioning of computing resources that can significantly reduce total cost for
each service request in the large scale cloud environment. In future work, the effect of I/O
subsystems will be included for several classes of applications that have lots of I/O activities
such as databases and web applications.

F00015
March 23-26, 2010
477
REFERENCES
1. R. Buyya, C. S. Yeo, and S. Venugopal, Market-oriented cloud computing: Vision, hype,
and reality for delivering IT services as computing utilities. International Conference on
High Performance Computing and Communication, 2008.
2. Geoff Koch, Discovering multi-core: Extending the benefits of Moore law,
Techonlogy@Intel Magazine, 2005.
3. Amazon Elastic Compute Cloud (Amazon EC2), http://aws.amazon.com/ec2.
4. Rodrigo N. Calheiros, Rajiv Ranjan, Cesar A. F. De Rose, and Rajkumar Buyya,
CloudSim: A Novel Framework for Modeling and Simulation of Cloud Computing
Infrastructures and Services, GRIDS-TR-2009-1, 2009.
5. S. Prueksaaroon, W. Konghaped, V. Varavithya, and S. Vannarat, Virtualization
Overhead of CPU and I/O Processing in the Xen Virtual Machines, In Proc. of the 11
th

Annual National Symposium on Computational Science and Engineering, 2007.
6. B. Dragovie, K. Fraser, S.Hand, T. Harris, A. Ho, I. Pratt, A. Warfield, P. Barham, and
R. NeugeBauer, Xen and the Art of Virtualization, Proc. of the 19
th
ACM SOSP, pages
164-177, 2003.
7. A. Petitet, R. C. Whaley, J. Dongarra, and A. Cleary, HPL A Portable Implementation
of the High-Performance Linpack Benchmark for Distributed-Memory Computers,
http://www.netlib.org/benchmark/hpl.
8. SPEC CPUint2000 benchmark, http://www.spec.org/cpu/CINT2000.
9. Netperf benchmark, http://www.netperf.org/svn/netperf2/tags/netperf-
2.4.3/doc/netperf.html.
10. SPECweb99 benchmark, http://www.specbench.org/osg/web99/workload.html.
11. Apache server benchmark, http://httpd.apache.org/docs/2.0/programs/ab.html.
12. Big Bug Bunny Rendering, http://www.bigbuckbunny.org.

ACKNOWLEDGMENTS
The authors acknowledge National Electronics and Computer Technology Center (NECTEC)
for providing computing resources that have contributed to the research results reported
within this paper.

F00016
March 23-26, 2010
478
Impact of Workloads on Fair Share Policies

Sangsuree Vasupongayya
Department of Computer Engineering, Faculty of Engineering, Prince of Songkla University
E-mail: vsangsur@coe.psu.ac.th; Fax: 074-212895; Tel. 074-287360

ABSTRACT
Achieving fair share is one of many objectives in scheduling parallel computer jobs. It
has been shown recently that a simple fair share model supported in many widely used
production parallel computer job schedulers can achieve fair share objective up to a
certain level. That is, heavy-demand users are prevented by this simple fair share model
(RelShare) from overtaking the system resources. However, users with mixture of jobs
that do not spread their jobs out suffer poor performances due to the current priority
mechanism used by RelShare model. To solve such problems, a modified goal-oriented
parallel computer job scheduling policy (i.e., Tradeoff-fs(Tw:avgX)) is recently
proposed. The experimental results show that this newly proposed policy achieves good
scheduling and fair share performances. Furthermore, the users who suffer under
RelShare are no longer suffered under Tradeoff-fs(Tw:avgX) policy. However,
Tradeoff-fs(Tw:avgX) has only been evaluated on a single workload. It is widely
accepted that the workload can have an effect on scheduling performances. Therefore,
in this work, Tradeoff-fs(Tw:avgX) and RelShare policies are further evaluated on two
different characteristic workloads. The differences are that these workloads allow
longer runtimes and larger processors. The results in this work show that even with
different workloads, Tradeoff-fs(Tw:avgX) policy still outperforms RelShare and other
priority backfill policies on both scheduling and fair share performances. Furthermore,
the experimental results show that users with mixture of jobs suffer larger undershare
under RelShare policy when jobs are allowed longer runtime.

Keywords: Fair share, Goal-oriented parallel job scheduling policies, backfilling.

1. INTRODUCTION
A parallel computer job scheduler task is to select a set of waiting jobs to be executed on
idle resources according a set of objectives. Most parallel computer systems are non-
preemptive meaning a job will run until completion without any interruption once it starts.
The users usually submit their jobs along with the job requirements such as runtime estimates,
number of processors and amount of memory. Some scheduling policies utilize this job
information in order to make their scheduling decisions. Typical scheduling objectives
include minimizing wait time, maximizing throughput, maximizing system utilization and fair
share. The fair share objective is focusing on correctly assigning the system resources to users
according to a set of rules. For example, an equal fair share objective means that each user
will be assigned equal resources.
It has been shown recently that a simple fair share model (namely relative share (RelShare)
model) currently supported by many production parallel computer job schedulers (e.g., PBS
[1],[2], LSF [3], IBM LoadLeveler [4], Maui/Moab [5],[6]) can only achieve fair share
objective up to a certain level [7]. That is, heavy-demand users are prevented from overtaking
the system resources. However, the implementation of such model leads to a performance
problem. More specially, the users with mixture of jobs that do not spread their jobs out have
been shown to suffer poor performances under such fair share policies. To solve this poor
performance problem, a modified goal-oriented parallel computer job scheduling policies
(namely Tradeoff-fs(Tw:avgX)) are proposed in [8]. The experimental results presented in [8]
showed that Tradeoff-fs(Tw:avgX) achieved good scheduling performances and good fair
F00016
March 23-26, 2010
479
share performances. Furthermore, the experimental results also showed that users who
suffered under RelShare were no longer suffered under Tradeoff-fs(Tw:avgX) policy. The
experimental results in [7,8] were conducted using the same workload (i.e., IA-64). However,
it is widely accepted that the workload can have a significantly impact on the scheduling
performances [9,10]. Therefore, in this work, Tradeoff-fs(Tw:avgX) and RelShare policies are
further evaluated using two workloads available on [11]. These two workloads have different
characteristics and provide difficult challenges to the scheduler. These challenges include
longer runtimes (i.e., 60 hours for KTH and 168 hours for SDSC-BLUE) and larger
processors (i.e., 100 single-processor nodes for KTH and 144 8-processor nodes for SDSC-
BLUE) than those of IA-64 workload.
The remaining of this paper is organized as follow. Section 2 provides details information
of each policy studied in this work. Section 3 gives information of workloads, performance
measures and experimental settings. The results and discussions are given in Section 4.
Finally, the conclusion is given in Section 5.

Three types of parallel computer job scheduling policies in this study are described next.
The first set of policies is priority backfill policies which are the most simple and common
policies. The second set of policies is RelShare policies which are the current implementation
of fair share policies supported on many production schedulers. The last set of policies is
Tradeoff-fs(Tw:avgX) policy which is a variant of goal-oriented parallel computer job
scheduling policies proposed specifically for handling fair share objectives.
The most common scheduling policies supported by all production parallel computer job
schedulers are priority backfill policies. All waiting jobs are prioritized by some job measures
such as wait time or number of processors or a combination of a set of job measures. The
scheduler considers waiting jobs for executions according to the job priorities. If the idle
resources are not enough for the highest priority job to start, no job can start. Thus, the
resources will be left idle even when there are small low priority jobs that can use the
resources resulting in low overall system utilization. To improve the overall utilization, a
backfilling technique is proposed [12]. Under backfilling scheme, if the highest priority job
cannot start its execution due to insufficient system resources the job will be given a
reservation--the earliest time enough resources became available. Then, instead of stopping,
the scheduler will continue to consider waiting jobs that are small enough to utilize the idle
resources. If the executions of these jobs do not affect the reservation time of the highest
priority job, the jobs will be given the idle resources.
The next set of policies is RelShare policies which are supported on many widely used
production parallel computer job schedulers. Under a relative fair share policy, the jobs are
prioritized according to their owner's fair share priority values. The scheduler considers jobs
for executions according to the job priority. Backfilling technique is also supported to
improve the system utilization. A fair share priority of each user defines the user priority
relative to other users in the system. The fair share priority of each user is dynamically
computed from the user entitle share and cumulated usage. The cumulated usage of each user
is the amount of resources that the user actually used within the current fair share window--a
period of time that the usage is cumulated which is a parameter of the relative fair share
model. The entitled share of each user is calculated according to the current fair share
objective and the demand of all active users during the fair share window. Typical fair share
window is one day. In this work, a relative fair share policy with one-day fair share window is
denoted by RelShare(1d).
Since the experimental results in [7] showed that RelShare(1d) policy leaded to poor
scheduling and fair share performances for some users. Tradeoff-fs(Tw:avgX), a modified
goal-oriented parallel computer job scheduling policies was introduced in [8]. The original
goal-oriented parallel computer job scheduling policy (Tradeoff(Tw:avgX)) was designed to
handle a set of scheduling objectives that are conflicted with each other [13].
Tradeoff(Tw:avgX) manages to find a good schedule under a conflicting set of objectives
using a search technique together with an objective model. When a scheduling decision is
F00016
March 23-26, 2010
480
needed, Tradeoff(Tw:avgX) searches the space of all possible schedules and find a 'good'
schedule according to the objectives within a time limit. At each scheduling decision point,
the space of all possible schedules can be very large. Thus, the searching will end when a time
limit is reached. However, Tradeoff(Tw:avgX) policy had been shown to find a 'good'
schedule even when a limited time is given because of the search algorithm that is engineered
to consider various different schedules. Instead of spending a lot of time searching at similar
schedules, the search engine in the goal-oriented scheduler skip similar schedules and explore
different schedules to find a better schedule. To do so, the search engine employed a
discrepancy-based search technique [14]. The original goal-oriented policy is modified to
handle fair share objective as well as the scheduling performances in [8]. The modification
was done to bias the search process to the schedules that favors fair share performances. This
way, the schedules with good fair share performances will be considered first by the modified
policy. The modified policy is denoted by Tradeoff-fs(Tw:avgX).

3. EXPERIMENTAL SETTINGS
The workloads, performance measures and experimental settings used in this study are
described in this section. Two workloads available at [11] are studied in this work. First, a
ten-monthly KTH workload allows 60-hour runtime limit and 100 computation nodes.
Second, a twelve-monthly SDSC workload allows 168-hour runtime limit and 1152
computation nodes. These two workloads provide different characteristics from the original
workload used in [7,8], which only allows 24-hour runtime limit and provides a total of 128
computation nodes.
Both scheduling and fair share performances are evaluated in this work. The scheduling
performances include wait time and slowdown which is a ratio of the job turnaround time to
its runtime. Both maximum and average measures are presented. To measure a fair share
performance, the per-user fair share performance namely dev is used. The per-user dev
performance measures the difference between the user's entitle share and the actual usage of
each user [7]. Only equal share of users is focused in this work. That is, the scheduler should
assign the system resources equally among all active users. Non-equal share of users is left for
future work.
All policies are evaluated using an event-driven simulator. A monthly performance of
each policy is computed for the job submitted during the month. To be realistic, each
simulation requires a warm-up and a cool-down period. The simulator is preloaded with the
jobs from a week prior to the month in which the performance is measured. This preloading
process is called a warm-up period. While, the jobs from the next month are continued to
arrive until the jobs of the measured month have all started. This period is called a cool-down
period.
The focus in this work is a non-preemptive parallel computer system. Under such scheme,
a job will run until completion without any interruption once it starts. The platform is a
homogenous single partition parallel computer system. Under such architecture, a job can be
as large as the entire system. All computation nodes are the same. The scheduler only
considers jobs according to the number of computation nodes requirements. Thus, the
memory requirement is not considered. To study the full potential of each policy, the perfect
runtime information of each job is assumed. That is, the scheduler will know exactly how
long each job will take and how many computation nodes the job will require when the job
arrives. However, the scheduler will not know when the job will enter the system (on-line
environment).

4. RESULTS AND DISCUSSIONS
This section is organized as follows. First, overall scheduling performances of each policy
are presented. Next, the fair share performances of each policy are given. A detailed analysis
and discussions are given after each set of results.
Figure 1 shows the overall scheduling performances of FCFS-backfill, LXF-backfill,
RelShare(1d) and Tradeoff-fs(Tw:avgX) policies under KTH workload. Figure 1(a)-(c) show
F00016
March 23-26, 2010
481
the maximum wait, the average wait and the average bounded slowdown performances,
respectively. Figure 2 shows the same set of results under SDSC workload. As expected, the
Tradeoff-fs(Tw:avgX) policy provides best or close to the best scheduling performances on
all measures of both workloads studied. While, the FCFS-backfill policy provides the best
maximum wait performance, it suffers poor average wait and bounded slowdown
performances because small jobs can be blocked by the older large jobs. The LXF-backfill
policy provides good average performance measures however it produces poor maximum
wait performance. Many may consider poor maximum wait performance as outlier or non
significant. However, these poor maximum wait performance jobs usually are large jobs, long
jobs or large-and-long jobs. Allowing these jobs to suffer poor performance may create an
incorrect impression on parallel computer users. That is, even though the center may provide
a large number of computation nodes, the users should only submit small to medium jobs.
Otherwise, their jobs will have to wait for a long time. RelShare(1d) provides very poor
maximum wait performances under both workloads. This result is similar to RelShare(1d)
performances reported in [7,8].

0
100
200
m
a
x

w
a
i
t

(
h
r
)
1
0
/
9
6

1
1
/
9
6

1
2
/
9
6

1
/
9
7

2
/
9
7

3
/
9
7

4
/
9
7

5
/
9
7

6
/
9
7

7
/
9
7

FCFS-backfill
LXF-backfill
RelShare(1d)
Tradeoff-fs(Tw:avgX)

0
1
2
3
a
v
g

w
a
i
t

(
h
r
)
1
0
/
9
6

1
1
/
9
6

1
2
/
9
6

1
/
9
7

2
/
9
7

3
/
9
7

4
/
9
7

5
/
9
7

6
/
9
7

7
/
9
7

FCFS-backfill
LXF-backfill
RelShare(1d)

0
50
a
v
g

b
o
u
n
d
e
d

s
l
o
w
d
o
w
n
1
0
/
9
6

1
1
/
9
6

1
2
/
9
6

1
/
9
7

2
/
9
7

3
/
9
7

4
/
9
7

5
/
9
7

6
/
9
7

7
/
9
7

FCFS-backfill
LXF-backfill
RelShare(1d)

(a) Maximum wait (b) Average wait (c) Average bounded
slowdown

Figure 1. Overall scheduling performances of each policy under KTH workload.

0
200
400
m
a
x

w
a
i
t

(
h
r
)
6
/
0
0

7
/
0
0

8
/
0
0

9
/
0
0

1
0
/
0
0

1
1
/
0
0

1
2
/
0
0

1
/
0
1

2
/
0
1

3
/
0
1

4
/
0
1

5
/
0
1

FCFS-backfill
LXF-backfill
RelShare(1d)
0
1
2
a
v
g

w
a
i
t

(
h
r
)
6
/
0
0

7
/
0
0

8
/
0
0

9
/
0
0

1
0
/
0
0

1
1
/
0
0

1
2
/
0
0

1
/
0
1

2
/
0
1

3
/
0
1

4
/
0
1

5
/
0
1

FCFS-backfill
LXF-backfill
RelShare(1d)

0
10
20
30
a
v
g

b
o
u
n
d
e
d

s
l
o
w
d
o
w
n
6
/
0
0

7
/
0
0

8
/
0
0

9
/
0
0

1
0
/
0
0

1
1
/
0
0

1
2
/
0
0

1
/
0
1

2
/
0
1

3
/
0
1

4
/
0
1

5
/
0
1

FCFS-backfill
LXF-backfill
RelShare(1d)

(a) Maximum wait (b) Average wait (c) Average bounded
slowdown

Figure 2. Overall scheduling performances of each policy under SDSC workload.

This poor performance of RelShare(1d) policy is caused by the poor performance of users
with mixture of jobs. Typically, the users with mixture of jobs will submit jobs with various
sizes within a short period of time (possibly within the same fair share window). Usually,
large jobs are difficult to be executed unless they are given reservations. To get a reservation,
the job must have the highest priority among all waiting job. Under RelShare(1d) policy,
however, the job priority is defined by its owner fair share priority value. Since the user has
mixture of jobs, if any small or medium size job of the user is executed, the user fair share
priority will be reduced. This is in turn reducing the priority of the user's remaining jobs. The
other large jobs of the user will have a low priority making it difficult to get a reservation. As
F00016
March 23-26, 2010
482
a result, these large jobs will have to wait until the end of the fair share window when their
owner fair share priority is increased. Since the usage of the previous fair share window of
any user will not be carried over to the next fair share window, the fair share of an active user
will be adjusted at the end of each fair share window. These users will suffer undershare
greatly. The problem is getting worse when longer runtimes are allowed (i.e., 168-hour
runtime jobs under SDSC workload), as seen in Figure 2(a)).
Next, the fair share performances of all four policies are presented. Figure 3-4 show the
fair share performance of all four policies under both workloads. Figure 3(a) and 4(a) show
the average absolute per-user dev performance of all policies under KTH and SDSC
workloads, respectively. While, Figure 3(b) and 4(b) show the total absolute per-user dev
performance of all policies under KTH and SDSC workloads, respectively.

0
200
400
a
v
g

a
b
s
(
d
e
v
)
1
0
/
9
6

1
1
/
9
6

1
2
/
9
6

1
/
9
7

2
/
9
7

3
/
9
7

4
/
9
7

5
/
9
7

6
/
9
7

7
/
9
7

FCFS-backfill
LXF-backfill
RelShare(1d)
0
1
2
x 10
5
s
u
m

a
b
s
(
d
e
v
)
1
0
/
9
6

1
1
/
9
6

1
2
/
9
6

1
/
9
7

2
/
9
7

3
/
9
7

4
/
9
7

5
/
9
7

6
/
9
7

7
/
9
7

FCFS-backfill
LXF-backfill
RelShare(1d)

(a) Average absolute dev (b) Total absolute dev

Figure 3. Fair share performances of each policy under KTH workload.

0
1000
2000
a
v
g

a
b
s
(
d
e
v
)
6
/
0
0

7
/
0
0

8
/
0
0

9
/
0
0

1
0
/
0
0

1
1
/
0
0

1
2
/
0
0

1
/
0
1

2
/
0
1

3
/
0
1

4
/
0
1

5
/
0
1

FCFS-backfill
LXF-backfill
RelShare(1d)

0
1
2
x 10
6
s
u
m

a
b
s
(
d
e
v
)
6
/
0
0

7
/
0
0

8
/
0
0

9
/
0
0

1
0
/
0
0

1
1
/
0
0

1
2
/
0
0

1
/
0
1

2
/
0
1

3
/
0
1

4
/
0
1

5
/
0
1

FCFS-backfill
LXF-backfill
RelShare(1d)

(a) Average absolute dev (b) Total absolute dev

Figure 4. Fair share performances of each policy under SDSC workload.

Similar to the results observed in [8], Tradeoff-fs(Tw:avgX) policy produces the
lowest on both total and average absolute per-user dev performances under both
workloads studied. The smaller differences indicate a better fair share performance
because each user receives his/her share of resources fairly. From the results shown in
the previous section and this section, it can be concluded that Tradeoff-fs(Tw:avgX)
policy not only provides good scheduling performances but it also reduces the
difference between the user's entitle share and the user's actual usages. Interesting
results here are the performance of LXF-backfill policy which are very close to that of
Tradeoff-fs(Tw:avgX) policy. However, as mention above LXF-backfill policy causes
poor maximum wait performance while Tradeoff-fs(Tw:avgX) policy does not.

F00016
March 23-26, 2010
483
5. CONCLUSION
The results in this work show that even with different workloads, Tradeoff-fs(Tw:avgX)
policy is still outperform RelShare and other priority backfill policies on both scheduling
performances and fair share performances. Furthermore, the experimental results show that
users with mixture of jobs suffer larger undershare under RelShare policy when jobs are
allowed longer runtimes. Future research directions include evaluating the policies under
inaccurate runtime information and under non-equal share objectives.
Since the real production schedulers do not have perfect job runtime information.
However, the job runtime information is required by most scheduling policies to make their
scheduling decisions. In the real production schedulers the users must provide the job runtime
information during the job submission period. The runtime information is known to be
inaccurate [15]. Thus, the future work should study the performance of each policy when
inaccurate user-provided runtime estimates are used. Furthermore, the Tradeoff-fs(Tw:avgX)
policy studied in this work and in [7,8] is only focusing on an equal share. In the real
situation, some users may have higher privilege than other users. For example, a professor
may have a higher share of resources comparing with a student. Thus, Tradeoff-fs(Tw:avgX)
and other policies should be evaluated on a non-equal share objective situation.

REFERENCES
1. OpenPBS, http://www.nas.nasa.gov/Software/PBS/
2. PBS pro, http://www.pbspro.com
3. LSF, http://www.platform.com/product/ lsffamily.
4. S. Kannan, M. Roberts, P. Mayes, D. Brelsford & J. Skovira. "Workload management
with LoadLeveler". Technical Report, IBM Redbook, 2001.
5. Maui scheduler, http://www.supercluster.org/maui
6. Moab scheduler,
http://www.clusterresources.com/products/mwm/docs/moabadminguide450.pdf
7. S. Vasupongayya, Impact of Fair Share and its Configurations on Parallel Job Scheduling
Algorithms, in Proc. of WASET Int'l Conf. on HPC, Venice, Italy, Oct. 2009, 560-568.
8. S. Vasupongayya, Achieving fair share objectives via goal-oriented parallel computer job
scheduling policies, in Proc. of WASET Int'l Conf. on CSE, Bangkok, Thailand, Dec.
2009.
9. S.-H. Chiang & M. Vernon, Characteristics of a large shared memory production
workload, in Proc. of Workshop on JSSPP, 2001.
10. D. Feitelson, A survey of scheduling in multiprogrammed parallel systems, Technical
Report Research Report RC 19790 (87657), IBM T.J. Watson Research Center, 1997.
11. Parallel workload archive, available at http://www.cs.huji.ac.il/labs/parallel/workload/.
12. D. Lifka, The ANL/IBM SP scheduling system. in Proc. of Workshop on JSSPP, 1995.
13. S.-H. Chiang & S. Vasupongayya, Design and potential performance of goal-oriented job
scheduling policies for parallel computer workloads. IEEE Trans on Parallel and
Distributed Systems, 19(12):1642-1656, 2008.
14. T. Walsh, Depth-bounded discrepancy search, in proc. of IJCAI, 1997.
15. T.Tsafrir, Y. Etsion & D. Feitelson, Backfilling using system-generated predictions rather
than user runtime estimates, IEEE Trans on Parallel and Distributed Systems, 18(6):789-
803, 2007.

F00019
March 23-26, 2010

484
Automatic Predictive URL-Categories Classification to UM
Model using Decision Tree Model

K. Silachan
Department of Computer, Faculty of Science, Silpakorn University, Nakornpratom, 73000, Thailand
C
E-mail: klao_99@yahoo.com; Fax: 034-272923; Tel. 081-8564346

ABSTRACT
The discovery navigation pattern is important for web usage mining because the web
usage mining aims to discover any access web data and also indicates the users
behavior on their web accessing. As a result of this discovery, it needs a technique to
categorise and study the users behaviour. This paper presents a new approach to
predict both accuracy of the category and also find out the users' behavior on the web
browsing. The method of category management is called predictive web-URL-
category tree is also proposed in this paper. UM modeling is the navigation pattern
constructed by statistic model and applying decision tree model for predictive
automatic web categories classification.

Keywords: Web Usage Mining, Web-URL-category, Predictive Web-Categories
Classification, UM Modeling, Decision Tree Model.

1. INTRODUCTION
Web usage mining is a subset of web mining to discover usage pattern for web log data in
order to understand the needs of web users. Web usage mining focuses on techniques that
predict users behavior while user interacts with the web ,but not isolated. It mainly
concentrate on the technique to discover the users usage pattern[13]. When the users access
to website. The analyzing can help to determine the lifetime value of users or customer.
Moreover, the outcome of the study can be useful to business in terms of marketing strategy
and to be effective of the business promotion campaign[4,7,11]. But in this study might be
benefit to education on their web page better improvement. So, Classification technique of
web usage are often used to associate group of categories and prediction measure to
accuracy[4] because classification technique is a technique to map data into a group. It
requires extraction and selection features to best describe properties of the class or the
category [5,9].
In this paper, the main purpose of the study is to approach classification URL-
categories of website and it is to predict URL-categories .It also aims to measure the accuracy
of data presents to user model. The classification and prediction technique using web usage
based on decision tree induction. The study uses the decision tree induction to automate
website classification by URL-categories, and perform matrix measurement to predict the
accuracy on the data. It is called predictive categories-URL tree[6].
This process of study is divided into five sections. First, is an introduction which
describe overall of the study. Second, explains related theories and other works linked to the
study. Third discusses about the methodologies used in the study. Fourth, discuss about
the result and the last is the conclusion.

F00019
March 23-26, 2010

485
2. THEORIES AND RELATED WORKS
The Decision Tree can generate data from collected websites. After putting the data into
tree algorithm, it will be transform to be the Predictive URL-categories classification
approach[35]. In general ,tree consists of many algorithms which is used to build decision
tree. These consist of : J48 (which is based on original C4.5 algorithm), Random Tree ,
Random Forest and finally REP Tree [1,7]. User navigation pattern has technique to learn the
information provide that would be better view of improvement of effectiveness on the
websites[12,13]. There were many researchers studied on predictive classification and user
navigation pattern or navigation user behavior in categories, such as[10]
Lindmann and Littig (2007) propose a novel method for the classification of web sites.
The method exploits both structure and content of websites in order to discern their function
[8].
Chekuri, H.Goldwasser, Raghavan,Eli Upfal (1999) concentrate on a new architecture
for web search using automatic classification. The detail about combination context-free
syntactic search with context-sensitive searching guided by classification. The classification
process base on statistic and term-frequency analysis[3].
Chih-Ming,Chen,Hahn-Ming,Lee,Yu-Jung Chang (2009) focus on webpage classifiers ,
It urgently needs to help the growing qualitative and quantitative demands for information
from the WWW and raise the problem of feature selection in web page classification and text
categorization[3].

3. EXPERIMENTAL
3.1 FEATURES
Web usage call log file ,that it is used to classify URL-categories. The pattern of
this access log which was uses for this study displayed in Table1.

Table 1. Access logs
Session-ID IP Address Date&Time URL accessed
970 10.1.50.29 2009-12-08
03:15:53
www.virginradiothailand.com
1008 10.1.50.52 2010-01-08
15:17:45
www.hi5.com
1018 10.1.52.78 2010-01-14
09:20:31
222.135.144.145

In this process , all data using for URL-categories must be passed the process of
preprocessing. It was data cleaning, which was considered and rejected some un-
determine and some unwanted data. This process also categories following URL
feature which would cut off the token(/) and "http". For example, the
http://www.yahoo.com Was cut off and it had only yahoo.com.

3.2 PROPOSE APPROACH
The web usage mining process can be classified into the productive URL-categories
classification which is based on CRISP-DM methodology(Cross-Industry Standard Process
for Data mining.)
Step 1: Construct the database categories for the reference : collect the URL on
database table by classification which normally has 25 categories. They are : Organization
,Art ,Business, Education, Electronic , Entertainment, Finance, Family, Fashion, Health,
News, Science, Search engine, Technology , Recretion , Reference and identify dotcom class
database ,We focus categories form website www.sanook.com, www.dmoz.org .This study
F00019
March 23-26, 2010

486

added 3 more categories which based on international websites,IP categories and another
group related on .com type.
Step 2: Preprocessing process : It is a process of web usage mining. This preprocessing,
the approach maps the usage data on the web server into the relational database. It then will
be in data cleaning method that it removes information is not useful and unwant
requirement from the log file.
Step3: Construct the training set and evaluate accuracy of the training set : The
training set consists of one or more predictive attributes and also has the class label.
The training set built from out comes from step1 and step2 which means its a new
appropriate training set.
Step 4: Building URL-categories Classification Model : This process use decision tree
algorithm <ML> for automatic classifier.
Step5: Evaluation and prediction the performance accuracy.

3.3 Propose Model
The process model for predictive URL-categories classification is as follows:

Categories URL Classification
Classification URL-categories access log into binary tree
Build a Decision tree for Classification
For Classification URL-categories states is not clear
Discover URL-categories tree model for each state
Categories-URL Prediction:
For each coming testing set from proxy server
Find its closest classification
Use Decision tree model to make prediction
If the predictions are made by states that do not belong to a majority class
Use decision tree model to make a revised prediction

3.4 Web URL-categories navigation pattern to UM
User model for accessing log help to see user behavior on URL-categories.
Practically almost as many users model define as systems exploiting information
about the user that some trends and general ideas may serve as a basis for building
more complex and accurate user models . The user model contains information about a
user/group that the information uses for communication between users and system. Then the
out comes can be used to improve what we seek. In this study presents and seek for users
behaviors on web navigation pattern by using statistical and data analysis method, which
much focuses on frequency URL-categories.
Frequency measures the occurrence count the number of categories much use.

x (1)

X is a record of the session-id in the same group
n is the number of categories etc.Group
1
, Group
2
,.Group
n

F00019
March 23-26, 2010

487

is frequency of categories
Propose algorithm
Frequency of counting user session-ID for categories navigation pattern is proposed
algorithm as follows :

(1) Function UM-categories( )
(2) Initialize X is record of session-id Group
(3) Assign n where n {1,n}
(4) For Categories
1
to Categories
n

(5) Compute the equation
X
(6) End for

3.5 Performance Measure
The system performance evaluation to evaluate the experimental results of predictive
classification . The evaluation predictive categories-URL classification is about prediction
to URL-categories classification to accuracy. Performance matrix to frequency are define
as :

Table2. performance matrix

The system is divided into three part: Dataset, Performance Measure.The
performance part function is divide two subpart is the training set for classification
and the test set for prediction.

4.1 Data Set
The Data set used in this study using data from the Nakornpratom Rajaphat
University web server (http://www.npru.ac.th).We extracted access entries from
October 2009- January 2010 and random data to training set. After preparation we set
contain 400 instance and use categories contain form 25 categories from
http://www.sanook.com and http://www.domaz.org to reference categories.

4.2 Performance Measure
Predictive Classification Tree performance see in table2. Several tree algorithms are
applied to each dataset and then evaluated for accuracy by using 10-cross-validation strategy
10-cross-validation (10- CV) is a standard way of predicting the error rate. To perform 10-
Precision TN
TP+FP
Recall

TP
TP+FN
Weighted Accuracy Acc
+
+ (1-) Acc
-

F-measure

2xPrecision x recall
Precision + Recall
True Positive Rate(Acc
+
) TP
TP+FN
Error rate

Number of wrong predictions
Total number of predictions
F00019
March 23-26, 2010

488
CV a dataset is separated into ten approximately equal portions, each of which is used in
turn for testing with the other nine being used for training (meaning that ten iterations are
performed in total) [17].

Classification Tree algorithms Training set
Precision ACC+ (Recall)
Accuracy(%)
C4.5 .401 .616 61.6048%
REPTREE . .461 .599 59.8654%
RANDOM FOREST .471 .607 60.745%

Predictive
Classification Tree algorithms Test set Wt- Accuracy
ACC+ Precision Recall F-measure
Accuracy(%)
C4.5 .625 .422 .483 .494 61.6406
REPTREE .602 .362 .602 .452 60.1719
RANDOM FOREST .991 .989 .991 .99 99.1404

Tree algorithm for training set is c4.5 better Random Forest and Reptree but
prediction classification to URL-categories the most accuracy is Random Forest the
value
prediction accuracy 99.1404 % .
The URL-categories user to navigation pattern from a number of experimental data
consists of 12 URL-categories consists of
- Organization - IP-categories - International web

- Entertainment - Education - Dotcom

- Search Engine - Internet - Business

- Shopping - Computer - News

5. CONCLUSION
In this paper, the study presents a new appropriate classification URL-categories and
predicts categories accuracy including focusing on URL-categories pattern.The study found
that data mining by categories and predictive technique to search for navigation pattern and
point to user behaviors in categories that produces UM-categories algorithm.This study
collected data from the Nakorn parthom Rajaphat Universitys website during the end of
December 2009 to the beginning of January 2010 by using randomize the data. The purpose
is to classification categories by the users behaviors to compare with reference categories
database(on step1). The researcher has also build a new appropriate training set by using tree
algorithm.The future work recommends to focus on a large data set form the web server of the
university. It should be concentrate different academic term times to point out different web
searching in difference time and that will useful to the university web page information
improvement.

F00019
March 23-26, 2010

489

REFERENCES

1. Breiman, L., Friedman, J., Olshen, R., and Stone, C., Classification and Regression Trees,
Boca Raton, FL: CRC Press, 1984.
2. Chekuri, C., Goldwasser H., Raghavan, M., P. , and A., Upfa, Web Search using Automatic
Classification, In Sixth World Wide Web Conference, 1999.
3. Chen, C., Lee, H., and Chang, Y ., Two novel feature selection approaches for web page
Classification, Expert Systems with Applications, Volume36, 2009, Pages260-272.
4. Facca, F., and Lanzi, P., Recent development in web usage mining research, DaWak2003,
LNCS2737, pp.140-150, 2003.
5. Gang, F., Sheng Ma. G., and Jing, H., Web Navigation Patterns Mining Based on
Clustering of Paths and Pages Content, International Workshop on Web-Based Internet
Computing for Science and Engineering (ICSE), 2006.
6. Jovanovic , N., Milutinovic, V., and Obradovic, Z ., Foundations of predictive Data
mining,Neural Network Applications in Electrical Engineering , NEUREL '02, 6th
Seminar on.page(s): 53- 58, 2002.
7. KoHavi, Ron.,Quinlan, R., Decision Tree Discovery, Handbook of datamining and
knowledge, Pages: 267 - 276 , 2002.
8. Lindemann, C.,and Littig, L., Classify Web Sites, International World Wide Web
Conference Proceedings of the 16th international conference on World Wide Web, Pages:
1143 1144 , 2007.
9. QI, X., and D.DAVISON,B.,WEB PAGE Classification : Feature and Algorithm, ACM
computing Surverys, Vol 41, No.2, Article 12, 2009.
10.Ramadhan, H., Hatem, M., A 1-Khanjri,Z., and Kutti, S., A Classification of technique
for Web usage Analysis, Journal of computer science (413-415), 2005.
11.Srivastava, J., Cooley, R., Deshpande, M., and Tan, P-N., Web usage mining : discovery
and application of usage patterns from web data, SIGKDDExplorations,1(2):12-2003,
2000.
12.Tahir Hassan, M., Nazir Junejo, K., Karim, A., Learning and Predicting Key Web
Navigation Patterns Using Bayesian Models, Proceedings of the International Conference
on Computational Science and Its Applications, Pages: 877 - 887 , 2009.
13.Wang, Y., Web mining and knowledge Discovery of usage patterns, ACM SIGKDD JAN
,2000.

Computational
Science
and
Engineering

1
Use of Genetic Algorithm in Computing the
Capacity of a Discrete Memoryless Channel
M. Sabri and A. Anpalagan
Department of Electrical and Computer Engineering
Ryerson University, Toronto, Canada
Abstract Finding the channel capacity and the
corresponding input probability distribution for an
arbitrary digital channel is a challenging optimiza-
tion problem. Classical optimization methods are
not suitable due to inherent complexity of nding
derivative of the average mutual information with
respect to input symbol probabilities. In this pa-
per, the Genetic Algorithm (GA) is proposed as a
search technique to nd (i) the capacity of discrete
memoryless channel and (ii) the corresponding in-
put probability distribution. Numerical examples
are given to demonstrate the eectiveness of the
GA-based approach.
KeywordsShannon capacity, information theory,
discrete memoryless channel, genetic algorithm
I. Introduction
Shannon channel coding theorem [1] has for-
mulated the maximum reliable transmission over
discrete memoryless channel. The formulation is
based on maximizing average mutual information
between source and destination symbols. In cases
where the channel is symmetric and inputs are
equi-probable, channel capacity can be calculated
relatively easily. As well known among coding
theorists, nding the channel capacity and the
corresponding input probability distribution is a
challenging optimization problem for any arbitrary
channel. Classical optimization methods such as
Newton-Raphson or Steepest Descent are not suit-
able for this problem due to inherent complexity
of nding derivative of the average mutual infor-
mation with respect to input symbol probabilities.
Therefore, intelligent search methods would be a
proper solution for these kind of problems.
In this paper, Genetic Algorithm (GA) is pro-
posed as a search technique to nd the capacity of
discrete memoryless channel and the correspond-
ing probability distribution. The genetic algorithm
basically starts with generating a random set of so-
lutions which is named population. According
to the tness of each individual solution, a new
set of solutions is generated through crossover and
mutation as explained later in the paper. Creating
new generations is continued until desired result is
achieved. This method is relatively easy to use and
two examples are provided to show the eectiveness
of the proposed GA-based approach.
We describe the problem of computing the Shan-
non capacity for discrete memoryless channel in
Section II. In Section III, genetic algorithm is de-
scribed. Numerical results are stated in Section IV
and, nally paper concludes in Section V.
II. Discrete memoryless channel
The general block diagram of Discrete Memory-
less Channel (DMC) is shown in Fig. 1. For simplic-
ity, modulator (and demodulator) units are omit-
ted. Since we are concerned with information the-
ory concept of the channel, omission of modulator-
demodulator units does not reduce the generality of
our discussion and conclusion. Noisy discrete chan-
nels are described mathematically in terms of three
parameters. These parameters are nite channel
input alphabet X, nite channel output alphabet
Y and the channel transition probability function
p(y
1
, ..., y
n
|x
1
, ..., x
n
) . The channel will generate the
output string y
1
, ..., y
n
in response to input string
x
1
, ..., x
n
where x
k
X and y
k
Y.
A noisy channel is memoryless if channel proba-
bility function is equal to multiplication of individ-
ual conditional probabilities:
p(X
1
, ..., X
n
|Y
1
, ..., Y
n
) =
n
i=1
p(y
i
|x
i
). (1)
The random characteristics of channel can be
March 23-26, 2010
G00002
490
2
source
encoder noisy channel
b1 .... bm
bits
code words
received
code words
x1 .... xn
1 .... yn y
Fig. 1. Block Diagram of Discrete Memory Channel.
represented by transition probability matrix as
shown next:
p
11
p
12
. p
1n
p
21
p
22
. p
2n
. . . .
. . . .
p
n1
p
n2
. p
n1
; p
ij
= p(y
j
|x
i
)
This matrix has the following properties:
1. Sum of each matrix column is equal to 1, i.e.,
n
i=1
p(y
j
|x
i
) = 1
2. Weighted sum of each row is equal to output
symbol probability, i.e.,

n
i=1
p(x
i
)p(y
j
|x
i
) = p(y
j
)
Shannon entropy provides a measure for input
symbol information. Conditional entropy H(X | Y)
measures the amount of uncertainty about X af-
ter receiving Y. Therefore, the mutual information
I(X,Y) is dened in terms of source entropy and
conditional entropy as:
I(X,Y) = H(X) H(X|Y)
The mutual information is interpreted as informa-
tion carried by received symbol Y about the trans-
mitted symbol X. The transmitted symbol entropy
and conditional entropy can be expressed in terms
of input symbol probability and transition (poste-
rior) probability respectively.
H(X) =
n
i=1
p(x
i
)log
2
p(x
i
) (2)
The average mutual information can be ex-
pressed as:
I(X,Y) =
n
i=1
p(x
j
)p(x
j
|y
i
)log
2
p(x
j
|y
i
)
p(y
i
)
(3)
Channel capacity for discrete memoryless channel
is dened as:
C = max
p(X)
I(X,Y) (4)
The Shannon noisy channel coding theorem [2]
states that there exists a code with a certain block
length with which it is possible to transmit through
a memoryless channel with capacity C, an amount
of information H(X) with arbitrary small proba-
bility of error if R < C where R is the symbol
transmission rate. The theorem simply states that
there exists a coding method through which error
free transmission is possible but it does not indi-
cate how to design the codes with desired proper-
ties. Input symbol probability constellation is one
major step in the design that is considered as the
main focus of this paper. In case of binary sym-
metric channel (BSC) shown in Fig. 2, the compu-
tation is easy and straightforward. For BSC with
0
1 1
0
1p
1p
p
p
Fig. 2. Transition Probability Matrix of BIBO Channel.
P(X = 0) = P(X = 1) = 0.5, using (3) it can be seen
that C = 1 H(p), where p is the error probability
as indicated in the gure.
The major obstacle in classical methods to solve
this minimization problem is the need for deriva-
tive of I(X,Y) that is given in (3). In general it
is very dicult to obtain this derivative. There-
fore, an evolutionary search method is required.
It is worth to point out that some specic algo-
rithms have been already proposed by dierent re-
searchers; however, most are complex and have
drawbacks in high dimension case [3],[4]. We be-
lieve the genetic algorithm (GA) among evolution-
ary search methods is one of the best to nd the
capacity of a discrete memoryless channel. In this
paper, we have used GA which is discussed in next
section.
March 23-26, 2010
491 G00002
3
H(X|Y ) =
n
i=1
p(y
i
)H(X|y
i
) =
n
i=1
n
j=1
p(y
i
)p(x
j
|y
i
)log
2
p(x
j
|y
i
)
III. Genetic Algorithm
Genetic algorithms are a part of evolutionary
computing which is a rapidly growing area of arti-
cial intelligence in diverse applications. As one can
guess, genetic algorithms are inspired by Darwins
theory about evolution. In simple words, solution
to a problem solved by genetic algorithms is evolved.
A GA-based algorithm starts with a set of solutions
(represented by chromosomes) called population.
Solutions from one population are taken and used
to form a new population. This is motivated by a
hope that the new population will be better than
the old one! Solutions which are selected to form
new solutions (ospring) are selected according to
their tness - the more suitable they are the more
chances they have to reproduce. Steps of genetic
algorithm can be outlined as follows:
1. Start: Generate random population of n chro-
mosomes (suitable solutions for the problem).
2. Fitness: Evaluate the tness f() of each chro-
mosome in the population.
(a) New population - Create a new population by
repeating following steps until the new population
is complete.
(b) Selection - Select two parent chromosomes
from a population according to their tness (the
better tness, the bigger chance to be selected).
(c) Crossover - With a crossover probability,
crossover the parents to form a new ospring (chil-
dren). If no crossover was performed, ospring is
an exact copy of parents.
(d) Mutation - With a mutation probability, mu-
tate new ospring at each locus (position in chro-
mosome).
3. Replace: Use new generated population for a
further run of algorithm.
4. Test: If the end condition is satised, stop; and
return the best solution in current population.
5. Loop: Go to Step 2.
This algorithm requires design choices as discussed
next.
Encoding (Presentation)
The answer set or population can be represented
in several dierent ways and the selection of proper
encoding depends on the specic problem. The
most commonly used encoding method is Binary En-
coding. The binary encoding can be used for both
integer and also oating point case. In this method,
each answer (chromosome) is represented as fol-
lows:
Chromosome A 101100101100101011100101
Chromosome B 111111100000110000011111
Other presentation methods are Permutation
Encoding, Value Encoding and Tree Encoding. In
this paper, the binary coding is used and improved
by Gray coding. It means that instead of normal
representation of oating point values, the Gray
coding is used for representation. It has been re-
ported that Gray coding presentation improves the
algorithm [5].
Selection
As discussed earlier, the chromosomes are se-
lected from the population to be parents to
crossover. The problem is how to select these chro-
mosomes. According to Darwins evolution theory,
the best ones should survive and create new o-
spring. There are many methods for selecting the
best chromosomes; for instances, roulette wheel se-
lection, Boltzman selection, tournament selection,
rank selection, steady state selection are available
as choices. In roulette wheel method, parents are
selected according to their tness. The better the
chromosomes are, the more chances they have to be
selected. The problem with roulette wheel is that
it can cause a pre-mature answer [6]. Generally,
we want to restrict the number of repetition of a
good answer in the next generation. Here, we use
logarithmic scaling prior to use of roulette wheel
(shown below).
After Scaling
Before Scaling
March 23-26, 2010
G00002
492
4
Channel Matrix Input Probabilities Channel Capacity
0.6 0.7 0.50

0.3 0.1 0.05
0.1 0.2 0.45
0.48
0.02
0.51
0.16
0.07 0.11 0.10 0.08 0.13 0.07 0.24 0.19

0.18 0.15 0.16 0.04 0.13 0.07 0.09 0.18
0.17 0.09 0.18 0.08 0.16 0.03 0.04 0.19
0.06 0.18 0.17 0.23 0.02 0.16 0.18 0.02
0.17 0.02 0.05 0.24 0.16 0.14 0.10 0.10
0.09 0.10 0.11 0.08 0.09 0.21 0.03 0.12
0.05 0.23 0.10 0.14 0.13 0.14 0.07 0.03
0.18 0.06 0.09 0.07 0.13 0.14 0.21 0.14
0.02
0.12
0.01
0.20
0.05
0.13
0.12
0.31
0.195
Fig. 3. Channel Transition Probability Matrix, Input Probabilities and Capacity.
Crossover
After the selection, crossover (mating) is per-
formed over selected chromosomes. There are dif-
ferent crossover methods such as single point, dou-
ble point and arithmetic crossover (shown below).
In our approach, the double point approach is used.
Crossover rate should generally be high, about 80-
95%.
Parent A Parent B Offspring
(Single Point)
(Double Point)
(Arithmetic)
IV. Numerical Results
We have applied the proposed GA-based ap-
proach to the following two channel matrices: 3x3
and 8x8 (shown in Fig. III) and compared the re-
sults obtained by Arimoto [4] for 3x3 channel ma-
trix. Channel matrices, obtained input probabil-
ities and the resulting channel capacity are tabu-
lated there. It is noted that both our approach
and Arimotos approach give the same results for
3x3 matrix. The importance of our approach is the
scalability - that is, when the channel matrix is 8x8,
we were able to obtain the results that are shown
in the table.
V. Conclusion
Shannon coding theory species that there exists
a block coding which we can transfer data with ar-
bitrary low error rate at a symbol rate less than
the channel capacity. It is required to nd correct
input symbol probability distribution for the com-
putation of the channel capacity. In this paper, we
have proposed a new approach to solve this prob-
lem by means of the genetic algorithm optimization
method which is fast and applicable to arbitrary
size of channel matrix.
References
1. C. Shannon, A mathematical theory of communication
Bell Systems Technical Journal, vol. 27, 1948, pp. 397-
423.
2. C. E. Shannon, R. G. Gallager, and E. R. Berlekamp,
Lower bounds to error probabilities for coding on discrete
memoryless channels, Inform. Contr., vol. 10, Part I II,
pp. 65-103, 522-552, 1967.
3. S. Morgan, On the capacity of a discrete discrete chan-
nel Journal of Physics, Japan,, vol. 8, 1953, pp. 484-494
4. S. Arimoto An algorithm for computing the capacity of
arbitrary discrete memoryless channels IEEE Transaction
on Information Theory, January 1972, pp. 14-20
5. R. A. Caruana and J. D. Schaer Representation and
hidden bias: Gray vs. binary coding, Proc. 6th Interna-
tional Conference on Machine Learning, pp 153-161, 1988.
6. J. E. Baker , Reducing bias and ineciency in the selec-
tion algorithm, Proc. of the 2nd Int. Conference on Ge-
netic Algorithms and their Applications, pp. 14-21, 1987.
March 23-26, 2010
G00002 493
G00003
March 23-26, 2010
494
Image Processing for Rice Diseases Analysis

Areerat Distsatien
1
, Wanvisa Wilaisil
2
, and Nunnapad Toadithep
3

1,2,3
Department of Computing, Faculty of Science, Silpakorn University
Nakhon Pathom 73000, Thailand
Email :
3
nisachol@cp.su.ac.th

ABSTRACT
This paper develops the system for rice diseases analysis for 4 categories of diseases
by using digital photograph, leaf photograph is used. This system can diagnose,
which are 1. Bacterial leaf blight 2. Yellow orange leaf 3. Rice blast 4. Brown spot.
The images model is RGB. There are two main factors for testing, firstly, detecting
the bruise on the rices leaf, secondly, determining color disorder on the rices leaf.
Under the included scope are backgrounds of the photograph, lightness, clearness, the
size of bruise that matter in 70% of the photograph and the distance of the photograph
taking. The results of the experiment of the system gets an accuracy of 78% from
100 images of diseased rice leaves, the Yellow orange leaf gets the maximal
accuracy, 90% and the Bacterial leaf blight gets the minimal accuracy, 66.67%.

Key Words : Image processing, RGB Model, Disease Analysis, Rice leaf

1. INTRODUCTION
Rice is life life is rice, life needs rice. Global rice production is about 550 million tons.
Rice is peoples staple food for more than half of the world. There are at least 113 countries
where rice is planted. Asian continent is the world most rice production continent, China and
India are the largest source of rice production which have produced no less than half volume
of rice; however, it is needed to consume inland. Thailand is a country which exports rice
most, that means Thailand has a large volume of rice production. Rice is not only primary
food of Thailand, but also the most revenue economic plant of the country. More than 60% of
Thai people are farmers who plant rice as a main crop and this make the land becomes golden
land, Tung Ruangthong field.[6]
Thai people consume rice as main food and they plant rice as the main occupation as
well. Farmers cultivate rice for domestic consuming and exporting. However, farmers still
face many cultivating problems such as the climate, pests, drought, unstable price, and rice
disease problems. Rice disease problems caused by many reasons which some diseases are
serious epidemic effecting on farmers. These are reasons why the system of rice diseases
analysis by image processing has been developed to analyze rice photo and elementarily
examine the diseases. Some diseases have some features that are similar to the others but they
have different treats. Therefore, the specialists are the only one who can analyze the diseases
because they cannot be examined by the face value or by the eyes. The image processing used
in this case indicates the disease details and remedy methods that help to restraint and to
prevent recurrent in the future. The system able to analyze 4 rice diseases: Bacterial leaf
blight , Yellow orange leaf, Rice blast, and Brown spot diseases.

G00003
March 23-26, 2010
495
2. RELATED WORK
2.1 Rice
There are more than 100,000 kinds of varieties of rice around the world. The two
main groups are African rice and Asian rice that have been planted in many countries in 3
species. The confirmed evidence found in Thailand for cultivation of paddy and rice, Thai
people life is bound to rice not less than 5,500 years ago.
Most of rice diseases are similar in each area of the country. Most of these diseases
are caused by fungus, virus, nematodes, and louse such as Green leafhopper is a virus carrier
caused the Yellow leaf disease and fungi Bipolaris oryzae causes the Brown spot disease.
Fig 1 show the rice diseases found in midland.

2.2 Research use of image processing
[1] The research is to assess and evaluate the wholesale packaging of sweet tamarind
pod of Golden and Sri Chomphu types, separated by sizes of the sketch and trace
definitively pod procedure using tamarind connection with photo processing machine, cause
system of separation tamarind pod for more rapid [1]
Highlight of the article is to measure the curvature of sweet tamarind pod features by
finding the midpoint of tamarind pod. The two-level image is converted images (Binary
Image) then the graph is made to see the distance between head of the sweet tamarind pod to
the end of its pod. If the distance is far, it indicates that pod is straight. If the distance is not
far, it assumes that the tamarind is bended. (shown in Fig 2)

Sheath rot disease Dirty panicle disease

Ragged stunt disease Narrow brown spot disease

Figure 1. Rice diseases found in midland [7]

3.1 Analysis System

The structure of rice disease analysis system comprise of 3 processing states:
preprocessing state, phase 1 analysis state and phase 2 analysis state as shown in Fig 3.1.

G00003
March 23-26, 2010
496

Figure 2. Principle of curvature scaling sweet tamarind pod [1]

Preprocessing
(edge dectection,
background cutting,
threshold
segmentation)
Rice disease
analysis phase 1
Rice disease
analysis phase 2
Rice leaf
images
Bacterial leaf blight
or Yellow orange
Rice blast or
Brown spot

Figure 3.1 Structure of Rice Disease Analysis System

.


Yellow orange leaf

Rice blast

Brown spot.

Figure 3.2. Rice diseases used in this research [4]

3.2 Steps in the process of rice disease analysis
This system uses of 100 diseased rice leaves photographs as an input of this analysis
system, and take them to preprocessing state and finally to the analysis state that divided into
2 phases of analysis. The final result is the name of disease, picture of leaf and the useful
G00003
March 23-26, 2010
497
message for the treatments or the method to cure the particular rice disease. Fig 3.2 show the
rice diseases photographs used in this research.

The preprocessing state prepares the image for the consequence steps of analysis, step 1 4
are the early steps of preprocessing state.

1. convert to 256-levels gray image (Gray scale level) because the input image is the red,
green, and blue (RGB Image). So, we need to adjust the color level to gray level image since
it is easy to identify in subsequence process.
2. edge detection of rice leaf by Sobel method, result shown in Fig 3.2, for ease to find
the leafs edge position and for ability to cut off the background and any unnecessary part.
3. Cutting off image background, left only the rice leaf part so it would be easy to
analyze, by taking the value in Matrix which is the color level of each pixel (white=1 and
black=0) as shown in Fig 3.3.

Figure 3.2 Edge detection of diseased rice leaf Figure 3.3 background cutting,
remaining only the image of rice leaf

Bacterial leaf blight Yellow orange leaf Rice blast, Brown spot and no
disease

Figure 3.4 Sample results from the conduct of a segmentation in each case

4. Segmentation the bruise of disease, this segmentation step gets the more easier in
analyzing the diseases characteristics. The result image from the segmentation are converted
into two levels of color : black and white. In this system, Threshold Segmentation method
is used. Threshold value of 0.50, which is trialed and chosen is the most appropriate to get
bruise area. The scars on rice leaves after image segmentation have been categorized into 3
cases, left most picture is Bacterial leaf blight disease, the middle one is Yellow orange leaf
disease, and the right one is represented as the other 3 cases of diseases: Rice blast, Brown
spot and no disease.
G00003
March 23-26, 2010
498

5. Processing step of diagnosis Bacterial leaf blight and Yellow orange leaf.
Figures that used for analysis are cropped picture. The color value will be examined
and if there are white dots over black dots in one side of picture and there are black dots over
white dots in another side, this can conclusion to be Bacterial leaf blight disease, and Yellow
orange leaf disease has white dots in both sides of leaf, shown in Fig 3.5.

Figure 3.5 Bacterial leaf blight disease (left) and
Yellow orange leaf disease (right)

Figure 3.6 Rice blast (left) and Brown spot disease (right)

6. Processing step of diagnosis Rice blast and Brown spot disease
The system retrieves the input picture and then the program cuts the picture background
off. After that it segmentate image data at threshold value of 0.20 and there will have sample
image after adjust threshold value. Next, sample image will be inverted into 2 level image in
order to fulfill wound edge by imdilate strel command and disk structure for fulfilling the
wound. In this system, we choose disk command to fulfill because the shape of Rice blast and
Brown spot disease wound are quite an ellipse shape.

After the segmentation process, found a wound in the image, the system will take the cutted
image of the 1 scars to find the wound midpoint (centroid value) and crops it out (as Fig 3.6),
after that analyzes for the result. A reason for choosing only one wound is, if we analyze all
wounds which are cropped, the accuracy of the system will decline. We can see program
tests result in Table 1.
Rice blast disease:
The middle of wound has white area more than black area.
Brown spot disease:
The middle of wound has black area more than white area.
Analysis procedure shown in steps in flowchart in Fig 3.7.
G00003
March 23-26, 2010
499
start
Edge detection with sobel
method
Finding the begin and end
position of edge from
previous step
Cutting off the background
of image, only leaf image
needed
or Yellow orange ?
Cannot found the
disease used ?
Rice blast or Brown
spot ?
yes
Bacterial leaf
blight or Yellow
orange disease
yes
no
no
Cannot found the
disease
Rice blast or
Brown spot disease
yes
no
stop

Figure 3.7 Flowchart of Rice Disease Analysis

Table 1 Results from rice diseases analysis

Rice disease
leaves
#image correct Not correct

Cannot
analyzed

Percent
of
accuracy

Brown spot 30 21 7
2 70%
Rice blast 30 24 6
0 80%
Yellow orange 20 18 1
1 90%
Bacterial leaf
blight
15 10 5
- 66.67%
No disease 5 5 -
- 100%
total 100 78 19

3

78%

G00003
March 23-26, 2010
500
4. EXPERIENTAL RESULTS
The testing results from 100 images can provide the correct classification of rice
diseases, shown in Table 1.
Be seen from the Table 1 that the result from analysis of rice diseases by digital
photos. All 100 samples from photo images was classified correctly for 78%. The system is
able to classify 18 images correctly of at most 90% accuracy of Yellow orange leaf disease.
There are 19 images that cannot be analyzed correctly.

5. CONCLUSION
This research will be a great advantage for Thai farmer, because the result of some
disease is confidential and useful for contribution.
In further study, we will develop the system in order to analyze more diseases and
increase more accuracy. We will emphasize to improve the minimal accuracy, Rice blast
analysis process.

REFERENCES
1. Bandit Jarimopas and Nitipong Jaisin, An experimental machine vision system for
sorting sweet tamarind, Journal of Food Engineering, issue no.80, December 2008, pp
291-296.
2. Kamolajit Popripat, Krisakorn Lakraei and Vutipong Areekul, 2009 , The measurement
of the population of Mealy bug, https://pindex.ku.ac.th/fileresearch/InsectPaper.pdf
3. Nattawat and Sinchai Chinvararat, 2009, The Analysis and Examining Research of the
defects of dyed fabrics,
http://me.psu.ac.th/tsme/ME_NETT20/article/pdf/amm/AMM045.pdf.
4. Site of Nakhonsawan Rice seeds Center, 2009, The importance of rice disease.
http://nswrsc.ricethailand.go.th/LibraRiceSeeds /Disease/LibraryDisease-00.html.
5. Montree Kanchanadecha. Image Edge Detection,
http://fivedots.coe.psu.ac.th/~montri/Teaching/240-373/Chapter8.pdf ,2009.
6. The importance of rice disease , http://seedcenter14.doae.go.th/knowledge/37.html
7. Ministry of Agriculture, Department of Rice. Rice Knowledge Bank,
http://www.brrd.in.th/rkb/data_005/rice_xx2- 05_index..html

ACKNOWLEDGMENTS
We would like to thanks all of committees, Steering, Technical, and Local Organizing
committees. We are very grad to attend The 14
h
Annual Symposium on Computational
Science and Engineering (ANSCSE 14). 23-26 March 2010, Mae Fah Luang University.
Chiang Rai, Thailand.

G00006
March 23-26, 2010

501
Decision Tree Era Classification of Ancient Thai
Inscriptions

J. Khiripet
1,C
and N. Khiripet
2

Knowledge Elicitation and Archiving (KEA) Laboratory, NECTEC,
112, Phahon Yothin Rd., Klong 1, Klong Luang, Pathumthani, 12120, Thailand
C
E-mail: jutarat.khiripet@nectec.or.th; Fax: 062-5646772; Tel. 062-5646900

ABSTRACT
The classification of ancient handwriting inscriptions plays significant roles in the
investigation of ancient languages and dialectics. The classification of inscriptions can
be done in several different ways, for instance, according to the material surface, age,
language, or geographical region. However, this classification job mostly carried out by
human archaeological experts, which is very time consuming. Recently, details of a
collection of ancient inscriptions in Thailand are published on the Princesses Maha
Chakri Sirindhorn Antropology Centers website. These details are the results of
careful studies by archeologist with very little help from modern information
technology. We therefore propose a computational tool to bridge this gap. In this
research, we apply a decision tree technique from machine learning to predict the era of
inscriptions found in Thailand using metadata information extracted from the database
of the objects. This automatic technique give much faster era results in conjunction
with statistical confidence values and achieve accuracy rate comparable to those of a
human archeologist.

Keywords: Decision Tree, Machine Learning, Inscription, Archaeological

1. INTRODUCTION
Inscriptions are a cultural heritage that indicates culture in the form letters that appear in
Thailand. So the inscription find in all regions of Thailand are aware of evidence that the
culture of the language of people living in those regions. Inscription as an example," Khu
Ni" was the oldest stone inscription found in Thailand and found evidence appears
era. This stone inscription was found at Khu Ni, Araypraets district in Prachinburi
province. It built in 1180 Buddhist era. This stone inscription is ancient documents that
indicate the use of culture in form letters on Thailand. This was the letter that appears first on
the land of Thailand that started simultaneously of early history of Thailand. That time line is
consistent with the evidence of Thailand's first contemporary art that called Buddhist art with
Dvaravati era [1].
Archaeological studies, the inscriptions found in ancient sources are used to check with art
for determining the age and sequence evolution in art [2]. Therefore, archaeological work
is required to analyze data from the inscription. The important information of these
inscriptions are the age of them that led to new issues of education about historical
evidence. For this reason, techniques or methods to analyze data for new issues has
increased education is important.
For education in Thailand inscription that has been compiled the study is to search
the database and details of the inscription. The study focused on the work of the
archaeologist and linguist that read and record the information. The inscription
records the activities of specific individuals who have relationships with the
G00006
March 23-26, 2010

502
inscription, and religion of a majority [3]. In addition, a study tries to bring the stone
inscription information era and the history of art, using data analysis the lack document. Chai
Sing analyzed data from archaeological excavations with the data, Stone inscription and
contemporary art history of Sukhothai for finding the new research issues.
The research and findings from the analysis of evidence from the archaeological dig stone
inscription and art can indicate the source of people to Sukhothai. The evidence is consistent
in making images of learning and art history of Sukhothai clearer [2].
Decision trees have proved to be valuable tools for the description, classification and
generalization of data. Work on constructing decision trees from data exists in multiple
disciplines such as statistics, pattern recognition, decision theory, signal processing, machine
learning and artificial neural networks. The work to which the method of Decision tree
analysis to archaeological evidence found that a specific era of painting in China. This
research proposes a scheme to detect Traditional Chinese painting (TCP) from general images
and categorize them into Gongbi and Xieyi schools. Low-level features such as color
histogram, color coherence vectors, autocorrelation texture features and the newly proposed
edge-size histogram are used to achieve the high-level classification. Decision tree classifier
(C4.5) is used to compare classification technique. Support vector machine (SVM) is applied
as the main classifier to obtain satisfactory classification results. Experimental results show
the effectiveness of the method [4]. Next is the classification coins, Davidsson P. [5] has
applied a novel method for learning characteristic decision trees, the ID3-SD algorithm, to the
problem of learning the decision mechanism of coin-sorting machines. The main reason for
the success of this algorithm in this application was its ability to control the degree of
generalization. The method has given the best results.
In Thailand, the archaeological work from surveys and studies is not available. Therefore,
researchers have presented a technique for building Decision tree models forecast an era of
inscription in the country with the Weka [6].
Next in order of this research is the knowledge related to Decision tree,
experimentation, data used, test results, conclusions and guidelines for future
development.

2. DECISION TREES AND J48 ALGORITHM
Decision trees are often used in classification and prediction. It is simple yet a powerful
way of knowledge representation. The models produced by decision trees are represented in
the form of tree structure. A leaf node indicates the class of the examples. The instances are
classified by sorting them down the tree from the root node to some leaf node [7].

Figure 1. A decision tree.
J48 [8] implements Quinlans C4.5 algorithm [9] for generating a pruned or unpruned
C4.5 decision tree. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees
generated by J48 can be used for classification. J48 builds decision trees from a set of labeled
training data using the concept of information entropy. It uses the fact that each attribute of
Root node
Leaf node Leaf node
Set of possible answers Set of possible answers
Root node
Leaf node Leaf node
Set of possible answers
G00006
March 23-26, 2010

503
the data can be used to make a decision by splitting the data into smaller subsets. J48
examines the normalized information gain (difference in entropy) that results from choosing
an attribute for splitting the data. To make the decision, the attribute with the highest
normalized information gain is used. Then the algorithm recurs on the smaller subsets. The
splitting procedure stops if all instances in a subset belong to the same class. Then a leaf node
is created in the decision tree telling to choose that class. But it can also happen that none of
the features give any information gain. In this case J48 creates a decision node higher up in
the tree using the expected value of the class. J48 can handle both continuous and discrete
attributes, training data with missing attribute values and attributes with differing costs.
Further it provides an option for pruning trees after creation. In this paper, we use the
algorithm J48 provided within the open source Weka data mining software. .

3. EXPERIMENTAL AND DATA
The inscription in Thailand database [3] contains data more than 1,500 inscriptions. The
database contains search methods by name, scripts, language, province and keyword on the
Princesses Maha Chakri Sirindhorn Antropology Centers website. The inscription
information and picture are retrieved from database after user submitted their search. Thus,
we implemented the PERL programs for collecting, cleaning and categorizing data from
website. Finally, we selected the attributes from categorizing data and transformed these data
for classification model with Weka.
Weka is a collection of machine learning algorithms for data mining tasks. Wekas native
storage method is ARFF format. So a conversion has been performed to make the
examination data available for analysis through Weka. The most important part in the entire
data mining process is preparing the input for data mining investigation. The classification of
inscriptions can be done by defining attributes with the age, language, script, and province. In
this research, the tasks are represented in figure2 that follow as collecting the inscription data,
cleaning data, categorizing data, transforming data, building the classification model and
testing.
5. Building the classification model and Testing
1. Collecting Data
Html files
of inscription
Html files
of inscription
2. Cleaning Data
Plain text files
of inscription
Plain text files
of inscription
Files of
selecting data
Files of
selecting data
3. Categorizing Data
Province

Age Language Script Name ID
Province

Age Language Script Name ID
4. Transform Data
An arff
file format
Princesses Maha Chakri Sirindhorn
Antropology Centers website
Classification Task in Weka
Model:
Decision Tree
Test Set
Training Set
Learn Model
Learning
algorithm (48)
Apply Model
In
d
u
c
tio
n
In
d
u
c
tio
n
Model:
Decision Tree
Test Set
Training Set
Learn Model
Learning
algorithm (48)
Apply Model
Model:
Decision Tree
Test Set
Training Set
Learn Model
Learning
algorithm (48)
Apply Model Model:
Decision Tree
Test Set
Training Set
Learn Model
Learning
algorithm (48)
Apply Model
In
d
u
c
tio
n
In
d
u
c
tio
n
2
1
Province

2 2 2 2
1 1 1 1
Age Language Script ID
2
1
Province

2 2 2 2
1 1 1 1
2
1
Province

? 2 2 2
? 1 1 1
2
1
Province

? 2 2 2
? 1 1 1

Figure 2. Tasks of Era Classification of Ancient Thai Inscriptions.
G00006
March 23-26, 2010

504
4. RESULT
Perl program and Weka 3.4 were tools for using in this implementation. The PERL
programs and Weka were run on a notebook computer equipped with a 1.4 GHz Intel
Pentium and 512 GB RAM for this implementation. We used the PERL program for
collecting the 1,756 inscription data in html file formatted. In cleaning step, these files were
cleaned the html tags and saved into plain text files by the PERL script. The next step, we
used the PERL script to retrieve inscriptions data that were composed of age, language, script,
and province and categorized these data in a table. The attributes of the categorized table were
composed of id, name, script, language, province and age. Finally, the table had 1,655 records
that had not miss value. The data in the table was transformed to an input file in arff
formatted. Input file was used to building a decision tree model. The evaluation of the model
showed 89.0634 % of correctly classified instances and classified data in 39 era classes.

5. CONCLUSIONS
Initially the experiments conducted on the 1,655 Thai inscription dataset. PERL language
and Weka Graphical Visualizations were the implementation tools. The J48 decision tree
algorithm was selected to building the era classification of ancient Thai inscriptions
model. The experiments involved 1,655 examinations with 4 attributes (script, language,
province and age). The tree result represented the script was a root. The correctly classified
instance of model was 89.0634 %.

REFERENCES

1. National Library of Thailand, http://www.nlt.go.th/Data/KM/manuscript.pdf
2. Saising, S., The Analysis of Results from Sukhothai's Archaeological Excavations, and
from Former Researches on Epigraphy and Art History, to Stimulate New Methological
Approaches, http://123.242.145.121/Library/abstract/Thesis0587.pdf
3. Princesses Maha Chakri Sirindhorn Antropology Center,
http://www4.sac.or.th/jaruk2008/
4. Jiang, S., et al., An effective method to detect and categorize digitized traditional
Chinese paintings, ScienceDirect, 2006, 27(7), 734-746.
5. Davidsson, P., Coin classification using a novel technique for learning characteristic
decision trees by controlling the degree of generalization. In: Proc. of 9th Int. Conference
on Industrial & Engineering Applications of Artificial Intelligence &. Expert Systems
(IEA/AIE-96), 1996, 403412.
6. Weka: Data Mining Software in Java., http://www.cs.waikato.ac.nz/~ml/weka/
7. Witten, I. H. and Frank, E., Data Mining: Practical Machine Learning Tools and
Techniques, 2
nd
ed., Morgan Kaufmann Publishers, CA, 2005, 525.
8. Quinlan, J. R., C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San
Mateo, CA, 1993.
9. Quinlan, J. R., Learning with Continuous Classes. In: 5th Australian Joint Conference
on Artificial Intelligence, Singapore, 1992, 343-348.

ACKNOWLEDGMENT
The author would thanks the Princesses Maha Chakri Sirindhorn Antropology Center for the
public data on website. This study is fully supported by NECTEC grant.

G00008
March 23-26, 2010
505
Natural Scene Matching using Inexact Maximum Common
Subgraph

R. Boonsin
C
, and N. Khiripet

Knowledge Elicitation and Archiving (KEA) Laboratory, NECTEC,
112, Phahon Yothin Rd., Klong1, Klong Luang, Pathumthani, 12120, Thailand
C
E-mail: rapeepun.boonsin@nectec.or.th; Fax: 662-5646772; Tel. 662-5646900

ABSTRACT
Many applications need to compare the similarity between the objects represented as
graphs such as image processing and pattern recognition. In image matching literatures,
the similarity measurement of graph objects can be considered as the maximum
common subgraph (MCS) problem to find the common subgraph of two graphs, which
is NP-hard. There is no algorithm of polynomial-time complexity known for the
general case. The most widely used solution for the MCS problem is first constructing
the association graph of two graphs and then detecting the maximum clique using the
Bron-Kerbosch-like algorithm. In this paper we present our experiment in image
matching for natural scenes using our MCS algorithm. First, we transformed the scene
images into graphs by a watershed image segmentation approach based on colors and
compositions. Then, we used an inexact maximum common subgraph algorithm to find
the most similar image from the database. The result shows high accuracy.
Furthermore, the executing time for the subgraph searching process with our inexact
approach is significantly faster than the traditional one.

Keywords: Maximum Common Subgraph, Graph Algorithm, Image Matching, Pattern
Recognition.

1. INTRODUCTION
Image matching is the process of finding an image, which has the most similar features to
the given image. The matching procedure is essential in varieties of applications such as
context-based image retrieval, biological or chemical structural analysis [1, 2], face
recognition [3], fingerprint recognition [4] and scene recognition [5]. For the scene
recognition, natural scene matching is important to stock photo companies, photographers,
and travel agencies, when the finding or collecting natural scene image is need. In these
applications, images usually represent as graph structures [5, 6, 7, 8] instead of the real image
data because graph-based representation stores much smaller data. The vertices of the graph
usually represent the objects, while the edges represent the relationships between the objects.
For instance, the composition of the given image consist of the sky on the top of the image,
the mountain under the sky, the lake on the left under the mountain, and the field on the right.
In this case, the vertices of the graph represent the sky, the mountain, the lake and the field
while the edges represent the relation of two objects such as the connection of the sky and the
mountain. This instance is the natural scene image and this representation model typically
called Attributed Relation graph (ARGs) [7]. ARGs is a powerful graph for describe the data.
It has attributes on their nodes and edge. The fundamental problem of ARGs is to compute the
similarity of two ARGs and it is known as the graph matching problem. When the matching
of the two given images represented as graph is needed, the graph similarity measurement is
often used. There are many algorithms used to measure the graph similarity such as subgraph
isomorphism [9, 10] and maximum common subgraph [2]. However, both methods
traditionally deal with problem of finding the exactly match. The algorithms attempt to find
the subgraph with the same attribute of vertices and edges in graph. That means all subgraphs
G00008
March 23-26, 2010
506
in the model graph and the data graph need to be compared. Thus, the graph isomorphic and
the maximum common subgraph are known to be NP-hard and NP-complete [11]
respectively. Furthermore, it is impractical for the real image mining environment as for
natural scene images, when the input image is ambiguous, the segmentation process can make
the over-segmentation or under-segmentation region. That means some vertices in the input
graph have no meaning. Hence, the exact algorithms may give the low accuracy results.
In this case, the inexact method should be more effective in the matching problem. There
are many ways to solve this problem such as Decision tree [6, 9, 10]. H. Jiang and C. W.
Ngo[7], proposed an inexact maximal common subgraph with backtracking algorithm. The
algorithm finds out the maximal common subgraph. The nodes of several graphs are mapped
to each other. All nodes in graph have weights which is the RGB vector of the mean RGB
values correspond to the image segment. Some search branches are skipped in a branch and
bound traversal if that search branch weight less than the specified threshold. However, it
only test on a few traffic sign images which cannot be assure that the algorithm will give the
best accuracy and it takes exponential time in worse case, which is common for NP-Hard
problem. K. Sambhoos (et al) [12] proposed an inexact graph matching called Truncated
Search Tree (TruST) extended from the branch and bound method. The TruST algorithm
searches for the optimal match graphs by the similarity values. It can control the state space
via truncate parameter. The algorithm only good for the direct graph but if it used for an
undirected graph like an image. It can be reduce the accuracy results.
This paper presents our experiment in image matching for natural scenes using the inexact
maximum common subgraph. Our algorithm is practical for image matching and gives the
best accuracy result. The algorithm procedure is laid out as follows. First, we transformed the
scene images into graphs by a watershed image segmentation approach [13]. We use
Attributed Relational Graphs (ARGs) to represent the watershed images. That is the vertices
of graph usually represent the objects, while the edges represent the relationships between the
objects. Then, we find the inexact maximum common subgraph of the two graphs by
employing the backtracking search algorithm. The result shows in group of similarity images
in which the image with the highest similarity value is the best match for the input image.
This paper is organized as follows. In Section 2, we describe the inexact maximum
common subgraph algorithm. Then, we present our algorithm in section 3. After that the
experimental results are analyzed in section 4. Finally, the section 5 presents the conclusion of
the paper.

2.1 Attributed Relational Graph
Attributed Relational Graph is the graph to represent an image. Their nodes and edges
usually have attributes and the edge represent the relationship of their nodes. For example in
natural images, their nodes used to represent the objects like the sea, the mountain, the road
and the lake. The node attributes can be the RGB vector space, the texture and the
composition of the objects. Their edges represent the relationship of the nodes such as the
connection of the two nodes.
2.2 Maximum Common Subgraph
Maximum Common Subgraph (MCS) problem is the problem of finding the largest
induced subgraph in G1 that isomorphic to a subgraph of G2. To solve the MCS problem,
there are many proposed solutions. All of the proposed methods can classify in two categories
that is exact method and inexact method. The exact methods provide the best result but most
of them take an exponential time, while the inexact methods give the sub-optimal result
within acceptable time. The exact method, for example, Durand-Pasari algorithm [14] is the
algorithm that solves the MCS problem by transforms the finding for the maximum common
subgraph of two graphs to the problem of finding the maximal clique. However, in the
construction of association graph step definitely takes n x m space to store the nodes and the
attributes of graph that is impractical for image matching which may be contained the over-
segmentation nodes. In our inexact maximum common subgraph, these nodes will eliminate
in the backtracking process.
G00008
March 23-26, 2010
507
Our proposed method, when the test image is given as an input to the algorithm, that image
will segmented to region by the watershed image segmentation algorithm. With the watershed
image, the Attributed Relational Graphs was created by represent each region in watershed
image to the vertices of graph. The mode RGB value and composition of each region is the
attribute of node that used to calculate the distance. To find the similarity of the input graph to
the model graph, the distance value of node to node was calculated by using Euclidian
distance. After that the new distance value was recalculate to combine the distance of two
attribute. In the first experiment, only the RGB vector was used as the attribute of node. Then,
we employ coordinate feature vector to enhance the performance by give the weight to each
feature vector and recalculate new attribute values by the following equation.

(1)

Where is a weight. After this procedure, the distance value was created as show in table
1.

Figure 1. Graph Example

Figure 1. Data Graph D and Model Graph M

Table 1. Distance Matrix
Vertices M1 M2 M3 M4
D1 200.82 136.29 215.75 289.90
D2 46.01 115.96 52.71 99.68
D3 64.63 93.34 68.68 136.68
D4 107.75 186.72 77.01 87.71
D5 149.25 222.97 102.63 116.91

Then, we employ the backtracking algorithm to find the inexact maximum common
subgraph between the unknown graph and each prototype graph in the database. The first
process in backtracking algorithm, we find the root node that have the similarity value less
than specified threshold. For each step in Depth First Search, the algorithm find the node of
data graph and model graph that have same attribute and connected to at least one node of
parent node, then this similar nodes will be checked for the similarity value. If this value less
than specify threshold, it will inserted to the branch and the similarity of group is then
recalculated. Finally, the group with the lowest distance value will be the result. The
backtracking algorithm described above can be written as follows:

Backtracking Procedure
Input: element c of Candidate set
Output: IMCS
if (c is Empty)
then Candidate = findrootnode();
else
then Candidate = getCandidate(c);

G00008
March 23-26, 2010
508
if (Candidate not Empty())
if ( size of IMCS < size of Candidate)
then IMCS = Candidate
else if (size of IMCS == size of Candidate)
then add Candidate to IMCS
for all element c in Candidate
Backtracking (c);
getCandidate Procedure
Input: List C (Candidate of previous state)
Output: Candidate
for all element c in pair node list C
split c to list of list_node x, list_node y
X = all node that connect to x
Y = all node that connect to y
for all node in X
for all node in Y
valid = true
for k=0 k < sizeof(x)
boolean a = X
i
connect to x
k

boolean b = Y
j
connect to y
k

if( a != b)
then valid = false
break;
sum =sum + Distance of X
i
to Y
i

distance = distance/k
if(valid) and (distance < threshold)
then add node X
i
,Y
i
to candidate

For example show in figure 1 and table 1, First, root node is empty, we choose the pair
node that has distance value less than threshold is the D2:M1, D2:M3, D2:M4, D3:M1,
D3:M2, D3:M3, D4:M3, D4:M4. For each branch, we employ the DFS to find the group of
pair node that connect to at least one node in parent node and must have the same attribute.

Figure 2. Search tree

In figure2, At parent node D3:M2, the algorithm looking for node that connect to D3 that is
D1,D2,D4,D5 and the node that connect to M2 that is M1, M3, M4. The algorithm built the
pair node and check that pair had the same attribute to parent node. For example, D2 is
connected to D3 and M2 is connected to M3. Then, the distance value of D2 to M3 was
checked. When it lowers than the specified threshold, the pair node was added to the parent
node. With the DFS, the algorithm find the pair of node in the next level that pair of node
have the same attribute, that means pair nodes must connected to D2 or D3 in the same way
connected to M2 and M3. Moreover. they must have the distance value less than the specified
value. In this example, there is no pair of node that connected to the upper node then the

G00008
March 23-26, 2010
509
algorithm will backtrack to the upper node. Also with the D2:M3 pair, the D4:M3 pair, D4 is
connected to D3 and M3 is connected to M2, and the distance value of D4 to M3 is lower than
specified threshold then this pair node was added to be the leaf node.

In our experiment, the test set of natural scene images was build and categorized in 6
classes, namely, beach, field, island, mountain, sunset and cat. And the distance value
threshold was set to 100. First, we perform experiment by using only RGB feature vector, the
results show in table 2, and then the performance was enhance by increase coordinate feature
vector as the node attribute collaborate with the RGB feature vector. We scale both attribute
values by give priority to RGB vector 80%. Then recalculate the distance value and choose
model graph that have the lowest distance to be the answer. For all experiment the results in f-
measure show as follow.

Table 2. Classification results by F-measure (%)
class RGB RGB+Coordinate
=0.8
Mountain 66.67 66.67
Field 77.78 77.78
beach 63.16 60.00
Island 63.16 73.68
Sunset 77.78 82.35
Cat 52.63 52.63

In table 2, the f-measure score shows as the results in each class. With only RGB feature
vector, the highest score is the field and the sunset class. The lower is the cat class and the
next is the beach and the island class. When consider to the beach and island, the island image
was classified to other class like the mountain and the beach class, which both have similar
parts to an island. The RGB vector of some node such as node that represents the island is
green like the mountain. Moreover, the sea water of beach and sea water around the island is
the similar object, which makes the algorithm miss classified. For the distance with the two
attribute, RGB feature vector and coordinate, the priority is set to RGB value with weight is
0.8, the result show that the performance was increases in the island and the sunset class. In
two experiments, the cat class takes lowest score that means the algorithm not acceptable to
this characteristic of data. This algorithm is suitable to the colorful image due to the priority
of the RGB attribute used in our algorithm.

5. CONCLUSION
This paper introduced a method of finding the inexact maximum common subgraph for
matching the natural scene images. First, a watershed image was built and change to graph.
Then, this graph was given as the input to our algorithm to find the inexact maximum
common subgraph by compare to the model graph in the database. The algorithm employs
backtracking search algorithm to find the pair of nodes in the data graph to model graphs that
contain same attribute values. When the pair node contained the same attribute value and the
distance value lower than specified threshold, this pair of node was added to the tree. The
experimental result shows the best f-measure score is 82.35%.
For the future work, we will employ area of each node in percentile to be attribute of node
in graph used for calculate the distance value collaborate with the old attribute which is the
RGB vector space and composition of node.

G00008
March 23-26, 2010
510
REFERENCE
1. John W. Raymond, Maximum Common Subgraph Isomorphism Algorithm for the
Matching of Chemical Structures, Journal of Computer-Aided Molecular Design, 2002,
16 (7), 521-533.
2. Yiqun Cao, Tao Jiang, and Thomas Girke, A maximum common substructure-based
algorithm for searching and predict drug-like compounds, Bioinformatics, 2008, 24(13)
366-374.
3. Laurnz Wiskott, Jean-Marc Fellous, Norbert Krger, and Christoph von der Malsburg,
Face Recognition by Elastic Bunch Graph Matching, IEEE Transections on Pattern
Analysis and Machine Intelligence, 1997, 19(7), 775-779.
4. Mana Tarjoman and Shaghayegh Zarei, Automatic Fingerprint Classification using
Graph Theory, World Academy of Science, Engineering and Technology 47, 2008.
5. P. Mulhem, W. K. Leow, and Y. K. Lee, Fuzzy Conceptual Graphs for Matching Images
of Natural Scenes, 7th International Joint Conference on Artificial Intelligence, Seattle,
Washington, USA, 2001, 1397-1404.
6. K. Shearer, H. Bunke, and S. Venkatesh. Video indexing and similarity retrieval by
largest
common subgraph detection using decision trees. Pattern Recognition, 2001, 34(5),
10751091.
7. H. Jiang and C. W. Ngo, Image mining using inexact maximal common subgraph of
multiple ARGs, Int. Conf. on Visual Information System, 2003.
8. Praveen Dasigi and C.V. Jawahar, Efficient Graph- based Image Matching for
Recognition and Retrieval. Proceedings of National Conference on Computer Vision,
Pattern recognition, Image Processing and Graphics, 2008.
9. B. T. Messmer and H. Bunke. Error-correcting graph isomorphism using decision trees.
International Journal of Pattern Recognition and Artificial Intelligence, 1998, 12(6),
721742.
10. B. T. Messmer and H. Bunke. A decision tree approach to graph and subgraph
isomorphism detection. Pattern Recognition, 1999, 32(12), 19791998.
11. M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of
NP Completeness, Freeman, New York, 1979.
12. K. Sambhoos, R. Nagi, M. Sudit and A. Stotz, Hierarchical Higher Level Data fusion
using Fuzzy Hamming and Hypercube Clustering, Journal of Advances in Information
Fusion, 2008, 3(2).
13. Serge Beucher and Fernand Meyer. The morphological approach to segmentation: the
watershed transformation. Mathematical Morphology in Image Processing (Ed. E.R.
Dougherty), 1993, 433-481.
14. P. J. Durand, R. Pasari, J. W. Baker, and C. Tsai. An efficient algorithm for similarity
analysis of molecules. Internet Journal of Chemistry, 1999.
G00013
March 23-26, 2010
511
Wind Circle 3D Visualization of Direction Weather Data

A. Siripatana, K. Jaroensutasinee, and M. Jaroensutasinee
Center of Excellence for Ecoinformatics and Computational Science Undergraduate Program,
School of Science, Walailak University, 222, Thasala District, Nakhonsithammarat 80161, Thailand
E-mail: bigdil1@hotmail.com, krisanadej@gmail.com, jmullica@gmail.com;
Fax: 66 0 7567 2004; Tel. 083-6365567

ABSTRACT
This study developed the 3D visualization model for wind direction data called Wind
Circle 3D histogram. The visualization contained three data types: (1) Windrun
(distance per time), (2) Wind directions (angle) and (3) time interval. Windrun and
wind direction data were obtained from a Davis weather station model wireless vantage
Pro II installed at Muang District, Nakhon Si Thammarat, Thailand (latitude
8.418611N and longtitude 99.963611N) from 10 June 2006 to 20 November 2009.
Windrun data were obtained at a one minute interval and sumed as a monthly windrun.
Wind vectors were categorized into 16 directions: N, E, S, W, NE, NW, SE, SW, NEN,
NEE, NWN, NWW, SES, SEE, SWS, and SWW. These 16 directions were displayed
as 16 sections in a circle with 22.5 degree for each section. By using interpolation, we
assigned the maximum windrun to (R, G, B) of (0.0, 0.1, 0.8) and the minimum
windrun to (R, G, B) of (0.8, 0.1, 0.0). In the 3D visualization, blue color represented
high windrun and red color represented low windrun with gradient of color between
low and high windrun. The 3D visualization clearly showed that the maximum windrun
(dark blue color) occurred between September-October yearly from south direction. On
the other hand, the least windrun (dark red color) was from northwest direction. The
maximum windrun occurred in the year 2006 and constantly declined since. After we
used color gradient of 2
nd
order polynomial interpolation to generate missing data
between each adjacent direction, the 3D visualization of windrun showed a smooth
spiral pattern. This indicates that there are constant changes of maximum wind
direction throughout the year.

Keywords: Windrun, Visualization, Color Gradient, Interpolation, Weather Data.

1. INTRODUCTION
Data visualization plays the major role in scientific interpretation of big-scale
collected data. The purpose of visualization is simply to transform numerical data into
the single image or graphs for scientific tasks. This project aims to develop the new
model for visualizing wind data contains many parameters necessary to be observed.
The model must be able to tell the pattern of windrun in 360 degree direction over the
time interval. And also, the visualization must be able to show the comparison
between each subsequences year as clear as possible. Hence, this visualization model
presents the amount of windrun, wind direction, time and frequency of time interval in
3D. The visualization model requires 3 types of data including (1) windrun (distance)
(2) wind directions (N, S, E, W etc.) and (3) time were collected. These sequences of
data were processed through the program using Mathematica. The result of
visualization was spring tube with varying the surface highs and colors following the
data generated by the program.

G00013
March 23-26, 2010
512
Data site: Samples data were obtained from a Davis weather station model wireless
vantage Pro II installed at Muang District, Nakhon Si Thammarat, Thailand (Figure 1)
(latitude 8.418611N and longtitude 99.963611N) from 10 June 2006 to 20
November 2009. We used 3 types of data including (1) windrun (Miles) (2) wind
directions (N, S, E, W etc.) and (3) time data were collected.

Figure 1. Data site (latitude 8.418611N and longtitude 99.963611E)
Step 1- To gain more senses of directions from data: Normally, windrun and wind
direction data obtained from weather station were plotted over the time in two
separated graphs as in Figure 2.

Figure 2. (a) Windrun over time (b) High wind direction over time
These were the simplest way to show the data. However, It was not designed for
pattern observation. The first idea is to integrate windrun and wind direction data into
a single visualization. We separated the total wind run for each month in 16
directions: N, E, S, W, NE, NW, SE, SW, NEN, NEE, NWN, NWW, SES, SEE, SWS,
SWW. Each direction represented by a fragment of the circle resembled a pie chart
called Wind circle. The color in each fragment represented the monthly total
amount of windrun in that direction. Wind circle visualization could help us to
understand the relationship between windrun and wind direction more than(Figure 3).
G00013
March 23-26, 2010
513

Figure 3. Wind circle visualization( Lighter area represent greater amount of
windrun).
We ordered each wind circle by month. Therefore, We got a polygon represented the
monthly windrun in each direction over the time.(Figure 4a-b).

(a)

(b)
Figure 4. The visualization including time coordinate (a) the wind circle polygon (b)
the pattern of monthly windrun during years 2006-2009.
Step 2 To interpolate the missing data: from step 1, We might be able to see
monthly windrun pattern. However, We want to see the visualization that show us the
change of windrun continuously in every directions. We transformed the collected
data using 2D interpolation function to generated missing data. Therefore, We
generated the continuous windrun visualization namely Wind tube model. This
model gives us more details of windrun change over the time and showed better
pattern. (Figure 5a-c).

Year 2006

Year 2009

Year 2006

Year-2009

G00013
March 23-26, 2010
514

(a) (b)

(c)
Figure 5. Wind tube visualization (a) Contour plot of windrun between time (555
days inrterval) against angle (0 to 2 Pi); (b) 3D plotting of function in Figure 6 with
21 days interval, Difference high in function represented difference windrun; (c)
Show the generated function on tube surface and using hue colors to represent
difference in windrun at any point.
Step 3 To develop Wind spring model: In this step, We improved the Wind tube
model. By folding the tube model into spring shape using parametric plot, we could
compare windrun at any direction between each subsequences year in straight line
manner (Figure 6). This Spring model gives us better sense of seasonality.
G00013
March 23-26, 2010
515

Figure 9. Wind spring model show high windrun in blue and low windrun in red.
The corresponding time of the year and angles are aligned in straight line manner.

The windrun data obtained from Davis weather station. We processed the data
through visualization program using Mathematica language. 16 directions of windrun
were changed into the angle in radians, start with E at 0 radian. Then, the angle
were rotated anti-clockwise to 2 Pi at E direction. Wind circle was separated into 16
pieces of Pi/8 segment. Each segment contained total windrun and wind direction.
We use 2D interpolation order 1 to generate missing data. Then, Plot the data
using ContourPlot, Plot3D and ParametricPlot function in Mathematica to visualize
the function generate by collected data. We generated many spires and color gradients
following the amount of windrun.

Wind Spring visualization model gives us the information about (1) The
continuous change of windrun over the time interval (2) directions of windrun (3)
windrun cycle over the year( Figure 7).

Year 2006

Year 2009

G00013
March 23-26, 2010
516
5. CONCLUSION
This project developed the new technique for visualization of direction wind
data called Wind Spring model. The visualization gives us the whole picture of
wind data over long period of time and present better quality of visualization for wind
pattern observation.

REFERENCES
1. Williams, M., Cook, E., van der Kaars, S., Barrows, T., Shulmeister, J., and Kershaw,
P., Quat. Sci. Rev., 2009, 28 (23-24), 2398-2419
2. Gonzalez, R. C., and Woods, R. E., Digital Image Processing, 2nd edition, Tom
Robbins, New Jersey, 2002, 282-330.
Gray, J., Mastering Mathematica Programing Methods and Applications, 2nd edition,
Academic Press, Illinois, 1998, 339-365.

G00025
March 23-26, 2010
517
Adaptive Window Size for Spatial Image Segmentation

S. Uttama
School of Information Technology, Mah Fah Luang University, 333, Moo 1, Ta-Sud, Muang, Chiang
Rai, 57100, Thailand
E-mail: usurapong@gmail.com; Fax: 053-916743; Tel. 053-916758

ABSTRACT
Window size is a crucial factor for feature extraction and spatial image segmentation.
Too large or too small window size results in under or over-segmentation. In addition,
window size should be variable according to the characteristics of images regions.
This paper proposes the novel concept to determine the adaptive and optimum window
size for spatial image segmentation. The method is based on the combination of
quadtree decomposition, autocorrelation function and Wold-like decomposition. The
proposed technique was experimented by performing segmentation on sets of non-
textured and textured images using classic co-occurrence features. The results reveal
that the novel method overcomes the existing one i.e. fixed window size in terms of
segmentation accuracy especially for textured images.

Keywords: Spatial Image Segmentation, Adaptive Window Size, Feature Extraction.

REFERENCES
1. Garcia, M. and Puig, D., Supervised texture classification by integration of multiple
texture methods and evaluation windows, Image and Vision Computing, 2007, 25, 1091
1106.
2. Park, M., Jin, J. and Wilson, L., Texture Classification using Multi-scale Scheme,
Proceedings of the Pan-Sydney area workshop on Visual information processing, 2004,
67-70.
3. Qaiser, N. and Hussain, M., Optimum Window-Size Computation for Moment Based
Texture Segmentation, Multi Topic Conference INMIC 2003, 2003.
4. Sheng, W., Xu, C. and Liu, J., Adaptive window-size selection approach for feature
extraction in texture analysis, Society of Photo-Optical Instrumentation Engineers (SPIE)
Conference Series, 2001, 4550, 228-233.
5. Jan S. and Hsueh, Y., Window-Size Determination for Granulometrical Structural Texture
Classification, Pattern Recognition Letter, 1998, 19, 439-446.
6. Amelung, J. and Vogel, K., Automated Window Size Determination for Texture Defect
Detection, The British Machine Vision Association and Society for Pattern Recognition,
1994,105-114.

G00026
March 23-26, 2010
518
Developing Dashboard Decision Support System For
Subdistrict Administration Organization Network
P. Jinpon
C
, M. Jaroensutasinee, and K. Jaroensutasinee
Center of Excellence for Ecoinformatics, and Computational Science Graduate Program,
School of Science, Walailak University, 222 Thaiburi, Thasala, Nakhon Si Thammarat, 80161,
Thailand
C
E-mail: jpuangrat@gmail.com, jmullica@gmail.com, krisanadej@gmail.com;
Fax: 075-538034; Tel. 083-1720696

ABSTRACT
We propose a novel approach of knowledge discovery method by adopting dashboard
concept and incorporating elements of data clustering, visualization and knowledge
codification. The dashboard was designed to help the executive of Sub-district
Administration Organization Network (SAON) to explore the insight of community
health performance by analyzing significant medical data, and facility availability in the
community in order to improve decision making process. The system has been
developed using system development life cycle (SDLC) methodology and coded in
web-based and open source environment (i.e. MySQL, and PHP). This system obtains
the data collected from the Family and community Assessment Program (FAP) in five
sub-districts, Nakhon Si Thammarat. The dashboard architecture and software are
presented in this paper. The prototype dashboard is designed by informatics personnel
and users together and fully operates real-time online. The prototype dashboard is
proved to be a useful tool for improving decision-making system.

Keywords: Business Intelligent, Dashboard, Health Information System, Clinical Data

1. INTRODUCTION
Current and future technological systems are increasingly complex and large scale in terms
of system size, functionality breadth, component maturity, and supplier heterogeneity.
Effective management planning, decision making, and learning processes rely on a spectrum
of data, information, and knowledge to be successful [1]. In management information
systems, a dashboard is an executive information system user interface that (similar to an
automobile's dashboard) is designed to be easy to read [2].
Major application of knowledge discovery identified for the executive of Sub-district
Administration Organization (SAO) is to explore the insight of community health
performance. It is critical for the executive to improve a decision making process. Presently,
the executive of SAO is just utilizing a normal report with minimum knowledge exploration
to analyze information regarding the improve decision making process such as the insight of
community health performance. The objective of this project is to design and develop a
Dashboard Decision Support System for a useful tool to improve decision-making by
combining various techniques; data warehouse designed, graph-based visualization technique,
Well-Being indicator performance, network collaboration, OLAP and dashboard concept.
With the new approach of knowledge exploration, it shall help the executive of Sub-district
Administration Organization Network (SAON) to improve a decision making process.

2.1 Dashboard
Dashboards are set up to provide managers with the information they need in the correct
format. Ideally, each manager can use the display on the dashboard to focus on what is
important in his or her job. The success of digital dashboard projects often depends on the
metrics that are chosen for monitoring [3]. Key Performance Indicator (KPI) is one of some of
the content appropriate on the dashboard. A software dashboard provides decision makers
G00026
March 23-26, 2010
519
with the input necessary to "drive" the business [4]. Thus, a graphical user interface may be
designed to display summaries, graphics (e.g., bar charts, pie charts, bullet graphs,
"sparklines," and etc.), and gauges (with colors similar to traffic lights) in a portal-like
framework to highlight important information [5].

2.2 Family and Community Assessment Program (FAP)
The FAP is software developed by the health care team to provide a complete information
for planning appropriate health service for a population in catchments area, and encourage
community participation to manage happiness community [6]. This program was developed
by Microsoft Access 2000 and linked data to report via Google Earth. FAP was linked and
retrieved an individual person data from Health Center Information System (HCIS).


Figure 1. Architecture of data flow for Dashboard Decision Support System.

The system has been developed using the System Development Life Cycle (SDLC)
methodology and coded in web-based and open source environment (i.e. MySQL, and PHP)
[7]. The Dashboard implement as a set of web-based charts, that gives a visual, real-time
representation of important aggregated data related to several strategic areas and KPI Well-
Being status of the SAO. The dashboard report leveraged the abundance of information stored
in the centers MySQL online-accessible database. This relational database is the backend
data repository of user, significant medical data, facility availability in the community, KPI,
strategy, and relevant data that some were transferred from the FAP in five sub-districts,
Nakhon Si Thammarat and some were posted from users.
G00026
March 23-26, 2010
520
There were five groups of users: administrators, SAO staff, SAO manager, SAO clinician,
and general users. SAO groups (ie. SAO staff, SAO manager, and SAO clinician) were given
user names and passwords to log in and do web data entry [8]. General users had accessed to
some parts of the Dashboard Decision Support System (Dashboard DSS). However, they
cannot edit, input or make some changes on their data in the Dashboard DSS. If general users
would like to be able to input their data on the Dashboard DSS, they have to register on the
website.
The data from FAP database were collected annually and the data from HCIS database or
relevant data were collected when the SAO staff wanted to update their data (Figure 1). In this
study, the clinician or administrator at the Health Center (HC) in SAO performed the data
cleaning and sent the cleaned data to SAO head site via FTP (File Transfer Protocol) and
manually sent zip files to the SAO head site if the internet was not functional. The system
administrator at the SAO Head site stored the input data from HC and analyzed them (Figure
1).

By searching and discussion group between informatics personnel and users, we divided a
group of KPI for community health performance into 10 groups (Primary KPI), each group
had a minor KPI (Figure 2). The prototype home page was showed in Figure 3. The model
used a software that rescaled all the indicators to the same range and represented them
mathematically or graphically. The software rateed each indicator on a scale ranging from 0
point (the worst case of all contexts being compared) to 10 points (best case). All intermediate
cases were calculated using a linear interpolation between these two bounds. The prototype
used the following formula to assign a numerical score to each indicator:
Primary Rate = (PKPIi Wi)/(Wi)
PKPIi = Primary KPI for a community health performance
Wi = Weight of Primary KPI.
Thus, all values ranged from 0 to 10 as noted. For a given indicator x, the report showed in
Figure 4-5.
This paper describes an innovative paradigm for the use of data from FAP and relevant to
explore the community health performance. Although there has been much discussion of
identifying KPI or benchmarks in the community health performance, the focus has been on
identifying a minimal data set that can be a regular static report rather than a dynamic
approach of interrogating a data warehouse for timely support.
The features must be considered when using a Dashboard DSS as a support in the SAON
process [9]:
The effectiveness of the analysis conducted with the Dashboard DSS of the relative
evaluations is closely connected to the relevance, the pertinence, and the
completeness of the chosen indicators. The case of Pakpoon shows that the
Dashboard DSS can be adapted to a local urban context if built on indicators chosen
to be significant to the examined context.
Results obtained using the Dashboard DSS are strongly influenced by the weight
assigned to the different aspects evaluated; the definition of weights is a key issue
when the Dashboard DSS is used.
Adoption of medium-long term actions is not necessarily encouraged by the use of
the Dashboard DSS, but solutions depend on the indicators chosen to build the
Dashboard DSS. The Dashboard DSS tool supports the SAON process as it provides
a detailed analysis of individual and collective aspects of sustainability, and can also
be used to describe the results achieved over time.
G00026
March 23-26, 2010
521

Figure 2. KPI for the Dashboard Decision Support System.

Figure 3. Dashboard Decision Support System Home Page.
G00026
March 23-26, 2010
522

Figure 4. The Well-being community status in a gauge report.

Figure 5. The comparison of Well-being community status report.

G00026
March 23-26, 2010
523
5. CONCLUSION
We propose a holistic design of the Dashboard DSS which shall improve the decision
making process through the data collected from the FAP in five sub-districts, Nakhon Si
Thammarat and coded in web-based. By designing the system as a dashboard, OLAP, and
embedded with various data retrieval techniques. The prototype is designed by both
informatics personnel and users to make it a fully operated real-time online system and
proved to be a useful tool for improving a decision-making system. The dashboard concept is,
therefore, very close to that of performance indicators and benchmarks. Future enhancements
at the SAON site are reviewed the performance indicators and benchmarks with users.

REFERENCES
1. Selby, R. W., IEEE Software, 2009, 41-9.
2. Martin, E., and Bernardo, V. D., IEEE, 2008, 24-7.
3. Luciano M. P., Katherine P. A., Hanson, R., Kelly, P. and Khorasani, R. J. Digital
Imaging, 2008, 10.1007/s10278-008-9167-39167-3.
4. Grant, A., Moshyka, A., Diaba, H., Carona, P., de Lorenzia, F., Bissona, G., Menarda, L.,
Lefebvrea, R., Gauthiera, P., Grondinb, R., and Desautelsb, M., Int. J. Med. Infor., 2006,
75, 232-9.
5. Sloane, E. B., Rosow, E., Adam, J., and Shine, D., Proc. the 28th IEEE EMBS Annu. Int.
Conf. New York City, USA, Aug 30-Sept 3, 2006, 28, 5440-3.
6. Jaraprapal, U., Family and Community Assessment Program (FAP): Tool for managing a
good health from community, 2009, 13.
7. Wan Mohd, W. M. B., Embong, A., and Mohd Zain J., LNAI, 2008, 5212, 6849.
8. Tsiknakisa, M., Katehakisa, D., and Orphanoudakis, S. C., Int. Congress Series, 2004,
1268, 289 94.
9. Scipioni, A., Mazzi, A., Mason, M., and Manzardo, A., J. Ecol. Indicators, 2009, 9(2),
364-80.

ACKNOWLEDGMENTS
We would like to thank Mrs. Urai Jaraeprapal for her invaluable assistance in the field. This
work was supported in part by the Center of Excellence for Ecoinformatics, the Institute of
Research and Development, Walailak University and NECTEC.
G00027
March 23-26, 2010
524
Noise Reduction of Ancient Document Images

S. Watcharabutsarakham
1,C
, S. Marukatat
2
and S. Sinthupinyo
3

1,2,3,C
National Electronics and Computer Technology Center, 112 Thailand Science Park,
Phahon Yothin Road, Klong 1,
Klong Luang, Phatumthani, 12120, Thailand
E-mail: sarin.watcharabutsarakham@nectec.or.th; Fax: 02-5646873; Tel. 02-5646900

ABSTRACT
This research cooperates with a rare book division in a national library. A national
library has collected various ancient books and transformed to a digital format by using
scanners. Almost, the document images have a lot of noise by bookworms and color
paper changing. In addition, the images not only see the front page but also see another
side of page. So, a proposed process is used eliminate background noise from the
images. The result images are more clean and more legible. In this paper, we propose a
process to improve color paper chaning of background image by using a global
thresholding and an exponential converse function and a process for reduce noise by
window marks. Experimental results show that our proposed processing is capacity for
removing background and improvement degraded document images. Besides, an
objective of this method is applied for OCR (Optical Character Recognition).

Keywords: color image, gray scale image, converse, ancient document

1. INTRODUCTION
This research cooperates with a rare book division in a national library. A national library
has collected various ancient books and transformed to a digital format by using scanners or
cameras. The document images are collected in a color digital image type for reserving the
original color pictures in pages of books. Almost, the document images have a lot of noise by
bookworms and color paper changing. The national library want to get content in the books
because many books have the useful knowledge in them. So, our research is the basic step for
clear up this document images. After, all images can use OCR to collect the content. There
are 2 performance algorithms such as a threshold technique and an adaptive technique for
improving images. The threshold technique is take the entire image histogram and normalize
for getting the threshold value. This value is classify the pixel in foreground or background of
the image. The adaptive technique is divide the original image into many sub-images and
calculate on the sub-images. This technique use 2 basic values: mean value and standard
deviation value for indicate in foreground group or background group.
The document images are scaned by the color image format. In normally the images are
represented in RGB color model. The RGB color model has 3 primary colors: red, green, and
blue. The other model is HSI model, it compose of hue, saturation and intensity. It is used in
the medical images, because it responds to the human vision better than other model.
Moreover, this model is useful for the non-uniform illumination images such as satellite
image.

2. PROPOSED TECHNIQUE
There are 2 steps in this processing. First, an image conversion process, this process
converses the color image to gray-scale image by using a global thresholding and an
exponential converse function for divide between foreground group and background group.
G00027
March 23-26, 2010
525
Next, a noise reduction, this pocess use 9x9 window mark size to remove noise from old
documents.

2.1 IMAGE CONVERSION PROCESS
2.1.1 RGB to HSI
This process for converses the color image to gray-scale image. Normally, the
images are represented in RGB color model, conversion to grayscale is not
unique. A common strategy is set weight in red, green, and blue channel, which
make the gray-scale image match to the luminance of the color image.
But, my proposed technique use the HSI model to convert color image to
gray-scale image. It composes of hue, saturation and intensity value. Especially, I
or intensity value is display in the equation (1)-(4).

B G R
R
r

(1)
B G R
B
b

(2)
B G R
G
g

(3)
256 255 . 3 / ) ( x r g r I (4)
Where, I is intensity value, R is value of red, G is value of green, B is value
of blue

2.1.2 A GLOBAL THRESHOLDING
This process finds a threshold value to indicate pixels into foreground group
and background group. Our technique makes the histogram of I value (0-255).
Next, finding the difference of intensity value by using equation (5)
2 2

i i i
His His Df (5)
Where, Df
i
is difference of intensity in the position i
His
i+2
is of intensity in the position i+2
His
i-2
is of intensity in the position i-2

Next, finding the difference of difference of intensity value by using the
result of the equation (5) to be an input of the equation (6) .
1 1

i i i
Df Df Sp (6)
Where, Sp
i
is difference of difference of intensity in the position i
Df
i+1
is difference of intensity in the position i+1
Df
i-1
is difference of intensity in the position i-1
Next, sorting all Sp value by ascending and select histogram index which
have 2 narely histogram index and value of Sp are positive values. The histogram
index is a threshold value that used in next process.

2.1.3 EXPONENTIAL CONVERSE FUNCTION
This process use the threshold value to apply in the function by using
equation (7). The threshold value is break point while, the color value more than
threshold value will be calculate by the exponential equation and the color value
less than threshold value is zero.
G00027
March 23-26, 2010
526
256 )
) 256 log(
) log(
( x
value
color
i
i
(7)
Where, value
i
is histogram in the index i, color
i
is new color in position i

2.2 NOISE REDUCTION
2.2.1 EDGE of TEXT
This process is finds the area of text in the image by projection in row and
column directions. The threshold value in before step is used classify text and
background. Text area is usually center of image and outside is background area.
We will remove the noise in only the background area.

2.2.2 NOISE REDUCE
The characteristic of noise is a salt and pepper noise which is appears around
the image. We will remove the noise in the background area because noise in the
text area does not define actually noise or character. The mask size 9x9 pixels
used clear the noise in the image. If the size of noise is less than 9x9 pixels, it will
be deleted.

3. EXPERIMENT
In a national library has collected various ancient books so, there are a lot of document
images. We select thirty images for testing in our research. Twenty images are the
ancient books from national library and ten images are old images from other source.
All images are color images and have only text in the images. The results of this experiment
are gray-scale images.

Experimental results show that the proposed technique is capable of improving legibility
of degraded document images. It makes the old document images are clean up and contrast
between foreground and background. However our research makes the output images will be
collected nearly the new books. In future work, the technique will apply in the preprocessing
of the optical character recognition project for get more accuracy because Thai OCR
commercial product tested with clear documents.

5. CONCLUSION
Our technique will not apply to the document images which have image or lines because
they are make the threshold value out of really value and this technique does not remove noise
in text area because it may be the real character in the image.

G00027
March 23-26, 2010
527

(a) (b)

Figure 1. Ancient Books from National Libray (a) Original Image (b) Result
Image

(a) (b)

Figure 2. Ancient Books from National Libray (a) Original Image (b) Result
Image

G00027
March 23-26, 2010
528

(a) (b)

Figure 3. Ancient Books in Laboratory (a) Original Image (b) Result Image

REFERENCES
1. Faisal Shafaita, Daniel Keysersa and, Thomas M. Breuelb, Efficient Implementation of
Local Adaptive Thresholding. Techniques Using Integral Images, Proceedings of the
SPIE, 2008, Volume 6815, pp.681510-681516.
2. Andreas E. Savakis, Adaptive Document Image Thresholding using Foreground and
Background Clustering, Proceedings of International Conference on Image Processing
ICIP, 1998, Volume 3, pp. 785-789.
3. Kavallieratou and, Efstathios Stamatatos, Improving the Quality of Degraded Document
Images, Proceeding of the Second International Conference on Document Image
Analysis for Libraries DIAL, 2006, pp. 340-349.
4. Kasar, Jayant Kumar and, Angarai Ganeshan Ramakrishnan, Font and Background
Color Independent Text Binarization, Second International Workshop on Camera-Based
Document Analysis and Recognition CBDAR, 2007
5. N. Otsu, A Threshold Selection Method from Gray-Level Histograms, IEEE Transactions
on Systems, Man and Cybernetics, 1979, Volume 9, Issue 1, pp. 62 - 66.

ACKNOWLEDGMENTS
The preparation of this document images would not have been possible without
support of authority in the nation library. We are particularly grateful for the national
library.
G00034
March 23-26, 2010

529
Determining Appropriate Parameter Setting of Firefly
Algorithm Using Experimental Design and Analysis

Niyada Rukwong
1
, Primpika Pansuwan
1
and Pupong Pongcharoen
2,C

1
Faculty of Science, Naresuan University, Pitsanulok, 65000, Thailand.
2
Faculty of Engineering, Naresuan University, Pitsanulok, 65000, Thailand.
C
E-mail: pupongp@yahoo.com and pupongp@nu.ac.th; Fax: +66 55 964003; Tel. +66 55 964201

ABSTRACT
Research works related to the computational intelligence methodologies, performance
improvement and their applications have been extensively reported in the last few
decades. However, the statistical investigation on parameters setting is relatively rare
especially for the Firefly Algorithm (FA). The algorithm parameters have been mostly
set in an ad hoc fashion and may therefore not be performed at its best configuration
and/or appropriate condition. This paper demonstrates the use of statistical tools for
investigating on the appropriate parameters setting of the FA, being applied to solve
scheduling problem in multi-product, multi-stage, multi-machine environments. The
algorithms take into account the Just-in-Time philosophy by aiming to minimise the
combination of earliness and tardiness penalty costs. The statistical analysis on the
experimental results indicated that the performance of the proposed algorithm can be
improved magnificently after adopting the appropriate parameters setting identified by
the statistical tools.

Keywords: Artificial Intelligence, Computational Intelligence, Firefly Algorithm,
Experimental Design and Analysis, Production Scheduling.

1. INTRODUCTION
Nature is always being a source of inspiration and there has been increasing interests in the
development of computational models or methods that iteratively conduct stochastic search
process inspired by natural intelligence. Computational intelligence can be categorised into
three groups [1, 2]: physically-based inspiration for instance Simulated Annealing [3];
socially-based inspiration such as Tabu Search [4]; and biologically-based inspiration e.g.
Neural Network [5], Genetic Algorithm [6], Shuffled Frog Leaping [7], Swarm Intelligence
[8], Ant System [9], Artificial Immune System [10], Artificial Bee Colony [11] and Firefly
Algorithm [12].

Research works related to the computational intelligence methodologies, performance
improvement and their applications have been extensively reported in the last few decades
[13-15]. For example, a survey conducted by Chaudhry and Luo [15] stated that 67.98% of
178 research articles related to Genetic Algorithm (GA) have used GA to solve scheduling or
facility layout problems. Aytug et al. [13] reviewed more than 110 GA articles published
between 1996-2002. They found that GA parameters and operators have mostly been selected
in an ad hoc fashion, rather than using a systematic design of experiments approach.

Firefly-inspired algorithm is relatively new metaheuristic compared with other biological
inspired optimisation methods. Although the Firefly Algorithm (FA) seems promising for
dealing with optimisation problems, very few research works related to the FA have been
reported in literature. The algorithm has been completely applied to solve continuous
mathematical functions [12, 16]. However, there is no report on international scientific
G00034
March 23-26, 2010

530
databases related to the investigation of the FA parameter setting and its application on the
combinatorial optimisation problems.

The objectives of this paper were to: i) demonstrate the use of statistical tools called
experimental design and analysis for investigating the appropriate parameters setting of the
Firefly Algorithm applied for solving multiple-stage multiple-machine multiple-product
scheduling (MMMS) problems using data obtained from a collaborating company, which
manufactures complex capital goods with deep and complex product structures; and ii)
compare the algorithm performance with and without using the optimised parameter setting.

The remaining sections firstly present a brief introduction on scheduling problem in multiple-
stage multiple-machine multiple-product environment followed by the process of the Firefly
Algorithm and it pseudo code for scheduling the manufacture and assembly of complex
products, which are described in section 3. Section 4 presents the experimental design and
provides a statistical analysis on the experimental results. These are followed by the
conclusions in section 5.

2. PRODUCTION SCHEDULING PROBLEM
Scheduling is defined as the allocation of resources over time to perform a collection of
tasks [17]. A schedule specifies sequence and timing, normally expressed as a set of start and
due times [18]. Scheduling is a combinatorial optimisation problem that is classified as an NP
hard problem [19], which means that the amount of computation required to find solutions
increases exponentially with problem size. Scheduling is important because companies seek
to minimise lead-times and simultaneously achieve high resource utilisation.

Various assumptions have been made in order to simplify, formulate and solve scheduling
problems. The most common assumptions can be summarised as follows [20]: a successor
operation is performed immediately after its predecessor has finished, providing that the
machine is available; each machine can handle only one operation at a time; each operation
can only be performed on one machine at a time; there is no interruption of operations; there
is no rework; setup and transfer times are of zero or uniform duration; and tasks are
independent. Classical job shop and flow shop scheduling problems generally consist of a set
of independent tasks. This is known as single stage scheduling, which means that there are no
precedence constraints arising from assembly requirements.

Production scheduling in the capital goods industry is difficult for several reasons. Firstly,
demand is highly variable and uncertain. The products (e.g. steam turbine generators and
power station boilers) are complex and are produced from components that require a large
number of operations on machines with high capital and operating cost. Different control
approaches should be applied to the resources with high and low utilisation [18, 21]. A final
product requires a number of assemblies, subassemblies, parts and components, in each of
which a sequence of operations to be performed on multiple machines is specified. There are
many operation and assembly dependency relationships. There are also multiple finite
capacity resource constraints and the performance objectives may vary for different product
families. Finally, the nature of production leads to large variations in product mix. Feasible
schedules must correctly sequence operations and satisfy precedence constraints, and
assembly relationships.

There has been limited research relating to multi-stage multi-machine multi-product
scheduling (MMMS) problem. Genetic Algorithm based scheduling tool has been developed
for solving the MMMS problem [22] and reported that the schedules obtained from GA
outperformed the schedules obtained from a collaborating company. Pongcharoen et al. [23]
investigated the significance of the proposed repair process and found the best values for GA
G00034
March 23-26, 2010

531
parameters using regression analysis. However, the computational experiments conducted in
these papers ignored the genetic operations. The repair process used has consisted of four
steps: i) operation precedence adjustment; ii) part precedence adjustment; iii) timing
assignment; and iv) deadlock adjustment. The part precedence adjustment has been designed
to satisfy product structure constraints i.e. an assembly requires all its subassemblies and
components to be complete before it can be produced. A deadlock situation may arise when
there is a conflict between operation and part precedence constraints for different items that
both require a common resource.

3. FIREFLY ALGORITHM FOR SCHEDULING
Firefly Algorithm (FA) was recently introduced by Yang [12], who was inspired by firefly
behaviours. The fireflies are social insect producing short and rhythmic flashes for
communicating with mating partners and attracting potential prey. Since the light intensity
decreases proportionally as the distance increase, the flashing light is visible to the others
nearby in limited distance. The development of firefly-inspired algorithm was based on three
idealised rules: i) artificial fireflies are unisex so that sex is not an issue for attraction; ii)
attractiveness is proportional to their flashing brightness so that the most attractive firefly is
the brightest one, to which it convinces neighbours moving toward. In case of no brighter one,
it freely moves any direction; and iii) the brightness of the flashing light can be considered as
objective function to be optimised.
The main steps of the Firefly Algorithm start from initialising a swarm of fireflies, each of
which is determined the flashing light intensity (usually based on the objective function
associated with the problem considered). During the loop of pairwise comparison of light
intensity, the firefly with lower light intensity will move toward the higher one. The moving
distance depends on the attractiveness. After moving, the new firefly is evaluated and updated
for the light intensity. During pairwise comparison loop, the best so far solution is updated.
The pairwise comparison process is repeated until termination criteria are satisfied. Finally,
the best so far solution is visualised. The pseudo code of Firefly Algorithm (FA) applied to
solve the production scheduling problem is shown in Figure 1. The FA based scheduling
program was coded in modular style using a general purpose programming language called
TCL/TK. The scheduling program developed can be categorised into three phases: i) input
phase, in which products information and its manufacturing data including machining time,
part code, product structure and resource were uploaded into the program; ii) scheduling
phase, where the proposed algorithms were used to generate and evaluate schedules
constrained by precedence relationships and finite resource capacity; and iii) output phase
including information on the best production schedule found and its penalty cost. Graphic
user interface was considered during the development of the program to allow users to
manipulate data, set parameters and choose outputs from the program.

Figure 1. Pseudo code of the ABC algorithm adopted from [12].
Objective function f(x), x = (x1, ..., xd)
T

Generate initial population of fireflies xi (i = 1, 2, ..., n)
Light intensity Ii at xi is determined by f(xi)
Define light absorption coefficient
While t < MaxGeneration (G)
For i = 1 : n all n fireflies
For j = 1 : i all n fireflies
If (Ij > Ii ), Move firefly i towards j in d-dimension; End if
Attractiveness varies with distance r via exp[r]
Evaluate new solutions and update light intensity
End for j
End for i
Rank the fireflies and find the current best
End while
Postprocess results and visualisation
G00034
March 23-26, 2010

532
In this work, scheduling data including resources profile, product information, manufacturing
and operational data as well as customer due date obtained from a collaborating company
engaged in make/engineer to order capital good industry were experimented as a case study.
The considered problem consisted of two products, each of which has four levels of product
structure. To manufacture both products, it involves a combined requirement of 118
machining operations and 17 assembly operations to be performed on 17 non-identical
machines (resources).

4. EXPERIMENTAL DESIGN AND ANALYSIS
This section presents the design and analysis on the computational experiments conducted
using the Firefly Algorithm (FA) to solve production scheduling problem obtained from a
collaborating company engaged in capital goods industry. The experiment was aimed to
systematically investigate the appropriate setting of the FA parameters. Due to the number of
FA parameters, each of which are considered with three levels and without using a proper
experimental design, it will end up with a large number of experimental runs. This is time
consuming and requires large resources as well as computing expenses. To overcome those
difficulties, the one-forth Fractional Factorial Experimental Design (3
k-2
) [24] was adopted in
this work. The proposed design was able to reduce a number of computational runs by
88.88% compared with the Full Factorial Experimental Design.

Table 1. Firefly Algorithms parameters and its levels considered.
Factors Levels
Uncoded Values
Low(-1) Medium(0) High(1)
Combination of nG 3 25*100 50*50 100*25
Light absorption coefficient () 3 0.1 5 10
Randomisation parameter () 3 0 0.5 1
Maximum attractiveness value (
0
) 3 0 0.5 1

Table 1 shows the FA parameters and its levels considered in this work. The first parameter
was the combination of the amount of fireflies (n) and the number of maximum generations
(G). This combination factor plays an important role on the amount of search in the solution
space conducted by the FA. Higher values of both parameters increase the probability of
finding the best solutions but require longer computational time. If there is no timing
constraint and limitation on computing resources, the values of both parameters should be
defined as high as possible. In this work, the experiment was based on timing constraint
scenario. The amount of search (the combination of nG) for the instants problem was
predefined at 2,500. The second factor was the light absorption coefficient (). This factor
typically was varied from 0 to 10 [16]. The last two parameters were the randomisation
parameter () and the maximum attractiveness value (
0
) [12]. Both parameters were
uniformly distributed between zero to one.
Table 2. Analysis of variance (ANOVA) on the FA parameters.
Source DF Sum of Square Mean Square F P
nG 2 1083677778 352550000 26.36 0.000
2 82977778 40913636 3.06 0.059
0
2 352553472 23513542 1.76 0.187
2 339702083 169851042 12.7 0.000
Error 36 481500000 13375000
Total 44 2340411111

The proposed design was computationally experimented with five replications, each of which
was conducted using identical random seed number. The computational results obtained from
45 (3
4-2
*5) runs were analysed using a general linear model form of analysis of variance
G00034
March 23-26, 2010

533
(ANOVA). Table 2 shows ANOVA table consisting of Source of Variation, Degrees of
Freedom (DF), Sum of Square (SS), Mean Square (MS), F and P values. A factor with value
of P0.05 was considered statistically significant with 95% confidence interval.

It can be seen that the combination of nG and the randomisation parameter () were
statistically significant whilst the light absorption coefficient () was found almost significant.
These factors have impact on the performance of the proposed algorithm. Only one
parameter, the maximum attractiveness value (
0
), was found insignificant for the range
considered. However, it is not sensible to discard the parameter with having the P value less
than 0.2 in screening experiments with low power of test. The main effect plots on the levels
of the FAs parameter against the computational results obtained from the FA based
scheduling program are illustrated in Figure 2. It can be seen that the combination of the
amount of fireflies and the number of maximum generations (nG) has large impact on the
total penalty cost associated with the schedule received. With the limited amount of search of
2,500 candidate solutions, the best result was achieved when the nG combination was set at
100*25. This indicated that the higher number of fireflies is more preferable than the number
of maximum generations. Another significant factor, the randomisation parameter (), was
performed best with at the value of 1. The best condition of the light absorption coefficient ()
was found at a value of 5. Finally, the maximum attractiveness value (
0
) was suggested to be
defined to one.
M
e
a
n

o
f

R
e
s
p
o
n
s
e
1 0 -1
198000
195000
192000
189000
186000
1 0 -1
1 0 -1
198000
195000
192000
189000
186000
1 0 -1
NG Gamma
Beta Alpha
Main Effects Plot (data means) for Response

Figure 2. Main effect plots on the FA parameters.

For studying the performance comparison on the best so far schedules produced by the Firefly
Algorithms (FA) based scheduling program with and without using the appropriate parameter
setting mentioned above, the average penalty costs obtained from the program with and
without using optimised setting were analysed. It was found that the average total penalty cost
associated with the schedules obtained using the FA optimised setting was 183,900 Baht
whilst 192,063 Baht was the average total penalty cost for those without using the optimised
setting. The proposed algorithm was therefore improved by 4.25%. This emphasises that the
FA performance can be improved significantly after adopting the optimised parameter setting.
Moreover, the standard deviation of the schedules obtained from the optimised setting was
also lower than those with non-optimised setting.

G00034
March 23-26, 2010

534
5. CONCLUSION
Performance improvement on the nature-inspired optimisation algorithms has mostly been
conducted via the concepts of modification and hybridisation. Another area of improving is to
set the appropriate values for each parameter so that the algorithm can perform on its best
configuration. This paper demonstrates the use of statistical tools for investigating on the
appropriate parameters setting of the Firefly Algorithm, which applied to solve scheduling
problem in multi-product, multi-stage, multi-machine environments. The proposed algorithms
take into account the Just-in-Time philosophy by aiming to minimise the combination of
earliness and tardiness penalty costs. The novel design adopted can reduce a number of
computational runs by 88.88% compared with the common design. The statistical analysis on
the experimental results indicated that the quality of the solutions obtained from the proposed
algorithm can be improved magnificently after adopting the appropriate parameters setting
identified by the statistical tools. It was also found that the variance of the results with
optimised parameter setting was lower than those with non-optimised setting.

ACKNOWLEDGMENTS
The corresponding author would like to acknowledge the Naresuan University as the results
reported in this work was part of the research project supported by the Naresuan University
Research Fund under the grant number EN-AR-053/2552.

REFERENCES
1. Engin, O. and Doyen, A., G. U. J. of Sci., 2004, 17(1), 71-84.
2. Pongcharoen, P., Chainate, W., and Pongcharoen, S., LNCS, 2008, 5132, 220-31.
3. Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P., Science, 1983, 220(4598), 671-9.
4. Glover, F., INFORMS Journal on Computing, 1989, 1(3), 190-206.
5. Hopfield, J.J. and Tank, D.W., Biological Cybernetics, 1985, 52(3), 141-52.
6. Holland, J., Adaptation in Natural and Artificial Systems, Michigan Press, Ann Arbor,
1975.
7. Eusuff, M.M. and Lansey, K.E., J. of Water Res. Plan. and Manag., 2003, 129(3), 210-25.
8. Kennedy, J. and Eberhart, R.C., Swarm Intelligence, Kaufmann Publishers, CA, 2001.
9. Dorigo, M. and Sttzle, T., Ant colony optimization, MIT Press, Massachusetts, 2004.
10. Dasgupta, D., Artificial Immune Systems and Their Applications, Springer, Heidelberg,
1998.
11. Karaboga, D. and Basturk, B., Foundations of Fuzzy Logic and Soft Comp., 2007, 789-98.
12. Yang, X.-S., Stochastic Algorithms: Foundations and Applications. 2009. p. 169-78.
13. Aytug, H., Knouja, M., and Vergara, F.E., Int. J. of Prod. Res., 2003, 41(17), 39554009.
14. Dorigo, M. and Blum, C., Theoretical Computer Science, 2005, 344(2-3), 243-78.
15. Chaudhry, S.S. and Luo, W., Int. J. of Prod. Res., 2005, 43(19), 4083-101.
16. ukasik, S. and ak, S., Semantic Web, Social Net. and Multiagent Systems. 2009. 97-
106.
17. Baker, K.R., Introduction to Sequencing and Scheduling, Wiley and Sons, New York,
1974.
18. Hicks, C. and Pongcharoen, P., Int. J. of Prod. Econ., 2006, 104(1), 154-63.
19. King, J.R. and Spackis, A.S., Int.J. of Phys. Distr. and Mat. Manag., 1980, 10, 105-32.
20. Pongcharoen, P., Khadwilard, A., and Hicks, C., Ind. Eng. & Manag.Sys., 2008, 7(3),
204-13.
21. Hicks, C. and Pongcharoen, P., Int. J. of Tech. Manag., 2009, 48(2), 202-18.
22. Pongcharoen, P., Hicks, C., and Braiden, P.M., Eur. J. of Oper. Res., 2004, 152(1), 215-
25.
23. Pongcharoen, P., Hicks, C., and Braiden, P.M., Int. J. of Prod. Econ., 2002, 78(3), 311-22.
24. Montgomery, D.C., Design and Analysis of Experiments, John Wiley & Sons, NY, 2001.
G00038
March 23-26, 2010
535
Decision Support System for Prediction Air Temperature in
Northern Part of Thailand by using Neural Network

Wattana Kanbua
1,C
and Charn Khetchaturat
2

1
2
E-mail: wattkan@gmail.com; Fax: 023669375; Tel. 023994561

ABSTRACT
This paper presents an alternate approach that uses artificial neural network to simulate
the critical level dynamics of air temperature. The algorithm was developed in a
decision support system environment in order to enable users to process the data. The
decision support system is found to be useful due to its interactive nature, flexibility in
approach and evolving graphical feature and can be adopted for any similar situation to
predict the critical level. The main data processing includes the meteorological satellite
image data, numerical weather prediction product as relative vorticity at 500hPa, the
automatics weather station selection, input generation, lead-time selection/generation,
and length of prediction. This program enables users to process the critical level, to
train/test the model using various inputs and to visualize results. The running results
indicate that the decision support system applied to the critical level of air temperature
warning seems to have reached encouraging results for the risk area under examination.
The comparison of the model predictions with the observed data was satisfactory,
where the model is able to forecast the critical level up to 24 hours in advance with
reasonable of air temperature prediction accuracy, which can be improved by using
quantitative air temperature. The benefits depend on the effective use of the forecast
information, for air temperature monitoring, the operation of air temperature protection
structures and warning of people and livestock. This requires appropriate decision
information in a timely manner to those who need it, where they need it, in a manner
that is easy to understand.

Keywords: Decision Support System; Neural Network; Critical Level; Automatics
weather station; Air temperature.

1. INTRODUCTION
Cold weather continues to affect residents in northern and north eastern parts of
Thailand and there is studying this matter will occur to continue through the coming
year. People should listen to warning about cold weather from Thai Meteorological
Department in order to keep warming body due to the continued freezing
temperatures affecting the area. The winter season started in December to February. The
rather active northeast monsoon prevailed over the mainland China, Thailand. Cold air mass
or high pressure moved southward to lie across northern and northeastern parts. In addition,
high pressure covered the north eastern and northern. These caused abundant air temperature
dropped. The minimum record for 24 hour air temperature was 10.0 degree Celsius. Cold air
mass had made damages i.e. sick was reported at several areas especially in north eastern and
northern part especially top of the mountain areas. The Cold weather triggered by days of air
temperature dropped made people sick and trapped in north eastern and northern of Thailand.
Cold weather in Thailand has caused of people sick and accidents on the roads. The measures
to reduce such cold weather can be by engineering structure and air temperature warning
system. The air temperature warning system can immediately inform the people living in the
G00038
March 23-26, 2010
536
area to take precaution before the cold air mass reaching to villages. By this system the people
suppose to make decision when the cold air mass would arrive and how much the time they
have to prepare cloths, blanket in order to keep their body warm in time. With the new
technology of automatic weather station innovation and modern communication as well as
GPRS, the decision support system for air temperature warning becomes more common and
higher reliable forecasting.

In this paper we have used Artificial Neural Network (ANN), it is a parallel and dynamic
system of highly interconnected interacting parts based on neurobiological models. Here the
nervous system consists of individual but highly interconnected nerve cells called neurons.
These neurons typically receive information or stimuli from the external environment. Similar
to its biological counterpart, ANN is designed to emulate the human pattern recognition
function through parallel processing of multiple inputs i.e. ANN have the ability to scan data
for patterns and can be used to construct non-linear models. Multi-Layer Perceptron ANN
have become widespread in recent years. Three layer networks with sufficient number of
hidden nodes are usually applied due to the continuity of the relevant function. Every network
contains an appropriate number of input and output nodes which is equal to the number of
input and output variables, and the assumed number of hidden nodes. There is no effective
rule for the estimate of the number of hidden nodes.

Input layer
Hidden layers
Output layer
.

.

.
.

.

.
.

.

.
.

.

.

Figure 1. Multi-Layer Perceptron Artificial Neural Network scheme

Multi-Layer Forward Neural Network can explain complex data and more multi-layer, when
the problem has more complex. However test-train process is to combination of weight of
links. One of methods is popular, is Backpropagation.

G00038
March 23-26, 2010
537
Y
1
Y
2
Y
3
Y
h
.

.

.
.

.

.
.

.

.
Input layer Hidden layer Output layer
w
jk
x
1
x
2
x
n
z
1
z
2
z
o
w
ij
Index i Index j Index k
Z
1
Z
2
Z
o
e
e
e
e e

Figure 2. Backpropagation Artificial Neural Network scheme

The backpropagation algorithm works in much the same way as the name suggests: After
propagating an input through the network, the error is calculated and the error is propagated
back through the network while the weights are adjusted in order to make the error smaller.
When I explain this algorithm, I will only explain it for fully connected ANNs, but the theory
is the same for sparse connected ANNs.
Although we want to minimize the mean square error for all the training data, the most
efficient way of doing this with the backpropagation algorithm, is to train on data sequentially
one input at a time, instead of training on the combined data. However, this means that the
order the data is given in is of importance, but it also provides a very efficient way of
avoiding getting stuck in a local minima. I will now explain the backpropagation algorithm, in
sufficient details to allow an implementation from this explanation:

First the input is propagated through the ANN to the output. After this the error
k
e on a
single output neuron k can be calculated as:

k k k
y d e ------------------------------(1)

Where
k
y is the calculated output and
k
d is the desired output of neuron k . This error value
is used to calculate a
k
value,
which is again used for adjusting the weights. The
k
value is calculated by:

) (
k k k
y g e ------------------------------(2)

Where g is the derived activation function. The need for calculating the derived activation
function was why I expressed the need for a differentiable activation function.

When the
k
value is calculated, we can calculate the
j
values for preceding layers. The
j
values of the previous layer is calculated from the
k
values of this layer. By the
following equation:

G00038
March 23-26, 2010
538

K
k
jk k j j
w y g
0
) ( -----------------------------(3)

Where K is the number of neurons in this layer and is the learning rate parameter, which
determines how much the weight should be adjusted. The more advanced gradient descent
algorithms does not use a learning rate, but a set of more advanced parameters that makes a
more qualified guess to how much the weight should be adjusted.

Using these values, the w values that the weights should be adjusted by, can be
calculated by:

k j jk
y w ------------------------------(4)

The
jk
w value is used to adjust the weight
jk
w , by
jk jk jk
w w w and the
backpropagation algorithm moves on to the next input and adjusts the weights according to
the output. This process goes on until a certain stop criteria is reached. The stop criteria is
typically determined by measuring the mean square error of the training data while training
with the data, when this mean square error reaches a certain limit, the training is stopped.
More advanced stopping criteria involving both training and testing data are also used.
If the ANN is fully connected, the running time of algorithms on the ANN is
dominated by the operations executed for each connection The backpropagation is dominated
by the calculation of the
j
and the adjustment of
jk
w , since these are the only calculations
that are executed for each connection. The calculations executed for each connection when
calculating
j
is one multiplication and one addition. When adjusting
jk
w it is also one
multiplication and one addition. This means that the total running time is dominated by two
multiplications and two additions (three if you also count the addition and multiplication used
in the forward propagation) per connection. This is only a small amount of work for each
connection, which gives a clue to how important it is, for the data needed in these operations
to be easily accessible.

3. EXPERIMENTAL
We have studied in areas where Automatics Weather Stations (AWS) were installed,
These AWS belong to the Friends In Need (Of Pa) Volunteers Foundation including
Uttaradit, Sukhothai, Phrae, Chiangmai and Petchaboon. The following procedures have been
applied: 1) select the study area where cold weather frequently occurred, 2) design the system
prototype for data transmitting and computer modeling in air temperature calculating. The
system develops could be sent and received the data from automatic weather stations in every
5 minutes, 3) develop the early warning network between web server and users at local area
via internet system. 4) develop decision support system for air temperature warming with
people participatory. The new AWS network in the warning Alert System installed by the
Friends In Need (Of Pa) Volunteers Foundation. The purpose of this network is monitoring
the amount of air temperature. Part of these Stations are eqipped with meteorologicals sensors
as air temperature, relative humidity, wind speed and direction, solar radiation and this
information is useful for making meteorological forecast that are part of the material that is
going to be employ in the warning alert system. The network works fully automatically or by
including observer information. Also monitoring and administration of the stations, data
communication, store data, alarm handling and process of the measurements are discussed, as
the system is part of the national warning alert system a brief explanation of the inter
institutional system is treated.

G00038
March 23-26, 2010
539

Figure 3. Artificial Neural Network Structure.

The ANN used was a three layer. Input nodes contain month, date, time, air temperature,
humidity, cloud amount, wind direction, wind speed, rainfall, solar radiation and relative
vorticity at 500 hPa. The learning process uses backpropagation method by use error value
has that to turn back to adjust weights and uses sigmoid function as activating function.

Air temperature warning system designed and developed based on Global System for
Mobile communications (GSM)/ General Packet Radio System (GPRS) system was installed
at Uttaradit, Sukhothai, Phrae, Chiang Mai and Petchaboon. GSM provides voice services,
Short Message Service (SMS), circuit-switched data (CSD) and High Speed Circuit Switched
Data (HSCSD). GPRS creates a packet-switched overlay for the GSM network providing IP
connectivity to the Internet and Intranets. The result is a cellular technology capable of
supporting a very broad range of services. The Automatic Weather Stations were installed at
Uttaradit, Sukhothai, Phrae, Chiangmai and Petchaboon. There are many elementary of
meteorological parameters such as air temperature, relative humidity, wind speed and
direction, rain gauge and solar radiation, were installed at AWSs. The data was designed to
transmit to the main computer server in Bangkok via GPRS system every 5 minutes.

Figure 4. Air temperature prediction by using ANN at Chiang Mai station.

G00038
March 23-26, 2010
540

Figure 5. A comparison of computed and observed air temperature at Chiang Mai station.

Figure 6. Air temperature prediction by using ANN at Petchaboon station.

Figure 7. A comparison of computed and observed air temperature at Petchaboon.

G00038
March 23-26, 2010
541

Figure 8. Air temperature prediction by using ANN at Uttaradit station.

Figure 9. A comparison of computed and observed air temperature at Uttaradit station.

At Chiang Mai, we spent time in learning is 15000 epochs, the value of learning rate is
0.01. There used statistical parameters such as the root mean square errors (RMSE) and the
efficiency index (EI) in order to check performance and accuracy of model. Which RMSE is
0.015877 and EI is 0.984123. At Petchaboon, we spent time in learning is 60000 epochs, the
value of learning rate is 0.0095, RMSE is 0.00277, EI is 0.99723. At Uttaradit, we spent time
in learning is 25000 epochs, the value of learning rate is 0.01, RMSE is 0.00381, EI is
0.99619.
Results of simulations obtained from ANN are presented in figure 4 9. The verification
statistics by RMSE and IE, show that the air temperature were simulated very satisfactorily
when short-term prediction of less than 24 h are concerned. In addition, the predictions with
leading time of 24 h still exhibit fairly good results. The RMSEs are less than or equal to
0.01158; and EIs of these forecasts are higher than or equal to 0.98.

5. CONCLUSION
The introduction of this system is giving to risk area warning more information in type
and detail to be an effective tool in providing advance notice of potential of cold weather so
orderly warning people to prepare cloths or keep body warm when the cold air mass or high
air pressure take place prior to the onset of cold weather will require a strong effort to assure
the long term sustainability of the system. The research presented in this paper considered the
effects of air temperature changes to the ANN method used to predict air temperature over
Uttaradit, Chiangmai and Petchaboon as representative of northern region, including larger
training set sizes, seasonal input terms, increased lag lengths, and varying the size of the
network. Increasing the size of the training set did not reduce forecast errors. Similar
G00038
March 23-26, 2010
542
improvements resulted from extending the duration of historical data in the input vector from
1 hour to 24 hours ahead. Future work may compare the accuracy of such models to season
specific models such as those created in this research. The decreases in model performance
associated with lag lengths greater than 24 hours. Feature extraction methods may be able to
reduce the size of the input vector, reducing network degrees of freedom and improving
performance. The results confirm that ANN's have the potential for successful application to
the problem of surface air temperature estimation. The purpose of this study is studying
influence of climate change which effected with air temperature in northern part of Thailand.
We have used soft computing approach such as ANN and the result of experimental is very
well. We can use ANN technique to predict air temperature for 24 hours ahead and we will
attempt to predict air temperature for 1 year ahead.

REFERENCES
1. B.B. Nasution, A.I. Khan, A Hierarchical Graph Neuron Scheme for Real-Time Pattern
Recognition, IEEE Transactions on Neural Networks, vol 19(2), 212-229, Feb. 2008
2. Bryson, A. E., & Ho, Y. C. ( 1969). Applied optimal control. Blaisdel.
3. Derome J., G. Brunet, A. Plante, N. Gagnon, G. J. Boer, F. W. Zwiers, S. J. Lambert, J.
Sheng, et H. Ritchie, 2001: Seasonal Predictions Based on Two Dynamical
Models.Atmos. Ocean., 39, 485-501.
4. Hassoun, M. H. (1995), Fundamentals of Artificial Neural Networks. The MIT Press.
5. Hertz, J., Krogh, A., and Palmer, R. G. (1991), Introduction to The Theory of Neural
Computing. Addison-Wesley Publishing Company.
6. M. Bell, Dr. A. Giannini, E. Grover-Kopec, Dr. B. Lyon, C. Ropelewski, Dr. A. Seth, IRI
Climate Digest January 2005, Climate Impacts December, Contributions to this page
were made by IRI researchers.
7. Minutes: GPRS Infrastructure IP Addressing; Working Party Meeting #1 (held on 19th
April
2000)(http://www.ripe.net/ripe/wg/lir/gprs/007_0047_Minutes_GPRS_infra_addressing.h
tml)
8. Mittra, S.S., Decision support systems: Tools and techniques. John Wiley & Sons, New
York, USA, 1986
9. Siegelmann, H.T.; Sontag, E.D. (1991). "Turing computability with neural nets". Appl.
Math. Lett. 4 (6): 7780.

ACKNOWLEDGMENTS
I would like to express my sincere gratitude and deep appreciation to Chair Professor Pichit
Suvanprakorn for his guidance, invaluable advice, supervision and encouragement throughout
this research which enabled me to complete this research successfully. He was never lacking
in kindness and support. I am particularly indebted to the Friends In Need (Of Pa)
Volunteers Foundation, Thai Red Cross Society for the financial support which has enabled
me to undertake my research.

G00039
March 23-26, 2010
543
Probabilistic Knowledge Discovery from Medical Databases

Nittaya Kerdprasop and Kittisak Kerdprasop
Data Engineering and Knowledge Discovery (DEKD) Research Unit,
School of Computer Engineering, Suranaree University of Technology,
111 University Avenue, Nakhon Ratchasima 30000 Thailand
E-mail: nittaya@sut.ac.th; Fax: 044-224602; Tel. 044-224432

ABSTRACT
Medical knowledge discovery is an emerging area within the data-mining field that
attracts many new researchers from diverse disciplines. During the past decade
numerous learning techniques had been employed to discover useful knowledge from
health examination and clinical data. Unlike past efforts that simply concentrated on the
deployment of well-known learning techniques on medical data sets, our new approach
expands the learning algorithm to deal with uncertain knowledge. We devise an
algorithm to generate probabilistic knowledge from the induced decision tree. The
implementation of the proposed algorithm is demonstrated via second-order predicates
and a meta-programming approach. Experimental results on several medical domains
emphasize the simple form of knowledge representation and the potential of
incorporating learning results as background knowledge in the knowledge-base system.

Keywords: Medical knowledge mining, Probabilistic knowledge induction.

1. INTRODUCTION
The automated learning of models from patient data and biomedical records has become
more and more essential since the extensive computerization in healthcare industry and the
significant advancement in genomic and proteomic technologies during the last decade.
Medical and clinical databases have been created and constantly growing at an exponential
rate. The development of an automatic and intelligent data analysis tool is an obvious solution
to the data-flooding problem in medical domains [4], [12], [13].

Knowledge extraction from huge amount of health databases is expected to ease the
medical decision-making process. The ultimate goal of knowledge extraction is to generate
the most accurate and useful knowledge and represent it in an understandable format. Such
goal is, however, difficult to accomplish due to the learning complexity of knowledge
induction methods and the nonconformity of the database contents. Most of the time
knowledge discovery from medical databases results in reporting large number of irrelevant
knowledge [6], [9]. We thus focus our study on this issue and devise a technique to extract a
limited number of knowledge that is most likely relevant to the specific domain.

In medical knowledge mining, interpretability of results is an important feature of the data
analysis tool. Medical practitioners need a system that can produce accurate results in an
understandable form. Therefore, knowledge represented as rules has been widely used for
knowledge discovery in medical applications. Classification rule induction is an approach
commonly used for building diagnosis models [2], [5], [7], [14]. Association rule mining is an
induction method applied for exploring patterns that are frequently occur in medical data [3],
[10]. Classification and association mining methods are two major techniques for rule
generation that work successfully in many domains. Nevertheless, in medical applications
these learning techniques tend to generate a lot of rules. Too many rules, some are redundant
and uninteresting, cause problems to the medical practitioners because a truly relevant one
can be easily overlooked.
G00039
March 23-26, 2010
544
We thus propose a rule induction method based on the decision-tree structure that adopts
the probability concept to select the most probable applicable rules. The outline of this paper
is as follows. After the introductory section, we present in Section 2 the general framework
and the algorithms of our proposed method. The implementation and experimental results are
illustrated in Sections 3. Section 4 concludes this paper with the discussion on further
research direction.

2. FRAMEWORK AND METHOD FOR PROBABILISTIC
KNOWLEDGE INDUCTION
Our knowledge induction system (Figure 1) is based on the decision-tree induction method
[11]. Decision tree induction is a popular method for mining knowledge from data and
representing the result as a classifier tree. Popularity is due to the fact that mining result in a
form of decision tree is interpretability, which is more concern among casual users than a
sophisticated method but lack of understandability. A decision tree is a hierarchical structure
with each node contains decision attribute and node branches corresponding to different
attribute values of the decision node. The goal of building decision tree is to partition data
with mixing classes down the tree until each leaf node contains data with pure class.
Probabilistic Knowledge Induction System

Patient records
Clinical data &
Other documents

Request/query
Medical
practitioner Response

Figure 1. A general framework for the tree-based probabilistic knowledge induction.

In our system framework, we increase interpretability of the knowledge mining results by
transforming the decision tree structure into a small set of decision rules. After a complete
decision tree has been created, we calculate the probability of case occurrence augmented
with each leaf node. In the phase of decision rule generation, these probability values will be
sorted. Rules within the top ranking part will be displayed to assist medical practitioner for
making decision. In the designed framework, probabilistic knowledge induction system is
composed of four main components: data integration, tree induction, probabilistic-rule
generation, and the knowledge inferring and answering engines. Data integration component
is responsible for collecting data from different sources, cleaning and format transforming.
These data are to be used by the tree induction component.

precise knowledge
Data repository
Tree
Induction
Probabilistic-
rule Generation
Data
Integration
Knowledge inferring and answering engines
probabilistic
knowledge
G00039
March 23-26, 2010
545
Given the induced tree, the probabilistic-rule generation component traverses each tree
branch to calculate the likelihood of path occurrence. This likelihood is interpreted as the
probability of event and associated to the rule generated from the path traversal. The
generated probabilistic rules are then sorted. Rules at the top ranking (specified by the given
minimum probability) are stored in the knowledge base as the probabilistic knowledge and
could be used for recommendation or answering query to the medical practitioner. Algorithms
for knowledge induction based on tree structure (Algorithm 1), probabilistic-rule generation
from decision tree (Algorithm 2), and probabilistic knowledge inferring to answer the most
probable class decision on new case (Algorithm 3) are given in the following.

Algorithm 1 Knowledge induction
Input: a data set formatted as Prolog clauses
Output: a decision tree with node and edge structures
(1) Initialization
(1.1) Clear temporary knowledge base (KB) by removing all information regarding the
predicates node, edge and current_node
(1.2) Set node counter = 0
(1.3) Scan data set to get information about data attributes, positive instances, negative
instances, total data instances
(2) Building tree
(2.1) Increment node counter
(2.2) Repeat steps 2.2.1-2.2.4 until there is no more attributes left for creating decision
attributes
(2.2.1) Compute Info value of each candidate attribute
(2.2.2) Choose the attribute that yields minimum Info to be decision node
(2.2.3) Assert edge and node information into KB
(2.2.4) Split data instances along node branches
(2.3) Repeat steps 2.1 and 2.2 until the lists of positive and negative instances are
empty
(2.4) Output tree structure containing node and edge predicates
Algorithm 2 Probabilistic knowledge generation
Input: a decision tree with node and edge structures, and a probability threshold
Output: a set of probabilistic rules ranking from the highest probability
(1) Traverse tree from a root node to each leaf node
(1.1) Collect edge information and count number of data instances
(1.2) Compute probability as a proportion
(number of instances at leaf node) / (total data instances in a data set)
(1.3) Assert a rule containing a triplet (attribute-value pair, class, probability value) into
KB
(2) Sort rules in the KB in descending order according to the rules probability
(3) Remove rules that have probability less than the specified threshold
(4) Assert selected rules into the KB and return KB as an output
Algorithm 3 Probabilistic knowledge inferring
Input: a KB containing probabilistic knowledge, and a new case with unknown class
value
Output: a decision on most likely class of the new case
(1) Read all attribute-value pairs appeared in the given case
(2) Compare the pairs with each relevant rule in the KB to get the decision class value
(3) Compute the decision confidence as
(number of matched attribute-value pair) (probability of the decision rule)
(4) Output a final decision based on the voting scheme

G00039
March 23-26, 2010
546
3. IMPLEMENTATION AND EXPERIMENTAL RESULTS
In this section, we present the implementation technique of our proposed tree-based
probabilistic-knowledge induction framework using logic programming paradigm. Prolog
code is based on the syntax of SWI Prolog (www.swi-prolog.org).
Data format. In logic programming, program and data take the same format, i.e. all are in
Prolog clausal form. For the purpose of demonstration, we use the health examination data of
86 patients after the operation. The general conditions such as blood pressure and temperature
are observed to determine whether the patient is in good condition and should be sent home
shortly (class=home), or the condition is quite moderate and should stay at the hospital ward
for further follow up (class=ward). Even though binary classification is a typical task in
medical domains, the code presented in this section can be easily modified to classify data
with more than two classes. The post-operative data set (downloadable from the UCI
repository [1]) in Prolog clausal form is shown some part as the following.
attribute(internalTemp, [mid,high,low]).
attribute(surfaceTemp, [mid,high,low]).
attribute(oxygenSaturation,[excellent,good,fair,poor]).
attribute(bloodPressure, [high,mid,low]).
attribute(tempStability, [stable,mod-stable,unstable]).
attribute(coreTempStability,[stable,mod-stable,unstable]).
attribute(bpStability, [stable,mod-stable,unstable]).
attribute(comfort, [5,7,10,15]).
attribute(class, [home,ward]).
instance(1,class=ward,[internalTemp=mid,surfaceTemp=low,oxygenSaturation=excellent,
bloodPressure=mid,tempStability=stable,coreTempStability=stable,
bpStability=stable,comfort=15]).

Main module. The three algorithms, explained in the previous section, are called by the
main module, which is the top-level of our program implementation. The Prolog coding of
main module is as follows:
main:-init(AllAttr,EdgeList), getnode(N),
create_edge_onelevel(N,AllAttr,EdgeList),
addKnowledge,
write(chooseMinProb),read(Min),
selectRule(Min,Res), maplist(writeln,Res).

The built-in predicate maplist is a second-order predicate [8] provided in the library of
SWI Prolog. Its implementation is declared recursively as the following. The predicates init
and getnode in main module invoke the following initialization process. The predicates
assert and retractall are also second-order predicates responsible for asserting and
removing, respectively, information in the knowledge base. Another second-order predicate
apply repeatedly assert clauses into the knowledge base. The built-in second-order
predicate findall searches for all solutions that satisfy the constrained predicates.
Probabilistic-rule generation. In main module, the predicates addKnowledge and
selectRule(Min,Res) are invoked to compute probability along each tree branch to
generate probabilistic rules and then select only rules that could occur at the probability level
higher than the specified threshold. Prolog coding of this module is as follows:
addKnowledge:- findall([A],pathFromRootToLeaf(A,_),Res),
retractall(_>>_>>_), maplist(apply(assert),Res).
selectRule(V,Res):- findall(N>>X>>Class,(X>>Class>>N,N>=V),Res1),
sort(Res1,Res2), reverse(Res2,Res).
G00039
March 23-26, 2010
547
pathFromRootToLeaf(V>>Class>>Num,C):- path(1,V,C),
node(C,Value1-Value2),(Value1=[]; Value2=[]),
(Value1=[]->length(Value2,Numb);
length(Value1,Numb)),
total+Total, Num is Numb/Total,
(Value1=[]->Class=home; Class=ward).

Running results on probabilistic-rule induction. For the demonstration purpose, we
show the final result of probabilistic rule induction with a specified minimum threshold 0.02.
Each result has been formatted as probability >> rules conditions (shown as attribute-value
pairs) >> decision on class value (either home or ward).
0.0930233>>[comfort=10, bloodPressure=high, surfaceTemp=low]>>home
0.0581395>>[comfort=10, bloodPressure=mid, coreTempStability=stable,
internalTemp=mid, bpStability=stable, surfaceTemp=mid,
tempStability=unstable]>>home
0.0465116>>[comfort=10, bloodPressure=high, surfaceTemp=mid,
bpStability=mod_stable]>>home
0.0348837>>[comfort=15, bpStability=unstable, surfaceTemp=mid]>>home
0.0348837>>[comfort=15, bpStability=stable, internalTemp=mid,
tempStability=stable]>>home
internalTemp=mid, bpStability=unstable, surfaceTemp=mid,
0.0348837>>[comfort=10, bloodPressure=low]>>home
0.0348837>>[comfort=10, bloodPressure=high, surfaceTemp=mid,
bpStability=stable, oxygenSaturation=excellent]>>home
0.0232558>>[comfort=15, bpStability=unstable, surfaceTemp=high]>>home
0.0232558>>[comfort=15, bpStability=mod_stable]>>home
0.0232558>>[comfort=10, bloodPressure=mid,
coreTempStability=unstable, tempStability=unstable,
bpStability=stable]>>ward
internalTemp=mid, bpStability=stable,
surfaceTemp=low]>>home
internalTemp=mid, bpStability=mod_stable,
surfaceTemp=high]>>home
internalTemp=low, surfaceTemp=low,
internalTemp=high]>>home

4. CONCLUSION
Modern healthcare organizations generate huge amount of electronic data stored in
heterogeneous databases. Data collected by hospitals and clinics are not yet turned into useful
knowledge due to the lack of efficient analysis tools. We thus propose a rapid prototyping of
an automatic knowledge-mining tool to induce probabilistic knowledge from medical data.
The induced knowledge is to be integrated into the knowledge base of a medical decision
support system. Thus, in our design the knowledge base will be composed of precise
knowledge as well as a minimal set of induced probabilistic knowledge. Discovered
knowledge can also facilitate the reuse of knowledge base among decision-support
applications within organizations that own heterogeneous clinical and health databases. Direct
application of medical probabilistic knowledge base is for medical related decision-making.
Other indirect but obvious application of such knowledge is to pre-process other data sets by
grouping it into focused subset containing only relevant data instances.
G00039
March 23-26, 2010
548
The main contribution of this work is our implementation of knowledge mining engines
based on the concept of higher-order Horn clauses using Prolog language. Higher-order
programming has been originally appeared in functional languages in which functions can be
passed as arguments to other functions and can also be returned from other functions. This
style of programming has soon been ubiquitous in several modern programming languages
such as Perl, PHP, and JavaScript. Higher order style of programming has shown the
outstanding benefits of code reuse and high level of abstraction. This paper illustrates higher
order programming techniques in SWI-Prolog. The powerful feature of meta-level
programming in Prolog facilitates the reuse of mining results represented as rules to be
flexibly applied as conditional clauses in other applications.
The plausible extensions of our current work are to add constraints into the knowledge
mining method in order to limit the search space and therefore yield the most relevant and
timely knowledge, and due to the uniform representation of Prologs statements as a clausal
form, mining from the previously mined knowledge should be implemented naturally. The
probabilistic knowledge induction and inferring techniques presented in this paper can be
applied to the development of probabilistic databases. We also plan to extend our system to
work with stream data that normally occur in modern medical organizations.

REFERENCES
1. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine,
CA (2007) http://archive.ics.uci.edu/ml/
2. Bojarczuk, C.C. et al.: A Constrained-Syntax Genetic Programming System for Discovering
Classification Rules: Application to Medical Data Sets. Artificial Intelligence in Medicine 30, 27--
48 (2004)
3. He, Y., Tang, Y., Zhang, Y., Sunderraman, R.: Adaptive Fuzzy Association Rule Mining for
Effective Decision Support in Biomedical Applications. Int. J. Data Mining and Bioinformatics 1,
1, 3--18 (2006)
4. Kononenko, I.: Machine Learning for Medical Diagnosis: History, State of the Art and Perspective.
Artificial Intelligence in Medicine 1, 89--109 (2001)
5. Kretschmann, E. et al.: Automatic Rule Generation for Protein Annotation with the C4.5 Data
Mining Algorithm Applied on SWISS-PROT. Bioinformatics 17, 10, 920--926 (2001)
6. Li, J., Fu, A., Fahey, P.: Efficient Discovery of Risk Patterns in Medical Data. Artificial Intelligence
in Medicine (2008) doi:10.1016/j.artmed.2008.07.008
7. Mugambi, E. et al.: Polynomial-Fuzzy Decision Tree Structures for Classifying Medical Data.
Knowledge-Based System 17, 2-4, 81--87 (2004)
8. Nadathur, G., Miller, D.: Higher-Order Horn Clauses. J. ACM 37, 777--814 (1990)
9. Ordonez, C.: Comparing Association Rules and Decision Trees for Disease Prediction. In: Proc. Int.
Workshop on Healthcare Information and Knowledge Management, pp.17--24 (2006)
10. Ohsaki, M., Sato, Y., Yokoi, H., Yamaguchi, T.: A Rule Discovery Support System for Sequential
Medical Data in the Case Study of a Chronic Hepatitis Dataset. In: Proc. ECML/PKDD-2003
Discovery Challenge Workshop, http://www.lisp.vse.cz/challenge/ecml pkdd2003
11. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81--106 (1986)
12. Roddick, J.F. et al.: Exploratory Medical Knowledge Discovery: Experiences and Issues. ACM
SIGKDD Explorations Newsletter 5, 1, 94--99 (2003)
13. Shillabeer, A., Roddick, J.F.: Establishing a Lineage for Medical Knowledge Discovery. In: Proc.
6
th
Australasian Conf. on Data Mining and Analytics, pp.29--37 (2007)
14. Zhou, Z., Jiang, Y.: Medical Diagnosis with C4.5 Rule Preceded by Artificial Neural Network
Ensemble. IEEE Transactions on Information Technology in Biomedicine 1, 37--42 (2003)

ACKNOWLEDGMENTS
This research has been funded by grants from the National Research Council of Thailand
(NRCT) and the second author is supported by the Thailand Research Fund (TRF, grant
number RMU5080026). DEKD has been fully supported by Suranaree University of
Technology.
G00041
March 23-26, 2010
549
Classify Freshwater Fish Using Morphometric Analysis
and Image Processing Technique

P. Khamchuay, K. Jaroensutasinee, and M. Jaroensutasinee
Center of Excellence for Ecoinformatics and Computational Science Undergraduate Program,
School of Science, Walailak University, 222, Thasala District, Nakhonsithammarat 80161, Thailand
E-mail: p.khamchuay@gmail.com, krisanadej@gmail.com, jmullica@gmail.com;
Fax: 66 0 7567 2004; Tel. 084-9922918

ABSTRACT
This study aims at applying the morphometric method and image processing technique
to identify and group freshwater fish species. In this study, we collected freshwater fish
from eight waterfalls at Khao Nan and Khao Luang National Parks, Nakhon Si
Thammarat. Eight waterfalls were Thep Chana, Tha Leek, Suan Khan, Tha Phae,
Sunanta, Karom, Phrom Lok and Nan Chong Fa. Thirteen freshwater fish species were
collected: (1) Crossocheilus reticulates, (2) Danio regina, (3) Garra rufa, (4)
Pentaprion longimanus, (5) Puntius binotatus, (6) Puntius lateristriga and (7) Tor
tombroides. All fishes were photographed and measured their morphometric
characteristics using the image processing technique in Mathematica. All fish body
measurements (i.e. body length, body width, head length, dorsal fin length, and
standard body length) were analyzed using a Principal Component Analysis (PCA).
Our PCA results showed that fish from the same species were grouped tightly together.
However, there were two fish species (Tor tambroides and Puntius binotatus) that were
mostly overlapped. This is because these two species had very similar morphology in
term of their body length and form. Based on the PCA results, there were two distinct
fish groups: (1) long slender shape and (2) wide shape. In short, the morphometric
method and the image processing technique can be used to classify freshwater fish
species.

Keywords: Morphometric Measurement, Image Processing Technique, Principal
Component Analysis (PCA).

1. INTRODUCTION
The morphometric analysis of the body shape is one of the most used to identify or
investigate differences between the fish species [1-2], describing their spatial distribution and
possible temporal change in the aggregation of fish from different area [3-6]. Consistent
morphometric differences between locations may indicate a populational separation [2-3] i.e.
the existence of different stock units, even if those differences are caused by environmental
influence. An extensive mixing of individuals from different locations would make those
differences undetectable [2]. Traditional multivariate morphometrics are now more effective
for stock identification through better data acquisition, more effective description of shape [1]
and it is a powerful technique to investigate the geographical variation of stocks [4]. This
study aims at applying the morphometric method and image processing technique to identify
and group freshwater fish species from eight waterfalls at Khao Nan and Khao Luang
National Parks, Nakhon Si Thammarat.
G00041
March 23-26, 2010
550
THEORY AND RELATED WORKS
Study site
Freshwater fish samples were collected 20-24 April 2009 from eight waterfalls at Khao
Nan and Khao Luang National Parks, Nakhon Si Thammarat, Thailand (Table 1, Figure 1).
Eight waterfalls were Thep Chana, Tha Leek, Suan Khan, Tha Phae, Sunanta, Karom, Phrom
Lok and Nan Chong Fa.
Table 1. The locations of all eight study site waterfalls.
Waterfall Latitude (N) Longitude (E)
Wang Muang 08.92341 99.66289
Tha Leak 08.91316 99.73191
Nan Chong Fa 08.51328 99.43258
Thep Chana 08.78330 99.78764
Sunanta 08.76735 99.80175
Phrom Lok 08.31377 99.46591
Karom 08.37415 99.73617
Tha Pae 08.36014 99.68970
Figure 1. Locations of study sites marked on Google Earth

Data collection
Seven freshwater fish species were collected: (1) Crossocheilus reticulates, (2) Danio
regina, (3) Garra rufa, (4) Pentaprion longimanus, (5) Puntius binotatus, (6) Puntius
lateristriga, and (7) Tor tombroides. All fishes were photographed and measured their
morphometric characteristics using the image processing technique in Mathematica. All fish
body measurements (i.e. body length, body width, head length, dorsal fin length, and standard
body length) were analyzed using a Principal Component Analysis (PCA). The Cluster
Analysis was used to analyze the clustering of data. We captured freshwater fish at each
waterfall, placed them in a small aquarium tank (15 x 30 x 30 cm
3
in dimension) with a ruler
glued on the side of the aquarium, and photographed them. Once all fishes were
photographed, they were released back to the waterfall.

Table 2. Species and number of freshwater fish collected from study sites.

Species (Thai name) Number of fish individuals
Crossocheilus reticulatus () 19
Danio regina ( ) 106
Garra rufa () 24
Pentaprion longimanus () 26
Puntius binotatus () 33
Puntius lateristriga () 10
Tor Tombroides () 99
Total 312
G00041
March 23-26, 2010
551
Traditional measurements
For the past 50 years, morphometric investigations have been based on a set of traditional
measurements [7]. These measurements have recently been criticized because they are
concentrated along the body axis with only sampling from body depth and body breadth, and
most measurements are on the head. Furthermore, individual measurements are all along the
body length. Some morphological landmarks such as the tip of snout and the posterior rend of
the vertebral column are used repeatedly as a central point for most of the measurements.
Traditional measurements are concentrated along the body axis with sampling from body
depth and body breadth, and most measurements are on the head. Furthermore, individual
measurements are all along the body length. Some morphological landmarks such as the tip of
snout and the posterior rend of the vertebral column are used repeatedly as a central point for
most of the measurements.

Figure 2. Traditional morphometric measurements. Seven traditional morphometric variables:
Total length (tl), Fork length (fl), Standard length (sl), Head length (hl), Eye diameter (ed),
Snout length (snl), and Body depth (bd)
Each individual fish was measured seven traditional morphometric variables including
Total length (tl), Fork length (fl), Standard length (sl), Head length (hl), Eye diameter (ed),
Snout length (snl), and Body depth (bd) (Figure 2).

The morphometric measurements of fishes were measured using Mathematica program
with the image processing technique. The program analyzed photographs of fishes taken from
study sites, then, collected the coordinates of morphometric measurements. Then, the program
exported the all morphometric measurements in .xls files.
Principal Component Analysis (PCA) was used: (i) to analyze the separation through the
matrix variability; (ii) to identify collinear variables; and to (iii) to choose the right subset of
variables (influential variables) to be used in the SDA. Cluster Analysis used to clustering the
PCA results into groups of similarly morphometric measurement fish.

The PCA scores of traditional morphometric measurements were grouped by using
Cluster Analysis (Figure 3a-f).
G00041
March 23-26, 2010
552

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 3. PCA scores and clustering results. PCA1 and PCA2 (a,b) total length and
standard length, (c,d) total length and head length, (e,f) total length and snout length and (g,h)
folk length and eye diameter. Colors represent Crossocheilus reticulate, Danio regina,
Garra rufa, Pentaprion longimanus, Puntius binotatus, Puntius lateristriga, and Tor
tombroides.
The PCA scores of total length and standard length (Figure 3a,b), and total length and
head length (Figure 3c,d) grouped freshwater fish into two groups: (1) Crossocheilus
reticulates, Puntius binotatus, Puntius lateristriga, and Tor tombroides and (2) Danio
regina, Garra rufa, and Pentaprion longimanus.
G00041
March 23-26, 2010
553
The PCA scores of total length and eye diameter grouped freshwater fish into three
groups: (1) Crossocheilus reticulates, Puntius binotatus, and Tor tombroides, (2)
Danio regina, Garra rufa, and Puntius lateristriga and (3) Pentaprion longimanus and
Danio regina (Figure 3e-f).
The PCA scores of fin length and eye diameter grouped freshwater fish into two groups:
(1) Pentaprion longimanus, Puntius binotatus, Puntius lateristriga, and Tor
Tombroides, and (2) Crossocheilus reticulates, Danio regina, and Garra rufa (Figure
3g-h).
Our results show that freshwater fish can be grouped based on seven traditional
morphometric measurements. The standard length and head length gave the same
classification. However, the PCA scores of total length and eye diameter grouped Danio
regina into two cluster groups. This suggests that different morphometric measurements that
were used could give different cluster group.
Our results show that the PCA scores of fin length and eye diameter shows different
pattern than other morphometric measurements (Figure 3a-h). These could PCA scores of fin
length and eye diameter group Crossocheilus reticulates in the same group with Danio
regina, and Garra rufa.
Morphometric studies have been able to identify differences emphasizing them as helpful
tool for the discrimination of fish population [2, 4, 8]. Morphometric combines with an image
analysis is a step ahead to produce a better understanding of stock structuring of fish species
[4]. Within this context, our study highlights that scale morphology can be successful used to
discriminate [9] fish populations at a fine spatial scale, i.e. a waterfall stretch. The use of fish
scale morphology neither is an easy-to-implement method, relatively rapid, inexpensive and
does nor requires fish sacrifice [9]. Since the identification of populations and their
connectivity between each other is a major point for conservation and management of
vulnerable species, that use of scale morphology to this purpose appears particularly
promising [9].

4. CONCLUSION
The freshwater fish can be classified using morphometric measurements based on the
traditional measurement method. Principal Component Analysis and Cluster Analysis can be
used to analyze the data.

REFERENCES
1. Cardin, S. H. & Friedland, K. D., Fisheries Res., 1999, 43, 129-39.
2. Murta, A. G., J. Mar. Sci., 2000, 57, 1240-8.
3. Saborido-Rey, F., and Nedreaas, K. J., J. Mar. Sci., 2000, 57, 965-75.
4. Palma, J., and Andrade, J. P., Fisheries Res., 2002, 57, 1-8.
5. Silva, A., J. Mar. Sci., 2003, 60, 1352-60.
6. Salani, J. P., Milton, D. A., Rahman M. J., and Hussian, M. G., Fisheries Res., 2004,
66, 53-69.
7. Hubbs, C. L., and Lagler, K. F., Cranbook Inst. Sci. Bull., 1947, 26, 1-186.
8. Bowering, W. R., Can. J. Fish. Aqua. Sci., 1998, 45, 580-5.
9. Poulet, N., Reyjol, Y., Collier, H., and Lek, S., Aqua. Sci., 2005, 67, 122-7.

ACKNOWLEDGMENTS
This work was supported by Center of Excellence for Ecoinformatics, the Institute of
Research and Development, NECTEC/Walailak University.
G00059
March 23-26, 2010
554
Misalignment Compensation of Sheet Metal
Forming Tool by Loop-shaping Controller

S. Poodchakarn
1,A
, D. Sriprapai
2
, D. Budcharoentong
3
,
S. Saimek
4
and C. Thanadngarn
5

1,2,3
Dept. of Tool and Materials Engineering, Faculty of Engineering,
King Mongkuts University of Technology Thonburi, 10140,Thailand
4
Dept. of Mechanical Engineering, Faculty of Engineering,
King Mongkuts University of Technology Thonburi, 10140, Thailand
5
Dept. of Production Engineering, Faculty of Engineering,
King Mongkuts University of Technology North Bangkok, 10800,Thailand
A
E-mail: srpkarn@yahoo.com; Tel: 089-8866451; Fax: 02-5870029

ABSTRACT
The dynamic modeling of a flexible pillar with the Rotational Eccentric Bushing (REB)
of sheet metal forming tool is developed using an extended Hamiltons Principle and
Euler-Bernoullis beam equation. Low-degree power functions in space variables of
each substructure (pillar or beam) are adopted as shape functions for the purpose of
discretization. Boundary conditions between substructures are then considered in the
form of linear constraints of generalized (modal) coordinates and are enforced a
posteriori, yielding the assembled dynamic system. This modified approach allows a
systematic formulation which is independent of the problem characteristics and
analysis's initiative, and allows a simpler reduced-order model with less degrees of
freedom than those obtained by other discretization schemes, such as the finite element
method. Then using optimal loop-shaping controller and such a model for prismatic
sliding joint at the root of flexible pillar which carrying a payload such as upper die at
the end of pillars. A robust controller is designed for the uncertain parameters and
uncertain dynamics as well as trajectory tracking for avoid the misalignment of forming
tool.The MSC.ADAMS/View and Matlab/Simulink, CAE software used as a tool for
dynamic and controller simulation which output signals consist of measurements
provided by acceleration, angular velocity and position of the upper die and including
applied torque to REB can be measurement. The simulation results confirm the
simplicity and validity of the proposed method that is the upper die misalignment
decreased and closed to zero position.

Keywords: Misalignment Compensation, Sheet Metal Forming Tool, Flexible Pillar,
Loop-shaping Controller.

1. INTRODUCTION
Sheet metal forming is one of the most important manufacturing processes. Since
international competition in the car-making industries is extremely severe, all companies try
to reduce costs on the one hand and increase productivity, technological properties and quality
on the other. The increased manufacturing accuracy, i.e. mainly dimensional accuracy, is
achieved by: (i) smaller machining tolerances of the tools; (ii) increased stiffness of tools and
presses; (iii) CNC monitoring, control and adjustment systems of presses and tools; (iv)
visualization of forming process; and (v) recording of forming parameters [1-5]. In the same
way as the presses, the tools used for sheet forming influence the dimensional accuracy of the
product, the final sheet metal component. Hence, every measure must be taken to improve the
performance of the drawings and stamping tools used in the press shops. Indeed, even under
center loading of presses, mismatch of the upper and lower die will be found. Tilting and
rotation (about the z-axis) of the press slide and the upper tool part take place in relation to the
press table. The problems of tilting and off-set are even more important with regard to the
accuracy of the components produced and the tool life.
G00059
March 23-26, 2010
555
Conventional pillars die guidance systems have been widely used in industrial production as
shown in Fig.1. Those pillar elements act as elastic bodies. If increasing the diameter of
pillars for higher stiffness, the result is the weight to payload ratio of the die set must be high,
and the operation pressing speed is normally quite low as well as high cost for manufactures.
This paper presents new idea to obtain the higher accuracy and precision alignment along
pressing stroke of these metal forming tools, That is by using external force or toque for
actuate those pillars for deflection compensation the misalignment of the upper die tool. To
improve manufacturing tolerances and output rating in metal forming by automation. One of
the main prerequisites is computerized automatic feedback and control of the process by
parameters variation [1-5].

Figure 1. Standard tools and guiding systems Figure 2. Schematic of conventional
in sheet metal working (LVWU, GhK) [1]. 4 pillars die set test rig model with
suitable mechanical joints.
2.1 Mechanical design
The 4-pillar generally conventional standard tool such as shown in Figure 1, can do
simulation by using MSC.ADAMS/View, CAE software [6]. This virtual die set test rig
model shown in Figure 2, consists of hydraulic seal, ram rod, 4 bushings, upper die, 4 pillars
and lower die. All of these components linked together by suitable mechanical joints [12].

2.2 Flexible pillar integrated with suitable actuator
When have more off-center loading in metal forming process. The pillar of forming tool act
as flexible body and can be assumed to be a spring beam or flexible beam because one end
fixed at the lower die and other end inserted in a guide bushing which fixed at the upper die.
From the background problem and motivation stated above. This research presents the idea of
the flexible pillar integrated with robust controller system. The actuator system is an applied
force or torque device, which has more enough energy for supplies to the joint of forming
tool. In case of die set that will be used for this research, it has four pillars which located by a
symmetrical square figure. For easier study will be concentrated on a single flexible pillar.
The actuator, which will be used for applies force or torque at one end of pillar has two
approaches. (i) Applies force at a translational joint that connects with the base of pillar and
lower die as shown in Figure 3. (ii) Applies torque at a revolute joint that connects with
rotational eccentric bushing (REB) and upper die as shown in Figure 4. For this research the
second way was selected to avoid more clearance can be occurred at both base and tip of
pillar. The second reason is the pillar can be machined to be a pin-in-slot mechanism for more
simply experiment and finally, REB mechanism can be controlled to be a precision movement
[12].
G00059
March 23-26, 2010
556

Figure 3. Ram rod connected with planar
joint which can translate only in X-Y plane
and cylindrical joint connected with planar
joint which can slide along Z axis and can
rotate about Z axis also.

Figure 4. Equivalent free body diagram of a
half REBs test rig model on Y-Z plane.

From the motivation of this research. The main mechanical design is to modify the
conventional-concentric bushing to be REB. The boring hole of bushing has a few millimeters
eccentricity such as 2.828 mm for example. The 4 REBs must be arrangement as shown in Figure
6, and the different points from the concentric bushing in Figure 5, are point A is a rotational
center and point B is a boring hole-center of each REB. Length e is an eccentricity between point
A and B on X-Y plane.
When REBs number 2 and 4 were fixed and applies rotational motion function to REBs
number 1 and 3 and vice versa. The upper die displacements will be shown in Figure 7 and can be
implied that there are a few interferential motion between them.
The actuators for apply force/torque to REBs system may be designed as shown in Figure 8.
Hydraulic actuators with 2-rod-hydraulic cylinders and 4/3 servo valves used for apply force to
the REBs swing arms. The rotational angle of each REB is 45 to +45 degrees by the limitation
of those mechanism. When REBs 1, 3 controlled rotation synchronously then the upper die will
be translated in Y direction. When REBs 2, 4 controlled rotation synchronously then the upper
die will be translated in X direction also.
From the above translation behavior of REBs test rig model. For simply analysis a symmetrical
structure, the upper die can be divided by a half of model as shown in Figure 9. The equivalent
flexible pillar combined from each half of pillar 2 and 4. Equivalent bushing combined from each
half of REB 2 and 4. For simply analysis a system of single flexible link. The REB 1 can be
changed to a translational joint at the base of flexible pillar 1 as shown in Figure 3. This
equivalent flexible pillar can be changed to massless equivalent spring (Keq) as shown in Figure
10. Because the displacement of the upper die has a few when compared with displacement of the
translational joint after applied force. Therefore the mass inertia effect of the equivalent flexible
pillar can be ignored.
G00059
March 23-26, 2010
557

Figure 5. Top view on X-Y plane. Each
bushing and pillar has a concentric point.

Figure 6. Top view of Figure 2, which
modified to be REBs with the length
eccentrically, e = 2.828 mm.

Figure 7. The upper die displacements in both +X, X direction (solid line) and +Y, -Y
direction (dash line) on X-Y plane.

Figure 8. Top view of die set and hydraulic circuit with 2-rod-hydraulic cylinders used for apply
torque to REBs. Port A, B connected to each cylinders. Port P, T connected to pump and tank,
respectively.

G00059
March 23-26, 2010
558

Figure 9. A half of REBs test rig
model on X-Y plane.

Figure 10. The relationship between flexible
beam with fixed base, free end and equivalent
spring (Keq).

The schematic in Figure 11. used for analysis model-base controller, but for implementation
the base beam was fixed with the lower die and REBs system will be used instead of translational
joint. Therefore a relationship between applied force at the base beam and applied torque at the
REBs-tip-end beam must be derived as following. Refer to Figure 6, and focusing on REB 2 as
shown in Figure 12. Assuming that applied force at the base beam is . Fbase When
implementation with applied torque at REB. The Fbase assumed to be apply at the REB-tip
beam. When REBs 2, 4 rotate synchronously, the tip beam will be bent out of them. Therefore
each of tip beam will has a displacement .cos e e u mm. when e is an eccentric length. u is an
angular rotation of REB. Fkeq is an equivalent spring force, which occurring when tip beam bent
to be .cos e e u . Ft is a tangent force, which calculated to be applied , . Torque reb A is a center
of REB rotation. B is a center of REB-boring hole.

Figure 11. Equivalent free-body diagram
reference from Fig.3, which consisted of a
base mass (Mt), flexible beam, tip mass (Mt)
and equivalent spring (Keq).

Figure 12. Top view, free-body diagram of
REB 2.

G00059
March 23-26, 2010
559
The relationship between applied force at base beam and applied torque at REB-tip beam are as
following.
3
3
. cos (1)
(2)
cos
3
( . cos ) (3)
, .sin (4)
.sin (5)
, .
3
( . cos )(sin ) . (6)
cos
=
=
| |
=
|
\ .
=
A =
= A

| | | |
=
| |
\ . \ .
Ft Fbase
Fbase
Ft
EI
Fkeq e e
L
Force against Ft Fkeq
Ft Ft Fkeq
Torque reb Ft e
Fbase EI
e e e
L
u
u
u
u
u
u u
u

This research done by simulation and will do experiment in shop floor with a press machine
that without ram slide. Maximum ram force 500,000 N. Stroke length 500 mm. Ram rod
diameter 100 mm. Press table dimension 500 x 800 mm as shown in Figure 13. The misalignment
compensation may be using the pin-in-slot mechanism instead of servo hydraulic system. The die
set test rig model built up by ADAMS/View. Those dimensions of the lower die 500x500x100
mm and the upper die 500x500x50 mm with mass 85.25 kg. The square corner of 4 pillars
300x300 mm. The length of each pillar 300 mm, diameter 50 mm with mass 4.6 kg. The length of
ram rod 450 mm, diameter 100 mm with mass 27.57 kg. For massless off-center-loading spring
with stiffness coefficient 5,000 N/mm and damping coefficient 10 N.s/mm, the position for
installation is (-100,50) as shown in Figure 14.

Figure 13. Hydraulic press machine with die
set test rig model installation (KMUTT).

Figure 14. Point of (-100,50) for
installation the off-center loading spring
on this X-Y plane.
G00059
March 23-26, 2010
560
2.3 Flexible pillar die set manipulator behavior
The objective of this research is controller design for control the misalignments of the upper
die must be closest to zero along all stroke of pressing. For high speed positioning
applications, manipulator with lightweight flexible links possess many advantages over the
conventional rigid-link ones. The lightweight and highly flexible nature of these pillars die
set, however, leads to a challenging problem in end-point trajectory control. The difficulty
arises due to the flexibility distributed along the pillars where an improved control technique
is required to simultaneously track the desired trajectory and suppress the residual vibrations
in the upper die. The test rig model as shown in Figure 2. will be oscillation when excited by
a pulse as shows in figure 15.

Figure 15. Acceleration signal of the upper die in Y direction at level stroke Z is 0 mm after
applies a unit step function to both REBs 1 and 3, simultaneously.

2.4 Mathematical Modeling
As shown in Figure 11, flexible pillar die set manipulator on frame X-Y which is a fixed
inertia and local reference frame. The position of the base is denoted by d(t) and f(t) is the
control force applied to the base. Other system parameters are: L is the length of the link, EI is
the uniform flexural rigidity, Mb is the mass of the translational base, Mt is the concentrated
tip mass payload, y(x,t) is the elastic deflection measured from the un-deformed link, and
p(x,t) := d(t) + y(x,t) is the position of the flexible link [8-11].
The standard assumption in flexible pillar control is that the deflection y(x,t) is small
compared with the length of link. The total kinetic energy Ek of the system is given by,
2 2 2
0
1 1
( ) ( , ) ( , ) (7)
2 2 2
= + +

L
k b t
E M d t p x t dx M p L t
p

And the total potential energy Ep, is
2 2
0
1
[ ( , )] ( , ) (8)
2 2
'' = +
L
p eq
EI
E y x t dx K p L t
Where the dots and primes denote the derivatives with respect to time t, and space variable x,
respectively. Substituting (7) and (8) into the extended Hamiltons Principle,
0
[ ( ) ( )] 0 (9) + =
f
t
k p
t
E E f t d t dt o
And noting that the shear force at the base of the flexible link can be calculated by,
0
(0, ) [ ( ) ( , )] [ ( ) ( , )] [ ( ) ( , )] (10) ''' = + + + +

L
t eq
EIy t d t y x t dx M d t y L t K d t y L t p
Then arrive at the following dynamic equations of the system by Newtons 2
nd
Law and
Euler-Bernoullis beam equation,
( ) (0, ) ( ) (11) ''' + =

b
f t EIy t M d t
G00059
March 23-26, 2010
561
( , ) [ ( ) ( , )] (12) '''' = +
EIy x t d t y x t p
Equation (11) is the motion equation of the base, and (12) describes the vibration of the
flexible link. The corresponding boundary conditions are given by,
(0, ) 0, (0, ) ( ) 0 (13)
( , ) 0, ( , ) [ ( ) ( , )] [ ( ) ( , )] 0 (14)
' ''' = =
'' ''' = + + =

b
t eq
y t EIy t M d t
y L t EIy L t M d t y L t K d t y L t

Using the Method of Separating Variables [11] to solve (11)-(12) under conditions (13)-(14).
The solution is assumed to be of the form ( , ) ( ) ( ), = y x t x Q t o and becomes
(15)
''''
=

Q EI
Q
o
o p

From the corresponding boundary value problem which indicates nontrivial solution, arrived
at
4
( ) ( ) ( ) (16) '''' = x x
L
|
o o
Where
4 2
( ) (17) =
L EI
| p
e
The general solution to (16) is of the form
1 2 3 4
( ) cos cosh sin sinh (18) = + + +
x x x x
x C C C C
L L L L
| | | |
o

We obtain an infinite number solution to the boundary value problem [13]
cos
( ) cos cosh (19)
cosh

= +

i i i
i i
i
x x
x A
L L
| | |
o
|

With
i
| are the positive solutions of (19), and
i
A are nonzero constants given in [13]. The time
dependent function ( ) Q t is now governed by

Which indicates that ( ) Q t is harmonic with frequency . e For the infinite number of
i
| . we
have from (17), an infinite number of corresponding frequencies is
2
(21)
| |
=
|
\ .
i
i
EI
L
|
e
p

For this research defines the pillar (beam) length 300 mm and the length at finally down
stroke is 125 mm which measured from the fixed base of pillar which this position may has
the most off -center loading and misalignment, ram rod mass 13.8 kg, upper die mass 42.63
kg, REB mass 8.28 kg, pillar mass 4.6 kg. The first three modes of natural frequency are
36.599, 325.260, 2902.686 rad/sec, respectively. The symmetric stiffness and mass matrix
coefficients have the forms [11].
2
2
L
j
i
ij ji i j
2 2
0
L
ij ji i j
0
d (x)
d (x)
k k EI(x) dx k (L) (L), i, j 1, 2, ..., n (22)
dx dx
m m m(x) (x) (x)dx, i, j 1, 2, ..., n (23)
o
o
= = + o o =
= = o o =

The state space equation can be generated by the following form [11] also.
x Ax Bu and then y Cx Du (24) = + = +
When
2
( ) ( ) 0 (20) + =
Q t Q t e
G00059
March 23-26, 2010
562
1
0 I
A
M K 0
,
1
0
B
M
,
i n
i n
(L) (L) 0 0
C
(0) (0) 0 0
o o
=

o o

,
0
D
0

=

.
2.5 Loop-shaping controller design [13]
Loop shaping controller synthesis ( H
) is an optimal method synthesis. It computes a

stabilizing H
controller K for plant G to shape the sigma plot of the loop transfer function
GK to have desired loop shape
d
G with accuracy GAM = in the sense that if
0
e is the 0 db
crossover frequency of the sigma plot of
d
G (j ) e , then by roughly,
1 1
(G( j )K( j )) (G ( j )) for all and (G( j )K( j )) (G ( j )) for all (25)
d 0 d 0
o e e > o e e< e o e e s o e e > e

For this paper using Matlab toolbox [K,CL,GAM] = loopsyn(G,Gd). Input arguments G is
LTI plant, Gd is desired loop-shape (LTI Model). Output arguments K is LTI controller, CL=
G*K/(I+GK) is LTI closed-loop system and GAM is Loop-shaping accuracy (GAM> 1, with
GAM=1 being perfect fit. One drawback of this algorithm is that it typically yields high-
order controllers. This drawback can be partially obviated by reducing the order of the
controller by toolbox GRED = reduce(G,order) prior to implementation.

3. COMPUTATIONAL DETAILS [6,13]
In this section, the simulation technique of feed-back controlled system done by
Matlab/Simulink controller block diagrams which interfaced with MSC.ADAMS/Control by
Real-Time Workshop (RTW) as shown in Figure 24. The applied torques those rotate the
REBs system will be computed by this control system, based on the error between the
misalignment positions and the desired upper die positions. After those applied force signals
for the base pillar pass through a suitable tune up gain, then using compensation function
such as shown in eq.(6) for calculation applied torque at REBs system. After that the
ADAMS/View can show the motion of press stroke along Z axis and the misalignment of the
upper die along X and Y axis.

Figure 16. Matlab/Simulink controller block diagram.

The typically eight-reduced orders transfer function-controller and Matlab function [13] for
compensation from applied force at the base of pillar to be applied torque at the REBs system
can be applied from (6) and shows as the Embedded MATLAB Function in Figure 16.

The virtual simulation by MSC.ADAMS with Loop-shaping controller system can show
many important results. The press stroke of the upper die along Z axis starting from 0 to -100
mm height in 5 second by a step function in MSC.ADAMS/Function builder [6]. During the
press stroke with off-center loading spring and without controller system, the misalignment of
G00059
March 23-26, 2010
563
the upper die in X axis is 0.25 mm and Y axis is -0.7 mm as shown in Figure 17. After using
the misalignment compensation controller then the mismatch of the upper die decreasing
dramatically as shown in Figure 18. The maximum misalignment in X axis is only about
5.0e-4 mm and Y axis is about -1.55e-3 mm also. Although, this controller system designed
from the base of LTI system, moreover systems parameters along the press stroke are exactly
changing. The other results for this case study comprising applied torques and angular
measurements of REBs system can be shown in Figure 19 and 20, respectively. Therefore,
these satisfying results may lead to do real experiments in future research.

Figure 17. Misalignment of the upper die without controller.

Figure 18. Misalignment of the upper die with Loop-shaping controller.

Figure 19. Applied torques to REBs of X and Y axis.

Figure 20. Angular measurement of the REBs system both X and Y axes.
G00059
March 23-26, 2010
564
5. CONCLUSION
In this paper presents a new mechanism design for the misalignment compensation of 4-
pillar with a square-corner-sheet metal forming tool. A conventional pillar guidance system
with 4-concentric bushing was modified to be 4-Rotational Eccentric Bushing (REB). The
upper die can be controlled and translation in X and Y axis, separately. Those 4 pillars act as
flexible-cylindrical-cantilever beam. For more analysis this system, a dynamically
mathematical modeling established based on extended Hamiltons Principle and Euler-
Bernoullis beam equation. Then using an optimal loop-shaping controller and compensation
function for prismatic sliding joint at the root of flexible pillar to be rotational joint at REB
which carrying a payload such as upper die at the end of pillars. This robust controller is
designed for the uncertain parameters and uncertain dynamics as well as trajectory tracking
for avoid the misalignment of forming tool. The MSC.ADAMS/View and Matlab/Simulink
used as a tool for dynamic and controller simulation. These simulation results confirm the
simplicity and validity of the proposed method that is the upper die misalignment decreasing
dramatically and closed to zero position during press stroke.

REFERENCES
1. Hans-Wilfried Wagener, New developments in sheet metal forming: sheet materials, tools
and machinery, Journal of Materials Processing Technology, 72 (1997), pp. 342357.
2. H.W. Wagener and A.Wendenburg, Analysis system prerequisite for automation in metal-
forming technology, Journal of Materials Processing Technology, 116 (2001), pp. 5561.
3. H.W. Wagener and C.Schlott, Influence of Die Guidance Systems on the Angular
Deflection of Press Slide and Die Under Eccentric Loading, Journal of Mechanical
Working Technology,2ll (1989), pp. 463-475.
4. Santosha Kumar Dwivedy and Peter Eberhard, Review,Dynamic analysis of flexible
manipulators, a literature review, Mechanism and Machine Theory, 41 (2006) pp.
749777.
5. R. Neugebauer,B.Denkena and K.Wegener, Mechatronic Systems for Machine Tools,
Annals of the CIRP, Vol. 56/2/2007, pp. 657-686.
6. MSC.ADAMS2007 software, MSC.Software Corporation.
7. K. Lange (Ed.), Handbook of Metal Forming,McGraw-Hill, New York, 1985.
8. S.S. Ge, Energy-based Robust Control of Flexible Link Robots, Advanced Studies in
Flexible Robotic Manipulators:Modeling.Design,Conrrol& Application, World Scientific
bublished, 2003.
9. S.S. Ge, T.H. Lee and G. Zhu, Energy-based Robust Controller Design for Multi-link
Flexible Robots, Mechatronics, Vol.6, No.7, pp. 779- 798, 1996.
10. T.H. Lee, S.S. Ge and Z.P. Wang, Adaptive Robust Controller Design for Multilink
Flexible Robots, Mechatronics, Vol.11, No.8. pp. 951-967, 2001.
11. L. Meirovitch, Elements of Vibration Analysis, McGraw-Hill, Inc., 1975.
12. Poodchakarn S., Experimental Study on Sheet Metal Forming Tool for
Misalignment Compensation Using Pin-in-slot Mechanism,
Sustainable Development to Save the Earth Proceedings (SDSE2008), pp.342-
350, 2009
13. Matlab/Simulink R2006a, The Mathworks, Inc.

ACKNOWLEDGMENTS
The author wish to thank Prof. Dr. P. Venugopal, Professor, Materials Forming
Laboratory, Met & Matls. Engg., IIT Madras, India.Dilok Sriprapai, Dech
Budcharoentong, Saroj Saimek and Charn Thanadngarn are acknowledged for
numerous valuable discussions. And gratefully to thank production engineering
department,KMUTNB for supports a copy right software MSC.ADAMS.
G00062
March 23-26, 2010
565
Model Based Motor Vehicle Segmentation and Type
Classification Using Shape Based Background Subtraction

J. Chiverton
1
and S. Uttama
1,C
1
School of Information Technology, Mah Fah Luang University, 333, Moo 1, Ta-Sud, Muang, Chiang
Rai, 57100, Thailand
E-mail:johnc@mfu.ac.th

ABSTRACT
Motor vehicle segmentation and type classification are potentially important
components in automatic computer vision based traffic safety systems. Applications of
motor vehicle segmentation and type classification could include use in a larger
automated system that is aimed at improving traffic safety specifically for a particular
vehicle type, such as two wheeled vehicles or perhaps large vehicles such as lorries.
Alternatively vehicle type classification could be used solely in conjunction with
automatic traffic controls for particular types of vehicles, such as lane control for large
vehicles or even identifying illegal entry of restricted vehicle types on high speed
traffic networks.
Automatic computer vision based segmentation is used here to simultaneously
automatically isolate vehicles in video images and identify vehicle types. A prior
classification of vehicle types is used to identify the type of vehicle that has been
segmented using shape information. Background subtraction is also used for
segmentation in combination with the classification stage.
The background subtraction technique described here overcomes potential problems
with reflections and changing environmental conditions using shape information in the
form of signed distance functions. The background subtraction technique builds on-line
a model of the background and this is subtracted from the current frame. This is similar
to existing static camera based segmentation and tracking techniques.
Simultaneous to the background subtraction, the system also classifies vehicles into
vehicle types using the (on-going) segmented vehicle shape. This information is then
also simultaneously fed back into the background subtraction in the form of priors
learnt over the different categories of vehicle shapes. The combination of simultaneous
segmentation and type classification produces not only good type classification results
but also good object tracking results. Results are presented on video data obtained in a
closed traffic environment.

Keywords: Computer Vision, Traffic Safety, Background Subtraction, Shape Model,
Signed Distances.

1. INTRODUCTION
Traffic management and safety are very important topics in society. The study and application
of these topics can potentially help to improve the quality of life of individuals and
communities. Furthermore advanced technological solutions to problems in these fields of
study can improve existing knowledge and understanding and assist with difficult labour
intensive monitoring of traffic. This paper therefore describes a technique to provide
segmentation, tracking and identification of vehicle types, e.g. cars, motorbikes and trucks.
This information can help to improve knowledge about movements of types of vehicles.
Closed traffic environments such as educational establishments could also benefit from the
provision of accurate information about timing and quantity of vehicles. This information can
also be used to identify busy times which can then be further combined with location
G00062
March 23-26, 2010
566
information to manage traffic flow and appropriate parking controls. Therefore technological
innovations in the field of traffic management and safety could potentially help traffic safety
experts to manage appropriate traffic flow to particular areas of closed traffic environments.
In common with existing works such as [4] a static camera is assumed, so background
subtraction can be used. The background subtraction as described in [1] is used to identify
potential moving targets that might be valid traffic entities e.g. motorcycles and cars.
However the output from the background subtraction stage is typically noisy, with many false
positives and significant regions missing from the moving vehicles, see e.g. figure 1. The
source of such problems is often movement of trees, reflections, shadows and other
photometric effects that are difficult to model without prior knowledge.

Figure 1. Results from the background subtraction stage.
Frames 180 and 200 from the original sequence with corresponding frames from
the background subtraction stage.

Nevertheless when substantial movement is detected much of the false detections disappear.
Furthermore the false detections are usually much smaller than the objects of interest, they do
not possess a coherent motion and the their shape is usually quite different from anything of
interest, such as the shape of a car or motorbike. Therefore we include prior models of the
shapes of the vehicles of interest. This helps to identify valid targets and to reduce the
likelihood of attempting to track noise.Before comparison of shapes can occur it is necessary
to identify isolated regions. This can be done using connected component labelling which is a
commonly used technique and it can be computed very fast using techniques such as in [5].
Connected component labelling also then identifies the number of isolated regions. Perhaps
fortuitously, for the data sets tested and the implementation of the background subtraction
technique, this information can be used to identify when a dominant moving object is
detected. As already discussed techniques that identify noise by size and other factors could
also be used, but this was found to be unnecessary so far.
Next the shape of the isolated detected regions are compared with a set of prior shape
templates. Exemplar shape templates are illustrated in figure 2.

Figure 2. Prior shape template models.

These shape templates were derived manually from frames in a video sequence using an art
drawing program. Their profile remains valid through out the traffic sequences tested as the
camera is located with a right angle side view of the traffic scene. More advanced
configurations of templates would be necessary if a different perspective was selected, see e.g.
[4].
G00062
March 23-26, 2010
567
The matching between the shape models and the data was then performed using two different
shape similarity techniques: one relying on the Dice coefficient that measures the amount of
similarity between shapes; and another based on the signed distance function of the shape
template, calculated via the sample expectation of the pixel error of the signed distances.
Example signed distances can be seen in figure 3.

Figure 3. Signed distances of the prior shape template models shown in figure 2.

The signed distance is particularly useful as it provides distance information and this can be
used to align the shapes, see e.g. [1]. Alignment is necessary to calculate the similarity of two
shapes. However this alignment process using signed distance functions can be
computationally intensive and for the purposes of the present work unnecessary. Alignment
was instead performed simply by aligning the centers of masses of the models and the isolated
regions.
Once the similarities between the template shapes and the isolated regions in the image have
been calculated they are then used to calculate which shape model
i
M and which isolated
region
j
D are most likely to correspond. For the Dice coefficient: the model
i
M with the
highest Dice coefficient is assigned to correspond with isolated data region
j
D . For the
signed distance function error: the model
i
M with the lowest error is identified to
correspond with data region
j
D . A model can be associated with each isolated data region.
However data regions with overall low probabilities are excluded from further processing.
This helps to exclude sources of noise, such as movement of trees. This was implemented here
by selecting the model and data with the best match, however this assumes that there is only
one moving object of interest.
Once a suitable model template has been identified it can then be used to augment the isolated
region in the background subtraction result. The center of gravity is again calculated for each
isolated data region that has been assigned a model template. The model templates are then
aligned with the isolated data regions using the centers of gravities. This information then
provides the system with an improved result, where noise in the background subtraction is
removed and the effect of other distortions such as reflections in and around the moving
object are reduced. This can be seen in figure 4.

Figure 4. Background subtraction result (left) and background subtraction result
augmented with matching model template (right).
G00062
March 23-26, 2010
568
The system was implemented in C++ on an Intel dual core platform with 3GB of
memory. OpenCV was used for the background subtraction and development of the system
described here.

A comparison of the ranking of the log likelihood of the mean square errors on the
signed distance functions between isolated data regions for exemplar frames and some model
templates can be seen in the following table 1. Table 2 shows a similar ranking for the Dice
coefficient comparisons. These tables show ranking values where 1 is a a match and 8 is the
lowest match. 0 entries indicate the ground truth (i.e. model templates) do not correspond
with those data regions. There were two isolated data regions for each model template so each
row which corresponds to a different model template has two ranking values.
The results shown in tables 1 and 2 indicate that the signed distance function mean square
error is overall slightly more accurate at identifying matching shape templates in comparison
to isolated data regions. The signed distance function errors would likely be improved further
if a more accurate alignment technique were used that took advantage of the signed distances
rather than just using alignment of the centers of gravities.

Table 1. Ranking of isolated data regions in comparison to model templates
using negative log likelihood ranking of signed distance function errors.
Isolated Data Region
Model 1 2 3 4 5 6 7 8
1 6 1 0 0 0 0 0 0
2 0 0 3 2 0 0 0 0
3 0 0 0 0 1 4 0 0
4 0 0 0 0 0 0 2 1

Table 2. Ranking of isolated data regions in comparison to model templates
using negative log likelihood ranking of Dice coefficient overlap
measures.
Isolated Data Region
Model 1 2 3 4 5 6 7 8
1 5 1 0 0 0 0 0 0
2 0 0 2 1 0 0 0 0
3 0 0 0 0 4 5 0 0
4 0 0 0 0 0 0 2 1

The computation of the mean square error over the signed distance functions is however,
computationally intensive due to the necessity of calculating the signed distance for the model
templates and the isolated data regions multiple times for each frame. Therefore preliminary
results for a video sequence using the Dice coefficient matching can be seen in figures 5 to 7.

G00062
March 23-26, 2010
569

Figure 5. Results obtained for a truck sequence of frames.

Figure 6. Results obtained for a motorbike vehicle sequence.

G00062
March 23-26, 2010
570

Figure 7. Result obtained for another motorbike vehicle sequence.

The results shown in figures 5 to 7 indicate overall good performance. However there are
problems associated with the correspondence between the model templates and isolated data
regions. This is particularly true when a vehicle is entering the scene as the shape matching
does not currently take this into account. Furthermore the results in figure 7 indicate some
erratic estimation of the location of the motorbike. This will be corrected in future with the
use of a more advanced tracking approach, such as a Kalman filter and least squares matching
of models to data regions. This will further enable tracking of more than one object per frame.

5. CONCLUSION
A technique that augments moving vehicles with type and shape classification has
been presented that will, in future be developed further to enable accurate estimation of
vehicle counts passing a camera. The system at present is able to differentiate between
motorcycles and larger vehicles. More accurate tracking and estimation techniques will be
developed in future for incorporation into this system.

REFERENCES
1. J. Chiverton, M. Mirmehdi and X. Xie, In Proc. 20
th
British Machine Vision Conference,
BMVA Press, London, 2009.
2. P. KaewTraKulPong and R. Bowden, In Proc. IAPR 2
nd
European Workshop on
Advanced Video Based Surveillance Systems (AVBS01), Kluwer Academic Publishers,
2001.
3. C-C. Chiu, M-Y. Ku and H-T. Chen, In 8
th
Int. Workshop on Image Analysis for
Multimedia Interactive Services (WIAMIS'07), IEEE, 2007.
4. X. Song and R. Nevatia, Proc. 10
th
Int. Conf. Computer Vision (ICCV'05), IEEE 2005.
5. N. Ranganathan, R. Mehrotra and S. Subramanian, IEEE Transactions on Systems, Man
and Cybernetics, 1995, 25(3), 415-423.

ACKNOWLEDGMENTS
The authors gratefully acknowledge Mitsui Sumitomo Insurance Welfare Foundation
(MSIWF) for providing a grant on traffic safety.
G00063
March 23-26, 2010
571
Histogram Specification for Variable Illumination
Correction of Face Images

H. Chiverton
1
and J. Chiverton
2,C

1
School of Science, Mah Fah Luan, 333, Moo1, Thasud, Muang, Chiang Rai, 57100, Thailand
2
School of Information Technology, Mah Fah Luan, 333, Moo1, Thasud, Muang, Chiang Rai, 57100,
C
E-mail: hpmeow160@gmail.com

ABSTRACT
Photographs of the human face usually have variable illumination due to the rich three
dimensional structure of the human face. This paper therefore describes a method to
improve the quality of these types of images.
The algorithm divides an image into regions to calculate histograms that are then
compared with corresponding regional histograms of a well lit ground truth image. This
comparison then provides parameters to transform the test image to a new image with
relatively good lighting. The algorithm assumes that the face location has been
identified and segmented a priori. This enables the algorithm to concentrate on varying
the lighting transformations of different regions of the face in the transformed image.
The result of the algorithm is to retain the natural relative brightness expected of the
human face. This is important for a number of reasons: (a) to prevent overly
brightening some regions of the face; and (b) thus reducing the effect of distortions that
might otherwise result in transforming a region that is likely to have very little natural
intensity information under normal lighting conditions. Potential applications include
face recognition and tracking systems.
The algorithm uses histogram specification which is a technique that is able to
transform the histogram for a set of pixels (a face region) to a specified shape. For a
dark region of a test image, the histogram specification obtained from the ground truth
image will often be relatively brighter and therefore transform the test image pixel
values to a higher range of pixel intensities, thus increasing the relative brightness for
that region of the test image. An affine transform is used to ensure the regions of the
face of the test image match the regions of the face in the ground truth image.

Keywords: Histogram Specification, Variable Illumination, Face Images, Image
Processing

1. INTRODUCTION
Still Images from camera and images from CCD camera have become very common and
give lots of benefits in many areas, such as for identifying and investigating human face in
criminal cases. However, if the camera or CCD camera is set up in the dark area, then images
have poor lighting so that it becomes a difficult task to recognize the person in that image.
Varying illumination also occurs because of the rich three dimensional structure of the human
face. To solve this problem, the histogram specification for correcting variable illumination of
face images is offered.

Histogram Specification, adapted from Histogram Equalization, is the application of image
processing using a specified histogram [1]. This application is usually used for global images.
The concept is to transform the histogram of an original image to a specified new histogram
G00063
March 23-26, 2010
572
which can provide a new image with improved or corrected properties. For example, the
information of a face without lighting is usually lost. It is difficult to recover the detail within
the face area. This can be a difficult task even by using histogram equalization. The
Automatic Pixel Boosting Method [2] is able to improve the quality of the dark image
automatically instead of using the tool called curves in Photoshop.

In contrast to the work in [2] the work here describes a supervised approach to illumination
correction of face images. A model of the face is used, obtained by smoothing a well lit
ground truth image somewhat similar to the work in [3]. The information provided by the
model face provides appropriate details about the location and amount of correction to apply.
The technique is fully automatic and generalizes relatively well as will be seen by the results
in the later sections.

The principle of the technique is histogram specification. Each pixel has a histogram specified
from a small local window obtained from a smoothed ground truth image of a human face.
First the topic of histogram specification is described.

Let ( )
r
p r be the continuous probability density function for input image and ( )
z
p z be the
specified continuous probability density function for output image. The Cumulative
Distribution Function (CDF) of the original dark image which can be used immediately for
histogram equalization is given by:

0
( ) ( )
r
r
s T r p w dw
, (1)
where s is a random variable. The CDF and inverse CDF for the specified continuous
probability density function ( )
z
p z is:

1 1
( ) ( ) ( ( )) ( ) G z T r and z G T r G s

, (2)
where z is a random variable. The CDF of the dark image can be approximated by the
normalized histogram,

0 0
( ) ( ) , 0,1, 2,..., 1.
k k
j
k k r j
j j
n
s T r p r k L
n

(3)
Histogram specification then finds the value of a new corrected image intensity by calculating
the CDF value for a given pixel intensity, then transforming the pixel value to a new pixel
value by calculating the inverse CDF for the specified histogram. These steps are given by,
0
1 1
( ) ( ) , 0,1,..., 1.
( ( )) ( ), 0,1,..., 1.
k
k k z i k
i
k k k
v G z p z s k L
z G T r G s k L

(4)
The new intensity
k
z is then used for finding the new intensity for the output image.
By adjusting the concept of Histogram Specification to small regions of the dark face area,
this can help to improve the quality of the poorly lit face image by adjusting the brightness of
the face. More adjustment is usually required for regions with poor lighting. Also regions of
the face image that do not usually have high intensities can also be corrected, but only by
small amounts. The face image is therefore divided up into small windows. The image is then
corrected using histogram specification for each window using specified histograms
calculated from a ground truth face image that is smoothed to remove high detail that should
not be projected onto the corrected face image.

Assume that each small region in dark image has Gaussian distribution, this will help the
speed of this algorithm faster. The normal pdf, also called the Gaussian function, is given:
G00063
March 23-26, 2010
573

2
2
( )
2
2
1
( | , )
2
z
g z e
(5)
The integral of this function will be a CDF of a Gaussian. Unfortunately there is not an
analytical solution to this integral, but the integral of a Gaussian is more commonly known as
the error function or, erf(z), so that:
2
0
2
( ) erf( )
z
t
G z z e dt

(6)
This function is available in many computer mathematic libraries such as in matlab or C++. It
can therefore be used as part of the histogram specification process. The inverse of the CDF,
as described earlier is also necessary for histogram specification. Furthermore the inverse of
the Gaussian error function erf
-1
(
k
s ) is also a commonly required function and computational
implementations are also widely available, so that:
1 1
( ) erf ( )
k k k
z G s s

(7)

This will later be for finding the new intensity
k
z as described in (4).

The ground truth of one person on the leftmost as shown below is used as a prior learning.
Next, the eight original images with different illumination are shown and will be used as input
images.

Figure 1. Ground truth image (left most) and original dark images.

Procedure

1) By using smooth tool to the groundtruth, the noise and detailed person specific
information will be removed.
2) Start from the top left corner of the groundtruth, the small region is taken by a
window size 7x7. This small region is assumed to have a Gaussian distribution.
3) Find the mean and standard deviation of the original dark images in the 7x7 window.
4) Calculate the CDF for the dark image s=T(r) using the center of the window pixel
value (r) in the window assuming a Gaussian distribution and the mean and standard
deviation.
5) Find the mean and standard deviation for the ground truth in the window.
6) Find the average of the dark image and ground truth means and standard deviations.
7) Calculate the inverse CDF for specified histogram z=G
-1
(s) assuming a Gaussian
distribution.
8) Set the new intensity output using the value of z.
9) Move the small regions of the groundtruth, dark image and new image to the right,
continue Steps 2-9.
10) Sharpen the resulting image with an unsharp filter.
G00063
March 23-26, 2010
574

The results after using the proposed procedure above are shown below.

Figure 2. Results after using Histogram Specification.

The chosen dark images are extremely poor lighting which makes our task even more
challenging. Three of those dark images are totally dark and lost lots of information from the
face. Some images are dark in only one-side of the face. The results after histogram are
shown that histogram specification has improved the quality of the light throughout the whole
face. By assuming that each small region of the window across the whole face image has a
(different)
Gaussian distribution so the procedure does not need to find the inverse function of the
cumulative distribution function many times. Therefore, the procedure reduces the cost of
calculation and also simple.

5. CONCLUSION
The advantage of this application is that the only one groundtruth of one person can be
used for any person in the dark image. Furthermore, the procedure is fast, robust, and simple.

REFERENCES
1. R.C. Gonzalez and R.E. Woods, Digital Image Processing 2
nd
Edition, Prentice Hall,
2002.
2. H. Poncharoensil S. Pansang, In 23
rd
International Technical Conference on Circuits/
Systems, Computers and Communications (ITC-CSCC 2008), IEICE, Yamaguchi, 2008,
33-36.
3. B. Boom, L. Spreeuwers and R. Veldhuis, In Computer Analysis of Images and
Patterns, LNCS 5702, Springer', Berlin, 2009, 33-40.

ACKNOWLEDGMENTS
This work is funded by Mae Fah Luang University. Yales Database B is used for testing this
proposed application for this paper.

G00066
March 23-26, 2010
575
Data Hiding and Security for Printed Documents

Pratarn Chotipanbandit
1, 2, C
, Sartid Vongpradhip
1, 2, C

1
System Engineering Laboratory at Chulalongkorn University, Bangkok, Thailand 10330
2
Department of Computer Engineering Faculty of Engineering, Chulalongkorn University, Bangkok,
Thailand 10330
C
E-mail: pratarn_k@hotmail.com, vsartid@chula.ac.th Tel. 02-2186985

ABSTRACT
Nowadays, due to piracy information of identity card, this affects to the increasing of
security. Disadvantage of piracy information of identity card such as business
transactions, assail with telephone and etc. This paper proposes a method for data
hiding by applying the Data Glyphs technique. Data Glyphs are a robust and
unobtrusive method for embedding computer-readable data on surfaces such as paper,
labels, plastic, glass, or metal. It is low-cost, robust, and easy to apply. The procedure
of Data Glyphs is separated into two parts, one is encoding data and the other is
decoding data. The part of encoding is as follows. Firstly, encode the message (desired
data) into the ASCII code. Next, transform encoded message into a Data Glyphs that is
represented as a 2D barcode. Finally, prepare the Data Glyphs image and then print on
the document. The part of decoding is as follows. Firstly, read the Data Glyphs image
by using retrieving equipment such as scanner. The image file will be transformed into
binary files which consist of 0 and 1. After that, send the binary image file to a
preprocessing process for improving the quality. The preprocessing process uses a
thresholding method for making the foreground image differs from the background.
Finally, send to a noise reduction process and then check of the pattern. Experimental
results show the proposed method performs well in security and have to put private
information into document using Data Glyphs technique to protect pirate accessing
and difficult to guess with eye vision and safety for piracy information, easy for
management and high accuracy for retrieving data. This is implied that Data Glyphs
technique is a good choice for private information protection.

Keywords: Data Hiding, Data Glyphs Technique, Information Security.

1. INTRODUCTION
Nowadays, the rapid growth of information technology and communication makes
accessing information much easier than before and causes more information needed to be
protected from piracy at the same time. Hence, due to piracy information of identity card, this
affects to the increasing of security. Disadvantage of piracy information of identity card such
as business transactions, assail with telephone and etc. From graveness of banditry data that
add more security and attendant data. Currently, person invent more security and attendant
data such as used the outer boundary of a character to embed data [5], Tiny barely visible dots
are used both to recover synchronism and to carry the information [2], security system for
accessing to data importance, encoding data, tone color represent data for hiding [7], and
examination for adjusting media. It provides computer for management to good performance
but increase security and attendant data that uses more investment such as Smart card, Radio
Frequency Identification (RFID)
The other problem that researcher considers as important for increase security and
attendant data. Researcher has ideology for increasing security and attendant data is represent
technique data hiding on card that the proposed scheme is to transform Data Glyphs to
G00066
March 23-26, 2010
576
barcode for protecting authentication to access data and abstruse with eye vision and safety
for piracy information, easy for management and high accuracy for retrieving data.

Image Processing
Image processing is computer imaging where the application involves a human being in the
visual loop. In other words, the images are to be examined and acted upon by people. For
these types of applications these require some understanding of how the human visual system
operates. The major topics within the field of image processing include image restoration,
image enhancement, and image compression. As was previously mentioned, image analysis is
used in the development of image processing algorithms. In order to restore, enhance, or
compress digital image in a meaningful way these need to examine the images and understand
how the raw image data relates to human visual perception. Image restoration is the process of
taking an image with some known, or estimated, degradation, and restoring it to its original
appearance. Image enhancement is improving an image visually. Image compression is
reducing the amount of data needed to represent an image. Image analysis is the examination
of image data to solve a computer imaging problem. Image segmentation used to find higher
level objects from raw image data. Feature extraction is acquiring higher-level information,
such as shape or color of objects. Image transforms is may be used in feature extraction to
find spatial frequency information. Pattern classification used for identifying objects in an
image.

Preprocessing
The preprocessing algorithms, techniques and operators are used to perform initial processing
that makes the primary data reduction and analysis task easier. They include operations
related to extracting regions of interest, performing basic mathematical operations on images,
simple enhancement of specific image features, and data reduction in both resolution and
brightness. Preprocessing is a stage where the requirements are typically obvious and simple,
such as the removal of artifacts from images, or the elimination of image information that is
not required for the application. For example, in one application that needed to eliminate
borders from the images that resulted from taking the pictures by looking out a window; in
another that had to mask out rulers that were present in skin tumor slides. Another example of
a preprocessing step involves a robotic gripper that needs to pick and place an object; for this,
that reduce a gray level image to a binary (two-valued) image, which contains all the
information necessary to discern the object is outline.

Data Glyphs
The Data Glyphs, type of 2-D matrix barcode was developed at Xeroxs Palo Alto Research
Center [4]. It can encode more data than a 1-D barcode and has an extremely high data
density for a 2-D barcode. It provides supporting technology for reliable exchange and
transportation of data and controls. Data Glyphs codes represent new method for people,
documents, and machines to communicate. It represents binary data on Image. By printing at
higher resolutions, the grid, which is known as a glyph block, we encode messages into
thousands of tiny. Elements of individual glyph: Each glyph or symbol comprises a Data
Glyphs code. In this paper, increase symbol is shown in the Table 1.

Table 1. Data Glyphs and Decimal Number System Map

Symbol 0 1 2 3 4 5 6 7 8 9
Decimal
Number System

G00066
March 23-26, 2010
577
Encoding
This part of encoding presents image scheme. The proposed scheme is to transform Data
Glyphs to 2D barcode. Our image scheme is divided into two steps. The process of
watermarking starts as follows.
Step 1: Encode the information into the ASCII code for example show in Table 2:

Table 2. Encode original information into ASCII code.

Original
information

1234567890123^Mr.^Pratarn^Chotipanbandit^26092524^1/92^Soi.Ladprao
138^Klong-chan^Bangkapi^Bangkok^03102551^25092558^^

ASCII code

049 050 051 052 053 054 055 056 057 048 049 050 051 094 185 210 194
094 187 195 208 183 210 185 094 226 170 181 212 190 209 185 184 216
236 186 209 179 177 212 181 194 236 094 050 054 048 057 050 053 050
052 094 049 047 057 050 094 171 046 197 210 180 190 195 233 210 199
049 051 056 094 164 197 205 167 168 209 232 185 094 186 210 167 161
208 187 212 094 161 195 216 167 224 183 190 193 203 210 185 164 195
094 048 051 049 048 050 053 053 049 094 050 053 048 057 050 053 053
056 094 094

ASCII code
concatenate

0490500510520530540550560570480490500510941852101940941871952
0818321018509422617018121219020918518421623618620917917721218
1194236094050054048057050053050052094049047057050094171046197
2101801901952332101990490510560941641972051671682092321850941
8621016716120818721209416119521616722418319019320321018516419
5094048051049048050053053049094050053048057050053053056094094

Step 2: Transform the ASCII Code into Data Glyphs.
The embedded Data Characters pattern is rendered as JPEG. Next generate Data Glyphs of
the following dimensions 828 x 37. In this Data Glyphs, the size of each symbol by 7 x 7
pixels. Then, generate binary logo of the Data Glyphs image. Finally, prepare the Data Glyphs
image and then print on the document into two part, one is on top, two is on bottom for high
accuracy retrieving data and shown in the figure 1.

Figure 1. Print Data Glyphs image on document

G00066
March 23-26, 2010
578
Decoding
The part of decoding is as follows. Firstly, read the Data Glyphs image by using retrieving
equipment by scanner. The image file will be transformed into binary files which consist of 0
and 1. After that, the process of detection involves an analysis to find the relevant positions
and symbols, and it can be divided into 4 steps: preliminary processing Data Glyphs image,
figuring out symbolic data, analyzing pattern of symbol, and decoding Data Glyphs image.
Step 1: Preliminary processing Data Glyphs image. The Data Glyphs image obtained from
the previous process may contain some geometrical distortions and noise, which have to be
removed. Gaussian filter algorithm [3] is used in this work.
Step 2: Figuring out the blocks of the symbolic data. In order to cut out the blocks of the
Data Glyphs, the techniques of horizontal projection and vertical projection are used by
mainly detecting the appearance of the Data Glyphs image against the background color [3] is
shown in the figure 2.

Figure 2. Detecting Data Glyphs image on document

Step 3: Analyzing the patterns of the symbols. Symbol pattern detection and analysis can
be done by comparing the results of horizontal and vertical projections against the block units.
Pattern analysis is based on vector theory; the pixels are separated by the Euclidean
distance. The unknown symbol pattern has to be compared with the group of symbol
patterns. The unknown pattern is defined by K; this parameter will be re-defined when
Euclidean distance value of Pattern K and the pattern group is the least.

N
i
k
i
k
i i
f f
k WED
1
2
2
) (

i
f represents the unknown symbol pattern at the position i
th
,
k
i
f , and
k
i
represents the
symbol and the value of standard mismatch of the K symbol at i
th
position. N is the sum of
symbol pattern units. When the block units have been detected, then it will yield the right
representative value. This value must be distinctive regarding the entire block units. The value
closest to the right value is then chosen, because it must be the right candidate by each of
apart of documents. Now, we get the right candidate, which was matched with Pattern Block
Unit.

Step 4: Decoding information. We obtain the information numbers by analyzing the
patterns of the symbols. The process of decoding information is as follows:

G00066
March 23-26, 2010
579
Table 3. Decode ASCII Code into original information.

ASCII code

0490500510520530540550560570480490500510941852101940941871952
0818321018509422617018121219020918518421623618620917917721218
1194236094050054048057050053050052094049047057050094171046197
2101801901952332101990490510560941641972051671682092321850941
8621016716120818721209416119521616722418319019320321018516419
5094048051049048050053053049094050053048057050053053056094094

ASCII code
split

049 050 051 052 053 054 055 056 057 048 049 050 051 094 185 210 194
094 187 195 208 183 210 185 094 226 170 181 212 190 209 185 184 216
236 186 209 179 177 212 181 194 236 094 050 054 048 057 050 053 050
052 094 049 047 057 050 094 171 046 197 210 180 190 195 233 210 199
049 051 056 094 164 197 205 167 168 209 232 185 094 186 210 167 161
208 187 212 094 161 195 216 167 224 183 190 193 203 210 185 164 195
094 048 051 049 048 050 053 053 049 094 050 053 048 057 050 053 053
056 094 094

Original
information

1234567890123^Mr.^Pratarn^Chotipanbandit^26092524^1/92^Soi.Ladprao
138^Klong-chan^Bangkapi^Bangkok^03102551^25092558^^

- Prepare the information numbers and the analysis patterns of the symbols to be
decoded.
- Get the information numbers decrypted by ASCII Code, and the result from the
decryption is the information data.
The output as the original information is obtained.

4. RESULTS
The result of Encoding have to put private information into document to protect pirate
accessing and it is difficult to guess with eye vision, safety for piracy information and result
of Decoding have been shown that Data Glyphs was detected correctly by providing the size
of Data Glyphs dimension 828 x 37. The size of each symbol is by 7 x7 pixels.

5. CONCLUSION
In this paper that proposed a new method for data hiding by applying the Data Glyphs
technique. Results show the proposed method performing well in security. These have to put
private information into document to protect pirate accessing, make it difficult to guess
with eye vision, safety for piracy information, easy for management and high accuracy for
retrieving data. This is implied that Data Glyphs technique is a good choice for private
information protection.

G00066
March 23-26, 2010
580
REFERENCES
1. Bender, W., Gruhl, D. and Morimoto, N. Techniques for data hiding. Proceeding of SPIE.
2420 (February 1995): 40-48.
2. Hae Yong Kim and Joceli Mayer, Data Hiding for Binary Documents Robust to Print-
Scan, Photocopy and Geometric Distortions. Proceedings of the XX Brazilian
Symposium on Computer Graphics and Image Processing. 2007, 105-112.
3. L. G. Shapiro and G. C. Stockman, Computer Vision, Prentice-Hall, Inc., U.S.A., 2001.
4. Palo Alto Research Center. Data Glyphs: Embedding Digital Data. Available source:
http://www.Data Glyphs.com.
5. Q. Mei, E. K. Wong, and N. Memon, Data hiding in binary text documents, in SPIE Proc
Security and Watermarking of Multimedia Contents III, San Jose, CA., Jan 2001, vol. 2
6. R. C. Gonzalez, and R.E. Woods, Digital Image Processing, U.S.A.: Prentice-Hall, Inc.,
2002.
7. R. Villn, Sviatoslav Voloshynovskiy, Oleksiy Koval, J.E. Vila-Forcn, Emre Topak,
Frdric Deguillaume, Y. Rytsar and Thierry Pun, Text Data-Hiding for Digital and
Printed Documents: Theoretical and Practical Considerations, In Proceedings of SPIE-
IS&T Electronic Imaging 2006, Security, Steganography and Watermarking of
Multimedia Contents VIII, San Jose, USA, January 15-19 2006.
8. S cott E Umbaugh, Computer Imaging Digital Image Analysis and Processing, 2005,
CRC Press, Computer vision, 659 pages, ISBN 0849329191.
G00068
March 23-26, 2010
581
The Geometry and Electronic Structures of Functionalized
Single-Walled Carbon Nanotubes by Carboxyl Groups on
Perfects and Defect Tubes

Shongpun Lokavee
1
, Anurak Udomvech
2
, and Teerakiat Kerdcharoen
1,C
1
Material Science and Engineering Programme, Faculty of science, Mahidol University,
2
Department of Physics, Faculty of Science, Thaksin University, Songkhla, 90000, Thailand

*
NANOTEC Center of Excellence at Mahidol University, National Nanotechnology Center, Thailand
C
E-mail: teerakiat@yahoo.com; Fax: 02-2015843; Tel. 02-2015842

ABSTRACT
Adaptive geometries and electronic structures of functionalized single-walled carbon
nanotubes (SWNTs) have been investigated by first principles density functional theory
(DFT) methods and molecular dynamic simulations. In this study, we have investigated
electronic and structural properties of (9, 0) SWNTs having Stone-Wales (SW) defects
that included two covalently functionalized carboxylic (2-COOH) groups on different
sidewall positions, i.e., at the middle and at the end of nanotubes. The effects of
functionalization on various locations were compared with the pristine tubes. Both
covalent bonds attached to the sidewall of perfect and defected nanotubes were studied
by DFT method. Properties of all functionalized SWMTs were investigated at the
B3LYP/3-21G* level of theory. It was predicted that energy gap decreases upon
making the stone-wales defects. However, functionalized Stone-Wales defect increases
energy gap. For molecular dynamic (MD) simulation, the interactions between
carboxyl groups and water molecules will be discussed in details.

Keywords: Functionalized carbon nanotube, Electronic structure, Molecular dynamics,
Stone-Wales (SW) defect.

1. INTRODUCTION
Single-walled carbon nanotubes (SWNT) have become a new topic that has drawn much
interest from both scientists and engineers since carbon nanotube was first discovered in 1991
by Iijima [1]. The SWNTs can be either metallic or semiconducting depending on difference
of the chiral vector. Because of their remarkable properties, they are prospective candidates to
create future nanoelectronics devices, circuits and computers. However, the practical
applications of these materials have been limited by many difficulties such as poor solubility
and weak adhesion with other materials. For instance, the pristine carbon nanotube is unable
to dissolve in water [2].
In reality, it is very difficult to synthesize perfect carbon nanotubes without defects.
Several functionalized SWNTs have been demonstrated by attachment of chemical groups
onto the nanotube through covalent bonding (see Fig. 1) [3-5]. Therefore, we can generate
defects on the electrical conductance of nanotube, which can also create a defect via bond
rotation of the C-C bonds by /2 from two neighboring sites for forming two heptagons and
two pentagons in location are called as Stone-Wales defect [6]. It is a crystallographic
defect that rotates four adjacent hexagons into two pentagons and two heptagons (5-7-7-5
defect) as shown in Fig. 2.

G00068
March 23-26, 2010
582

Figure 1. Structural configurations of functionalized sidewall pristine nanotube are designed
as (a) SWNT/2-COOH-I; (b) SWNT/2-COOH-II; (c) SWNT/2-COOH-III; and (d) SWNT/2-
COOH-IV

Figure 2. The structural patterns of a Stone-Wales defect and functionalizaion aspects of
COOH group in different positions; in which, (a) SW-defect (9, 0)-I and (b) SW-defect (9,
0)-II (c) SW-defect/2-COOH (9, 0)-I and (d) SW-defect/2-COOH (9, 0)-II

In the present work, we have investigated the covalent bonds attached to the sidewall of
pristine and defected (9, 0) SWCNT. The chemical structure results in sp
3
hybridization on
the surface of tubes. Typically, the electronic and structural properties of pristine tubes and
SW-defects CNT functionalized by two carboxylic groups (2-COOH) on the sidewall have
been studied by using first principles calculations [7, 8]. Simulation of functionalized CNT in
water was investigated by using molecular dynamics (MD) simulations. As a result, we have
shown how the sidewall functionalization with the carboxylic group affected properties of
pristine and SW-defects of carbon nanotubes.

In this study, three unit cell of a (9, 0) open-end single-walled carbon nanotubes
terminated with hydrogen atom, having the length and width of tubes about 11.38 and
7.00 , respectively. We have first investigated the effects of only one functional group, i.e.,
two carboxylic groups (2-COOH) at the different positions of pristine tubes and SW-defect
nanotubes which were fully optimized based on the density functional theory B3LYP using 3-
G00068
March 23-26, 2010
583
21G* basis set (denoted by B3LYP/3-21G*). The electronic structures such as the highest
occupied molecular orbital (HOMO), the lowest unoccupied molecular orbital (LOMO) and
the energy gap were calculated by using density functional theory within the GAUSSIAN03
program.
For subsequent calculations, the geometrical optimization of pristine tube used the density
functional theory based on the B3LYP/3-21G* and the B3LYP/6-31G*, consecutively while
SW-defect was calculated by the B3LYP/3-21G* (Opt). This proceeding is to investigate the
effect of 2-COOH onto various locations at sidewall of pristine (Fig. 1) and SW-defect
nanotubes (Fig. 2). On sidewall of pristine tube, Fig. 1 defined several sites such as SWNT/2-
COOH-I, SWNT/2-COOH-II, SWNT/2-COOH-III, and SWNT/2-COOH-IV. The
functionalized locations of SW-defected tube were followed in Fig. 2(a)-(d).
The effects of two functionalized groups (2-COOH) at the different positions on the
sidewall of pristine and SW-defected SWNTs were calculated by using the single point
calculation with B3LYP/6-31G* method followed using the classical MD method. The
classical MD simulation is employed for dynamics simulation. Models of the functionalized
SWNT/2-COOH and SW/2-COOH-defect tubes are calculated with 1680 molecules of water
system at room temperature (300 K). The water simulations were investigated by SPC model.
Molecular structures were represented by flexible models based on OPLS force field. All
systems were simulated in a cubic periodic box with 59.319 nm
3
dimensional size.

In this section, we have compared the electronic properties of pristine SWNTs, SWNT/2-
COOH (I-III), SW-defect (9, 0) (I-II) and SW-defect/2COOH (9, 0) (I-II). In the results,
geometrical optimizations lead to estimation of the HOMO-LUMO energy gap in comparison
with the geometrical optimization using density functional theory at B3LYP/3-21G* method
was show in Table 1.

Table 1. Comparison of some electronic properties of carboxylic groups in
different positions of the perfect and defective tubes investigated by B3LYP/3-
21G*(Opt) method.
Perfect and defective tubes of SWNT with
2-COOH groups by B3LYP/3-21G
*
(Opt)
HOMO (eV) LUMO (eV) E-gap (eV)
Pristine SWNT (9,0) -3.774 -3.516 0.258
SWNT/2-COOH I -3.904 -3.691 0.213
SWNT/2-COOH II -4.007 -3.784 0.223
SWNT/2-COOH III -4.033 -3.850 0.183
SW-defect (9,0) I -3.759 -3.526 0.233
SW-defect (9,0) II -3.771 -3.572 0.200
SW-defect/2-COOH (9,0) I -4.188 -3.930 0.258
SW-defect/2-COOH (9,0) II -3.862 -3.625 0.237
G00068
March 23-26, 2010
584
Figure 3. The snapshots of the simulations of the (a) pristine SWNT, (b) SWNT/2-COOH-I,
(c) SWNT/2-COOH-III and the structural patterns for SW-defect of (d) SW-defect (9, 0)-I,
(e) SW-defect (9, 0)-II and (f) SW-defect/2-COOH (9, 0)-I

It was predicted by this level of theory that the energy gap decreases upon
functionalization of perfect nanotubes. Therefore, after sidewall functionalization with 2-
COOH in different positions, electrical conductivity will be increased from 0.258 of Pristine
SWNTs to 0.213 and 0.223 eV of SWNT/2-COOH (II and III), respectively.
Functionalization on the sidewall of nanotube will increase the electrical conductivity of the
nanotube structures. According to Table 1, the Stone-Wale defects show low value of energy
gap. Consequently, both SW-defect tubes (SW-defect (9, 0)I and SW-defect (9, 0)II) of
calculation predict an increasing electrical conductivity. Finally, the SW-defect functionalized
by carboxylic groups (SW-defect/2-COOH (9, 0)I and SW-defect/2-COOH (9, 0)II) (see
Fig. 2(c) and (2d)) obtained in both positions were increased energy gap. This leads to an
electrical conductivity will be decreased.

Table 2. Total energy of perfect and defective tubes with functionalized carboxylic
systems has been studied using density functional theory at B3LYP/3-21G*(Opt) method.
Perfect
nanotubes
SWNT
(Pristine tubes)
SWNT/2COOH
- I
SWNT/2COOH - II SWNT/2COOH
- III

Total energy
Average
(kcal/mol)
Average
(kcal/mol)
Average
(kcal/mol)
Average
(kcal/mol)
-94631.97 -103303.56 -103307.14 -103308.66
Defective
nanotubes
SW-defect
(9, 0) - I

SW-defect
(9,0) II
SW-defect/2-
COOH (9,0) I
SW-defect/2-
COOH (9,0) II

Total energy
Average
(kcal/mol)
Average
(kcal/mol)
Average
( kcal/mol)
Average
( kcal/mol)
-94629.71 -94630.18 -103294.70 -103305.00

The total energy of all models have been studied using the geometrical optimization with
density functional theory at B3LYP/3-21G* method, while the total energy of the
functionalized tubes is higher than the pristine tube and SW-defect tube. However, the
average total energy of all systems is quite similar as shown in Table 2.
G00068
March 23-26, 2010
585
In MD simulation, the hydration shell structures of perfect and defective tubes can be
extracted from the simulation in terms of the radial distribution function (RDF). The RDF plot
for the center of mass (CoM) of water to CoM of each model tube as shown in Fig. 4, does
not yield any sharp peak. In this case, there is only one hydration shell around all models of
CNTs nanotubes. Nevertheless, the interaction between those tubes and water molecules
refers to the hydration structure is rather weak. Fig. 4 indicates that the water molecules
surrounding the functionalized tube (SWNT/2-COOH-I) more than that surrounding the
pristine tube.

Figure 4. Radial distribution functions (RDF) obtained from the center of mass
(CoM) of pristine SWNT to CoM of Hydrogen (HW) and Oxygen (OW)
of water molecules and from CoM of SWNT/2-COOH-I tube to CoM of
water molecules.

Fig. 5a and 5b show the RDF plots for CoM of pristine SWNT and SW-defective tubes to
oxygen (OW) and hydrogen (HW) of water molecules, respectively. It is not clear to observe
the orientation of water molecules around the sidewall of pristine tube and SW-defective
tubes because oriented probability for finding of H atoms and O atoms of water molecules is
almost similar. The functionalized tube with SW-defect section interacted water molecules
indicated that the hydration structure of defective system is quite the same, as show in Fig.
6(a) and 6(b).

Figure 5. Radial distribution functions obtained from the (a) CoM of perfect and defective
tubes to oxygen of water molecules and (b) CoM of perfect and defective tubes to hydrogen
of water molecules.

(a)
(b)
G00068
March 23-26, 2010
586
Figure 6. Radial distribution functions obtained from the center of mass of (a) O atoms
(b) H atoms of the water to the CoM of tubes

4. CONCLUSION
In summary, we have investigated the effects of the functionalized group (-COOH),
covalently attached to the sidewall of perfect and defective nanotubes, using both first
principles calculations and molecular dynamics simulations. The results show that the
functionalized (-COOH) groups significantly adjust the C-C bond length at various positions
in proximity and modify the energy gap of the tubes. MD simulations indicate that the
hydration structure of all systems is quite similar. Stone-Wale defects and sidewall
functionalization with two carboxylic groups exert not so strong effect on the hydration
structures.

REFERENCES
1. S.Iijima, Nature, 1991, 354, 56-58.
2. Wongchoosuk, C., Krongsuk, S., Kerdcharoen, T. Int. J. Nanoparticles. 2008, 136-156.
3. Wongchoosuk, C., Udomvech, A., Kerdcharoen, T. Curr. Appl. Phys. 2008, 351-358.
4. Veloso, V., Filho, A.G., Filho, J.M., Fagan, S.B., Mota, R. chem. Phys. Lett. 2006, 430,
71-74.
5. Kar, T., Akdim, B., Duan, X, Pachter, R. Chem. Phy. Lett. 2006, 423, 126-130.
6. Gayathri, V., Geetha, R. Int. J. Phy. 2006, 34, 824-828.
7. Wang, C., Zhou, G., Liu, H., Wu, J., Gu, B.-L., Duan, W. J. Phys. Chem. 2006, 110,
10266-10271
8. Fangping, O. J. Phys. Chem. 2008, 112, 12003-12007.

ACKNOWLEDGMENTS
This work was supported by Material Science and Engineering Programme Faculty of science
and NANOTEC Center of Excellence at Mahidol University, National Nanotechnology Center,
Thailand.

(a)
(b) (a)
G00070
March 23-26, 2010
587
An Improvement of Rainfall Estimation in Thailand

I. Sa-nguandee
1,C
, M. Raksapatcharawong
2
, and W. Veerakachen
2

1
TAIST Tokyo Tech, ICTES Program, Department of Electrical Engineering, Kasetsart University,
2
Department of Electrical Engineering, Kasetsart University, Bangkok 10900, Thailand
E-mail: sd.itthi@gmail.com; Fax: 02-9428842; Tel. 084-0709218

ABSTRACT
Remote sensing technology provides good spatial resolution for rainfall estimation
which is very important for monitoring, prediction and mitigation of rain disaster.
Today, data from Infrared channel of meteorological satellite have capability to
estimate rainfall, while rain gauge and radar should be utilized for calibration. The
Infrared channel is used for rainfall estimation by the relationship of Brightness
Temperature (BT) from satellite data and surface rainfall from rain gauge, but it is
widely accepted that a specific satellite rainfall technique is not necessarily applicable
to another climatic region due to differences in dynamic rain processes from region to
region and the relationship between Infrared band measurement of cloud top
temperature and surface rainfall from rain gauges is non-linear, algorithm must be
studied and specifically developed for Thailand. Therefore, this paper proposes
technique to mask out non-raining information by using numerical data from FengYun-
2C (FY-2C) Infrared channel to identify cloud type to improve rainfall estimation
accuracy in Thailand.

Keywords: Remote Sensing, Data Processing, Rainfall Estimation, FY-2C

1. INTRODUCTION
Today, data from Infrared channel of meteorological satellite are very useful in the field of
meteorology, water conservancy and agriculture, generating huge economics and social
benefits through weather forecast, climate prediction, disaster monitoring, and environmental
remote sensing. The present operational cloud classification product of FengYun-2C (FY-2C)
is based on a clustering method. This method uses Cloud-Top Brightness Temperatures of
Long Wave Infrared channel (IR1) to detect clouds, then uses Brightness Temperatures of
Water Vapor channel (IR3) to classify deep convective clouds [1]. The other method use Split
Window channel (IR2) data to identify the low-level cloud and the thin cirrus cloud from
underlying surface that can improve cloud detection method [2]. Among various satellite
remote sensing techniques for rainfall estimation, Infrared images are typically used to
estimate rain in Thailand. The relationship between Brightness Temperature (BT) of Infrared
images and rain-gauge measurement must be identified. This method is very sensitive to
image description such as the satellite radiometers, raw data slicing level, grayscale level,
color palette, etc. It is also difficult to adjust algorithm for another climatic region. To obtain
suitably model for the whole region of the country may take very long time, not speaking of
the change of model due to climate changes. Therefore, in order to improve rainfall estimation
in Thailand, this paper study statistical classification algorithms on two satellite data channels
(IR1: 10.3-11.3 m and IR3: 6.3-7.6 m) from Chinas first operational geostationary
meteorological satellite FengYun-2C (FY-2C) and find suitable techniques for the cloud
classification and rainfall estimation model by using FY-2C numerical data (16bit-PNG)
instead of satellite images. The results of this study will also help choosing automated cloud
classification algorithms for the upcoming launch of the FY-4 series [3].
G00070
March 23-26, 2010
588
FY-2C Visible Infrared Spin Scan Radiometer (VISSR) data transmission, received
through Digital Video Broadcasting System (DVB-S) has been analyzed to be able to process
and fully utilized the data. It covers the daytime data of Infrared band and Visible bands,
night-time data of Infrared bands only. The Infrared data sectors are extracted from the frame
format. Each sector includes 2,291 pixels as one line of image data. Quantization of Infrared
channels is expressed using 10 bits. This 10 bits information is mapped to Brightness
Temperature (BT) using conversion table which can be shown by graph in Figure 1. And
major characteristics of each instrument, including spectral wavelengths and spatial
resolutions, are shown in Table 1.

Figure 1. Conversion of Infrared Brightness Temperature.

Table 1. Major Instrument Characteristics.
Channel Name IR1: Long Wave Infrared
IR2: Split Window
IR3: Water Vapor
IR4: Medium Wave Infrared
VIS: Visible
Wavelength (m) IR1: 10.3-11.3
IR2: 11.5-12.5
IR3: 6.3-7.6
IR4: 3.5-4
VIS: 0.55-0.90
Resolution (KM) IR: 5
VIS: 1.25
Noise performance IR1/IR2 NEDT=0.4-0.2K(300K)
IR3 NEDT=0.5-0.3K(300K)
IR4 NEDT=0.6-0.5K(300K)
VIS S/N=1.5 (Albedo=0.5%)
S/N=50 (Albedo=95%)
Quantification precision (bit) IR: 10
VIS: 6

According to the remote sensing characteristics, Long Wave Infrared channel (IR1) and
Split Window channel (IR2) of FY-2C can discriminate cloud area and underlying surface.
The Water Vapor channel (IR3) can indicate deep convective clouds. Visible channel (VIS)
has greater spatial resolution, it is useful for the detection of low clouds, but their temporal
availability is limited to daylight hours. Medium Wave Infrared channel (IR4) is sensitive to
objects with higher temperature. It is usually used for the estimation of underlying surface
G00070
March 23-26, 2010
589
temperature and detection of fog and low-level clouds. However, great efforts are needed to
eliminate the influence of visible light on the brightness temperatures of IR4 channel [4]. For
this reason, this study will focus on numerical data from two Infrared channels: IR1, 10.3-
11.3 m; IR3, 6.3-7.6 m.

A. Rain Cloud Identification
Rain cloud identification is accomplished by the discrimination of high level cloud
characteristics, associated with the rain or no-rain classes. Raining pixels are identified by
thresholding of the IR1 (Long Wave Infrared) channel in combination with filtering process
based on Brightness Temperature Difference (BTD) between IR1 and IR3 (Water Vapor)
channel. The Infrared Threshold Rainfall (ITR) technique assumes all clouds colder than 253
K to be active convection. However, cirrus contamination in the high level cloud results in
overestimation of convective rainfall. Therefore, BTD filtering technique is applied to refine
rain cloud identification by discriminating cirrus clouds. In the use of BTD, Kurino (1996)
found that the areas with Water Vapor and Infrared Temperature difference less than or equal
to 0 K corresponded to deep convection. For this study, BTD threshold at -3 K was chosen as
suggested in [5] and to be adapted later according to spectral statistical analysis in the future.
The algorithms used in this paper are summarized in Table 2.

Table 2. Summarize Techniques Used for Rain Cloud Identification.

Techniques Rain Cloud Identification FY-2C data channel
Thresholding IR < 253 K IR1
Filtering BTD < Threshold IR1, IR3

It is worth noting that the ITR technique also tends to neglect warm coastal and orographic
rain which is fundamentally due to Infrared threshold. This may result in underestimation of
rain rate due to warm cloud. In this case, threshold has to be suitably adapted for the region of
Thailand to gain more accuracy for rainfall estimation in the future.

B. Rainfall Estimation
A regression relation converts Infrared brightness temperature numerical data to rainfall
rates on the pixel identified to be raining. The Infrared Power Law Rain Rate (IPR) technique
is applied to the residual cloud filter pixels, for estimating hourly rain rate. The power law
relationship was derived from a statistical nonlinear regression of co-located surface rain
gauge data and FY-2C Infrared temperature (K) obtained from satellite raw data. Although it
was found that this technique led to promising and acceptable rainfall estimation in some
region of Thailand [6], the polynomial relationship also provided good results in many cases.
Therefore, both models were used in our work for comparison and analytical purposes.

G00070
March 23-26, 2010
590

Figure 2. Locations of TMD 110 Rain Gauge Stations.

Thailand Meteorological Department (TMD) has provided rain gauge data obtained from
110 stations over the northern part of Thailand during the dry season (May 2008). Figure 2
shows the locations of TMD ground-based rain gauge stations mentioned above.

Data analyses were taken in 3 levels: station, province, and regional. Three stations
(Khaokuo-Petchaboon, Klonglan-Kampangpetch and Pobpra-Tak) were chosen to study the
tendency of rain estimate improvement using thresholding and filtering method. The results
show that Thresholding with IR1 at 253 K does improve the correlation between rain gauge
data and Infrared numerical values. However, it was not obvious that BTD filtering could
improve the estimation at station level. Therefore, data were further analyzed at province
level. Three provinces (Petchaboon, Kampangpetch, and Tak) were chosen. The results are
similar to the case of station level. This might be an effect from small amount of data on rainy
pixels. However, when all data from 110 stations were analyzed at regional level the result
was more positive. We can see an improvement of filtering with BTD over only ITR method
by increment of R
2
values for both power law and polynomial models as in Table 3.

Table 3. R
2
values for Rain Estimation Model.

Cloud Identification Polynomial Power Law
Thresholding 0.3783 0.481
Thresholding & Filtering 0.6618 0.658

Since the value of R
2
are pretty low, the statistics of Infrared channels for rain and no-rain
cases computed from 80,410 sample pixels with 3,163 rainy pixels from 110 rain gauge
stations are shown in Table 4. This could be used to adjust thresholds for both steps (IR1 and
BTD) of rain cloud identification in the future.
G00070
March 23-26, 2010
591
Table 4. Spectral Statistics for Rain/No-Rain Cloud.

Case Band Min Max Mean S.D.
Rain

IR1 180 296.161 254.0991 25.4252
IR3 199.852 255.413 239.1151 11.60858
IR3-IR1 -47.223 23.287 -14.9789 14.62227
No-Rain

IR1 180 302.374 270.6871 18.28181
IR3 199.852 256.362 244.4057 7.465542
IR3-IR1 -53.204 22.753 -26.2812 11.94687

One reason for low value of R
2
might be that the TMD rain-gauge data used in this
research are without validation. However, the results have shown tendency that the
methodology used in this work are appropriate. Our algorithm for rainfall estimation, using
Infrared numerical data together with thresholding and filtering techniques, has more
opportunity for improvement especially by validating rain gauge measurement and finding
more suitable thresholds.

5. CONCLUSION
The results of this study imply an improvement in methodology for rainfall estimation by
using Infrared numerical data extracted from FY-2C satellite raw data. However, rain gauge
data must be validated before use as input for regression analysis. Also, IR1 and BTD
thresholds should be suitably adapted for more accuracy. The split window technique using
Brightness Temperature Difference between IR1 and IR2 channels may be considered for
discriminating cirrus cloud. Microwave observation may be used to improve estimation of
warm cloud rain. Precise rainfall estimation can make great contribution to agricultural
country as Thailand.
This methodology can be applied to other sets of satellite data such as FY-4 series as well and
should be easily adjusted for other climatic region of the whole country. Also, experience
gained by using satellite numerical data instead of satellite images could be applied further to
find other products such as sea surface temperature, outgoing long-wave radiation, solar
radiation, etc. which could be used in other research areas i.e. energy and power balance
monitoring system, fire detection, flash flood warning, drought morning, and crop yield
forecast as well.

REFERENCES
1. YU Fan, LIU Liang-ming and WEN Xiong-fei, The Research on Application of FY-2C
Data in Drought Monitoring, ISPRS Congress Beijing 2008, Beijing, 2008.
2. R.W.Saunders and K.T.Kriebel, Int.J.Remote Sens., 1998, 9(1), 123-150.
3. W.J. Zhang, J.M. Xu and C.H. Dong, Earth Science Remote Sensing, TsingHua
University Press, Beijing, 2007, 1, 392-413.
4. D.R. Li, X.Y. Dong, L.M. Liu and D.X. Xiang, International Workshop on Knowledge
Discovery and Data Mining (WKDD), 2008, 289-292.
5. G.G.S. Pegram, I.T.H. Deyzel, S. Sinclair, et al., 2
nd
IPWG Workshop, Naval Research
Laboratory, Monterey, CA, USA, 2004.
6. P. Kosa and K. Pongput, The Rainfall Estimation using Remote Sensing in Thailand,
unpublished.
7. P. Gegkhuntod and T. Oki, Comparison of Gauge and Satellite Rain Estimates for
Thailand, unpublished.
8. Z. Ma Fang and N. Guo, Atmospheric science, 2007, 31(1), 119-128.
G00070
March 23-26, 2010
592
ACKNOWLEDGMENTS
This research is financially supported by Thailand Advanced Institute of Science and
Technology - Tokyo Institute of Technology (TAIST-Tokyo Tech), National Science and
Technology Development Agency (NSTDA), Tokyo Institute of Technology (Tokyo Tech)
and Kasetsart University (KU). We also thank Ministry of Information and Communication
Technology for supporting this research through a consultancy project entitled A Study and
Applications of Meteorological Satellite Data via Digital Video Broadcasting System (DVB-
S), grant no. 143/2551.

G00075
March 23-26, 2010
593
Survey of Metaheuristic Methodology for Solving Container
Loading Problem

Ramm Khamkaew
1
, Samerkae Somhom
2

1, 2
Department of Computer Science, Faculty of Science, Chiang Mai University, Chiang Mai 50202
E-mail: g500531140@cmu.ac.th; Fax: 0-5394-3433; Tel. 0-5394-3409

ABSTRACT
This paper proposes a container loading problem and problem solution. The solution
using mathematics and computer methods which genetic algorithm, tabu search
algorithm, simulated annealing algorithm. In this survey, we compare the strengths and
the weaknesses and discuss the effectiveness of each method. The test data is
rectangular boxes that have many sizes and standard rectangular container.

Keywords: Container Loading Problem, Metaheuristic, Simulated Annealing, Tabu
Search, Genetic Algorithm.

1. INTRODUCTION
The container loading problem is one of cutting and packing problems. The container
loading problem is that of orthogonally packing a subset of some given rectangular-shaped
boxes into a rectangular container of fixed dimensions. The problem has numerous
applications in the cutting and packing industry,e.g.,when cutting wood or foam rubber into
smaller pieces, loading pallets with goods, or filling containers with cargo. An optimal filling
of a container reduces the shipping costs as well as increasing the stability and support of the
load. The problem has been studied since the seminal work of Gilmore and Gomory in the
early sixties, and numerous papers and algorithms have been presented for its solution. There
are, however, several variants of the container loading problem depending on the objective
function and side constraints present. In the paper (Dowsland and Dowsland, 1992) has
classified a packing problem by packing model.

1. Two Dimensional Rectangular Packing is in several industrial applications one is required
to allocate a set of rectangular items to larger rectangular standardized stock units by
minimizing the waste. The standardized stock units are rectangles, and a common objective
function is to pack all the requested items into the minimum number of units: the resulting
optimization problems are known in the literature as two-dimensional bin packing problems.
2. Pallet Loading Problem is the maximum number of identical rectangular boxes has to be
packed onto a rectangular pallet. The problem arises at factories when large quantities of one
product must be shipped onto pallets, and it is sometimes called the manufacturers loading
problem (MLP) to distinguish it from the distributors loading problem (DLP) in which in
boxes of several sizes are packed together on one pallet.
3. Three-Dimensional Packing Problems consist of determining the minimum number of
three dimensional rectangular containers (bins) into which a given set of three dimensional
rectangular items (boxes) can be orthogonally packed without overlapping. All bins are of
identical known dimensions and each box is of dimensions.
4. Non Rectangular Packing Problem is to determine non-rectangular can be placed into the
container without overlapping. For non-rectangular packing problems, e.g. polygon packing
problem, bounds and worst-case analysis and average case analysis of the algorithms are more
difficult than that of rectangular packing problems.
G00075
March 23-26, 2010
594
In the problem the only parameters of this kind of model are the dimensions of the cargo
and container involved and, at least formally, some measure of the value of each item. But it
is perhaps necessary to emphasize that no claim is made that the factors described are of
importance in every case, indeed, that there are many practical situations where all of them
are of importance. However, is that there are many cases where at least some of the factors
listed below play an important role (Bischoff and Ratcliff, 1995).

1. Orientation constraints: The familiar this way up instruction on cardboard boxes is a
simple example of this kind of restriction. It may, however, not only be the vertical
orientation which is fixed, for instance, a two-way entry pallet is loaded by forklift truck even
the orientation in the horizontal plane may have to be regarded as effectively pre-determined.
2. Load bearing strength of items: Stack no more than x items high is another instruction
seen on many boxes in everyday situations. How far this translates into a straightforward
figure for the maximum weight per unit of area which a box can support depends on its
construction and also its contents. Often the load bearing strength of a cardboard box is
provided primarily by its side walls, so that it might be acceptable to stack an identical box
directly on top, whereas placing an item of half the size and weight in the centre of the top
face causes damage. The load bearing ability of an item may, of course, also depend on its
vertical orientation.
3. Handling constraints: The size or weight of an item and the loading equipment used may to
some extent dictate the positioning within a container. It might be necessary, for instance, to
put large items on the container floor or to restrict heavy ones to positions below a certain
height. It may also be desirable from the viewpoint of easy/safe materials handling to place
certain items near the door of the container.
4. Load stability: To ensure that the load cannot move significantly during transport is an
obvious requirement if the cargo is easily damaged. Also, an unstable load can have important
safety implications for loading and (especially) unloading operations. Straps, airbags and
other devices can be used to restrict or prevent cargo movement, but the costs, especially in
terms of time and effort spent, can be considerable.
5. Grouping of items: Checking of a load might be facilitated if items belonging to the same
group defined, for example, by a common recipient or the item type--are positioned in close
proximity. This may also have advantages in terms of the efficiency of loading operations.
6. Multi-drop situations if a container is to carry consignments for a number of different
destinations, it is desirable not only to place items within the same consignment close
together, but also to order the consignments within the container so as to avoid, as far as
possible, having to unload and re-load a large part of the cargo several times.
7. Separation of items within a container: In situations where the cargo compromises items
which may adversely affect some of the other goods. e.g., if it includes both foodstuffs and
perfumery articles or different chemicals which must not come into contact--it is necessary to
ensure that the loading arrangement takes account of this.
8. Complete shipment of certain item groups: Sub-sets of the cargo may constitute functional
entities. e.g., components for assembly into a piece of machinery or may need to be treated as
a single entity for administrative reasons. It is often necessary in these cases to ensure that if
any part of such a sub-set is packed, then all the other items belonging to it are also included
in the shipment.
9. Shipment priorities: The shipment of some items may be more important than that of
others. More specifically, each item might at least conceptually have a certain priority rating,
deriving from, for example, delivery deadlines or the shelf life of the product concerned.
Depending on the practical context, this rating may represent an absolute priority--in the sense
that no item in a lower priority class should be shipped if this causes items with higher ratings
to be left behind or it may have a relative character, reflecting merely the value placed on
inclusion in the shipment without debarring trade-offs between priority classes.
10. Complexity of the loading arrangement: More complex packing patterns generally result
in a greater materials handling effort. The additional effort required will be most significant if
the complexity of a pattern necessitates a change to more labor intensive handling methods,
G00075
March 23-26, 2010
595
such as from using clamp or forklift trucks to purely manual loading. Conversely, if the
handling technology cannot be changed, the pattern must conform to the limitations of this
technology.
11. Container weight limit: If the cargo to be loaded is fairly heavy, the weight limit of a
container may represent a more stringent constraint than the loading space available.
12. Weight distribution within a container: From the viewpoint of transporting and handling
the loaded container such as lifting it onto a ship it is desirable that its centre of gravity is
close to the geometrical mid-point of the container floor. If the weight is distributed very
unevenly, certain handling operations may be impossible to carry out.

2. METAHEURISTICS
Metaheuristic is a set of algorithmic concepts that can be used to define heuristic methods
applicable to a wide set of different problems. In other words, a Metaheuristic can be seen as
general-purpose heuristic method designed to guide an underlying problem-specific heuristic.
e.g., local search algorithm or construction heuristic. A Metaheuristic is therefore a general
algorithmic framework which can be applied to different optimization problem with relatively
modifications to make them adapted to a specific problem. In container loading problem,
Metaheuristic have been esteem to used for solving problem are simulated annealing
algorithm, tabu search algorithm and genetic algorithm.

1. Simulated Annealing Algorithm (SA) is a generic probabilistic Metaheuristic for the global
optimization problem of applied mathematics, namely locating a good approximation to the
global minimum of a given function in a large search space. It is often used when the search
space is discrete. In paper (Peng al. et., 2009) presents a hybrid simulated annealing algorithm
for container loading problem with boxes of different sizes and single container for loading. A
basic heuristic algorithm is introduced to generate feasible solution from a special structure
called packing sequence. The hybrid algorithm uses basic heuristic to encode feasible packing
solution as packing sequence, and searches in the encoding space to find an approximated
optimal solution.
2. Tabu search algorithm (TS) is to pursue local search whenever it encounters a local
optimum by allowing non-improving moves; cycling back to previously visited solutions is
prevented by the use of memories, called tabu lists, which record the recent history of the
search. The research of (Bortfeldt and Gehring, 1998) presents a parallel tabu search
algorithm for the container loading problem with a single container to be loaded. The
emphasis is on the case of a weakly heterogeneous load. The distributed-parallel approach is
based on the concept of multi-search threads. Several search paths are investigated
concurrently. The parallel searches are carried out by differently configured instances of a
tabu search algorithm, which cooperate by the exchange of (best) solutions at the end of
defined search phases. The parallel search processes are executed on a corresponding number
of LAN workstations.
3. Genetic algorithm (GA) are implemented in a computer simulation in which a population
of abstract representations called chromosomes of candidate solutions to an optimization
problem evolves toward better solutions. The evolution usually starts from a population of
randomly generated individuals and happens in generations. In each generation, the fitness of
every individual in the population is evaluated multiple individuals are stochastically selected
from the current population and modified to form a new population. The new population is
then used in the next iteration of the algorithm. Commonly, the algorithm terminates when
either a maximum number of generations has been produced, or a satisfactory fitness level has
been reached for the population.GA has been used in (Gehring and Bortfeldt, 1997) solving
container loading problem. The main ideas of the approach are first to generate a set of
disjunctive box towers and second to arrange the box towers on the floor of the container
according to a given optimization criterion. The loading problem may include different
G00075
March 23-26, 2010
596
practical constraints. The performance of the GA is demonstrated by a numerical test
comparing the GA and several other procedures for the container loading problem. The
developed GA seems to be suitable for container loading problems where simple stability
requirements are sufficient. The method promises high container utilization for problems with
both weakly heterogeneous and strongly heterogeneous assortments of boxes. Designed for
the latter case, however, the GA performs particularly well for higher numbers of box types.
The procedure meets some practical constraints and the required computing times appear to
be acceptable with respect to practical requirements.

3. TEST PROBLEM and COMPARISON
The tests were performed on 700 problems generated by (Bischoff and Ratcliff, 1995). The
700 instances are organized into 7 classes of 100 instances each. The number of box types
increases from 3 in BR1 to 20 in BR7. Therefore, this set covers a wide range of situations,
from weakly heterogeneous to strongly heterogeneous problems. The number of boxes of
each type decreases from an average of 50.2 boxes per type in BR1 to only 1.30 in BR7. The
total volume of the boxes is on average 99.46% of the capacity of the container, but as the
boxes dimensions have been generated independently of the containers dimensions, there is
no guarantee that all the boxes of one instance can actually fit into the container.

Figure 1. The format of test problem.

The complete computational results on the whole set of 700 instances appear in Table 1.
These tables include a direct comparison with the results of the best algorithms proposed in
the literature, which have benchmarked themselves against these test problems. Therefore, we
compare our algorithm against 7 approaches:
TS : a tabu tsearch algorithm by Bortfeldt and Gehring ;
SA : a simulated annealing algorithm by Mack, Bortfeldt and Gehring ;
GA : a genetic algorithm by Gehring and Bortfeldt ;
HYB : a hybrid algorithm by Mack, Bortfeldt and Gehring ;
PTS : a parallel tabu search algorithm by Bortfeldt, Gehring and Mack ;
PSA : a parallel simulated annealing algorithm by Mack, Bortfeldt and Gehring ;
PHYB : a parallel hybrid algorithm by Mack, Bortfeldt and Gehring ;

60 2508405 the problem number p, seed number
587 233 220 container length, width, height
10 number of box types n
1 78 1 72 1 58 1 14
2 107 1 57 1 57 1 11 where there is one line for each
box type
3 ...................
etc for n lines
The line for each box type contains 8 numbers:
box type i, box length, 0/1 indicator
box width, 0/1 indicator
box height, 0/1 indicator
number of boxes of type i

After each box dimension the 0/1 indicates whether
placement in the
vertical orientation is permissible (=1) or not (=0).
G00075
March 23-26, 2010
597
Table 1. Test Result for the Bischoff and Ratcliff problems.

Test
case
Nuber of
box type
Average volume utilization as a percentage of the container volume
Serial methods Parallel methods
TS SA GA HYB PTS PSA PHYB
BR 1 3 93.23 93.04 94.32 93.26 93.52 93.24 93.41
BR 2 5 93.27 93.38 95.22 93.56 93.77 93.61 93.82
BR 3 8 92.86 93.42 92.86 93.71 93.58 93.78 94.02
BR 4 10 92.40 92.98 91.59 93.30 9305 93.40 93.68
BR 5 12 91.61 92.43 92.55 92.78 92.34 92.86 93.18
BR 6 15 90.86 91.76 92.47 92.20 91.72 92.27 92.64
BR 7 20 89.65 90.67 90.74 91.20 90.55 91.22 91.68
Average 92.00 92.53 92.82 92.86 92.70 92.91 93.20

4. CONCLUSION
This paper has described a comparative efficiency of 7 metaheuristic methods for solving
container loading problem. The problems have 7 number of box type (3, 5, 8, 10, 12, 15, 20)
and each number of box type has 100 cases. The objective being to maximize the volume
utilization of the container. The solution using mathematics and computer methods which
genetic algorithm, tabu search algorithm, simulated annealing algorithm. The parallel
methods give best result for this problem. Nevertheless, more complex Metaheuristic, based
on the same ideas but adding more powerful improvement schemes, could improve the results
further.

REFERENCES
1. Bischoff, E. and Janetz, F. and Ratcliff, M. 1995. "Loading pallets with non-identical
items". European Journal of Operational Research 84: 681692.Author(s), Text Book
Title, edition, Publisher, Town, Year, page-page.
2. Bischoff, E. and Ratcliff, M. 1995. "Issues in the development of approaches to container
loading". Omega 23(4): 377-390.
3. Bortfeldt, A., Gehring, H., 1998. The Tabu Search Algorithm for Container Loading
Problem. Operation Research 62: 237250.
4. Bortfeldt, A. and Gehring, H. and Mack, D. 2003. A parallel tabu search algorithm for
solving the container loading problem. Parallel Computing 29: 641-662.
5. Dowsland, K. and Dowsland, W. 1992. Packing problems. European Journal of
Operational Research 56: 2-14.
6. Gehring, H. and Bortfeldt, A. 1997. A genetic algorithm for solving the container
loading problem. International Transactions in Operational Research 4: 401418.
7. Peng, Y. and Zhang, D. and Chin. 2009, F. A hybrid simulated annealing Algorithm for
Container Loading Problem. The First ACM/SIGEVO Summit on Genetic and
Evolutionary Computation : 919-922.
G00080
March 23-26, 2010
598
Variation Analysis of Neural Network Based Approximation
Function

S. Pongjanla
1,C
, P. Anussornnitisarn
2

1,2
Department of Industrial Engineering Faculty of Engineering, Kasetsart University, 50,
Phaholyothin Rd., Jatujak, Bangkok 10900, Thailand
C
E-mail: g5065209@ku.ac.th ; Fax: 02-5796804; Tel. 083-9299341

ABSTRACT

Using discrete event simulation is a popular choice in complex system
evaluation. However, to identify the right set of system parameters, which give the best
system performance, is a time consuming process due to stochastic uncertainty
behavior of system simulation. This research demonstrates an alternative method to the
discrete event simulation as a system model where system performance can be
evaluated within a short period of time in comparison to discrete event simulation.
However, typical approximation function based neural network need to be re-trained
when setting environment has changed. This research intends to address both weakness
in the time of simulations and a rigid structure of system learning of neural network.
The objectives of this research is to develop and study the scalable artificial
neural network system for approximating performance function in case study of the
buffer optimizing of pull-control system of multiple workstations, the network system
separated 3 types of single workstation by training data from simulation are initial
workstation, intermediate workstation and final workstation, approximation by
interconnecting 3 types of network systems and to study factors that affect the
performance function approximation efficiency, performance measure are the average
number to meet demand (sale), the average number to not meet demand (lost sale), the
average of work-in-process and the average cycle time.
From the study, the neural network system can use to approximate performance
measures of this case study system. The approximation function based neural network
does not only determine the mean of the stochastic system performance but also its
variation (the length of the mean confidence interval). Therefore, we can use this
approximation function based neural network to find the size of appropriate buffer
much more quick than does the method by simulation optimization.

Keywords: Artificial Neural Network, Function Approximation, Stochastic Search,
Variation analysis.

1. INTRODUCTION
Stochastic process is a conveniently mathematical model used for study but there are
some difficult limitations if the system is complicated or has numerous detailed elements.
The limitations, however, could be solved by simulation model used for the study of complex
process. It could adapt to be concordant with real production system also. Finding an optimal
solution for the model of the system required all possible answers which were impossible in a
practical way, especially the system having several related factors due to tremendous
situations that were possible in the future. Nevertheless, a resolution of this problem was
using artificial neural network, a technique for evaluation, to analyze only partial data
gathered from the model. This method did not depend on data distribution and was able to be
applied on incomplete data as well to produce the evaluation with high accuracy (Piromya,
2006)[6]. Notwithstanding, the process study based on uncertainty had to acquire flood of
G00080
March 23-26, 2010
599
data for the artificial neural network. If the system had a large number of input parameters, it
would be hard to simulate the situation for data collection in the network. Process allocation
into subunits of the process study based on uncertainty was very helpful to immensely reduce
quantity of data (Chambers, 2000)[2], especially a queuing system including pull control
system with buffer called Kanban system. Thus, the application of process allocation into
subunits before synthesis of expanded system would obviously decrease time spent on the
large process study with abundant input parameters (Manusak, 2007)[4]. But stochastic
system determine mean and variation, so if require representable real system, we will
approximate mean and variation.

The production pull control system had special characteristic such as the data production
would flow reversely to the flow of material (Monden, 1994)[5]. The latter production
process would take necessary material when needed by using Kanban card to bring the
material of the earlier production process. After that, such process had duty to only produce
sufficient amount to reach equal number of taken materials or following the Kanban card. It
could be said that the latter process would control the production rate of the earlier process as
shown in Figure 1.

Workstation Buffer Product flow Demand
flow

Figure 1 Pattern of pull control system

Albino (1995)[1] applied the creation of the queuing systems model by using Markov
process to evaluate behavior of the pull production process. This method aimed to increase
efficiency of throughput time evaluation and work-in-process for Kanban system analysis and
design to suit real production line. A significant characteristic of this evaluation was simple
allocation of production line used for the queuing system like M/M/1/L. The result found that
the behavior of the pull production process and other greater production lines could be well
evaluated by this method.

The artificial neural network system was a processing technique developed by basic
principles of creature neural network. The artificial neural network worked like a brain to
collect knowledge through learning process and keep it as weight which was adjustable when
accepting new things. General aspects of the developed network were different weight of
connection link between each node and signal. The weight value was regarded as the
knowledge gathered for specific problem resolution. There was a function within the node to
determine output signal or called transfer function. The characteristics of node in the artificial
neural network were presented in Figure 2.
Process
1
Process
2
Process
N
Customer
Demand

G00080
March 23-26, 2010
600

Figure 2 Neural Network Model

Chambers (2000)[2] investigated the buffer size of the queuing system by studying
possibility of single workstation building to forecast the result of the queuing system. The
single workstation was connected following the queuing system of two products and used the
back-propagation artificial neural network to teach the network. According to the situation
simulation, the connected workstations in the queuing system could predict the performance
value of the system which was an approximate with the result of the situation simulation.

Piromya (2006)[6] used the back-propagation artificial neural network to assess the
performance of the pull production process consisting of 3 and 4 workstations. The data
applied for teaching neural network was collected from the situation simulation to examine
the result of different data of entire possible situation of the system. It found that the
evaluation of the artificial neural network was a very good level. Besides, when there were
more data in the network, the efficiency of systems performance evaluation also increased
too.

Fulya (2006)[3] applied the back-propagation artificial neural network to identify the
buffer quantity in asynchronous assembly systems by using the data from situation simulation
to compare with a prediction method, regression metamodels. The analysis showed that the
back-propagation artificial neural network could be used as the suitable model to identify the
buffer quantity with low sensitiveness.

Manusak (2007)[4] developed and studied the scaleable artificial neural network system
for performance function approximation in case study of the buffer optimizing of pull control
system of multiple workstation ,the network system separate 3 types of single workstation by
training Data from simulation are initial workstation network , intermediate workstation
network and final workstation network , approximation by interconnecting 3 types of network
systems. From the study, the scaleable artificial neural network system can use to
approximate performance measure of this case study and from this studying factors that affect
the performance efficiency, increasing the factor of number of workstation in the system
make the efficient of approximation decreased. For sample size used training network
increased make the efficient of approximation increased.


Because of stochastic system determine mean and variation, therefore approximation
function not should consider only mean but it should approximate variation too. This research
used concept of Scaleable Aritifitial Neural Network System : SANN for create pull system
model, in order to be training data for neural network. The situation simulation of the pull
G00080
March 23-26, 2010
601
production control system was divided into three types of single workstation including initial
workstation, intermediate workstation and final workstation. The initial workstation was the
first workstation and the intermediate workstation was the second workstation to the
workstation n-1, n is number of total workstations in the system and the end single
workstation was the last workstation or n workstation.

After that, two models of the situation simulation were created. The same model was used
for all three types of single workstation. The data collected from the situation simulation of
each workstation was brought for the network teaching to produce the two types of the
networks. The pull production control system with several workstations could be built by
SANNS to link each workstation and evaluate 8 performance values including average
response time of sale, variation response time of sale, average response time of lost sale,
variation response time of sale, average of cycle time (CT), variation of cycle time (CT),
average of work-in-process (WIP) and variation of work-in-process (WIP).

The artificial neural network was a hidden layer with 23 nodes using two levels of data
taught in the network at 30% and 50% of all data. Probability of processing time was normal
distribution with standard deviation at 5% of the processing time (PCT) of each workstation
only. The PCT value was between 0.8-1.2 separated by 0.1 time unit in each interval. The
highest buffer size (B) was 5 while the lowest buffer size was 2. The demand rate (DR) was
dispersed as exponential distribution with average of one processing time at 0.8, 1.0 and 1.2
of time unit. The reliability (R) was 0.75, 0.875 and 0.99 while the down time (DT) occurred
during machine repair and obstructed working. This research defined the down time as the
exponential distribution with the average at 2, 4, 6 and 8 of time unit then compared with the
result from SANNS and computerized situation simulation.

The method of network connection started from checking the pull production control
system if there was any workstation with highest processing time or not. If such workstation
was found, it would be defined as a bottleneck workstation. Then the average of demand rate
of any earlier workstation of the bottleneck workstation would be equal to reverse value of
time to submit into each type of networks. The outcome was the average of production
interval regarded as the input rate for the next workstation. This process would thoroughly run
in whole workstations as demonstrated in Figure 3.

Initial
workstation
Intermediate
workstation
Final
workstation
PCT(1)
DR(1) R(1)
PCT(n)
DR(n) DT(n) DT(1) R(n) DR
B(n)
Sale(n)
[Mean, Variation]
Lost Sale(n)
[Mean, Variation]
WIP(n)
[Mean, Variation]
CT(n)
[Mean, Variation]
B(1)
DT(n) R(n)
PCT(n) B(n-1) B(n-1) B(n)
CT(n-1) CT(n-1)

Figure 3 Connector of workstation

G00080
March 23-26, 2010
602

R-square was used to measure the efficiency of the result as illustrated in Table 1.

Table 1. R-square values of the result

3
workstation
R
2
of Mean R
2
of Variation
Train 30% Train 50% Train 30% Train 50%
Sale 0.80 0.80 0.77 0.77
Lost Sale 0.81 0.82 0.76 0.76
Cycle Time 0.72 0.72 0.54 0.54
WIP 0.87 0.88 0.58 0.58

5 R
2
of Mean R
2
of Variation
workstation Train 30% Train 50% Train 30% Train 50%
Sale 0.55 0.53 0.51 0.53
Lost Sale 0.68 0.71 0.51 0.49
Cycle Time 0.26 0.43 0.03 0.05
WIP 0.32 0.36 0.09 0.07

When consider efficiency value from the test, the result found that variation of cycle time
and work in process are the lowest. The R-square value of variation of lost sale, sale are
highest at 0.76, 0.77 severally. The R-square of the average work in process is highest at 0.88
and the average cycle time is lowest at 0.72.

5. CONCLUSION

The resulted indicated that scaleable artificial neural network approximate mean accurate
than variation. When the tendency of the efficiency value evaluated by SANNS decrease, the
workstations in the pull production control system will increased but when input data train
increased, accuracy will increase too. Even if extend number of work station affect R-square
decrease, tendency of R-square are follow 3 work stations.

REFERENCES
1. Albino, V., Dassisti, M., and Okogbaa, G., Int. J. Production Economics, 1995, 40, 197-
207.
2. Chambers, M., and Mount-Campbell, C.A., Int. J. Production Economics, 2000, 79, 93-
100.
3. Fulya, A., Berna, and D., Akif, B., Applied Soft Computing, 2007, 7, 946-956.
4. Manusak, P., Development of Performance Function Approximation for Pull Control
System Using Scalable Artificial Neural Network System, Thesis, Kasetsart University,
2007.
5. Moden, Y., Toyota Production System: An Integrated Approach to Just-In-Time, 3
rd
ed.,
Tokyo, 1998.
6. Piromya, P., Development of Artificial Neural Network for Manufacturing System
Performance Prediction: Case Study of Pull Control System, Thesis, Kasetsart University,
2006.

G00081
March 23-26, 2010
603
Artifitial Neural Network and Kriging Model
Approximatios for The Deterministic Output Response

J. Rungrattanaubol
1,C
, P. Nakjai
1
, and A. Na-udom
2

1
Department of Computer Science and Information Technology, Naresuan University, 99, Moo 9,
Tahpo, Muang, Phitsanulok, 65000, Thailand
2
Department of Mathematics, Faculty of Science, Naresuan University, Phitsanulok, 65000, Thailand
C
E-mail: jaratsrir@nu.ac.th; Fax: 055-963263; Tel. 084-0486173

ABSTRACT
Currently the computer simulated experiments (CSE) have been extensively used to
investigate the relationship between the input variables and an output response. The
natures of CSE are time-consuming and computationally expensive to run. Hence many
efforts have focused on developing inexpensive and reliable surrogate models to
replace the CSE. Kriging models along with OA-based Latin hypercube designs have
been widely used for developing an accurate surrogate model in the context of CSE.
The performance of Kriging model is relied on the estimation of the unknown
parameters. The most popular method to estimate these parameters is the maximum
likelihood estimate (MLE) method. The MLE method is normally time consuming and
fails to obtain the best set of parameters due to numerical instability and ill-
conditioning of the model structure. Due to the popularity of artificial neural network
(ANN) in modeling high and complex problem, this paper presents an application of
ANN as an alternative to Kriging model in the context of CSE. The results indicate that
ANN performs well in terms of prediction accuracy and can replace Kriging model in
some cases.

Keywords: Computer simulated experiments, Artificial neural network, Kriging model,
OA-Latin hypercube design.

1. INTRODUCTION
Nowadays many complex phenomena are investigated through complex computer
codes. Some examples of such models include, use of reservoir simulator to predict ultimate
recovery of oil, finite element codes to predict behaviour of metal structure under stress and
bio-mechanical models to predict protein in sheep wool. These codes comprise of a system of
complex differential equations, which, for given setting of input variable conditions ( X ), can
be solved numerically to obtain the value of output response ( Y ). For example, a reservoir
simulator, for given values of the field characteristic (like Gross rock volume, porosity, gas
cap etc.), can be run to quantify the ultimate recovery of oil (Y) from the field.
Running of computer codes at various settings of input variables to study output
response is referred to as computer simulated experiment (CSE). Setting of input variables at
which code is operated is referred to as a design ( X ) with each setting being referred to as a
run. CSE is deterministic in nature; hence identical settings of input variables always produce
an identical set of output response and typically the process of CSE is not known a priori.
Therefore, space filling designs that aim to spread design points over a region of interest are
very useful. Often, computer simulated experiments are computationally expensive to run and
may require a large number of input factors [9]. Bates et al. [2] stated that the combination of
system decomposition experimental design and developing of surrogate model need to be
performed in order to decrease the complexity of the problem, therefore the approximation
model could be effective. In view of the complexities of the system, it is often more desirable
G00081
March 23-26, 2010
604
to create cheaper surrogate models for the computer codes [7], capable of predicting output
with accuracy. Challenge to statisticians is to develop accurate surrogate models based on a
handful of runs (typically less than 30). Such methodology is well known for the physical
experiments [3] where emphasis is on controlling experimental errors. A distinguishing
feature of computer simulated experiments is that the response is observed without
experimental errors or deterministic. Hence all together, different approach is required for
modeling such experiments.
Many evidences in the literature show that the statistical approximation approach in
the context of CSE can be divided into two main groups: the approach of developing
surrogate model and selecting the choices of experimental designs, respectively. In this study
we concentrate on the former approach, particularly the modeling methods that are not relied
on the assumptions of the statistical approximation models.

2.1 KRIGING MODEL
The first approach of developing a surrogate model for computer simulated
experiments called Kriging model, was proposed by Sacks et al. [7]. This method is based on
the idea that the response y can be modeled as a polynomial function of input variables and
whatever is left can be regarded as a realization of stochastic process, Z(x), with mean zero
and some form of correlation function. Typically y is written as,
( ) ( )
k
y f x Z x
j j
j 1
| = +
=
(1)
To make the model simpler, in most of the practical problems, the polynomial
function part in (1) is taken as a constant (Welch et al. [9], Sacks et al. [7]),
( ) y Z x | = + (2)
Moreover the researchers believe that there is no effect in terms of prediction
capability. The second part on the right of equation (1), Z(x) is considered as a Gaussian
correlation function (Morris and Mitchell [5], Welch et al. [9], Sacks et al. [7]), the most
frequently used form can be written as,
( , ) exp( )
. . . .
j
d
p
R X X X X
i j j i j
j 1
u = I
=
(3)
where 0 p 2
j
s s and 0
j
u > .
Normally the Kriging model is fitted using the idea of generalized least squares
method, and the problem of estimating all unknown parameters reduces to that of estimating
the parameters of the correlation function which can be done by the method of maximum
likelihood estimation (MLE)(Welch et al. [9], Sacks et al. [7]). The maximum likelihood
estimators can be obtained by maximizing the log likelihood function,
( , , , ) [ ln ln | | ( ) ( ) / ]
1
2 2 T 1 2
l p n R y 1 R y 1
2
| o u o | | o
= + + (4)
Given the correlation parameters u and p in (3), the generalized least estimate of |
is

( )
T 1 1 T 1
1 R 1 1 R y |

= , and the MLE of
2
o is

/ ( ) ( )
2 T 1
1 n y 1 R y 1 o | |
= (5)
Substituting
| and
2
o into the likelihood function in equation (4), the problem is to
numerically maximize
( ln ln| |)
1
2
n R
2
o + (6)
G00081
March 23-26, 2010
605
, which is a function of only the correlation parameters and the data from the design used in
data collection step.
After all unknown parameters are obtained, the next step is to build a predictor, ( ) y x ,
of ( ) y x to act as a cheap surrogate model for the complex computer simulation code. The
best linear unbiased predictor (BLUP) at an untried input x is

( ) ( ) ( )
T 1
y x r x R y 1 | |
= + (7)
, where r(x) is the vector of correlation function between error ( ( ) Z x ) at n design runs and
untried input variables ( x ).
Kriging model has received wide attention in many applications of computer
simulated experiments due to its interpolation property [5, 7, 8, 9, 10]. Simpson et al. [8]
reported that Kriging is very flexible because of the wide range of choices of the correlation
functions to be chosen. As we already stated above that finding the best set of correlation
parameter in Kriging is related to the optimization of equation (6) for a given range of u
value. Therefore, the emphasis of this study is on the optimization algorithm to estimate the
correlation parameter in Kriging model.
Welch et al. [9] proposed a stepwise algorithm to estimate the correlation parameters
and to screen the important factors, especially when the number of input is large. The idea of
their algorithms is that the j-th factor is important and influences the response pattern if the
corresponding
j
u obtains a value of their own while the other factors need to share the same
value of the correlation parameters. Whereas the j-th factor is inactive if
j
u becomes 0.

2.2 ARTIFICIAL NEURAL NETWORK
Artificial neural network (ANN) is commonly used in the sophisticated and complex
problem [4]. Unlike the usual statistical approximation model, ANN does not require any
assumptions of the model. This benefits ANN to be simple and easy to use in many
applications such as science, engineering and health science. The inspiration for neural
networks was the recognition that complex learning systems in animal brains consisted of
closely interconnected sets of neurons. A particular neural may be relatively simple in
structure but dense networks of interconnected neurons could perform complex learning tasks
such as pattern recognitions and approximation models. ANN consists of input (p), a data set,
which is combined through a combination function such as summation ( ) then is input into
an activation function (f) to produce an output response (y) and b is a bias as shown in Figure
1.
f
w
b
n
y
output
p
Input
1 bias

Figure 1. A simple layout of ANN

The summary of ANN process can be rewrite as
( ) y f wp b = + (8)
Typically ANN is formed by multiple nodes (in this case symbolized by ) as shown
in Figure 2. An activation function can be the same or different functions, such as linear,
sigmoid and Symmetrical hard limit.
G00081
March 23-26, 2010
606
f
b2
n2
y2
1
p1
p2
p3
b1
1
bSR
1
f
pR
w11
wSR
f
n1
nS
y1
yS

Figure 2. ANN with multiple nodes

The entire process can be rewritten as,

1 1 1 1
2 2 2 2
( )
( )
( )
N N N N
y f w p b
y f w p b
y f w p b
= +
= +
= +
(9)
, where N is the number of nodes.

3. RESEARCH METHODS
In this study we have compared the prediction accuracy of Kriging model ANN in
terms of prediction accuracy for any untried input variables. The prediction accuracy is
implemented by using various test problems selected from the literature [1]. In this section
we first introduce the test problems used in the study and then present the criterion that is
used to measure the prediction accuracy.

Table 1. Problems with main and interaction effect

Problem Function
RM2
( , ) . ( )
1 1
1 2 1 2 1 2
f x x 0 5 x x x 5x

=
,
1 2
1 x x 100 s s
Branin
function
( , ) . cos( )
2
2
1
1 2 2 1 1
x 5 1
f x x x 5 1 x 6 10 1 x 10
2 8 t t t
| |
| | | | | |
|
= + + +
| | |
|
\ . \ . \ .
\ .

,
1 2
5 x 10 0 x 15 s s s s

3.1 TEST PROBLEMS
The following test problems will be used to conduct a comparative study. The first group of
test problem is the two dimensional problem which is nonlinear function. The second group
is a seven dimensional problem which is complex function and has been widely used in the
context of CSE.

G00081
March 23-26, 2010
607
3.1.1 TWO VARIABLE PROBLEMS
The test problems involved in this group consist of main effect and interaction effects.
These problems are called RM2 and Branin function. The ranges of these problems are given
in the Table 1.
For these two variable problems, the design with 9 runs is used to fit the Kriging and
ANN models. The optimal Latin hypercube design (OLHD) generated from Simulated
annealing Algorithm (SA), using
p
o optimality criteria [5]. The performance of the surrogate
model developed for each of the above test problems is evaluated by calculating RMSE. A
set of 11 11 grid is used as the addition test points.

3.1.2 SEVEN VARIABLE PROBLEM
In this study we consider the problem called Cyclone model [10]. This model
comprises of 7 input variables and the mathematical form is

. / .
.
. { . ( / ) } ( / )
. ( )( )
0 56 3 2 1 16
0 85 3 4 2 4 2 1
5 2 1 6 7
x 1 2 62 1 0 36 x x x x x
y 174 42
x x x x x

(10)
The range of all input variables are given in Table [2]. This model measures the
relation between the diameter of a cyclone ( ) y and seven input variables in chemical
engineering and has been used as test problem [10].

Table 2. Range of input variables for Cyclone model

Input
variable
Lower limit Upper
limit
1
x 0.09 0.11
2
x 0.27 0.33
3
x 0.09 0.11
4
x 0.09 0.11
5
x 1.35 1.65
6
x 14.4 17.6
7
x 0.675 0.825
From all test problems stated above, we fit Kriging and ANN models by using the
DACE and ANN toolbox in MATLAB. After Kriging and ANN models for all cases is
fitted, the prediction accuracy is implemented by using the Root mean squared error (RMSE)
[7, 9], computed as

( )
k
2
i i
i 1
y y
RMSE
k
=

=
(11)
, where k is the number of random test points, y
i
is the actual response of the i
th
test point
and
i
y is the predicted response from Kriging and ANN models for the i
th
test point.

After all of the test problems are implemented using the optimal orthogonal Latin
hypercube design (OLHD)[4]. In this empirical study, 9 runs of an OLHD for 2 variables
problem and 30 runs for 7 variables problem are used. The OLHD is generated from
Simulated annealing algorithm (SA) [5]. After all cases are implemented, the RMSE values
are recorded and presented in Table 3.
G00081
March 23-26, 2010
608
As can be seen from Table 3, RMSE values generated from Kringing are considerably
larger than that of ANN, especially for Branin function test problem. This indicates that using
ANN as the approximation model leads to higher prediction accuracy. Table 3 also provides
the scaled measurement of error, called percentage improvement over ANN (PI). This scaled
measurement benefits in the ignorance of the differences in error magnitude of different test
problems. The PI values also confirm that ANN is superior to Kringing as PI is
approximately 52.0% improvement for RM2 test problem, 16.25% for Cyclone model. In
contrast, Kriging performs much better than ANN for Branin function (PI = 66.0%). This
indicates that ANN fails to capture the features of the sharp change in some area of the
problem. Therefore the more complex structure of ANN like multi layer perceptron could be
performed to conduct a surrogate model.

Table 3. Comparison of RMSE from all test problems

Test Problem
RMSE
KRIGING ANN
RM2
PI
7.812
-0.520
5.138
Branin function
PI
29.390
0.660
86.412
Cyclone
PI
0.042
-1.625
0.016
Note- Percentage improvement over ANN (PI) =
( ) ( )
( )
RMSE ANN RMSE KRIGING
RMSE ANN

5. CONCLUSION
As presented in the results section, ANN performs well in terms of prediction
accuracy and can replace Kriging in some cases. The advantage of ANN is free of
assumptions and hence model adequacy checking is not required. In the case that ANN could
not perform well, there is need of more complex architecture of ANN to conduct the
approximation model. Our empirical studies are limited to specific test problems. Hence,
larger dimensional problems could be future investigated and studied to observe the additional
conclusions. Furthermore, it should be noted that a success of Kriging and ANN modeling
methods normally depend on an underlying design that is used to develop a surrogate model.
From this empirical study, it indicates that an optimal orthogonal LHD is an appropriate
design choice for both modeling methods. Hence, selecting an optimal LHD critically affects
the implementation of the model.

REFERENCES
1. Allen, T.T., Bemshteyn, M.A., Kabiri-Bamoradian, K., Journal of Quality Technology,
2003, 35, 264-274.
2. Bates, R.A., Buck, R.J., Riccomagno, E., Wynn, H. P., Journal of the royal statistical
society, Series B, 1996, 58, 77-94.
3. Box, G.E.P., Hunter, W.G., Hunter, J.S., Statistics for Experimenters, 2005, John Wiley
& Sons, New York.
4. Bozdogan, H, Statistical Data Mining and Knowledge discovery, 2003, Chapman &
Hall/CRC, New York.
5. Morris M. D. and Mitchell T. J., Journal of Statistical planning and inference, 1995, 43,
381-402.
G00081
March 23-26, 2010
609
6. Ripley, B.D., Statistical aspects of neural networks. In: Barndoff-Nielsen, O.E., Jensen,
J.L., Kendall, W.S., editors. Networks and chaos-statistical and probabilistic aspects,
Chapman & Hall, New York, 1993, 40-123.
7. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P., Statistical Science, 1989, 4(4), 409-
435.
8. Simpson, T.W., Peplinski, J.D., Koch, P.N., Allen, J.K., Engineering with Computers,
2001, 17(2), 129-150.
9. Welch, W.J., Buck, R. J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D., 1992, 34,
15-25.
10. Ye K.Q., Li, W., Sudjianto, A., Journal of Statistical planning and Inference, 2000, 90,
145-159.

G00082
March 23-26, 2010
610
Analysis of Centers Initialization on K-means Performance

R. Sukhasem
1,C
and P. Anussornnitisarn
2

1,2
Department of Industrial Engineering, Faculty of Engineering, Kasetsart University,
50, Phaholyothin Rd., Jatujak, Bangkok, 10900, Thailand

C
E-mail: g4985025@ku.ac.th; Fax: 02-5793971; Tel. 086-6447089

ABSTRACT
Clustering is one of the critical problems in an analysis based organization
which often has complicated information systems. These organizations are under
pressure to detect or identify key factors from the large pile of data in its
information systems. By clustering the data, influenced relationships are often
identified and be exploited in order to gain benefits from a particular
relationship. K-means is the famous data mining technique for clustering. Due to
its simple logic and efficient search structure. K-means is often used as based
model for more complex modification or integrate with other techniques such as
genetic algorithm. This study aims to test the effi ciency of k-means for
searching cluster centers among various data. We constructed the experiments
by generated many groups of data from specific values (generated centers) with
difference standard deviations. Then we applied k-means algorithm for
searching cluster centers and grouping data. We can see how good of k-means
for clustering when data has more variation. Different initial cluster centers and
number of clusters were tested and compared. The results show that k-meanss
weakness is the sensitivity to the selection of the initial cluster centers. The
unsuitable initial cluster centers will not give global optimization. The result of
this study can be extended to other techniques which has been developed
based on k-means search structure.

Keywords: Data mining, Clustering technique, K-means algorithm, Weakness.

1. INTRODUCTION
An information and communication technology is a term that has evolved over the years.
With todays fast changing technology, electronic document is an alternative way for data
collecting. We can keep a huge of data as much as we need and its very easy to find out the
data to be analyzed. With any new information technologies, company has spent much money
to gets numerous data in data base system. Much data were collected and stored for long term
as data warehousing but these data are not taking advantage of the valuable and actionable
information hidden deep within their data repositories. By analyzing these data, we can see
some interesting relation among them. For example, sales became higher or lower vary with
time, we may see some products were seasonal products or some products were highly used
with some groups of customer. If manufacturers require selling more or introducing new
product, they would know the characteristic or attribute of target customers then make
strategy. That mean company need to find some relationship among their data. Hidden
information or meaningful trends are relationships among many data that can give benefits to
company such as help manager with decision making, for example, for promoting product to
the right customers etc.
G00082
March 23-26, 2010
611
Clustering is the unsupervised classification of patterns (observations, data items, or
feature vectors) into groups or clusters [1]. In general, clustering is one technique of Data
Mining (DM) process to find useful information or patterns from large databases. K-means
algorithm is a partition, non-hierarchical data clustering method suitable for classifying large
amounts of data into corresponding patterns. It is a simple yet effective statistical clustering
technique [7]. R.J. Kuo etc. [8] mentioned that k-means is most commonly applied in a large
variety of applications, for example, image segmentation, object and character recognition,
document retrieval, etc. The k-means method is still very popular and it has been applied in a
wide variety of areas ranging from computation biology to computer graphics [6]. G.P.
Papamichail and D.P. Papamichail [4] employed k-means clustering algorithm to classify
subset of the products that meet consumers need into disjoint clusters to help the shoppers
desire to define their preferences and customize the purchase information within the
electronic shopping environment according to their individual needs). Others inplementations
of k-means had presented by T. Saegusa and T. Maruyama [9], J. S. Ahn and S. Y. Sohn [6]
and D. Krfan et.al. [3]. The researchers interested in applying k-means algorithm to
manufacturing area. For example, clustering products or components into groups of Make-to-
stock (MTS), Assembly-to-order (ATO), Make-to-order (MTO) or Engineering-to-order
(ETO). These manufacturing strategy may give different benefits to company, thus the
exper i ment was conduct ed t o t es t t he ef f i ci ency of k - means al gor i t hm.
In this paper, the theoretical background of the studied problem will be introduced. Then
the design of experiment and results are presented. Finally, the last section is our conclusions
and discussion the papers contribution and the future research opportunities of applying k-
means algorithm in manufacturing areas.

2. K-MEANS ALGORITHM
By the term clustering we mean the unsupervised process through which a large number
of data items are classified into disjoint and homogenous groups (clusters) based on
similarity. Although promising in many application areas such as pattern classification, data
mining or decision making, it poses several restrictions to decision maker when little
information is known a priori about the nature of data. Therefore the choice of an appropriate
method, taking into account these restrictions, is crucial to the effective exploration of
interrelationships among the data items, in order to make a meaningful assessment. A simple
and commonly used algorithm for producing clusters by optimizing a criterion function,
defined either globally (over all patterns) or locally (on a subset of the patterns), is the k-
means algorithm [5]. It starts with random initial partition and keeps reassigning the patterns
to clusters based on the similarity between the pattern and the cluster centers until a
convergence criterion is met. For example, there is no reassignment of any pattern from one
cluster to another or the sum squared error (SSE) ceases to decrease significantly after some
number of iterations.

To partition a set of data into disjoint clusters by k-means algorithm follow these steps.
1. Choose a value for K, total number of clusters to be determined.
2. Choose K instances (data points) within the dataset at random. These are the initial
cluster center
3. Use simple Euclidean distance to assign the remaining instances to their closest
cluster center.
4. Use the instances in each cluster center to calculate a new mean for each cluster.
5. If the new mean value are identical to the mean values of the previous iteration the
process terminates. Otherwise, use the new means as cluster centers and repeat steps
3-5.

The first step of the algorithm requires an initial decision about how many clusters we
believe to be present in the data. Next, the algorithm randomly selects K data points as Initial
G00082
March 23-26, 2010
612
cluster centers. Each instance is then placed in the cluster to which it is most similar.
Similarity can be defined in many ways; however, the similarity measure most often used is
simple Euclidean distance
Once all instances have been placed in their appropriate cluster, the cluster centers are
updated by computing the mean of each new cluster. The process of instance classification
and cluster center computation continues until an iteration of the algorithm shows no change
in the cluster centers. That is, the algorithm terminates after j iterations if for each cluster C
i

all instances found in C
i
after iteration j-1 remain in cluster C
i
upon the completion of
iteration j.

The Euclidean distance: The formula for computing the Euclidean distance between point
A with coordinates (x
1
,y
1
) and point B with coordinates (x
2
,y
2
) as it is described in function
(1):

Distance (A-B) =
2
2 1
2
2 1
) ( ) ( y y x x (1)

More formally, the k-means algorithm tries to minimize the squared error, as in function
(2):

2
1 1
) (

k
j
n
i
j
j
i
j
y x SSE
, (2)

Where x
i
(j)
is the ith pattern belonging to the jth cluster and y
j
is the center of the jth
cluster.

3. DESIGN OF EXPERIMENT
For this study, we developed the program by VBA for running k-means algorithm. The
results were compared with SPSS program. We generated sets of data using normal
distribution, varied means and standard deviations. These means became the Generated
cluster centers. From one generated center, we assigned many value of standard deviations,
thus we had many groups of data with different variations. Then we applied k-means
algorithm to cluster these mixed data, we try many initial cluster centers then we compare the
results from different initial cluster centers.
The parameters in the experiments are
X : Dimension of data; [1,10]
K : Number of clusters/groups we want; [1,10]
N : Number of data in clusters/groups; [100,300]
S : Standard deviation of each data set; [1,7]

The parameter value was random. The example of data range with different standard
deviations shows in Figure 1.
G00082
March 23-26, 2010
613

E1 stdev=1 E2 stdev=2 E3 stdev=3 E4 stdev=4 E5 stdev=5 E6 stdev=7
Norm u=10 Norm u=10 Norm u=10 Norm u=10 Norm u=10 Norm u=10
Group1 stdev=1 stdev=2 stdev=3 stdev=4 stdev=5 stdev=7
u=10 Max= 13.43 Max= 16.85 Max= 20.28 Max= 23.71 Max= 27.14 Max= 33.99
Min= 6.98 Min= 3.95 Min= 0.93 Min= -2.09 Min= -5.12 Min= -11.16
Range= 6.45 Range= 12.90 Range= 19.35 Range= 25.80 Range= 32.25 Range= 45.15
Min= 16.98 Min= 13.95 Min= 10.93 Min= 7.91 Min= 4.88 Min= -1.16
Min= 26.98 Min= 23.95 Min= 20.93 Min= 17.91 Min= 14.88 Min= 8.84

Figure 1. The example of data range for the experiments

The assumptions of the experiment are 1) Initial cluster centers effect to the results. 2)
Number of clusters we want to group data effect to the results. 3) Good initial cluster center
and number of cluster will give small SSE. From Figure1, we can see when data has more
variation, the results of clustering might have more error and might take more iterations until
the algorithm show no change in final cluster center or SSE.

4. RESULTS
In order to evaluate the performance of k-means, some artificial data will be generated
and clustered by trial different value of initial cluster centers and number of clusters we want
to group the dataset. Figure 2 presents an example of running k-means on a 2-dimensional
dataset (x = 2) with difference initial centers and 300 points (n = 300). Figure 2a shows the
original data set with small variation (s = 1), 5 dataset (k = 5) are clearly separated.

-10 0 10 20 30 40 50 60 70
data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
G
r
o
u
p
Data Range
S=1
S=2
S=3
S=4
S=5
S=7
G00082
March 23-26, 2010
614
Data: K5 X2 S1
0
20
40
60
-10 10 30 50
X1
X
2

(a)

Data: K5 X2 S1 Cspss
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 615.595)

Data: K5 X2 S1 C4
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 30,615.595)

(b) (c)

Data: K5 X2 S1 C5
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 615.595)

Data: K5 X2 S1 C5_1
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE =6,615.595 )

(d) (e)

Figure 2. An example of running k-means on a 2-dimensional dataset with difference initial
centers and 300 points (s = 1, k = 5, n = 300, x = 2). The position of the initial cluster centers
are marked by and the final cluster centers are marked by (a) original data set
with small variation (s = 1). (b) the result of good initial cluster centers (c) the result of bad
initial cluster centers (d) the result of other good initial cluster centers (e) the result of not
good initial cluster centers

The simple case, Figure 2b, we apply good initial cluster center with k = 5 groups. K-
means clustering divide data into 5 groups and final cluster centers lay between each group
with SSE = 615.595, this is a correct result. In unlucky case, if we input bad initial cluster
centers value as shows in Figure 2c. Good number of cluster (k = 5) but bad initial center, k-
means divide data into 2 groups with SSE = 30,615.595, the result is extremely bad. This may
be caused by k-means clusters weakness, which is sensitive to outliers. Figure 2d and 3a, we
used other good initial cluster center, and then we got the same results and SSE = 615.595.
G00082
March 23-26, 2010
615
Initial cluster center in Figure 2e quite similar with Figure 2d, only one of initial centers
far away from first data set. K-means divide data into 4 groups even if we input 5 clusters we
prefer to get. SSE becomes higher to 6,615.595. Also in this case, Figure 3a and 3b, a few
changes in initial cluster centers can bring different result.

Data: K5 X2 S1 C9_1
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 615.595)

Data: K5 X2 S1 C10
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 6,568.468)

(a) (b)

Figure 3. An example of running k-means on a same dataset with Figure 2.
(a) good initial cluster centers (b) not good initial cluster centers

From this data set (Figure 2 and 3), the experiment shows that the result from k-means
sensitive to initial center we had input to the program. The errors can occur even if data set
have not much variation.

Data: K5 X2 S3 Cspss
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 5,540.353)

Data: K5 X2 S3 C4
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 10,653.486)

(a) (b)

Figure 4. The experiments results when the data sets have variations (s = 3).
(a) good initial cluster centers (b) bad initial cluster centers

Data: K5 X2 S7 C1
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 20,146.985)

Data: K5 X2 S7 C4
-20
0
20
40
60
80
100
-40 -20 0 20 40 60 80
X1
X
2
(SSE = 20,143.562)

(a) (b)

Figure 5. The experiments results when the data sets have variations (s = 7).
(a) good initial cluster centers (b) look-bad initial cluster centers
G00082
March 23-26, 2010
616
Q Figure 4 & 5 represent k-means results when we apply more variations to the data
set. For data set (s = 3) as shown in Fig. 4a and 4b, bad initial cluster centers give more SSE.
In the opposite result, Figure 5a and 5b show that when data set has much variation (s = 7),
its difficult to find out good initial cluster center. Look-bad initial cluster center in Figure 5b
(SSE = 20,143.562) less SSE than Figure 5a (SSE = 20,146.985).

Table 1. Other results of the experiment

K S Initial cluster center SSE Iterations
5 1 C1 615.6 2
5 1 C4 30,615.6 3
5 1 spss 615.6 2
5 3 C1 5,540.4 2
5 3 C4 10,653.5 16
5 3 spss 5,540.4 6
5 5 C1 12,691.2 5
5 5 C4 14,782.3 23
5 5 spss 12,979.5 21
5 7 C1 20,147.0 4
5 7 C4 20,143.6 30
5 7 spss 20,143.6 7

K S Initial cluster center SSE Iterations
3 1 C1 12,615.6 4
3 1 C4 30,615.6 4
3 1 spss 12,615.6 2
3 3 C1 16,646.3 6
3 3 C4 16,646.3 10
3 3 spss 16,646.3 5
3 5 C1 22,597.3 12
3 5 C4 22,597.3 15
3 5 spss 22,597.3 11
3 7 C1 31,484.3 26
3 7 C4 31,572.8 10
3 7 spss 31,484.3 7

According to Table 1, other experimental results are shown. C1 is a set of good initial
cluster centers and C4 is set of bad initial cluster centers. SPSS is a set of initial cluster center
that SPSS program used to run. Much variations data set (s = 7), SSE from bad initial cluster
center quite similar as SSE from using good and SPSS initial cluster centers.

When number of clusters had change, (number of k), the results had change and give
different SSE. Unsuitable k will give bad results and more SSE. From Table 1, when k = 3 are
used; more SSE we get. The result shows that three cluster centers are sufficient for
representing the data set.

The experiments shown that Initial cluster centers effect to the results. Bad initial centers
can take the calculations to the wrong way and can not give global optimization results. The
characteristic of Good initial center is the centers points that lay between raw data. Each
group of initial cluster centers might not close together. We can say bad initial cluster center
is the centers that have one or more points far away from raw data. Final cluster centers will
close to Generated cluster centers when the experiments have Good initial centers. Its
depending on How Good initial centers for raw data. Even if raw data has more standard
deviations but the initial cluster centers are good, the final cluster center will close to
generated cluster center and running time trend to be less. The study found that k-means can
cluster the better groups from good initial cluster centers.

G00082
March 23-26, 2010
617

Figure 6. Idea of applying k-means algorithm to Customization

5. CONCLUSION AND DISCUSSION
The results shown that in the case of data has variations, to get the better results we need
to input good initial center and numbers of cluster or group we want to group. This
encourages that Initial cluster centers and numbers of clusters effect to k-meanss results.
Sensitivity of input initial centers and numbers of clusters are the weakness of k-means
algorithm. We rarely know those suitable numbers from a huge of data. Thus, using k-means,
user need to do more trials and compare with many results to get the better one. The question
is how to make sure we get the best results. K-means may need to combine with other
algorithms and improve this weakness. Heuristics procedure seems to be interesting in this
area.

K-means algorithm might be an excellent tool for manufacturing area. Customers need
always need good quality, fast delivery and low price. Manufacturers have to control for
optimization over all the process. Manufacturing strategy had change, from mass production
to mass customization. Doing mass production, company produced products by forecasting,
stocked finished good and waited for the orders. Company encountered with many problems,
i.e. over stock, expired products, increasing of inventory cost etc. Mass customization is a
manufacturing strategy that aim at best satisfying individual customer need with near mass
production efficiency and accommodates differentiated product attributes at low product cost
and concerns with quick responsiveness. Products or component will be classified into group.
Which will be Make-to-stock (MTS), Assembly-to-order (ATO), Make-to-order (MTO) or
Engineering-to-order (ETO). Manufacturers need to decide which manufacturing strategy
suitable for which products. The researchers interested in applying K-means algorithm to
these research areas. The idea of applying k-mean to Customization as shows in Figure 6.
Some interested data to be analyzed might be production lead time, demand lead time,
capacity, cycle time, product sale, part type, product characteristics, sell period, customer
data, requirement etc.

In future study, scaling of variables (data) is an important consideration. If the variables
are measured on different scales (for example, one variable is expressed in dollars and another
variable is expressed in years), the results may be misleading. In such cases, we should
consider standardizing the variables before perform the k-means cluster analysis. The
researchers aim to present this idea in further article and hope that the manufacturer can get
benefits from the study for example; quick decision making, decreasing in inventory holding
cost, quick responsiveness etc.

MTS
MTO
ATO
ETO
Products / Components Clustering
Customization
strategy
MTS
MTO
ATO
ETO
MTS MTS
MTO MTO
ATO ATO
ETO ETO
Products / Components Clustering
Customization
strategy
G00082
March 23-26, 2010
618

REFERENCES
1. A.K. Jain, M.N. Murty and P.J. Flynn, ACM Computing Surveys (Data clustering: A
Review), 1999, 31(3), 264-323.
2. D. Arthur and S. Vassilvitskii, SCG06 (How slow is the k-means method?), 2006, 144-
153.
3. D. Irfan, X. Xiaofei, D. Shengchun and I.A. Khan, IEEE (Clustering framework for
supply chain management system., 2007, 422-426.
4. G.P. Papamichail and D.P. Papamichail, European journal of operational research (The
k-means range algorithm for personalized data clustering in e-commerce), 2007, 177,
1400-1408.
5. J. McQueen, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability (Some methods for classification and analysis of multivariate
observations), 1967, 281-297.
6. Jin Sook Ahn and So Yung Sohn, Expert system with application (Customer pattern
search for after-sales service in manufacturing), 2008, 36(3), 5371-5375.
7. R. J. Roiger and M. W. Geatz, Data mining: A tutorial-based primer, 1
st
ed., Addision
Wesley. Boston, 2003, 33-84.
8. R.J. Kuo, H.S. Wang, Tung-Lai Hu and S.H. Chou, An international journal of
computers& mathematics with applications (Application of Ant K-Means on clustering
analysis), 2005, 50, 1709-1724.
9. T. Saegusa and T. Maruyama, Field Programmable Logic and Applications, International
Conference on 28-30 Aug. (An FPGA implementation of k-means clustering for color
images based on KD-tree), 2006, 16.

G00089
March 23-26, 2010
619
Structural Model of Blood Vessels in Heart Using
Lindenmayer Systems

S. Ritraksa
1,C
, S. Chuai-Aree
1
,R. Saelim
1
1
C
E-mail: cram_pha@hotmail.com; Tel. 081-4786391

ABSTRACT
Nowadays, coronary artery disease is increasing cause of mortality in Thailand. There
are many methods to treat the disease. Coronary artery bypass grafting is a method to
treat this problem. Doctors have to know structure of blood vessels in hearts patients
before grafting. This paper proposes a method to develop a computer program for
reconstructing and interpreting the structure of blood vessels in heart. Doctors can
prepare before coronary artery bypass grafting more efficiency. The software can help
novice doctors and medical students to learn about the structure of blood vessels in
heart. It is implemented using Delphi programming language and OpenGL (Open
Graphic Library) based on the Lindenmayer Systems (L-systems) and Image
Processing. The result from the software can be used to explain the structure of blood
vessels in heart. The output images can be shown in two and three dimensional space.
The L-systems code from algorithm shows properties detail of size, thickness and angle
of blood vessels. Our method can be applied to describe for other tree-like structures
such as structure of tube in lung, structure of blood vessels in other organs[5], brains,
eyes [2] including application in agriculture and biological branching structures.[3]
Keywords: Blood vessels, L-systems, Image Processing, Tree-like structures.

1. INTRODUCTION

Nowadays, coronary artery disease is increasing cause of mortality in Thailand. The
causes of this disease are smoking, diabetes, hypertension, heredity or fatness. The treatment
to this problem have three methods, coronary artery bypass grafting, percutaneous coronary
intervention and percutaneous tranluminal coronary angioplasty. Coronary artery bypass
grafting is a method to treat this problem. Doctors have to know structure of blood vessels in
hearts patients before grafting for the accuracy.
Lindenmayer systems or L-systems for short were conceived as a mathematical
theory of plant development [4]. It can explain the tree-like structures such as structural of
tree, structure of tube in lung, structure of blood vessels in other organs, brains and eyes
including blood vessels in heart. Therefore L-systems can be used to model the structure of
blood vessels in heart.
The objective of this paper is to develop a computer software to explain the structure of
blood vessels in heart and show the picture in two and three dimensional. It provides two
processes in the software. First, it can generate a 2D or 3D object from L-systems code.
Second, input image contained blood vessels can be reconstructed as blood vessels in L-
systems code.
The paper is organized as follows: Theory, experimental and computational details,
results and discussion, finally, the conclusion.

G00089
March 23-26, 2010
620
L-systems were introduced by the biologist namely Aristid Lindenmayer [4] as a
mathematical theory of plant development. Symbols of L-systems used in this paper are
defined as follows:

Symbols Meaning
F(d,r
1
,r
2
) Move forward a step of length d and construct line width or cylinder
with beginning radius r
1
and ending radius r
2

[ Push the current state of the turtle onto a pushdown stack.
] Pop a state from the stack and make it the current state of the turtle.
+() Turn left by angle
() Turn right by angle
&() Pitch down by angle
^() Pitch up by angle
\() Roll left by angle
/() Roll right by angle
| Turn around by angle 180

This section describes algorithm and procedure of computing process. First process is to
interpret L-systems code to 2D/3D object. Second phase proposes the inverse problem of the
first process. It gets input image which is converted to L-systems code.

3.1. Input as L-systems code
L-systems code from user is transformed to 2D or 3D object using rotation matrix
which is commanded by OpenGL (Open Graphic Library) script for each character in
L-systems string. Figure 1(a) shows a procedure to interpret L-systems code to 2D or
3D object using the algorithm in Figure 1(b).

(a) (b)
Figure 1. (a) procedure for transforming the L-systems code to 2D/3D Object. (b)
algorithm for interpreting L-systems code to graphical object.
Input L-systems Code
Compile L-systems Code
Symbol Interpretation
Object Representation
2D/3D Output Object
1. Read L-systems code
2. interpret each character of L-systems code by the following meaning
F read string since ( until )
First number before , is length of blood vessel.
Second number before next , is radius of beginning point.
Last number is radius of end point.
( read string since ( until ) , it is angle.
[ Push the current state of the turtle onto
a pushdown stack.
] Pop a state from the stack and make the
current state of the turtle.
+ Turn left by angle
Turn right by angle
& Pitch down by angle
^ Pitch up by angle
\ Roll left by angle
/ Roll right by angle
| Turn around
3. Draw picture
G00089
March 23-26, 2010
621
3.2. Input as image
The second process is to get information of branching structure from input image and
reconstruct the branch structure in L-systems code format. Figure 2 illustrates flow diagram of
the reconstruction of 2D structure from input image converting to 2D/3D object and L-
systems code. Each process is described step by step in the following details.

Figure 2. Flow diagram for input image to results: L-systems code and 2D/3D output object.

Convert color image to gray scale image

The pixel p
i,,j
consists of four elements namely red (R
i,,j
), green (G
i,,j
), blue (B
i,,j
) for color
image and gray (Y
i,,j
). The color input image is converted to gray-scale image by using the
following equation.
) * 114 . 0 * 578 . 0 * 299 . 0 (
, , , , j i j i j i j i
B G R Round Y
} 255 ...., 2 . 1 . 0 { , , , ,
, ; , ,

j i h i j i j i
B G R Y

Where Y
i,,j
, R
i,j
, G
i,j
and B
i,j
are gray intensity of gray image, red, green, and blue channel
of color image at position (i,j), respectively. Y
i,j
is the integer value from 0 (black) to 255
(white) value. The Round function returns the integer value for Y
i,j
intensity. The color
intensities R
i,j
, G
i,j
, and B
i,j
are also integer value from color image[1].

Vessel Extraction Using Region Growing Method

Region growing process of each pixel p
i,j
in image P is proceeded by considering the 8-
neighboring pixels. We use the following equation to proceed the growing method [1].

Skeletonization Algorithm
Skeletonization method is applied by using Hilditchs algorithm in [1]. There are two
definitions in Hilditchs algorithm to calculate the center line of branching structure after
running region growing method.
Input Image
Convert to gray scale image
Vessel Extraction (Region Growing)
Skeletonization (center line)
Network Reconstruction
Resolution Reduction
2D/3D Output Object L-systems String code
)) M 8 M M M M
M M M (M Round(0.99 M M
old
j1 i1,
old
1 j 1, i
old
1 j i,
old
1 j 1, i
old
j 1, i
old
j 1, i
old
1 j 1, i
old
1 j i,
old
1 j 1, i
new
j i,
new
j i,

G00089
March 23-26, 2010
622
Definition 1
The Hilditchs algorithm consists of two functions; F
a
(p1) and F
b
(p1). Figure 3 explains
the function F
a
and F
b
in definition 1. They are defined as follows:
1. Function F
a
(p1) returns the number of (0,1) patterns in the sequence (p2,p3), (p3,p4),
(p4,p5), (p5,p6), (p6,p7), (p7,p8), (p8,p9),(p9,p2), and
2. Function F
b
(p1) gives a number of non-zero neighbors of pixel p1.

(a) (b) (c)
Figure 3. (a) cycle of pattern from p2 to p9, (b) F
a
(p1) = 1 (number of arrows represents (0,1)
pattern), F
b
(p1) = 2, (c) F
a
(p1) = 2 (two arrows for two (0,1) patterns), F
b
(p1) = 2.

Definition 2
The pixel p1 is removed from the image if it satisfies the following four conditions:
1. 2 F
b
(p1) 6,
2. F
a
(p1) = 1,
3. p2 * p4 * p8 = 0 or F
a
(p2) 1, and
4. p2 * p4 * p6 = 0 or F
a
(p4) 1, stop when nothing changes (no more pixels can be
removed).
After finishing the skeletonization process, the center line has to be reconstructed and
connect all pixels along the line. Each pixel along the same line has only one neighbor if it is
the starting pixel or ending pixel of the network. It has two neighbors if it is along the same
line. It has more than two neighbors if it is the junction or branching pixel. To reduce the
resolution of the network, it is needed to proceed the resolution reduction process.

Network Reconstruction
After skeletonizing method, the skeleton of branching structure is stored in array. Since
all the skeleton points are not connected to each other by the line connection.

Resolution Reduction
For every 3 nodes, and each node has only one child, A is the parent of B and B is the
parent of C, If node A, B, and C are on the same line, then calculate the angle between BA
and BC, If the angle between BA and BC (180 ), then remove node B from T, where is
the resolution angle for removal.

(a) (b) (c)
Figure 4. Resolution reduction process, (a) structure of considered point B for removing,
(b) network before reduction, and (c) network after reduction.

G00089
March 23-26, 2010
623
This section shows some results from our algorithms when input as L-systems
code and input image.

4.1. Input as L-systems code
In case of input the L-systems code, the algorithm from Figure 1 is applied. The result
image is shown in Figure 6 using the L-systems code in Figure 5.

Figure 5. Example input of L-systems code.

Figure 6. Result image from L-systems code (Figure 5)

4.2. Input as image
In case of input image, the algorithm in Figure 2 is called. The result of L-systems code is
shown as Figure 5 using input image in Figure 7.

Figure 7. Example of original input image and rough structure.
When user adds input image, the program converts an input image to gray scale image.
The result of gray scale image is given in Figure 8.

Figure 8. The result after converting color image to gray scale image.
Then user uses double click on gray scale image, the region growing method is applied on
input image. The result of region growing method is filled with red area given in Figure 9.

-(90)F(35,3,3)[-(90)F(20,3,3)[+(45)F(30,3,3)-(30)F(25,3,3)[-(80)F(20,3,3)+(40)F(100,3,3)
+(30)F(30,3,3)]F(80,3,3)[-(50)F(50,3,3)][+(45)F(15,3,3)[-(80)F(20,3,3)+(80)F(25,3,3)]
F(15,3,3)-(15)F(30,3,3)][+(10)F(25,3,3)-(50)F(35,3,3)+(90)F(20,3,3)-(90)F(25,3,3)
+(95)F(25,3,3)-(75)F(25,3,3)]][-(15)F(40,3,3)[-(60)F(30,3,3)]F(20,3,3)[-(60)F(20,3,3)
[-(35)F(15,3,3)]F(20,3,3)[-(15)F(20,3,3)][+(25)F(25,3,3)[-(50)F(20,3,3)][-(15)F(20,3,3)
[-(45)F(20,3,3)][+(45)F(20,3,3)]]]][-(20)F(30,3,3)-(45)F(25,3,3)+(50)F(20,3,3)-(15)F(70,3,3)]]
G00089
March 23-26, 2010
624

Figure 9. The result image after running region growing method.

After calling region growing algorithm, the skeletonization algorithm is executed. The
result of center line image shown in Figure 11. Figure 10 shows region growing image
together with its skeleton.

Figure 10. The result after region growing method and then skeletonization method.

Figure 11. The result from skeletonization algorithm.

5. CONCLUSION
This paper has proposed a method to develop a computer program for reconstructing and
interpreting the structure of blood vessels in heart. Doctors can prepare before coronary artery
bypass grafting more efficiency. The result from the software can be used to explain the
structure of blood vessels in heart. The output images can be shown in two and three
dimensional space, when input is L-systems code. The L-systems code from algorithm shows
properties of size, thickness and angle of blood vessels, when input is two dimensional image.
Our method can be applied to describe other tree-like structures such as structure of tube
in lung, structure of blood vessels in other organs such as brains, eyes including application in
agriculture and biological branching structures.

G00089
March 23-26, 2010
625
REFERENCES
[1] Chuai-Aree, S., Modeling, Simulation and Visualization of Plant Growth. PhD
Dissertation, University of Heidelberg, Germany, 2009.
[2] Kokai, G., Toth, Z. and Vanyi, R., Modelling Blood Vessels of the Eye with Parametric
L-Systems Using Evolutionary Algorithms. Springer-Verlag Berlin Heidelberg, 1999,
433-442.
[3] Leither, D. and Schnepf, A., Root Growth Simulation Using L-systems. Proceedings of
Algoritmy , 2009, 313320.
[4] Prusinkiewicz ,P. and Lindenmayer, A., The Algorithmic Beauty of Plants. Springer-
Verlag Berlin Heidelberg, 1990,1-35.
[5] Sadeghian, S., Sadeghian, S. and Molaei, R., Modeling and Reconstruction of Blood
Vessels Based on CT & MR Images. Proceedings of the World Congress on
Engineering VolII,2008

G00090
March 23-26, 2010
626
Multilayer Neural Networks for contacting load model of
Distributive Tactile Sensing
P. Nakjai
1
and J. Rungrattanaubol
2,c

Department of Computer Science and Information Technology, Faculty of Science,
Naresuan University,Phitsanulok 65000, Thailand
C
E-mail: jaratsrir@nu.ac.th; Fax: 055963263; Tel. 084-0486173

ABSTRACT
The distributive approach to tactile sensing is a novel approach. The method relies on the
distributed deformation of the surface in response to the applied load to few sensing
points within the surface area. The contacting load is typically formed into an
approximation model using neural networks or fuzzy rules. The performance of the
tactile sensor is sensitive to the location of the sensing points used. This paper
investigates the uses of multilayer neural networks to construct the model of contacting
load of a distributive tactile sensor based on two sensory data sets; with noise and
without noise. The multilayer neural networks presented in this paper are divided into
three aspects, multilayer neural networks without bias and multilayer with bias. The
results indicate that the multi-layer networks with bias provided a more accurate
prediction than the multilayer neural networks without bias for both data sets (with and
without noise), while the multilayer neural networks with a filter can be an alternative
approach to cope with data set with noise.

Keywords: Tactile Sensing, Multi-layer Neural Networks, Approximation model

1. INTRODUCTION
A distributive sensor consists of multiple sensing elements arranged in a linear or
two-dimensional array. The derivation of contract relies on the relative information received
at each sensory location, which create a pattern differing from one contact type to another.
In other words, a distributive sensor requires the coupling effect between sensing elements,
which this unique characteristic distinguishes it from most array type sensors that address the
isolation of single sensing site from the entire array in the construction. The resolution of a
distributive device depends on the interpretation algorithm rather than the number of sensing
elements. It should be noted that the regions between two sensing elements of an array sensor
have no sensitivity, or referred as dead area [1], while this is not the case in the distributive
tactile sensors. The dead areas of distributive sensors did not depend on the number of sensing
elements, but rather on the interpretation algorithm. Artificial Neural Network (ANN) is one
of the interpretation algorithms, which is extensively used in this area. This paper focuses on
the roles of ANN as the interpretation algorithms or to construct the model of contacting load
of a distributive tactile sensor. The paper discusses how ANN can apply effectively on two
sensory data sets; with noise and without noise, also investigates the essential parameters and
performance of multi-layer neural network. In the next section, a characteristic of a
distributive tactile sensor is presented together with the description of the beam theory used to
simulate the measurement of the sensors, varied by the location of the sensing points and load
positions, and the characteristic and structure of ANN is shown. Section 3 describes how to
set up the experiment and measurement, which leads to the discussion of the results and the
conclusion at the end.

G00090
March 23-26, 2010
627
2.1 A distributive tactile sensor
A simple experimental tactile sensor was constructed from a one-dimensional surface
arranged as a simple supported beam. The distributive surface of the experimental rig was a
mild steel beam sized 400x5x1.2 mm. The supporting structures and sensing elements were
mounted on a solid steel base that provided a rigid support. The schematic of the proximity
sensing unit are shown in Figure 1. The beam deflection under an applied load was detected at
2-8 positions from which parameters describing the contact load could be deduced. Figure 1
displays the distributive tactile sensors with 8 sensors.

Figure 1. Schematic diagram of the distributive tactile sensor with 8 sensors

2.2 Beam theory
The bending theory is reported in most structural mechanics text, for example [2][3].
The deflection y at position x in response to an applied load P at position a on a simply
supported thin beam of length l is given by

3
3
(2 )( ) ( )
( )
6
W l a a l ax l a x
y x a
EI l
+
=

(1)
,where E is the Youngs modulus and I is the second moment of inertia. For (1) the following
assumptions applied: (a) the beam is straight, (b) the beam is constructed from a homogenous
material of constant elasticity, (c) the cross sectional area remains planar and is uniform, (d)
the applied load will not cause permanent deformation, and (e) deflections are small with
respect to length. This study, the load was maintained constant at 3 N. The deflection based
on the beam theory can be visualized as Figure 2.

G00090
March 23-26, 2010
628

Figure 2. The simulated beam deflections for 8 sensors
2.3 Multilayer Artificial Neural Network
The determination of load position is modeled using artificial neural network
(ANN). ANN was inspired by attempts to simulate biological neural systems. Therefore,
analogous to human brain structure, an ANN is composed of an interconnected assembly of
nodes and directed links as shown in Figure 3. Figure 3 displays ANN with one hidden layer,
s nodes, r inputs and 1 output, ended a perceptron to combine all outputs from the hidden
nodes [4].

f
b
1
2
n
1
2
y
1
2
1
p
1
1
p
1
2
p
1
3
b
1
1
1
b
1
S
1
1
f
p
1
R
w
1
11
w
1
S
1
R
f
n
1
1
n
1
S
1
y
1
1
y
1
S
1
b
2
1
1
w
2
1
w
2
S
f
n
2
1
y
2
1

Figure 3. A structure of one-hidden layer ANN
,where p
i
stands for input, w
i
is weight, b
i
is bias, a sigma next to the inputs stands for neural
or node and f is an activate function, which normally is a linear function such as sigmoid,
linear and hard limit. ANN can be extended into multilayer as shown in Figure 4, where 2
hidden layers are applied for.

Perceptrons

One hidden layer
G00090
March 23-26, 2010
629
f
b
1
2
n
1
2
y
1
2
1
p
1
1
p
1
2
p
1
3
b
1
1
1
b
1
S
1
1
f
p
1
R
w
1
11
w
1
S
1
R
f
n
1
1
n
1
S
1
y
1
1
y
1
S
1
b
2
2
1
b1
1
b
2
S
2
1
w
2
11
w
2
S
2
S
1
f
n
2
2
f
f
n
2
1
n
2
S
2
y
2
1
y
2
2
y
2
S
2
b
N
1
1
w
N
S
f
n
N
1
y
N
1
w
N
1

Figure 4. A structure of two-hidden layer ANN
In this study, an applied load position was deduced from deflection of the beam at specified
positions using ANN. In order to investigate the efficiency of ANN, we have made
experiments on one hidden layer ANN with and without bias and two hidden layer ANN
without bias.

3. EXPERIMENTAL SETUP
In this study, we obtain the dataset from the simulation based on the beam theory
using 2-8 numbers of sensors with 1-399 mm load positions and 4 different patterns to
positioning the sensors. Therefore, in total there are 28 dataset files. Each file contains 400
records with different number of attributes or columns depending on the number of sensors
used. The 4 positioning patterns used in this study are:
pattern 1 (p_1) the position of the sensors is equally located on the beam.
pattern 2 (p_2) the position of the sensors is equally located on the beam, starting and
ending at one end of the beam
pattern 3 (p_3) the position of the sensors is randomly located on the beam.
pattern 4 (p_4) the position of the sensors is placed based on Principal Component
Analysis (PCA) as described in [5].
Since this study focuses on the performance of multilayer neural networks, we
investigate three different structures of neural networks, which are: single hidden layer
without bias, single hidden layer with bias and two hidden layer, each layer contains 10
nodes. Each technique applies on the 8-sensor dataset without noise and with 10% noise [6].
The dataset without noise is simulated directly from the beam theory. This means its
highly accurate based on a formula, but in practice the physical experiments always comes
with errors either from a defective sensor or inappropriate environment. In this study, we
assume there are at most 10 percentage errors from the actual measurement. In training, 10
load positions at equal pitch of 48 mm along the beam length starting at 1 mm and ending at
399 mm were used as training data. The learning rate is set at 0.9 and the activation function
used is sigmoid. The networks were trained for 3,000,000 iterations. Root mean squared error
(RMSE) is used to measure for prediction accuracy. The dataset is normalized to the interval
of [0,1] before passing through the process of neural training.

G00090
March 23-26, 2010
630
The result in Figure 5 indicates that one layer neural network performs best followed
by two layer neural network, while one layer with bias performs worse. This shows that the
neural network is sensitive to bias. Figure 6 presents the prediction accuracy of 3-type neural
networks on dataset with 10% noise. As can be seen from Figure 6, two layer neural network
turns to perform best in this case, in stead of one layer. One layer performs really worse in
pattern 3.

Figure 5. A comparison of RMSE value for different dataset pattern without noise

Figure 6. A comparison of RMSE value for different dataset pattern with 10% noise

Figure 5 and 6 both indicate that pattern 1 and 4 seem to be the best pattern for
positioning the tactile sensors.

G00090
March 23-26, 2010
631
5. CONCLUSION
In this study, we apply three different types of artificial neural network in modeling
of contacting load of a distributive tactile sensor. The prediction accuracy of one layer ANN
performs well with the dataset without noise, while two layers ANN performs best on the
dataset with noise and ANN with bias fails to construct an accurate approximation model. The
best pattern of positioning seems to be the pattern 1 and 3.

REFERENCES
1. Dargahi, J., Parameswaran, M., and Payandeh, S., Journal of Microelectomechanical
Systems, 9(3), 2000, pp 329-335.
2. Young, W.C., Roaks formulas for stress and strain (International ed.), McGraw-Hill Inc,
Singapore, 1989.
3. Gomm, J. B., Weerasinghe, M., and Williams, D., Proceedings of the Institution of
Mechanical Engineering. Part E Journal of Process Mechanical Engineering, 214(2),
2000, 131 143.
4. Zanchettin, C., Ludermir, T.B., Applied Soft Computing, 2007, Volume 7, 246256.
5. Rungrattanaubol, J., and Tongpadungrod, P., International Joint Conference on Computer
Science and Software Engineering (JCSSE), 2008, Volume 1, 47-52.
6. Svozil, D., Kvasnicka, V., and Pospichal, J., Chemometrics and Intelligent Laboratory
Systems, 1997, 39, 43-62.

G00091
March 23-26, 2010
632
Public Transport Route Design for Minimal Energy
Consumption

N. Charoenroop
c
, R. Nilthong, and A. Eungwanichayapant
Thailand
C
E-mail: nitisak@rmutl.ac.th; Tel. (+66) 5372-9600 ext. 4500 Fax. (+66) 5372-9606-7

ABSTRACT
Chiang Rai province has a high percentage of private vehicle usage, about 90% of all
vehicle types. As a result, many places, such as educational or business areas, experience
problems such as traffic congestion, environment and inefficient use of energy. Public
transport is an alternative way to solve these problems. This study applied the well-known
Vehicle Routing Problem (VRP) and integer linear programming format for designing the
public transport routes under a given objective of minimizing the fuel consumption. The
model was applied for routing 2 existing bus routes; Passenger Bus Terminal-Rajabhat
University and Passenger Bus Terminal- RongKhun Temple, process by Branch and Cut
method found that the best route used fuel consumption 3,943.58875 milliliters, travel time
49.1321 minutes and services distance 25,970.5739 meters. The model interfaced with the GIS
(Graphic Information System) modules and GLPK (GNU Linear Programming Kit) has a
potential for further development for the case of Multi-Depot Vehicle Routing Problem
(MDVRP).

Keywords: Bus Routing, Fuel Consumption, Integer Linear Programming Formulations,
Vehicle Routing Problem (VRP).

1. INTRODUCTION
This study refines the well-known Vehicle Routing Problem (VRP) by including
additional Fuel Consumption Constraints for the case of public transportation and logistics.
Transportation is the movement of people or goods from one location to another. Transport is
performed by various modes, such as road, rail and air. The field can be divided into
infrastructure, vehicles and operations. Satayopas B. et al [9]. suggested that the public
transport system in Chiang Rai should be promoted and developed to serve more effective and
sustainable. It is important that this new public transport system should support most areas in
Chiang Rais city. If people use public transport rather than the private car, the traffic
problems will be solved.
The survey of the public transport in Tumbon Robvieng, in Figure 1 shows three bus
routes: Rob-Vieng (line), San-Sai district (Bold line) and Rajabhat University (Bus Symbol
line). From this figure, some areas are serviced by more than one bus routes while some other
areas are not serviced. For example, the yellow points are school require service by public
transportation, however; lacking a public transport system in those areas lead to an increase in
using private car.
This study introduces Public Transportation model in order to analyze an appropriate bus
routes under a given objective minimize the fuel consumption by considering velocity. It is
because countryside city public transportation and differentiate of traffic effect to fuel
consumption.

*Corresponding author. Rajamangala University of Technology Lanna,Tumbon Sai-Khao
Amphoe Phan, Chiang Rai, Thailand, 57120.
Tel: (+66) 5372-9600-5 Fax:(+66) 5372-9606-7 E-mail:nitisak@rmutl.ac.th
G00091
March 23-26, 2010
633

Figure 1 shows public transportation. (Rob-Vieng zone).

The remainder of this paper is structured as follows. Section 2 reviews relevant prior
literatures on public transport route design for minimal energy consumption. The research
method about Integer Linear Programming Formulations (ILPFs) is discussed in Section 3.
The illustrative example of Public Transportation, used Tumbon Rob-Vieng, Chiang Rai
Province as case study, are reported and discussed in Section 4 (see Figure 2). Figure 2
identifies the appropriate bus route which has been passed the selective places such as school,
market, government and tourist attractive places. Section 5 concluded the study.

Figure 2 shows master plan Chiang Rai maps.

G00091
March 23-26, 2010
634
The Vehicle Routing Problem (VRP) is a combinatorial optimization and integer
programming problem seeking to service a number of customers with a fleet of vehicles. VRP
is an important problem in the fields of transportation, distribution and logistics. Often the
context is that of delivering goods located at a central depot to customers the goal of
minimizing the cost of distributing the goods.
Character of the VRP (Vehicle Routing Problem): There are r routes satisfy in the desired
conditions simultaneously: Each route starts and ends at the depot. Each bus stop is served at
least one route and other node (not bus stop) is served or not. The total distance or travel time
of each route does not exceed the distance limit or travel time limit. The objective is to find a
set of r routes of minimum total cost.
Solution techniques for VRP.
In general, there are two methods used to solve the Vehicle Routing Problem (VRP): the
Exact method and the Heuristic method. The first, Exact method, can be namely as
Optimization method, Branch and Bound method (up to 100 nodes) (Fisher, 1994) and
Branch and Cut method. This method is based on a decision tree structure to solve problems
(as shown in Figure 3) which can guarantee a best answer (optimal solution) in any other time
in the work. It proposes to compute every possible solution from one to reach the best.
Turning to the Heuristics, this method performs a relatively limited exploration of the
search space and typically produces good quality solutions within modest computing times.
However, this method will not guarantee for the optimal solution.
Researchers have continuing their study in both of these two methods to find out the way
to develop algorithms that reach the optimal solution in shortest time. Further, the studies are
continuing for the quality of solutions which will be nearly the same to the optimal one.
Branch and Cut is a method of combinatorial optimization for solving integer linear
programs. This method is a broad class of algorithms to solve MILPs. The method is a hybrid
of branch and bound and a cutting plane algorithm. The Branch and Bound is a divide and
conquer approach that reduces the original problem to a series of smaller sub problems and
then recursively solves each sub problem with static node selection methods (show in Figure
3). A cutting plane algorithm is used to find further linear constraints which are satisfied by
all feasible integer points but violated by the current fractional solution.

Figure 3 Branch and Bound Method [2]
G00091
March 23-26, 2010
635

VRP was first defined by Dantzig and Ramser in 1959. In that study, the authors use
distance as a surrogate for the cost function. Imdat Kara and Tolga Bektas Minimal Load
Constrained Vehicle Routing Problem, in that study, the authors used VRP and extended to
the case where each vehicle is restricted to an additional minimal starting or returning load
constraint. Two years letter Imdat Kara, Bahar Y. Kara, and M. Kadri Yetis introduce a new
cost function based on distance and load of the vehicle for the Vehicle Routing Problem. A
heavily loaded truck will use more fuel than lightly loaded truck. In that study, the authors use
fuel as a surrogate for the cost function. Where the fuel consumption is a function of load,
distance traveled. In this study introduces Public Transportation model in order to analyze an
appropriate bus routes under a given objective minimize the fuel consumption by considering
velocity.

In this section, we describe mathematical model in details. We begin with explain the
notation, present the mathematical formulation, components of which are given as:
Parameters, Decision variable, Function Objective Function and Constraint.
The problem is formally defined on a directed graph G = (V, A) where V = {0, 1, 2,, n}
is the set of nodes (vertices), classical node 0 denote is the depot (Bus Terminal) and the
remaining nodes are intersection and bus stop. The set is a link set.
With each link (i,j) is associated a travel cost , is Distance ( , velocity can
calculated to Travel Time and Fuel Consumption Rate ). This problem
selected weight and directed graph. Weights mean cost from node i to j and directed graph is
mean direction of bus from node i to j. Directions of bus in Chiang Rai are defined as two
formats. First is the format of one-way path which is forced all the time (see dense line arrows
in Figure 4). Another format is a one-way path some time (in rush hours) between 06:30-
08:30 am and 03:30-06:00 pm showing as clear line arrows (see Figure 4).
This study introduces integer programming formulations (shown in Equations 1 and 2)
which applied to solve the Vehicle Routing Problem. An exact solution algorithm (Branch
and cut method) was used for solved problem.

Minimize or Maximize x c
t

(1)
Subject to ; 0 , x b Ax (2)

Figure 4 Direction of Bus.

G00091
March 23-26, 2010
636

Notation:
We summarize and present all notations used below to facilitate our discussion of the
model in the next subsection.
Set
N Set of nodes. (218 nodes). } ,..., 2 , 1 , 0 { n N
ROUTES Set of routes. } ,..., 2 , 1 , 0 { route ROUTE
LINKS Set of links between nodes i to j. (594 links) } , , | ) , {( j i V j i j i Link
FDepot Set of First Depot.
LDepot Set of Last Depot. (Multi Depot)
BS Set of Bus Stop Service. } ,..., 2 , 1 , 0 { s BS

Parameters
i Index for traveling from node i
j,k Index for traveling to node j,k
r Index for traveling in route r

Average velocity from node i to j. (Unit:km./hr.)
MaxDists Maximum Distance allowed each route (Unit:meter)
MaxTravT Maximum Travel Time allowed each route. (Unit:minute)

Coefficient

Distance (Unit:meter) from node i to j.

Fuel Consumption from node i to j

Travel Time from node i to j

Decision Variable
Notation used in the description of the model

= is sequence in which city i in route r is visited.

Function Calculate
= travel time from node i to j. It is function of distance and velocity, and it is
function used for calculated travel time each link of routes. And travel time doesnt
exceed limit. In general distance can calculated from equation (3) (Units: meters and
minutes)

mater and Minute) (3)

The travel time can be written as,

(4)

Mathematical formulation of all desired Objective Function and Condition constraint is the
following.

G00091
March 23-26, 2010
637

Objective Function:
The objective function (5) of this study is to minimize total fuel consumption of
overall route which is as flows.

(5)

= Fuel Consumption from node i to j can calculate from Fuel Consumption Rate
(FCR) multiply by Distance from node i to j can be written as,

(6)

Table 1 Average Fuel Consumption Rate (FCR) of minibus
Average Velocity
(
j i
v
,
) (km./hr.)
Average Fuel
Consumption
(liter/km.)

4.9 3.433
9.3 4.452
14.8 5.118
22.8 5.487
35.9 6.420
60.1 8.464
78.1 6.699
Figure 5 Fuel Consumption Rate (5-80 km./hr.)

Table 2 shows the fuel consumption function and interpolation function (Cublic Spline
Interpolation.) plotted at velocity between 5 80 km./hr. In the graph dotted shows average
Fuel Consumption Rate (shown in Figure 5). The line graph is interpolation function between
5-80 km./hr. From the graph, it was found that the average velocity at 60.1 km./hr. The
highest average fuel consumption rate is 8.464 km./liter.

Description Constraint: (7-15)
Constraint (7) Route constraint; each route starts at depot, and summation of all route r does
not exceed routes limits (7.a), and each route can start at depot 1 time only (7.b).

(7.a)

(7.b)

Constraint (8) Route constraint; each route stop at depot, and summation of all route r does
not exceed routes limits.

(8)

G00091
March 23-26, 2010
638

Constraint (9) Bus Stop Nodes constraint; state that every Bus Stop should be visited at least
.

(9)

Constraint (10) Non Bus Stop Nodes constraint; state that every non bus stop should be
visited or not (0).

(10)

Constraint (11) Nodes balance constraint; flow conservation constraints. In this case we
consider all nodes (Bus Stop and non Bus Stop nodes) and all routes.

(11)

Constraint (12) Links balance constraint; flow conservation constraints check total links each
route balance

(12)

Constraint (13) Routes balance constraint; flow conservation constraints check route balance

(13)

Constraint (14) Capacity constraint; the distance or travel time of any bus route r does not
exceed limits.

(14.a)

(14.b)

Constraint (15) Sub Tour Elimination constraint; ensuring the solution contains no illegal sub
tours. In this study use declare sub tour elimination in format of MTZ (add route index).

(15)

G00091
March 23-26, 2010
639

Constraint (16) Decision variable; = is binary variable, whether route r traveled form
node i to j.

(16)

Constraint (17) Decision Variable; = is sequence in which city i is visited.

(17)

We declared equation (ILPFs) in section 3, in format GMPL [1] (GNU Mathematical
Programming Language), afterwards process by GLPK [2] (GNU Linear Programming tool
Kit) version 4.42. Cooperate Microsoft Visual C++ 2010 Express Edition, on an Intel
Core2Dual running at 2.0 GHz, RAM 4,096 Mb.
In this study declare 218 node and 594 links and define 12 bus stops for test model.
Process by Branch and Cut method found that the best route used travel time 49.1321 minutes,
services distance 25,970.5739 meters, fuel consumption rate 3,943.58875 milliliters. We test
model by SYMPHONY Program again we found that SYMPHONY be equal to GLPK.
(Shown in Table 2 -3 and Figure 6.a-b).

Figure 6.a Output Solution Fugure 6.b Output Solution
(Tumbon Rob-Vieng Area)

Table 2 Summary Fuel Consumption
Route No. Travel Time
(minute)
Distance (meter) Fuel Consumption
(ml.)
1 23.8436 12,674.4913 1,862.73822
2 25.2895 13,296.0826 2,080.85052
Total 49.1321 25,970.5739 3,943.58875
G00091
March 23-26, 2010
640
Table 3 Table bus stop
Route Bus Stop (Pass)
1 [Bus Terminal-RongKhun Temple] Pass 5 Bus Stops
[ Chiang Rai Hospital, Panich School, Night Brasa Market, Market Real,
Chiang Rai Highway District. ]
2 [Bus Terminal-Rajabhat University] Pass 9 Bus Stops
[ OverBlook Hospital, Wood Office, Sanpasarnmit, Santi School, Night Brasa
Market, Market Real, TAT, King Mang Rai Monument, Prakav Temple]

5. CONCLUSION
This study presents an integer linear programming formulation for a case of the Vehicle
Routing Problem. The formulation with additional fuel consumption is shown. Further
classical Vehicle Routing Problem (VRP) is applied in this study to analyze the appropriate
bus routes under a given objective minimize the fuel consumption by considering velocity.
We consider multi-route cases and adopted the Integer Linear Programming Formulations
(ILPFs). The model was also developed with interfacing the GIS (Graphic Information
System) modules and GLPK (GNU Linear Programming Tool Kit). It was found that the best
route used fuel consumption 3,943.58875 milliliters, travel time 49.1321 minutes and services
distance 25,970.5739 meters. The model interfaced with the GIS (Graphic Information
System) modules and GLPK (GNU Linear Programming Kit) has a potential for further
development for the case of Multi-Depot Vehicle Routing Problem (MDVRP).

REFERENCES
1. Andrew Makhorin, Modeling Language GNU, DraftEdition, for GLPK Version 4.42,
Department for Applied Informatics Moscow Aviation Institute, Moscow, Russia. 2010,
53.
2. Andrew Makhorin, GNU Linear Programming Kit Reference Manual Version 4.42,
Department for Applied Informatics Moscow Aviation Institute, Moscow, Russia. 2010,
53.
3. Imdat Kara and Tolga Bektas, Minimal Load Constrained Vehicle Routing Problems,
Springer-Varleg Berlin Heidelberg, 2005, 188-195.
4. Imdat Kara, Bahar Y. Kara and M.Kadri Yetis, Energy Minimizing Vehicle Routing
Problem, Springer-Verlag Berlin Heidelberg, 2007, 62-71.
5. Miller C.E., A.W.tucker and R.A.Zemlin, Integer programming formulation of Traveling
salesman problems, Journal of ACM, 1960, Volume 3, 326-329.
6. Minea Filipec, Davor Skrlec, and Slavko Krajcar, An Efficient Implementation of Genetic
Algorithms for Constrained Vehicle Routing Problem, IEEE International Conference,
1998, Volume 3, 2231-2236.
7. Ministry of Land, Emission factors of in-use vehicles in Bangkok, Infrastructure and
Transport Japan, 2004, 39.
8. Paolo Toth and Dsaniele Vigo, The Vehicle Routing Problem, Siam Monographs on
Discrete Mathematics and Applications, United State of America, 2002, 365.
9. Satayopas B. et al, The masters plan of transport and traffic policy case study Chiang Rai,
Department of Civil Engineering, Chiang Mai University, Thailand, 2004.

G00093
March 23-26, 2010
641
Automatic Vessels Edge Detection for Low-Contrast Babys
Retinal Images

P. Suapang
1,C
, M. Chuwhite
2
, and W. Nghauylha
3

1, 2, 3
Biomedical Instrumentation Program, Department of Physics, Faculty of Science, Rangsit
University, 52/347, Lakhok, Muang, Pathumthani,12000, Thailand
C Biomedical Instrumentation Program, Department of Physics, Faculty of Science, Rangsit
E-mail: piyamas@rsu.ac.th; Fax: 02-9972200-22 # 1408; Tel. 02-9972200-22 # 1008, 1428

ABSTRACT
The number of premature babies affected by ROP disease (Retinopathy of
Prematurity), a foremost cause of childhood blindness, is continually increasing. ROP
can be effectively and successfully cured if diagnosis is given at an early stage. The
collaboration between physicians and ophthalmologists is therefore important to reduce
and prevent this eye loss phenomenon. The RetCam is a recently developed instrument,
which can be used to view the retina of a preterm baby. Nevertheless, pictures taken by
RetCam are still not clear and vessels cannot be clearly seen, which causes a delay in
diagnosis and treatment. This research is aimed at improving the medical image from
RetCam camera using image processing techniques. Local contrast enhancement and
edge detection techniques are tested and further developed to detect the edge of eye
vessels automatically. The resulting improved images will help doctors to instantly
diagnose the patient and then provide proper treatment in due course.

Keywords: Medical Image, Edge Detection, Local Contrast Enhancement.

1. INTRODUCTION
Retinopathy disease is a blind eye condition of premature babies. In Thailand 16 out of
120 premature babies lose their vision compared with 300 out of 1,000,000 in the United
States of America. Digital camera technology is adopted in photographing the eye but a
problem is that the image is not clear enough and not able to be used in diagnosing the
disease. Researchers have realized the issue, and set up research on methods of improving the
image for specialist physicians to adopt in diagnosing and treating the disease in time.

To find blood vessels automatically, image processing techniques are applied. The
researcher has developed a testing procedure. First, image is converted to gray scale image,
and then pre-processed using a procedure of Local Contrast Enhancement (as in equation (1)).

) ( ) (
) ( ) (
255 ) , ( ) , (
min max
min
f f
f f
j i g j i f
w w
w w
v v
v v
= (1)

where the sigmoid function is
1
exp 1 ) (
|
|
.
|
\
|
+ =
w
w
w
f f
f
o
v
G00093
March 23-26, 2010
642
while f
max
and min f
min
are the maximum and minimum value of intensity within the whole
image with
e
=
) , ( ) , (
2 ) , (
) , (
1
j i w l k
j i w
l k f
M
f
and
e
=
) , ( ) , (
2
2
2
) ) , ( (
1
) (
j i l k
w
w
f l k f
M
f o

With local contrast enhancement, intensity is adjusted more clearly while the colour
remains original. A clearer image will be produced. The image is then put into Edge Model
with an application of image gradients, in which edge of eye vessel can be detected by
Gradient derived from calculation of mask values of Roberts cross operator, Prewitt
operator, and Sobel operator. The gradient images are derived by calculation of convolution
in x and y axis, multiplication between the total of mask values in x and y axis and the image,
and then calculate on Gradient value from following equation (2).
Gradient |G| =
2
2
2 2
|
|
.
|
\
|
c
c
+
|
.
|
\
|
c
c
= +
y
f
x
f
G G
y x
(2)

In second order differential method of edge detection, theory of LOG, Laplacian of
Gaussian, has been applied. The procedure applies Gaussian smoothing first, and then a
Laplacian process, and finally a Zero-Crossing. A mask value can then be derived from the
following formula (3).

|
|
.
|
\
|
+

=
2
2 2
2
2
2 2
4
2
1
1
) , (
o
o to
y x
e
y x
y x Log (3)

Canny edge detection has also been experimented with a mask size of 11x3, 15x3, and
15x5 and calculating from Pascal triangular and Gaussian value. Mask size 5x5 is adopted
suitable for our set of images by deriving Gaussian value. A mask value taken from first order
differential methods of edge will again be applied in this edge detection process.

The Automatic Vessels Edge Detection for Low-Contrast Babys Retinal Images
programs by using MATLAB 2007 program to Local contrast enhancement and edge
detection. For, pictures taken by RetCam which has functional as shown in Figure 1.

G00093
March 23-26, 2010
643

Figure 1. The functional for Automatic Vessels Edge Detection for Low-Contrast Babys
Retinal Images programs.

To find blood vessels automatically, image processing techniques are applied. The
researcher has developed a testing procedure. First, image is then pre-processed using a
procedure of Local Contrast Enhancement by equation (1). The results after application are
shown in figures 2.

Figure 2. (A) Original retinal vessels in infant images. (B) image is then pre-processed using
a procedure of Local Contrast Enhancement.

With local contrast enhancement, intensity is adjusted more clearly while the colour
remains original. A clearer image will be produced. The image is then put into Edge Model
with an application of image gradients, in which edge of eye vessel can be detected by
Gradient derived from calculation of mask values of Roberts cross operator, Prewitt
operator, Sobel operator and Canny operator. The results after application are shown in
figures 3.

G00093
March 23-26, 2010
644

Figure 3. The results after edge detection.

The modified algorithms are applied to the original images shown in figures A and B. The
results after application are shown in figures 5 and 6.

A B
Figure 4. Original retinal vessels in infant images.

Figure 5. Resulting edge image of A and B use Sobel operators.

Figure 6. Resulting edge image of A and B use Canny operator with Mask 15x3
G00093
March 23-26, 2010
645

After getting results of images edge detection from several methods, the researcher has
distributed the images to specialist physicians to evaluate. The method or theory that is the
most practical in detecting eyes image edge of ROP premature babies will be identified by
clinician. The result, from the specialist physicians, is in the following.

Figure 7. Average Value Evaluated by Specialist Physicians
After the evaluation by specialist physicians, Sobel operator approach using Image
Gradient theory is found most suitable at 84.75 correctness.

5. CONCLUSION
Improving the medical image from RetCam camera using image processing techniques.
Local contrast enhancement and edge detection techniques are tested and further developed to
detect the edge of eye vessels automatically. The resulting improved images will help doctors
to instantly diagnose the patient and then provide proper treatment in due course.

REFERENCES
1. Alaknanda, R.S. Anand, Pradeep Kumar, NDT&E International, 15, 2005.
2. Gang Wang, T. Warren Liao, NDT&E International, 35, 519528, 2002.
3. Gonzales RC, Woods RE, Digital Image Processing, 2000.
4. H.I. Shafeeka, E.S. Gadelmawla, NDT&E International, 37, 291299, 2004.
5. H.I. Shafeeka, E.S. Gadelmawl, NDT&E International, 37, 301307, 2004.
6. Randy Crane, Simplified approach to Image Processing, 1997.
7. The MathWorks Inc., Image Processing Toolbox for Use With MATLAB, The MathWorks
Inc., 1999.

ACKNOWLEDGMENTS
We would like to appreciate to all of instructors and friends at Department of Industrial
Physics and Medical Instrumentation, Faculty of Applied Science, King Mongkuts
University of Technology North Bangkok, Thailand.
G00094
March 23-26, 2010
646
Image Acquisition and Image Processing Program for
Dermatology Camera

P. Suapang
1,C
, D. Mueanpong
2
, S. Sanglub
3
and B. Haomao
4

1, 2, 3, 4
C Biomedical Instrumentation Program, Department of Physics, Faculty of Science, Rangsit

ABSTRACT
The purpose of this research is to study on the design and construction of archiving
program from Micro USB camera by MATLAB 7.2.0.0232 based on the principle of
image acquisition and image processing. The constructed image archiving program
from Micro USB camera can be operated in 2 functions, i.e. 1) image acquisition
function which is connected image signal from Micro USB camera by capture card and
archived both single frame, .BMP file format and continuous frame, .AVI file format 2)
image processing function which is displayed 2 channel and based on adaptive
interpolators techniques. The results of functional testing found that the program can
connect to image signal from Micro USB camera by capture card, archive both single
frame, .BMP file format and continuous frame, .AVI file format without distortion of
image, display 2 channel and process images with adaptive interpolators techniques.

Keywords: Image Acquisition, Image Processing, Dermatology, Micro USB Camera.

1. INTRODUCTION
The inherent visual nature of dermatology makes it suitable for telemedicine. Several
teledermatology projects have recently been initiated in developing countries, and the number
is gradually increasing. Preliminary results underline a number of potential benefits to
patients, remote health care workers and health care systems of host countries. These benefits
include easy extension of specialized dermatological services to geographically remote areas
with few dermatologists, reduction of patients waiting time for appointments, faster
screening for skin diseases, promotion and coordination of scientific health projects, and
education of health workers and lay people. Local physicians benefit from the mentoring and
educational aspects of the consultations, as well as the access to improved research facilities
and professional interactions. Consulting experts also get special opportunities to review rare
or unusual dermatological cases.
As in other telemedicine systems, teledermatology employs both store-and-forward
methods (asynchronous) and real-time approaches (synchronous). Both modalities have
previously been shown to be quite reliable and accurate when compared with traditional face-
to-face consultation. Store-and-forward systems are more widely used, owing to their lesser
technological requirements and affordability. Images are submitted by email or presented on a
web-based system. Although the real-time approach represents a reasonable substitute for in-
person consultation and has the advantage of enhancing patientdoctor interaction, it is more
time-consuming and expensive.
Teledermatology may involve providing assistance, follow-up or teaching. Tele-assistance
models aim at teleconsultation, telescreening and/or second opinion. The majority of
teledermatology projects in developing countries deal with dermatology consultations.
Telescreening projects have been used to manage waiting lists for treatment of dermatoses
G00094
March 23-26, 2010
647
with different healing times or to support prevention programmes such as those surveying
skin tumours. Telefollow-up systems deal with transmission of medical information regarding
follow-up and treatment progression of patients from remote centres (e.g. to follow up
patients treated for certain chronic skin conditions such as leg ulcers and leprosy) and for
postoperative evaluation. Tele-education is proving to be a versatile model, helpful in staff
development such as by tutoring and assessing medical and paramedical workers.
7
Most
teledermatology collaborative projects also involve some degree of tele-education in addition
to tele-assistance. Thus, in addition to long-distance consultation, they also provide
continuing medical education (CME) for physicians who submit cases. Applications for tele-
education mainly integrate text and images (static or dynamic) and/or virtual reality models to
achieve health education.
The use of web applications for discussion forums represents another application of
teledermatology. The main objective of such applications is to create a quick and easy method
for teleconsultation from a pool of expert consultants. The philosophy behind these
DermOnline communities is open access teleconsultation in dermatology, which means that
these platforms are free to all users and that the users themselves generate the content by
sending and answering the teleconsultations. These communities have moderators who check
both the subscribers and the content of the requests in order to guarantee friendly and orderly
virtual interaction.

Figure 1. Teledermatology project data flow

This research has done with collecting the imaging data by the basic steps required to
create an image acquisition application by implementing a simple motion detection
application. The application detects movement in a scene by performing a pixel-to-pixel
comparison in pairs of incoming image frames. If nothing moves in the scene, pixel values
remain the same in each frame. When something moves in the image, the application displays
the pixels that have changed values. To use the Image Acquisition Toolbox to acquire image
data, you must perform the following basic steps: Step 1: Install Your Image Acquisition
Device which Generic Windows image acquisition devices, such as Webcams and digital
video camcorders, typically do not require the installation of a frame grabber board. You
connect these devices directly to your computer via a USB or FireWire port. After installing
and configuring your image acquisition hardware, start MATLAB on your computer by
double-clicking the icon on your desktop. You do not need to perform any special
configuration of MATLAB to perform image acquisition. Step 2: Retrieve Hardware
Information which in this step, you get several pieces of information that the toolbox needs to
uniquely identify the image acquisition device you want to access. You use this information
when you create an image acquisition object, described in Step 3: Create a Video Input
G00094
March 23-26, 2010
648
Object. In this step you create the video input object that the toolbox uses to represent the
connection between MATLAB and an image acquisition device. Using the properties of a
video input object, you can control many aspects of the image acquisition process. For more
information about image acquisition objects, see Connecting to Hardware. Step 4: Preview the
Video Stream (Optional) after you create the video input object, MATLAB is able to access
the image acquisition device and is ready to acquire data. However, before you begin, you
might want to see a preview of the video stream to make sure that the image is satisfactory.
For example, you might want to change the position of the camera, change the lighting,
correct the focus, or make some other change to your image acquisition setup. Step 5:
Configure Object Properties (Optional) after creating the video input object and previewing
the video stream, you might want to modify characteristics of the image or other aspects of
the acquisition process. You accomplish this by setting the values of image acquisition object
properties. Step 6: Acquire Image Data after you create the video input object and configure
its properties, you can acquire data. This is typically the core of any image acquisition
application. Step 7: Clean Up when you finish using your image acquisition objects, you can
remove them from memory and clear the MATLAB workspace of the variables associated
with these objects.
Image interpolation works in two directions, and tries to achieve a best approximation of
a pixel's color and intensity based on the values at surrounding pixels. The following example
illustrates how resizing / enlargement works:

Figure 2. Example illustrates how resizing / enlargement works.

Adaptive interpolators may or may not produce the above artifacts, however they can also
induce non-image textures or strange pixels at small-scales:

(A) (B)
Figure 3. (A) Original Image with Small-Scale Textures and (B) Crop Enlarged 220%

On the other hand, some of these "artifacts" from adaptive interpolators may also be seen
as benefits. Since the eye expects to see detail down to the smallest scales in fine-textured
areas such as foliage, these patterns have been argued to trick the eye from a distance (for
some subject matter).

G00094
March 23-26, 2010
649
The Image Acquisition and Image Processing Program for Dermatology Camera by using
MATLAB 7.2.0.0232 program to image acquisition and image processing. For, pictures taken
from Micro USB camera which has functional as shown in Figure 4.

Figure 4. The functional for Image Acquisition and Image Processing Program for
Dermatology Camera.

The steps of Image Acquisition and Image Processing Program for Firearms and
Toolmarks Cmparision in Forensic Science has shown in Figure 5.

Figure 5. The steps of Image Acquisition and Image Processing Program for Dermatology
Camera.

G00094
March 23-26, 2010
650
The Image Acquisition and Image Processing Program for Dermatology Camera which
the constructed image archiving program from Micro USB camera can be operated in 2
functions, i.e. 1) image acquisition function which is connected image signal from Micro USB
camera by capture card and archived both single frame, .BMP file format and continuous
frame, .AVI file format. 2) image processing function which is displayed 2 channel and based
on adaptive interpolators techniques.

Figure 6. Image acquisition function which is connected image signal from Micro USB
camera

Figure 7. Resulting of continuous frame, .AVI file format from Comparison Macroscope.

5. CONCLUSION
The results of functional testing found that the program can connect to image signal from
Micro USB camera by capture card, archive both single frame, .BMP file format and
continuous frame, .AVI file format without distortion of image, display 2 channel and process
images with adaptive interpolators techniques.

G00094
March 23-26, 2010
651
REFERENCES
1. Somrit Unai1, Pisan Khantang, Udon Junthorn, Waipot Ngamsaad, Narin Nattavut,
Wanapong Triampo, Chartchai Krittanai, SINGLE PARTICLE TRACKING:
APPLICATION TO STUDY MinD PROTEIN OSCILLATION IN LIVE Escherichia coli,
33
rd
Congress on Science and Technology of Thailand, p. 333, 2007.
2. Richard Wootton, Nivritti G Patil, Richard E Scott, Kendall Ho, Telehealth in the
Developing World, Royal Society of Medicine Press Ltd., 1 Wimpole Street, London
W1G 0AE, UK, 2009.
3. The MathWorks, Inc., Imag Acquisition Toolbox for Use With MATLAB, The MathWorks
Inc., 1999.

ACKNOWLEDGMENTS

G00095
March 23-26, 2010
652
Image Acquisition and Image Processing Program for
Firearms and Toolmarks Comparision in Forensic Science

P. Suapang
1,C
, C. Prasitsathapron
2
and S. Janpuk
3

1, 2, 3
C
Biomedical Instrumentation Program, Department of Physics, Faculty of Science, Rangsit University,
52/347, Lakhok, Muang, Pathumthani,12000, Thailand

ABSTRACT
An applied image processing for firearms and toolmarks comparision in forensic
science is desirable because it decreases time-consuming, decreases the cost of
inspection process and improves the inspection quality. This paper presents study on
the design and construction of archiving program from Comparison Macroscope by
Borland C++ based on the principle of image acquisition and image processing. The
constructed image archiving program from Comparison Macroscope can be operated in
2 functions, i.e. 1) image acquisition function which is connected image signal 2
channel from Comparison Macroscope by capture card and archived both single frame,
.BMP file format and continuous frame, .AVI file format 2) image processing function
which is displayed 2 channel and based on interpolation, rotation, shift, mirror and flip
techniques. The results of functional testing found that the program can connect to
image signal from Comparison Macroscope by capture card, archive 2 channel both
single frame, .BMP file format and continuous frame, .AVI file format without
distortion of image, display 2 channel and process images with interpolation, rotation,
shift, mirror and flip techniques.

Keywords: Comparison Macroscope, firearms and toolmarks comparision, image
acquisition and image processing.

1. INTRODUCTION
The Firearms-Toolmarks Section in forensic science receives and examines evidence
related to firearms, ammunition, tools and toolmarks involved in the commision of a crime.
Frequently received items include rifles, pistols, cartidges, bullets, crowbars, knives and
doorknobs. This section performs the following types of examinations: Firearms
Function/Test Fire, Microscopic Comparisons, Serial Number Restorations, Distance
Determinations and National Integrated Ballistic Information Network (NIBIN). An applied
image processing for firearms and toolmarks comparision in forensic science is desirable
because it decreases time-consuming, decreases the cost of inspection process and improves
the inspection quality.

This research has done with collecting the imaging data by selecting the data from video
signals which have sent to the monitor console by using video capture card (model: Life View
Chip BT 8.7) processing through AVICAP Windows Class. These function as the interface
between Application and Device Driver to control the image receiver to collect the image
signal in single windows bitmap file (.BMP) and continuous frame, .AVI file format.
G00095
March 23-26, 2010
653

Figure 1. Collecting the imaging data by using video capture card.

And, process images with interpolation, rotation, shift, mirror and flip techniques which
image interpolation works in two directions, and tries to achieve a best approximation of a
pixel's color and intensity based on the values at surrounding pixels. The following example
illustrates how resizing / enlargement works:

Figure 2. Example illustrates how resizing / enlargement works.
The rotation operator performs a geometric transform which maps the position (x
1
,y
1
) of a
picture element in an input image onto a position (x
2
,y
2
) in an output image by rotating it
through a user-specified angle about an origin O. In most implementations, output locations
(x
2
,y
2
) which are outside the boundary of the image are ignored. Rotation is most commonly
used to improve the visual appearance of an image, although it can be useful as a preprocessor
in applications where directional operators are involved. Rotation is a special case of affine
transformation. The rotation operator performs a transformation of the form:
0 0 1 0 1 2
0 0 1 0 1 2
) ( * ) cos( ) ( * ) sin(
) ( * ) sin( ) ( * ) cos(
y y y x x y
x y y x x x

(1)
where (x
0
,y
0
)are the coordinates of the center of rotation (in the input image) and is the
angle of rotation with clockwise rotations having positive angles. (Note here that we are
working in image coordinates, so the y axis goes downward. Similar rotation formula can be
defined for when the y axis goes upward.) Even more than the translate operator, the rotation
operation produces output locations (x
2
,y
2
) which do not fit within the boundaries of the
image (as defined by the dimensions of the original input image). In such cases, destination
elements which have been mapped outside the image are ignored by most implementations.
Pixel locations out of which an image has been rotated are usually filled in with black pixels.
The mirror and flip techniques are the object obtained by replacing all elements a
ij
with a
ji
.
For a second-tensor rank tensor a
ij
, the tensor transpose is simply a
ji
. The matrix transpose,
most commonly written A
T
, is the matrix obtained by exchanging A
/
s rows and columns, and
satisfies the identity.
(A
T
)
-1
= (A
-1
)
T
(2)

G00095
March 23-26, 2010
654

The Image Acquisition and Image Processing Program for Firearms and Toolmarks
Cmparision in Forensic Science by using Borland C++ program to image acquisition and
image processing. For, pictures taken by Comparison Macroscope which has functional as
shown in Figure 3.

Figure 3. The functional for Image Acquisition and Image Processing Program for Firearms
and Toolmarks Cmparision in Forensic Science.

The steps of Image Acquisition and Image Processing Program for Firearms and
Toolmarks Cmparision in Forensic Science has shown in Figure 4.
G00095
March 23-26, 2010
655

Image Processing Image Processing
zoom zoom
rotate rotate
shift shift

Figure 4. The steps of Image Acquisition and Image Processing Program for Firearms and
Toolmarks Cmparision in Forensic Science.

G00095
March 23-26, 2010
656
The Image Acquisition and Image Processing Program for Firearms and Toolmarks
Cmparision in Forensic Science which the constructed image archiving program from
Comparison Macroscope can be operated in 2 functions, i.e. 1) image acquisition function
which is connected image signal 2 channel from Comparison Macroscope by capture card (as
shown in figures 5) and archived both single frame, .BMP file format and continuous frame,
.AVI file format (as shown in figures 6) 2) image processing function which is displayed 2
channel and based on interpolation, rotation, shift, mirror and flip techniques. The modified
algorithms are applied to the original images shown in figures 7,8 and 9.

Figure 5. Image acquisition function which is connected image signal 2 channel from
Comparison Macroscope.

Figure 6. Resulting of single frame archiving in .BMP file format from Comparison
Macroscope.

Figure 7. Resulting of image processing function which is displayed 2 channel from
Comparison Macroscope based on interpolation techniques.

G00095
March 23-26, 2010
657

Comparison Macroscope based on rotation techniques.

(A) (B) (C)
Comparison Macroscope based on shift techniques.
(A) Breech face and firing pin marks on cartridge
(B) Bullet Identification
(C) Toolmark Identification

5. CONCLUSION
The results of functional testing found that the program can connect to image signal from
Comparison Macroscope by capture card, archive 2 channel both single frame, .BMP file
format and continuous frame, .AVI file format without distortion of image, display 2 channel
and process images with interpolation, rotation, shift, mirror and flip techniques.

REFERENCES
1. Alaknanda, R.S. Anand, Pradeep Kumar, NDT&E International, 15, 2005.
2. Gang Wang, T. Warren Liao, NDT&E International, 35, 519528, 2002.
3. Gonzales RC, Woods RE, Digital Image Processing, 2000.
4. H.I. Shafeeka, E.S. Gadelmawla, NDT&E International, 37, 291299, 2004.
5. H.I. Shafeeka, E.S. Gadelmawl, NDT&E International, 37, 301307, 2004.
6. Randy Crane, Simplified approach to Image Processing, 1997.
7. Piyamas Suapang, Surapun Yimmum, Kobchai Dejhan, Medical Image Compression and
DICOM-Format Image Archive, ICROS-SICE International Joint Conference
2009(ICCAS-SICE 2009), 2009.

ACKNOWLEDGMENTS

G00110
March 23-26, 2010
658
Subgraph Isomorphism Search for Network Motif Mining

J. Khiripet, W. Khantuwan and N.Khiripet
C
,

Knowledge Elicitation and Archiving Laboratory, National Electronics and Computer Technology
Center, 112 Paholyothin Rd., Klong 1, Klong Luang, Patumthani, 12120, Thailand
C
E-mail: noppadon.khiripet@nectec.or.th; Fax: 02-5646772; Tel. 02-5646900

ABSTRACT
Network motifs are understandable patterns of connections that serve as building
blocks of the network. They appear much more frequently in the given network than
expected by chance alone. However, enumerating all subnetworks within a large graph
network in order to find specfic patterns is practically infeasible. The set of interesting
motifs in biomolecular networks, such as autoregulation and feedfordward loops, is
much smaller to that of the set of frequent subgraphs, thus providing large opportunity
for pruning. We present a fast, content specific backtracking subgraph isomorphism
approach for reducing the number of candidate patterns considered in the search space.
Experimental results using E.Coli transcriptional network validate the efficiency and
utility of the technique proposed.

Keywords: Subgraph Isomorphism, Network Motif, Graph Mining

1. INTRODUCTION
Biology is astounding complex. Cells contain network of thousands of biochemical
interactions, protein-protein interaction (PPI), metabolic and signal transduction. For a long
time, researchers have attempted to discover generalized principles governing these
complexities. Despite the efforts, the search is still in early stage and far from complete [1].
Advances in experimental technology provide insight into biological circuitry. This leads to a
new research area called system biology. Evolution may work slowly but it certainly defines a
set of circuit elements that obey general design principles.
The circuit elements are patterns that recur within a network much more often than
expected by chance. It is also known as network motifs. Network motifs can be represented as
sub-graphs in the network and are also studied in ecology and other fields besides biology.
These are the building blocks from which the network is composed and beneficial to the
organism. The dynamic of network motifs in living cells indicates that they have
characteristic functions. The functions associated with common network motifs are recently
explored both theoretically and experimentally.
The increasing amount of available data creates the need for automated analysis methods
to better understand these interaction mechanisms [2]. Comparative approaches are often
employed to help improve the accuracy, discover complex pathways and understand the
underlying network mechanisms. Network alignments provide the means to study conserved
network topology such as common pathways and network motif [3]. However, the
comparison of networks is computationally intensive and can easily lead to NP-hard
problems.
The remaining of this paper is organized as follows. Section 2 describes background and
related works on network motif and our approach on its computation, especially the maximum
common subgraph. Section 3 demonstrates the utility on a selected prokaryotic metabolic
network. Finally, section 4 concludes our findings along with future research directions.

G00110
March 23-26, 2010
659
Many cellular networks, including protein interaction have shown to have a modular
organization [4]. The function of a network is closely related to its network topology and
topological patterns. Decomposition of a large network into relatively independent
subnetwork is a major approach to deal with the complexity of large cellular networks. In
cellular networks, a module is a group of biomolecular molecules connected together to
achieve a desirable function. Revealing the structure of the module is therefore helpful for
understanding biochemical processes and signal pathways.
To investigate the modularity of interaction network, many computational methods have
been purposed and developed [5]. The objective of these methods is to detect functional
modules in molecular interaction networks based on topological features. The challenge is
that the module is not a well-define structure. Moreover, they often share nodes, links and
even function with other modules [4].
The motifs are subunits of a complex network and considered to be basic building blocks
of many real networks. It is often much smaller than the module. These are the patterns that
occur more often than in randomized networks. Therefore, to detect network motifs, one can
compare the network to the Erdos-Renyi (ER) model [6] with the same number of nodes and
edges. The example and most common network motifs of interest are auto-regulation (AR)
and feed-forward loops (FFL) as shown in Figure 1.

Figure 1. (a) The auto-regulation motif. (B) The feed-forward loop motif.

The auto-regulation (AR) motif is the simplest and most abundant network motif. For
example, in E.Coli transcription network, a transcription factor (TF) represses its own
transcription. The AR function is to speed-up response to signal or to increase stability of the
gene product concentration against stochastic noise [7].
The feed-forward loops (FFL) motif consists of three genes and three regulatory
interactions. The three interactions can be positive or negative; therefore there are eight
possible types of FFL motifs [8]. In most cases, the FFL is either AND gate (X and Y are
required for Z to activate) or OR gate (either X or Y is sufficient for Z to activate).
Basic steps of motif analysis usually involve the enumerating all subnetwork within a
given mode in a network [9]. For large networks, this approach is practically infeasible.
Detecting motifs can be done by means of subgraph isomorphism [10], but it also faces
with the same problem because subgraph isomorphism is a generalization of both the
maximum clique detection of modular product graph and testing whether a graph contains a
Hamiltonian cycle, and is therefore NP-complete. Moreover, the modular product graph of a
large network requires huge storage space.

While detecting the AR motif is straight forward, detecting other network motifs in
general, in which the pattern of the motif is complex, poses a challenge in both design and
implementation of the algorithm. The algorithm must not hard-code the motif pattern due to
G00110
March 23-26, 2010
660
unforeseen pattern that may be needed to investigate in the future. Network motifs should be
represented as a library of templates, which could be modified as will without touching the
code. We proposed a subgraph isomorphism-based approach here that eliminates both the
modular product graph construction and maximum clique detection step. Therefore, the space
requirement is dramatically reduced and the result is guaranteed within acceptable time. This
approach is illustrated in Figure 2.

t={},m={}
Motif_match(G,N,m,t)
if upperBound(G,N,m) < sizeof(N)
return
t t
while true
v1 G-t
t t {v1}
if v1 == none
updateCandidate(m)
return
for v2 N-m
if compatible(v1,v2)
m m+{v1:v2}
Motif_match(G,N,m,t)

Figure 2. Pseudo code for network motif detection

Initially, the list of vertices t and a partial solution m is empty. The inputs are the network
G and the motif to be detected N, the list t and a partial solution m. Initially t and m are empty.
The list t excludes vertices in the search step. The upperBound estimates whether the search
space can give a result based on current partial candidate. The updateCandidate updates
global solution. The compatible tests whether v1 from G and v2 from N match each other.
This recursive algorithm yields all subgraphs in the network G that match the motif N when
the entire search space has been explored.

We used the E.Coli transcription interaction database from Mangan et.al., 2006 [11] and
implement the proposed algorithm in PERL. The result of the program is shown in figure 3.

G00110
March 23-26, 2010
661

Figure 3. Auto regulation and feed forward loop motifs found in E.Coli transcription
network.

The running time of detecting each type of motif is acceptable for normal usage. We
use PERL due to its natively support for advanced data structure, such as hash, and
other features such as regular expression. Implementing the algorithm in C could
speed up the running time dramatically, but we need to deal with lower level
operations and harder to program, however.

5. CONCLUSION
This paper presents our approach to find network motifs by using subgraph isomorphism
approach rather than enumeration. One might ask that if we do not know what type of motifs
in the network, how we can start the query. We argue that the traditional method of motif
finding, which is the enumeration method, is suitable for discovering first-handed. However,
if we must find the specific motifs in other networks, subgraph isomorphism approach is more
appropriate and has better performance. This method can be extended to work with larger
network motifs or modules in the future.

REFERENCES
1. Alon, U., An Introduction to Systems Biology: Design Principles of Biological Circuits,
Chapman&Hall/CRC, Boca Raton, 2006
2. Ideker, S.R., Nat Biotechnoly, 2006, 24(4), 427-433
3. Klau, G.W., BMC Bioinformatics, 2009, 10(s59), 1-9
4. Bateman, A., Coin, L., et.al, Nucl. Acids. Res, 2004, 32, D138-D14
5. Albert, R., and Barabasi A.L., Rev Mod Phys, 2002, 74,47-97
6. Erdos, P. and Renyi, A., Publ. Math, 1959, 6,290
7. Becskei, A. and Serrano, L., Nature, 2000, 405(6786), 590-3
8. Mangan, S. and Alon, U., Proc Natl Acad Sci, 2003. 100(21): 11980-5.
9. Aittokallio, T., and Schwikowski, B., Brief. Bioinform., 2006, 7, 243-255
10. Cook, S.A., Proc. 3rd ACM Symposium on Theory of Computing, 1971, 151-158
11. Mangan, S. Itzkovitz, S., et.al,J. Mol. Biol. ,2006, 356,1073-1081
G00114
March 23-26, 2010
662
On Applying Simple Data Compression to
Wireless Sensor Networks

Phayong Sornsiriaphilux
1,C
, Dusit Thanapatay
2,C
, Kamol Kaemarungsi
3
and Kiyomichi Araki
4

1
TAIST Tokyo Tech, ICTES Program Department of Electrical Engineering, Kasetsart University
2
Department of Electrical Engineering Kasetsart University, Thailand
3
National Electronics and Computer Technology Center (NECTEC), Thailand and
4
Department of Electrical and Electronic Engineering, Tokyo Institute of Technology, Japan
C
Department of Electrical Engineering Kasetsart University, Thailand
Fax: 02-9428555 Ext. 1550; Tel. 081-0056300
E-mail:
1
phayong.s@gmail.com,
2
fengdus@ku.ac.th,
3
kamol.kaemarungsi@nectec.or.th,
4
araki.k.aa@m.titech.ac.jp ;

ABSTRACT
Resource constrains such as memory and energy are important issues when
implementing wireless sensor networks. Especially, efficient memory utilization is
required when we utilize wireless sensor networks with environmental monitoring
systems which have various kinds of sensors. In this work, we demonstrate that
applying simple data compression in wireless sensor networks can alleviate both
memory and energy constraints. The performance of data compression is determined in
term of data compression ratio. The results from real world implementation of simple
data compression in wireless sensor networks and its implication to memory and
energy concerns are reported in this work.

Keywords: Data compression, Wireless sensor networks.

1. INTRODUCTION
Similar to sensor network, acquired data in wireless sensor network (WSN) might be
required large amount of virtual database [1]. For instance, environmental monitoring systems
which utilize wireless sensor networks usually consist of several sensors and usually have
various kinds of sensors. These systems might need a large memory space for collecting all
acquired data from equipped sensors. Moreover, the transmission of data over the wireless
channel at node level in a network consumes higher energy than data processing within the
node [2]. Therefore, minimization of such transmissions by applying data compression
algorithm in wireless sensor nodes before sending data over the air is one of important
strategies for energy-efficient WSNs. For example, in [3], the authors identified and evaluated
simple compression techniques to reduce amount of data prior to transmission in order to save
the energy. In addition, memory requirement at base station node can be reduced when we
utilize data compression in wireless sensor network. However, the choice of algorithm should
be simple, low complexity and applicable for wireless sensor networks which usually have
many resources constrains.
Recently, the authors in [2] surveyed a number of data compression schemes for WSNs
such as coding by ordering [4], pipelined in-network compression [5], low-complexity video
compression [6], and distributed compression [7]. Each of the introduced data compression
schemes was specifically designed for particular application of WSNs. Thus, those data
compression schemes might not be applicable for all applications which utilize wireless
sensor networks. Nevertheless, the authors in [8] proposed a simple data compression
algorithm for WSNs in which energy, memory and computational resources are very limited.
Their algorithm considered only differences between newly acquired data and previously
G00114
March 23-26, 2010
663
acquired data. However, their algorithm does not work well for multiple types of data which
have different characteristics. We observe that in practice different sensors often report data
with different standard deviations depending on the idiosyncrasy of particular sensor. In our
previous work, we modified algorithm in [8] to support multiple types of data by changing
Huffman variable length code to Fixed Index code as described in [9]. The simulation results
of our previous work are support our observation. In this work, we present our study to
validate our simulation results in previous work by applying algorithm in [9] to actual
hardware.
In this work, we implemented the modified algorithm in [9] to wireless sensor node
prototype to study the effect of data compression to memory and energy consumption. For
base station node, we also implemented a simple recompression technique to further minimize
all received compressed data. This technique is used to reduce memory storage space at base
station node.
This article organizes as follows. Section 2 explains the background and the details of
simple data compression algorithm in wireless sensor networks that we implemented in this
work. In Section 3, we describe the experimental setup of our hardware of WSN that we set to
study behavior and performance of simple data compression. We separate this section into
two parts which are the algorithms for wireless sensor node and base station node.
Experimental results and the performance analysis are reported in Section 4 where the
implications of applied simple data compression to memory and energy consumption are
discussed. Finally, Section 5 is the conclusion of this paper.

2. SIMPLE DATA COMPRESSION ALGORITHM IN WIRELESS
SENSOR NETWORKS
This section divides the explanations into two parts. First is the simple data compression
algorithm for wireless sensor node. This algorithm is used to compress acquired data at
wireless sensor node which minimizes the number of bits required for data transmission over
the air. Second is a simple recompression technique for base station node. This simple
recompression technique is used to further reduce memory storage space requirement of all
acquired data from network.
Simple data compression algorithm for wireless sensor node used in this work is the
algorithm described in [9]. Note that this algorithm is modified from the original algorithm in
[8] which we changed a part of compression scheme from Huffman variable length code to
Fixed Index as shown in the first column of Table 1. This is done to support the sensory data
that may have large amount of standard deviation. We implemented this algorithm in our
wireless sensor node prototype. This algorithm considers only differences between newly
acquired data and previously acquired data. The compression concept is to convert such
difference of acquired data into a set of low-order bits of 1s complement and concatenate
those low-order bits to an appropriate index to generate compressed data as described in [9].
For the base station node, we implemented a simple recompression technique to reduce
memory storage space requirement of collected data. In this work, we consider four wireless
sensor nodes that acquire data from equipped sensors and forward to a base station node. The
base station node receives a data item from each sensor node as a tuple of two values. Node
ID and compressed data are denoted by a tuple shown as <NodeID, Measurement>. A
compressed group of items is also a tuple but with a common NodeID and followed by
measurement data of each sensor node. In a common NodeID, we used only one bit to
represent the sensor node that the base station received its data. This means that the size of a
common NodeID depends on the number of sensor nodes within base station nodes
communication range. So, a tuple of compressed group of items is shown as <n bits common
NodeID, Measurement_1, Measurement_2,, Measurement_n-1, Measurement_n >. This
simple technique is applied to the base station node prototype in our experiment for
recompressing received compressed data.
G00114
March 23-26, 2010
664

Table 1. Compression rule of wireless sensor nodes simple data compression.

Index Difference on acquired data Low-order bits of the 1s complement
0000 0
0001 -1,+1 0,1
0010 -3,-2,+2,+3 00,01,10,11
0011 -7,,-4,+4,,+7 000,,011,100,,111
0100 -15,,-8,+8,,+15 :
0101 -31,...,-16,+16,...,+31 :
0110 -63,...,-32,+32,...,+63 :
0111 -127,,-64,+64,,+127 :
1000 -255,,-128,+128,,+255 :
1001 -511,,-256,+256,,+511 :
1010 -1023,,-512,+512,,+1023 :
1011 -2047,,-1024,+1024,,+2047 :
1100 -4095,,-2048,+2048,,+4095 000000000000,,011111111111,100000000000,,111111111111

3. EXPERIMENTAL SETUP
3.1 Hardware description
In real world implementation we applied simple data compression to relative humidity and
temperature data acquired from Sensirion SHT15 [13]. This device can measure both relative
humidity and temperature in one package. The sensor was attached to a prototype
microcontroller board which had a 16-bit microcontroller and a single chip UHF (Ultra high
frequency) RF (Radio frequency) transceiver. This platform is used as wireless sensor node.
A MSP430F169 [14] is used as the microcontroller on the board to control an RF transceiver
and manage wireless sensor node process. A CC1100 [15] is a single chip UHF RF
transceiver that we used for wireless communication between wireless sensor nodes and a
base station node. The frequency of wireless communication in this experiment was set to 315
MHz with 0 dBm output power.
On base station node, a CC1100 transceiver module is equipped with another prototype
microcontroller board which has a 32-bit microcontroller and a built-in SD-card based
memory storage device. An STM32F103 ARM Cortex-M3 [16] is used as the microcontroller
to control the base station node process. At this node we collected both compressed
temperature and relative humidity data from wireless sensor nodes. The stored data in SD-
card is the data that we applied simple recompression technique described in Section 2. The
acquired data were recorded into the SD-card by using FAT16 file system [17] to manage
data storage space. We used a real time clock (RTC) provided in ARM Cortex-M3 as timer to
manage the sampling interval.

Figure 1. Hardware measurement experiment setup of four wireless sensor
nodes and a base station in the middle.

Wireless sensor
node
SHT15
Base station
node
G00114
March 23-26, 2010
665
In this experiment, we deployed one base station node and four wireless sensor nodes for
setting up a wireless sensor network. The star network topology was used for interconnections
in this experiment. Each wireless sensor nodes was installed at approximately two meters
away from base station node as shown in Figure 1. By using these hardware prototypes, we
collected data from sensors every one minute for one day (2460 = 1,440 samples) to study
the effect of data compression to memory and energy consumption.
3.2 Sensor node process
On each wireless sensor node, after microcontroller is powered on we initialize CC1100
RF transceiver and setup starting value of temperature and humidity data in memory. These
two values are used for calculating data difference at the first measurement. This value can
help the sensor node to send only a few bytes of data to the base station node when it begins
its first measurement. Note that acquired data from sensor in this work is an integer or raw
data acquired from SHT15. After initialization, we enable CC1100 in receiver mode to wait
for any commands from base station node. Then, we enable interrupt systems and enter low-
power mode to minimize the power consumption. If there is an interrupt from CC1100 the
microcontroller will wake up and check the received data from CC1100 via SPI (Serial
Peripheral Interface) communication. If the received data is a command from the base station
node, the sensor node will read a measurement of data from SHT15 and wait for its available
time slot on the wireless channel to send compressed data back to base station node by using
simple data compression described in Section 2. Note that we avoid packet collision by using
TDMA (Time division multiple access) technique in this experiment. The data sent back to
base station node is inserted into a packet format shown in Table 2. Each field in a packet
format in Table 2 is equal to one byte. Note that we used only one command for this
experiment. The available time slot on wireless channel was managed by assigning different
fixed delay time for each wireless sensor node. The compressed data consist of compressed
temperature data concatenating with compressed humidity data. If sensor node received
another data from a base station or its neighbors, it will put CC1100 into receiver mode and
enter low-power mode again.

Table 2. Packet and data format.

Packet format
With data compression
Slave node Tx data Packet
Length
Source Address or Node
ID
Data_A0 Data_A1 Data_B0 Data_B1

Base station node Tx data
Packet Length Source Address or Node ID Command

Without data compression
Slave node Tx data Packet Length
(variable)
Source Address
or Node ID
Data_C0
Data_C1
(Optional)
Data_C2
(Optional)
Data_C3
(Optional)

Base station node Tx data
Packet Length Source Address or Node ID Command

Data format at Master node
Unused data compression
Node
ID 1
Data
A0
Data
A1
Data
B0
Data
B1
Node
ID 2
Data
A0
Data
A1
Data
B0
Data
B1
Node
ID 3
Data
A0
Data
A1
Data
B0
Data
B1
.
Used data compression
List of
node ID
Node 1 data
(Variable length: Maximum 4 bytes)
Node 2 data
(Variable length: Maximum 4
bytes)
Node 3 data
(Variable length: Maximum 4
bytes) .
Byte 0 Byte 1 Byte 2 Byte 3 Byte 0 Byte 1
Byte
2
Byte 3 Byte 0 Byte 1
Byte
2
Byte 3

3.3 Base station node process
Process at base station node starts by setting up the RTC which is used to manage the
sampling interval of data acquisition. Then, we initial CC1100 RF transceiver in which we
utilize simple broadcasting technique. Whenever a minute has passed, we broadcast a request
data command to all sensor nodes and set CC1100 to receiver mode and enter a low-power
G00114
March 23-26, 2010
666
mode to wait for the requested data. After base station node received data from its children,
we utilized simple recompression technique described in Section 2 to the received data to
minimize memory storage space. Finally, recompressed received data were saved into SD-
card by using FAT16 file system to manage data storage space. Table 2 shows packet and
data format that we used in this experiment. We also designed packets and data formats for
uncompressed data which were used for making comparisons to point out the performance of
our simple data compression.

4. EXPERIMENTAL RESULTS AND DISCUSSION
In this section, we point out the advantage of applying simple data compression in wireless
sensor networks which can alleviate both memory and energy constraints. By applying simple
data compression to wireless sensor nodes, total amount of temperature and relative humidity
data that we acquired from sensor in our experiment can be reduced. This also affects the
memory requirement at base station node which is a collecting node to require less memory
storage space than without applying data compression. Table 3 illustrates the total
temperature and relative humidity data of every one minute for one day from our experiment.
By applying simple data compression at each wireless sensor node, total acquired data (both
temperature and relative humidity data) can be reduced from 5,760 bytes of uncompressed
data to 2,865.75 bytes on average from four wireless sensor nodes or 50.25% of compression
ratio. Note that we considered only recorded measurement data in SD-card excluding Node
ID or List of Node ID on data format.
At base station node, after applying simple recompression technique, all acquired data
from its children use only 12,903 bytes which is 55.19% of compression ratio. Without data
compression, all acquired data required 28,800 bytes for memory storage space. The simple
recompression technique can improve the compression ratio on top of simple data
compression applied at each wireless sensor node by 4.94%. This indicates that when the total
temperature and relative humidity data at sensor node is reduced, the total temperature and
relative humidity data at base station node can also be reduced. Without data compression, we
used four bytes for buffering acquired data from SHT15 at sensor node which contain two
bytes for buffering temperature and another two bytes for relative humidity data. In addition,
because the total number of bits required for transmission at sensor node is reduced by the
reason of the compression algorithm, the overall cost for transmission is lower when
compares to the cost for transmission of uncompressed data [3]. This could produce an energy
saving for wireless sensor node.

Table 3. Total amount of temperature and relative humidity data every one
minute in one day.
Used data compression Unused data compression Compression ratio
@ each sensor node 2,865.75 Bytes 5,760 Bytes 50.25%
@ master node 12,903 Bytes 28,800 Bytes 55.19%

5. CONCLUSION
According to our experimental results, we demonstrated that applying simple data
compression in wireless sensor networks could alleviate both memory and energy constraints.
By applying our simple data compression to wireless sensor node can reduce the transmission
data over the wireless channel. This achievement also reduces the overall cost of transmission
of wireless sensor network compared to the cost of transmission of uncompressed data
because the total number of bits required for transmission at sensor node is reduced. When the
data from wireless sensor node is reduced, the received data at base station node is also
reduced. Moreover, a simple recompression technique can enhance the compression ratio of
received compressed data further at base station node. These minimizations can reduce the
memory storage space requirement if we collect data from several sensors. We believe that
our simple data compression in this work is a good alternative to use in real world wireless
sensor networks, since it can alleviate both memory and energy constraints. In the future, we
plan to demonstrate the energy consumption reduction of applying these simple data
compression in real world wireless sensor networks.
G00114
March 23-26, 2010
667
REFERENCES
1. Feng Zhao and Leonidas J. Guibas, Wireless Sensor Networks: An Information
Processing Approach, 1st ed., Elsevier/Morgan-Kaufmann, 2004.
2. Naoto Kimura and Shahram Latifi, A Survey on Data Compression in Wireless Sensor
Networks, Information Technology: Coding and Computing (ITCC05) 2005, Vol. 2, 8-
13.
3. Andrew van der Byl, Robert Neilson, Richardt H. Wilkinson, An Evaluation of
Compression Techniques for Wireless Sensor Networks, IEEE AFRICON, 2009.
4. D. Petrovic, R. C. Shah, K. Ramchandran, and J. Rabaey, Data Funneling: Routing with
Aggregation and Compression for Wireless Sensor Networks, In Proceedings of First
IEEE International Workshop on Sensor Network Protocols and Applications, May 2003.
5. T. Arici, B. Gedik, Y. Altunbasak, and L. Liu, PINCO: a Pipelined In-Network
Compression Scheme for Data Collection in Wireless Sensor Networks, Computer
Communications and Networks: ICCN, Oct. 2003, 539- 544.
6. E. Magli, M. Mancin, and L Merello, Low-Complexity Video Compression for Wireless
Sensor Networks, In Proceedings of 2003 International Conference on Multimedia and
Expo, July 2003.
7. S. S. Pradhan, J. Kusuma, and K Ramchandran, Distributed Compression in a Dense
Microsensor Network, IEEE Signal Processing Magazine, Volume: 19, Issue: 2, pp. 51-
60, March 2002.
8. Francesco Marcelloni and Massimo Vecchio, A Simple Algorithm for Data Compression
in Wireless Sensor Networks, IEEE Communications Letters, 2008, Vol. 12 (No. 6), 411-
413.
9. P. Sornsiriaphilux, D. Thanapatay, K. Kaemarungsi and K. Araki, Performance
Comparison of Data Compression Algorithms Based on Characteristics of Sensory Data
in Wireless Sensor Networks, IC-ICTES, Jan. 2010.
10. David Salomon, Data Compression the Complete Reference, 4th ed., Springer-Verlag
London Limited, London, 2007.
11. S. Puthenpurayil, Ruirui Gu, S.S. Bhattacharyya, Energy-Aware Data Compression for
Wireless Sensor Networks, Acoustics Speech and Signal Processing, 2007, Vol. 2, II-45-
II-4.
12. Michael J. Neely, Dynamic Data Compression for Wireless Transmission over a Fading
Channel, Information Sciences and Systems, 2008, 1210-1215.
13. SHT15 - Digital Humidity Sensor (RH&T) [Online]. Available:
http://www.sensirion.com/en/01_humidity_sensors/03_humidity_sensor_sht15.htm
14. MSP430 - The Texas Instruments MSP430 family of ultralow power microcontrollers
[Online]. Available: http://focus.ti.com/docs/prod/folders/print/msp430f169.html
15. CC1100 - Single Chip Low Cost Low Power RF Transceiver [Online]. Available:
http://focus.ti.com/docs/prod/folders/print/cc1100.html
16. STM32F (CORTEX-M3) - 32-bit Microcontrollers [Online]. Available:
http://www.st.com/mcu/inchtml-pages-stm32.html
17. File Systems [Online]. Available:
http://technet.microsoft.com/en-us/library/cc938947.aspx

ACKNOWLEDGMENTS
Technology Development Agency (NSTDA), Tokyo Institute of Technology (Tokyo Tech),
National Research Council of Thailand (NRCT) and Kasetsart University (KU).
G00115
March 23-26, 2010
668
Facial Reconstruction from Skull

A. Namvong
C
and R. Nilthong
School of Science, Mah Fah Luang University,
333, Moo 1, Tasud, Muang, Chiang Rai, 57100, Thailand
C
E-mail: ariyanamvong@gmail.com; Fax: 053-916776; Tel. 053-916775

ABSTRACT
The purpose of facial reconstruction is to estimate the facial outlook from skeleton
remain and to aid in human identification. The reconstruction is obtained by
deformation the craniometric landmarks of known skull into unknown skull. Forcing
soft tissue of the known skull to the unknown skull with the same deformation gives
the desired shape of the soft tissue for the unknown skull. This research uses 39
craniometric landmarks combining from five references [1,2,3,4,5]. For the
deformation process, the application of Free-from deformation (FFD) [6,7,8] is used.
The 3D head model is acquired by using computed tomography (CT) with 1 mm
resolution. This experiment attempts to deform the skull and skin of one 48 years old
woman into another 42 years old woman. The preliminary visual result shows that it is
possible to use this scheme for forensic facial reconstruction. Future development of
this research will try to collect more reference head models and use the average skin
deformation from various head models.

Keywords: Facial Reconstruction, Free-form Deformation.

1. INTRODUCTION
If the usual methods are impossible to identify the skeleton remains, then the possibility of
facial reconstruction from the skull is considered. It is true that there are many ways in which
soft tissue may cover the same skull leading to different facial outlook. So, the purpose of
facial reconstruction is not to produces an accurate likeness of the person during life but the
task is successful if it provides a positive effect to human identification from skeleton
remains. With the assumption that the underlying skeleton affects directly the overall aspect
of facial outlook, we consider that facial reconstruction is possible.
The successful of manual clay sculpting is depending on combination of the ability,
anatomical and anthropological knowledge of the artist, while the successful of computer-
aided reconstruction is depending on the number of head database and also the skill of
craniometric landmarks localization of user. The facial reconstruction is obtained by
deformation the craniometric landmarks of known skull into unknown skull or target skull.
Forcing soft tissue of the known skull to the unknown skull with the same deformation gives
the desired shape of the soft tissue for the unknown skull.
In this paper we try to develop a novel method for facial reconstruction through the use of
volume deformation. Current volume deformation computer-based facial reconstruction
methods differ mainly by the selection of landmarks points on the skull or craniometric
landmarks, methods used to registration and deformation the model towards a given target
skull.
Our procedure can be summarized as follows. For each head model we manual locate 39
craniometric landmarks editing from five references [1,2,3,4,5] then rough registration using
Frankfort horizontal plane, after that fine registration using Iterative Closest Point (ICP),
finally we deform craniometric landmarks of known skull into target skull using Free-Form
Deformation (FFD).
The remainder of this paper is organized as follows. In Section 2 we review theory of Free-
Form Deformation. In Section 3 we describe our facial reconstruction method. Section 4
G00115
March 23-26, 2010
669
shows the results of our experiment. The paper concludes showing directions for future
research in Section 5.

2. FREE-FORM DEFORMATION (FFD)
Free-form deformation was introduced by Sederberg and Parry [6,7,8] that known to be a
powerful shape modification method that has been applied to geometric modeling . This
technique deforms an object by embedding it with in a solid defined with a control lattice. A
change of the lattice deforms the solid and hence the object as seen in Figure 1. FFD is
defining to involve with 1D, 2D and also 3D data. We can compute the new location P of an
old location P after deform control point from P
ijk
to P
ijk
as follows:

1D FFD:

l
i
i
l
i
P t B P
0
) (
(1)

2D FFD:

l
i
m
j
ij
m
j
l
i
P t B s B P
0 0
) ( ) (
(2)

3D FFD:

l
i
m
j
n
k
ijk
n
k
m
j
l
i
P u B t B s B P
0 0 0
) ( ) ( ) (
(3)

Bernstein Polynomials:
i n i n
i
t t
i i n
n
t B

) 1 (
! )! (
!
) ( (4)

where point P is a new location at (s,t,u) of an old point P at (s,t,u) after deform
control point P
ijk
to P
ijk
,and l, m, n are the number of control points minus 1 in x, y, z axis.
( 0 s,t,u,s,t,u 1)

Figure 1. Example of FFD [9]

3. METHODOLOGY
Figure 2 shows the main step of our procedure. First step, we manually locate landmarks
on both unknown skull or target and known skull or reference. Second step, we align two
skulls in to common position. This stage contains two processes, rough alignment and fine
G00115
March 23-26, 2010
670
alignment. Rough alignment is to make two skulls in to the same orientation called Frankfort
horizontal plane, this is for supporting the performance of next process, fine alignment that
using ICP algorithm. Finally, we deform craniometric landmarks of known skull into target
skull using Free-Form Deformation.

Figure 2. Methodology.

3.1 Craniometric Landmarks
Figure 3 and Table 1 shows the 39 craniometric landmarks used in this paper. There are
two types of landmarks, central landmarks and lateral landmarks. Central landmarks laid on
the central of the skull, lateral landmarks laid on left and right side of the skull.

Figure 3. Craniometric landmarks.

G00115
March 23-26, 2010
671
Table 1. Craniometric Landmarks

# Central Landmarks # Lateral Landmarks
1 Glabella 12 Supraorbital
2 Nasal 13 Inner orbital
3 End of nasal 14 Outer orbital
4 Nose tip estimation 15 Suborbital
5 Mid-philtrum 16 Zygoma
6 Upper lip margin 17 Inferior malar
7 Incisor 18 Outer nasal
8 Lower lip margin 19 Lower nasal
9 Chin-lip fold 20 Supracanine
10 Mental eminence 21 Subcanine
11 Beneath chin 22 Outer Zygoma
23 Mid-mandible
24 Gonion
25 Occlusion line

3.2 Skull Alignment
This stage contains two processes, rough alignment and fine alignment. First process is to
make two skulls have the same orientation called Frankfort horizontal plane. The position of
Frankfort horizontal Plane is like someone looking straight ahead. The technical explanation
of positioning the skull this way is to have the lowest point on the lower margin of the orbit
align horizontally with the top edge of the external auditory meatus (the ear hole) [1]. See
Figure 4 for an illustration of this position.

Figure 4. Frankfort horizontal plane.

Second process, fine alignment uses the result from first step as an initial alignment.
The method used in this process is called Iterative Closest Point or ICP. ICP is a
straightforward method to align two free-form surfaces [10,11]. The algorithms of ICP to
align surface X and surface P are as follow:

The Iterative Closest Point Algorithm
- Initial transformation
- Iterative procedure to converge to local minima
1. p P find closest point x X
2. Transform P
k
P
k+1
to minimize distances between each p and x
3. Terminate when change in the error falls below a preset threshold

3.3 Free-Form Deformation
From Section 2 we mention FFD in the manner called global deformation; in this
research we use FFD in the manner of local deformation. As seen in Figure 5 demonstrate the
use of local FFD, the middle column shows original skull and face, the left column shows the
G00115
March 23-26, 2010
672
result of push the skull at the incisor inside that also deform the shape of face. In the right
column shows the result of push incisor outside.
Reach to this step, two skulls have the same orientation and have the closest distance
between reference skull and target skull. Now it is ready to deform all craniometric landmarks
point by point from reference skull to target skull to reconstruction the face of target skull.
The reference skull, reference face, target skull and also result from our facial reconstruction
are show in Section 4.

Figure 5. Demonstration of local FFD.

4. EXPERIMENTAL RESULTS
In this experiment, the 3D head model of reference head and target head are acquired by
using computed tomography (CT) with 1 mm resolution. We attempt to deform the skull and
face of one 48 years old woman into another 42 years old woman. In Figure 6 show reference
skull, reference face, target skull and also target face that derived from our procedure.
Reference face and reference skull are in the left column. The results derived from facial
reconstruction are in the middle column and real faces of target are in the right column. We
can see that the result is still bias to reference face. The result can be considered look like
reference face more than look like target face that it suppose to be. But when observing the
result we can see that it is possible to use this procedure to reconstruction face from skeleton
remains.

G00115
March 23-26, 2010
673

Figure 6. Experimental results, reference head (left column), result from this
approach (middle column) and target head (right column).

G00115
March 23-26, 2010
674
5. CONCLUSION
We cannot claim that this research successful because the results cannot be considered the
same as target face. But we have to accept the fact that the differences between the reference
head and target head reduce the successful of facial reconstruction. In this experiment we
have the limitation of number of head database for we have only a few head model. So, we
cannot use the reference skull that is more similar to target skull. On the other hand, the
preliminary visual result shows that it is possible to use this scheme for forensic facial
reconstruction. Future development of this research will try to collect more reference head
models for we can start from more similar reference head for facial reconstruction, and we
will use the average skin deformation from various reference head models to reduce the
reference bias.

REFERENCES
1. L. Gibson, Forensic Art Essentials: A Manual for Law Enforcement Artists, 1
st
edition,
Academic Press, London, 2008, 266-269, 303-305.
2. C. Wilkinson, Forensic Facial Reconstruction, 1
st
edition, Cambridge University Press,
Cambridge, 2004, 71-73.
3. K.T. Taylor, Forensic Art and Illustration, 1
st
edition, CRC Press LLC, Washington D.C.,
2001, 348-359.
4. P. Claes, D. Vandermeulen, S.D. Greef, G. Willems and P. Suetens, Statistically
Deformable Faces Models for Cranio-Facial Reconstruction, Journal of Computing and
Information Technology CIT 14, 2006(1), 21-30.
5. A.F. Abate, M. Nappi, S. Ricciardi and G. Tortora, FACES: 3D Facial reConstruction
from anciEnt Skulls using content based image retrieval, Journal of Visual Languages &
Computing, 2004(15), 373-389.
6. T.W. Sederberg, Computer Aided Geometric Design Course Notes, Department of
Computer Science, Brigham Young University, Utah, 2007, 133-135.
7. T.W. Sederberg and S.R. Parry, Free-form Deformation of Solid Geometric Models,
Computer Graphics, 1986, 20(4), 151-160.
8. W. Song and X. Yang. Free-Form Deformation with weighted T-spline. The Journal of
Visual Computer, 2005, 21(3), 139-151.
9. R. Barzel. Computer Graphics Animation Course Notes, Ecole Polytechnique, France.
2003, na.
10. P.J. Besl and N.D. Mckay. A Method for Registration of 3-D Shapes, IEEE Transactions
on Analysis and Machine Intelligence, 1992, 14(2), 239-255.
11. K. Bae. Automated Registration of Unorganised Point Clouds from Terrestrial Laser
Scanners, Ph.D. Dissertation, Department of Spatial Sciences, Curtin University of
Technology. Bentley, W.A, Australia. 2006, 1-9.

ACKNOWLEDGMENTS
We would like to thank Mr.Sorawee Thanawong, chief of X-Ray Department, Overbrook
Hospital, Chiang Rai, Thailand, for his precious help in the head computer tomography data
acquisition phase and also thank to the volunteers that makes this research possible.

G00116
March 23-26, 2010
675
Online Object Detection Program Using Fast Image
Processing

N. Charawae
1,C
,S. Chuai-Aree
1
, S. Wikaisuksakul
1
1
C
E-mail: ninasree_j@hotmail.com; Tel. 089-2989414

ABSTRACT
Object detection plays an imprtant in video processing because it can be used in
robotic technology, monitoring system, product management in industry, etc. This
paper presents a method to detect the preferred object in online webcam video. The
robot consists of a camera controlled by a computer program to capture pictures used in
the calculation of the vector of the object movement. This project is implemented using
Delphi7 and DSPack library. Block-matching algorithm has been applied to process
and find the matching between macro block of current frame and reference frame. In
addition the fast image processing is implemented by using scanline technique for real-
time processing. The results from the program show a current frame from webcam with
boundary of the object detection. From the program implemented can be applied in the
field of robotic technology, object detection and security system.

Keywords: Object Detection, Parallel Port, Image Processing, Step Motor

REFERENCES
1. Black, J., Ellis ,T.J., and Rosin ,P. ,A NovelMethod for Video Tracking Performance
Evaluation. In IEEE Workshop on Performance Analysis of Video Surveillance and
Tracking (PETS2003), 2003, 125132
2. Greenhil, D., Renno, J.R., Orwell, J., and Jones, G.A., Learning the Semantic Landscape:
Embedding scene knowledge in object tracking. Real Time Imaging, 2005, 11, 186203
3. Nascimento, J., and Marques, J.S., New Performance Evaluation Metrics for Object
DetectionAlgorithms. In IEEE Workshop on Performance Analysis of Video Surveillance
and Tracking (PETS2004), 2004
G00119
March 23-26, 2010
676
Exploration of Parallelism in Developing Fuzzy Applications

C. Chantrapornchai (Phongpensri)
1,C
, J. Pipatpaisan
1

1
Department of Computing, Faculty of Science, Silpakorn University, Nakorn Pathom, 73000 Thailand
C
E-mail: ctana@su.ac.th; Fax: 66-34-272-923; Tel. 66-34-272-923

ABSTRACT
Developing fuzzy applications contain many steps. Each part may require lots of
computation cycles depending on applications and target platforms. In this work, we
explore the possibility in developing parallel fuzzy applications. In particular, we
develop the parallel fuzzy library based on OpenMP. We explore the parameters in
fuzzy applications that affect the parallel performance. The developed parallel library
version is intended to use on the embedded platform. The experiments show the
speedup of the parallel fuzzy library for various tested parameters as well as the
overhead of the parallel version.
Keywords: Parallel Computing, OpenMP, Fuzzy Applications.

REFERENCES
1. Addison, C., OpenMP 3.0 Tasking Implementation in OpenUH , Workshop in Open64 workshop in
conjunction with OCG 09.
2. Chandrasekaran, S., Hernandez, O., Maskell, D., Chapman, B., Bui, V., Compilation and
Parallelization Techniques with Tool Support to Realize Sequence Alignment Algorithm on FPGA
and Multicore, Proc. Workshop on New Horizons in Compilers, India, 2007.
3. Chantrapornchai, C., Tongsima ,S., Sha, E-M., Rapid Prototyping Techniques for Fuzzy
Controllers. Lecture Notes in Computer Science: Advances in Computing Science ASIAN 99, 37-
49.
4. Chantrapornchai ,C., Rapid prototyping Methodology and Environment for Fuzzy Applications.
Lecture Notes in Computer Science (ICCS 2003), 940-949.
5. Chapman, B., et. al., Implementing OpenMP on a High Performance Embedded Multicore MPSoC,
In Proc. of Workshop on Multithreaded Architectures and Applications (MTAAP'09) In
conjunction with International Parallel and Distributed Processing Symposium (IPDPS 2009),
May 25-29, 2009, Rome, Italy, 1-8.
6. Cabrera D., Martorell, X., Gaydadjiev, G., Ayguade, E., OpenMP extensions for FPGA
Accelerators, International Symposium on Systems, Architectures, Modeling, and Simulation,2009.
7. Falchieri, D., Gabrielli, A., Gandolfi, E., A digital fuzzy processor for fuzzy-rule-based systems.
Hardware implementation of intelligent systems. 147-164.
8. Leow, Y.Y., Ng, C.Y., Wong, W.F.. Generating hardware from OpenMP programs , IEEE
International Conference on Field Programmable Technology, 2006 (FPT 2006), 73-80.
9. Ross, T.J., Fuzzy Sets, Fuzzy Logic and Fuzzy Systems: Theory and Applications, McGraw Hill,
1995.
10. Sima, V.-M. Panainte, E.M. Bertels, K., Resource allocation algorithm and OpenMP extensions
for parallel execution on a heterogeneous reconfigurable platform, International Conference on
Field Programmable Logic and Applications 2008 (FPL 2008), 651-654.
11. Shobhit Kanaujia , Irma Esmer Papazian ,Jeff Chamberlain , Jeff Baxter , FastMP: A Multi-core
Simulation Methodology, Workshop on Modeling, Benchmarking and Simulation, 2006.
12. Song, C.T.P., Quigley, S.F., Pammu, S., Novel analogue fuzzy inference processor, IEEE
International Symposium for Circuits and Systems, 247-250.
13. http://www.fuzzytech.com
14. http://www.cs.cmu.edu/afs/cs/project/ai-respository/ai/areas/fuzzy/systems/fuzzyfan
15. http://ffll.sourceforge.net/downloads.htm
16. http://www.mathworks.de/products/demos/shipping/fuzzy/defuzzdm.html#3
17. http://www.micrium.com
G00120
March 23-26, 2010
677
Determination of Sequence-Similarity Kernel Function
for Support Vector Machines in Classification of Influential
Endophytic Fungi in Rice on Bakanae Disease

P. Mekha
1,C
, J. Chaijaruwanich
1

1
Department of Computer Science, Faculty of Science, Chiang Mai University, Chiang Mai, 50202
Thailand
C
E-mail: g510531108@cmu.ac.th; Fax: 053-943433; Tel. 081-8818124

ABSTRACT
Bakanae disease of rice is widely distributed in Asia and was first recognized in Japan
in 1828 and California in 1999. The word bakanae is a Japanese word that means
"foolish seedling" and describes the excessive elongation often seen in infected plants.
The disease has now become widespread throughout the rice growing areas and some
infested fields suffered significant yield losses this past season. As the classification of
this disease involve in the use of costly chemicals and equipments, computational (data
mining) approaches are very important to help decreasing both costs and labors. We
propose a determination of sequence-similarity kernel for support vector machines
(SVMs) approach to the DNA classification problem. Since, the kernel allows non-
linear classification of test sequences and adopting codon bias as feature inputs.
Furthermore, kernel is conceptually simple and efficient to compute. Our experiments
show that string-based kernels with SVMs could offer a viable and computationally
efficient alternative to other methods for DNA classification and homology detection.
Besides, we compared with state-of-the-art methods for homology detection.

Keywords: Support Vector Machines (SVMs), Sequence-similarity, DNA, Codon
usage bias, Bakanae disease, Data Mining.

REFERENCES
1. B. Scholkopf, and A.J. Smola, A tutorial on support vector regression, Kluwer
Academic Publishers., pp. 199222, 2004.
2. Christina S. Leslie, Eleazar Eskin, A String Kernel for SVM Protein Classification.,
Pacific Symposium on Biocomputing, pp. 566-575, 2002.
3. Ivica M. MarKovic, Branimir T. Todorovic, Sequential Training of Support Vector
Machine, IEEE, 2008.
4. Jianmin Ma, Minh N. Nguyen, and Jagath C. Rajapakse, Gene Classification Using
Codon Usage and Support Vector Machines, IEEE/ACM, 2009.

G00122
March 23-26, 2010
678
Digital Watermarking with 2D Barcode and General
Watermark using DCT for JPEG Image

B.Somkror
1C
and E.Boonchieng, Ph.D
2
1
College of Integrated Science and Technology, Rajamangala University of Technology Lanna,
Chiang Mai, 50300 Thailand
2
Department of Computer Science, Faculty of Science, Chiang Mai University,
Chiang Mai, 50202, Thailand
C
E-mail: somkror@gmail.com; Tel. 081-9923824

ABSTRACT
Discrete Cosine Transform (DCT) is one compression and steganography method that
we can embed secret data into digital content. Digital media could be still image file,
video file or audio file. Objective of steganography is for information hiding without
detecting by human vision. But, digital watermarking is a technique of copyright
protection that can protect digital media from making an unauthorized copy. In this
paper, we are interested in extracting images. The result shows that all extracted images
are close to original images after they pass through our embedded method. We use a 2D
barcode as a watermark for embedding. Watermark will embed to original image by
separate 2D barcode and general watermark into channel of Red Green or Blue channel
(RGB) in color JPEG with DCT compression. After that extract 2D barcode watermark
and general watermark then compare original image and extracted image to find
performance of embedding and extracting method.

Keywords: Steganography, Digital Watermarking, Discrete Cosine Transform, 2D
barcode.

1. INTRODUCTION
In the way of customer on Internet, we get watermarked image from seller key,
watermark for extract watermark out. Because, customer need original image but seller got a
copyright protection and seller sold only watermarked image, watermark and key. Many
images need maximum quality nearest original image, like print image. So, we need an
appropriate way to protect for seller and customer get maximum quality of image which seller
can give only watermark and key.
The Internet is an excellent distribution system for digital media because it is
inexpensive, eliminates warehousing and stock. And delivery is very easy to transfer. Most
current problems of photography image are pirate digital photo. Not only digital image, every
digital media got this problem, too. One way that can solve this problem is digital
watermarking. In this paper we will focus only on digital image. The Discrete Cosine
Transform (DCT) become an international standard transform coding system [7] and is
adequate for most image compression. JPEG standard always use a lossy baseline coding
system which is based on DCT, which provides a good compromise between information
packing ability and computational complexity. 2-Dimension barcode is a choice to encode
data to be an image that it can keep data mostly 4,000 characters. It is interested in robustness
from damage and it is easily detectable as having only two level of value [1]. In this paper, we
interested in extracted image after extract watermark image and data matrix out compare with
original image.

G00122
March 23-26, 2010
679
2. INFORMATION HIDING
Two interesting technique of information that interested in this paper is
Steganography and Digital watermarking. Which, Steganogrphy is the art of information
hiding. It used for communicating in a way which hides a secret message in the main
information from third personal who listen between data communication. Digital
Watermarking is the process that embeds data called a watermark, tag or label into a
multimedia object such that watermark can be detected or extracted later to make an assertion
about the object. The object may be an image or audio or video. It may also be text only.
Steganography and Digital watermarking primarily differs by intent of use. A watermark can
be perceived as an attribute of the cover. It may contain information such as copyright, license
and authentication. In case of steganography, the embedded message may have nothing to do
with the cover. In steganography an issue of concern is bandwidth for the hidden message
whereas robustness is of more concern with watermarking [9].

3. DIGITAL WATERMARKING
Digital Watermarking is intended by its developers as the solution to the need to
provide value added protection on top of data encryption and scrambling for content
protection.
For encoding process, Let us denote an image by , a signature by and
the watermarked image by . is an encoder function, it takes an image and a signature ,
and it generates a new image which is called watermarked image , mathematically

And decoding process, A decoder function takes an image ( can be a
watermarked or un- watermarked image, and possibly corrupted) whose ownership is to be
determined and recovers a signature from the image. In this process an additional image
can also be included which is often the original and un-watermarked version of . This is due
to the fact that some encoding schemes may make use of the original images in the
watermarking process to provide extra robustness against intentional and unintentional
corruption of pixels. Mathematically,[9]

A type of digital watermark divided by human perceptual, its can be show in three type as
First, Visible watermark that appear on image, Second, Invisible robust watermark is embed
in a way on the pixel value change at least value. Finally, invisible-fragile watermark is
embed in the way that watermark will destroy when it has any change on embed image.

4. DISCRETE COSINE TRANSFORM
For each color component, the JPEG image format uses a discrete cosine transform
(DCT) to transform successive 8 8 pixel blocks of the image into 64 DCT coefficients each.
The DCT coefficients of an 8x8 block of image pixels are given by

Where when equal 0 and

G00122
March 23-26, 2010
680
The following operation quantizes the coefficient

is a 64- element quantization table. We can use the least-significant bit of the
quantized DCT coefficients as redundant bit in which to embed the hidden message.
The Modification

5. 2-DIMENSIONS BARCODE
Barcode widely use in product stock to keep data of name, code price and many data.
But 1-Dimension barcode is not keeping enough data storage, so 2-Dimension barcode is
developed. 1D barcode can keep data estimate 20 characters that we can see on many product
seal. 2D barcode was developed from 1D barcode. 2D barcode was designed to keep data in
vertical and horizontal. So, it can keep data most to 4,000 characters more than 1D imension
barcode in equal or lower area. We interested in Data Matrix that developed by RVSI Acuity
CiMatrix U.S.A. based on ISO/IEC 16022 Data Matrix specification. Square data matrix has
data module between 10x10 to 144x144 modules. It can store data up to 1556 byte of binary
digit. Finder pattern of data matrix defined on left and bottom border. Most of all use in limit
area and require small area of barcode. From several 2D barcodes, Data matrix is an efficient
2D barcode symbol that uses a unique square module perimeter pattern that helps the barcode
scanner to determine the cell locations. It can encode any kind of data including a letter,
numbers, text and binary data. The Data Matrix symbol is square and can range from 0.001
inch per side up to 14 inches per side. It also supports advanced encoding and error checking
with Reed Solomon error correction algorithms, named ECC200. This mechanism allows the
recognition of barcodes that are up 60% damaged. From this reason, the watermark using data
matrix can be survived other than ordinary cipher images in highly noisy environments.[6]

Figure 1. Data Matrix 2 Dimension barcode

6. PROPOSE METHOD
The advantage point of watermarking and data matrix could make more
robustness. In this paper offer method to embedded 2 watermark in color image
which in RGB color space.
G00122
March 23-26, 2010
681
Almost digital images are color images. RGB color image can be
watermarked in its three different color layer [10]. First watermark, general gray scale
watermark image, is embedding in Blue layer and second watermark, data matrix
barcode, embeds in Red layer. Green layer we will keep as original in order to not
degrade the quality of image.

Figure 2. Embedding process

Figure 3. Extract process
Extract process method is reverse method of embedding method.

7. CONCLUSION
The embedding method that we propose is an idea for seller and customer
who exchange images on the Internet. That method based on standard JPEG image,
compression, general watermarking and data matrix but focus on extract image
Original RGB
Image
Original Image
Red layer

Original Image
Green layer

Original Image
Blue layer

Gray Scale
Watermark Image
Embed

Data matrix

Embed

Key
Key
JPEG
Compression
Watermark
Image
JPEG
Compression
Extract Image
Watermark
Image
Extract Image
Red layer

Extract Image
Green layer

Extract Image
Blue layer

Gray Scale
Watermark Image
Extract

Data matrix

Extract

Key
Key
DCT
DCT
G00122
March 23-26, 2010
682
quality instead watermarked image compare with original image. Because almost
original image cannot be get from owner. So, these embed and extract method is
interesting. And Experimental of this method is in progress then result will be on
future work.

REFERENCES
1. Premaratne, P., Safaei, F, 2D Barcodes as Watermarks in Image Authentication. In: 6th
IEEE/ACIS International Conference on Computer and Information Science, 2007, 432
437
2. M. Kharrazi, H. T. Sencar, and N. Memon, Image Steganography: Concepts and
Practice,
Lecture Note Series, Institute for Mathematical Sciences, National University of
Singapore, 2004
3. H. Noda, M. Niimi and E. Kawaguchi. High performance JPEG Steganography using
Quantization Index Modulation in DCT Domain. Pattern Recognition Letters, 2006, 455-
461.
4. Ting Li, Yao Zhao, Rongrong Ni, and Lifang Yu, A High Capacity Steganographic
Algorithm in Color Images, IWDW 2008, LNCS 5450, 218228
5. Provos, N., Honeyman, P., Hide and seek: An Introduction to Steganography, Security &
Privacy, IEEE, 2003, Volume 1, Issue: 3, 32- 44
6. Jau-Ji Shen and Po-Wei Hsu, A Fragile Associative Watermarking on 2D Barcode for
Data Authentication, 2008, International Journal of Network Security, Volume.7, No.3,
301309
7. Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing, Second Editon,
Prentice-Hall, Upper Saddle River, New Jersey, 2002, 476
8. N. F. Johnson and S. Jajodia, Exploring steganography: Seeing the unseen, 1998, IEEE
Comput. Mag., vol. 31, pp. 26
9. Saraju P. Mohanty, Digital Watermarking : A Tutorial Review,
URL:citeseer.ist.psu.edu/mohanty99digital.html

10. Hanane Mirza

, Hien Thai and Zensho Nakao, Color Image Watermarking and Self-
recovery Based on Independent Component Analysis, Artificial Intelligence and Soft
Computing ICAISC 2008, Volume 5097, 839-849

G00123
March 23-26, 2010
683
One-Dimensional Hydrodynamic Calibration

S. Sairattanain
1,C
, R. Nilthong
1
, A. Eungwanichayapant
1
, and S. Saenton
2

1
Computational Science Program, School of Science, Mae Fah Luang University,
Chiang Rai, 57100, Thailand

2
Department of Geological Sciences, Faculty of Science, Chiang Mai University,
Chiang Mai, 50200, Thailand

C
E-mail: ssairattanain@yahoo.com; Fax: 0-5391-6776; Tel. 0-5391-6778

ABSTRACT
A numerical model was setup to simulate one-dimensional open-channel flow in the
Mae Lao River, Chiang Rai Province. The channel flow was monitored and simulated
over a distance of 30 km from stations G.10 to G.8 during water years 2004-2008. This
study used HEC-RAS program for simulating unsteady flow water surface profile using
implicit finite difference approach. The model requires information about channel
geometry, Manning n roughness values, discharge and starting water levels.
Perturbation method was used to estimate parameters sensitivities which included
Manning roughness coefficients of the main channel, left and right overbanks, Mae Lao
Weir and Tham Wok Weir coefficients. The model calibration targeted to minimize
sum of squared error (SSE) of hourly water levels at G.8 station during the water year
2004. The most sensitive of parameters is n main channel (5010.80) where other
parameter sensitivities are 1865.70, 271.33, 1.40, and 1.35 for n left overbank, n
right overbank, Mae Lao Weir coefficient, and Tham Wok Weir coefficient,
respectively. The value of SSE after calibration was 612.42 m
2
. The calibrated model
was later used to predict water levels of years 2005-2008. It appears that, without
further calibration, the model appreciably gave relatively good results. However, during
the dry season, numerical solutions underestimated the hourly water levels in every
subsequent water years.

Keywords: Open-channel Flow, Mae Lao River, HEC-RAS, Parameters Sensitivity,
Model Calibration.

1. INTRODUCTION
A flood is an overflow or accumulation of water that submerges land. Flooding may result
from the volume of water within a body of water, such as a river or lake, which overflows or
breaks levees, with the result that some of the water escapes its normal boundaries [1]. Mae
Lao River is the principal tributaries of Kok Basin and it is important river for agricultural for
this understanding the behavior and river flow model is important to manage Mae Lao River
flooding.
Mae Lao River runs through Chiang Rai Province, the northern Thailand. Mae Lao River
originates in Wiang Pa Pao District. It flows up and borders Mae Suai, Phan, Mae Lao,
Mueang Chiang Rai Districts and enters the Kok River in Mueang Chiang Rai District over a
distance of 117 km. The Mae Lao River have two gage station for stage recording for G.10
station (Ban Nong Pham, Mae Suai District) and G.8 station (Ban Ton Yang, Mae Lao
District). Between two stations there are two weirs across the river : Mae Lao and Tham Wok
Weirs.
A one-dimensional (1-D) unsteady-flow river model for Mae Lao River was constructed
using HEC-RAS Program [2] to compare water surface elevation between two stations during
water years 2004-2008.
G00123
March 23-26, 2010
684
2.1 One-dimensional unsteady flow model
HEC-RAS (version 4.0 beta) was developed by Hydrologic Engineering Center (HEC),
which is a division of the Institute for Water Resources (IWR), U.S. Army Corps of Engineers
[3]. It solves the mass and momentum conservation equations using implicit finite difference
approximations and Preissmans second-order scheme [4]. The physical laws which govern
the flow of water in a stream are: (1) the principle of conservation of mass (continuity), and
(2) the principle of conservation of momentum. These laws are expressed mathematically in
the form of partial differential equations, which will hereafter be referred to as the continuity
and momentum equations [2].
Continuity equation : 0 =
c
c
+
c
c
q
x
Q
t
A
(1)
Momentum equation : 0 =
|
.
|
\
|
+
c
c
+
c
c
+
c
c
f
S
x
z
gA
x
QV
t
Q
and
2 3 / 4
2
A R
n Q Q
S
f
= (2)
where A is cross sectional area (m
2
), Q is discharge (m
3
s
-1
), q is the lateral inflow per
unit length, V is velocity (ms
-1
), g is acceleration due to gravity (ms
-2
), x z c c / is the water
surface slope,
f
S is the friction slope, n is the Manning friction coefficient, R is the
hydraulic radius (m), x is distance (m) and t is time (s).

2.2 Perturbation method
Parameter sensitivity equals to the derivative of a simulated value, y' , associated with an
observation, y , with respect to one parameter, b . That is, b y A ' A / . One sensitivity is
calculated for each observation with respect to each parameter. Sensitivities are calculated
approximately using either a forward- or central- difference approximation [5].
Forward-difference:
) ( ) (
) ( ) (
b b b
b b b
A +
' A + '
=
A
' A y y
b
y
(3)
where y' A is the change in the simulated value caused by the parameter value change b A ,
bis a vector (can be thought of as a list) of the values of the estimated parameters, b A is a
vector in which all values are zero except for one which corresponds to the parameter for
which sensitivities are being calculated, b A is the nonzero value in b A which is called the
perturbation for this parameter and ) (b y' and ) ( b b A + ' y indicate that the value of the
simulated value y' is calculated using the parameter values represented bybor b b A + . The
derivative is said to be evaluated for the parameter values inb, which is important because
for nonlinear problems, the sensitivities are different depending on the values inb.
Central-difference:
) ( ) (
) ( ) (
2
2
b b b b
b b b b
A A +
A ' A + '
=
A
' A y y
b
y
(4)
where
2
A is used to denote the central-difference. Again, the derivative is said to be
evaluated for the parameter values inb.

2.3 Stepwise regression method
Stepwise regression is a modified version of forward regression that permits
reexamination, at every step, of the variables incorporated in the model in previous step. A
variable that entered at an early stage may become superfluous at a later stage because of its
relationship with other variables subsequently added to the model [6]. To check this
possibility, at each step we make a sensitivities for each parameter. The most sensitivities of
parameter was entered early and find the parameter value is minimum Sum of Square Error
(SSE) and redo process until last parameter.

G00123
March 23-26, 2010
685
3.1 One-dimensional Mae Lao River flow model
HEC-RAS performs one-dimensional steady and unsteady flow calculations on a network
of natural or manmade open channels. Basic input data required by the model include the
channel geometry, Manning n roughness values, discharge and starting water levels. In this
case HEC-RAS was implemented to simulate one-dimensional unsteady flow for Mae Lao
River between G.10 and G.8 station and basic data required as following,

3.1.1 Channel geometry
Cross sections are required at representative locations throughout a stream reach and at
locations where changes in discharge, slope and shape. There are 62 cross sections from G.10
station to G.8 station over a distance of 30 km and inline structure were create for Mae Lao
Weir at 6.990 km and Tham Wok Weir at 15.055 km (see Figure 1).

Figure 1. Map showing the location of the Mae Lao River, weir and gage station

3.1.2 Initial condition
Discharge information was required at starting point and end point in order to compute
the water surface profile at beginning time. Discharge at starting point (G.10 station) is 15.92
m
3
s
-1
and end point (G.8 station) is 7.19 m
3
s
-1
.

3.1.3 Boundary condition
Boundary conditions are necessary to define upstream and downstream. Upstream
boundary condition is hourly water level (m) at G.10 station during 1 April 2004 31 March
2005 (water year) and downstream boundary condition is Friction slope at G.8 station.

G00123
March 23-26, 2010
686
3.2 Model sensitivity and calibration
The model sensitivity and calibration was used data in water year 2004. Before calibration
the perturbation method was used to evaluate parameters sensitivities. The model calibration
used stepwise regression method to optimize parameter values by minimizing sum of squared
error (SSE). Parameters sensitivity were then calculated again after calibration. Table 1 lists
of the parameters and values were used in the model.
After calibration, the model was verified by comparing the model predicted versus the
observed stages for water years 2004-2008 at G.8 station.

Table 1. Model parameters.

No. Parameter Minimum n Maximum n References Trial value
1. n Main channel 0.025 0.033 Chow (1959) 0.030
2. n Right overbank 0.020 0.200 Chow (1959) 0.030
3. n Left overbank 0.020 0.200 Chow (1959) 0.030
4. Mae Lao Weir coefficient 2.460 4.000 Brunner (2002) 3.000
5. Tham Wok Weir coefficient 2.460 4.000 Brunner (2002) 3.000


Table 2. Sensitivity of parameters before calibration

No. Parameter Method b A Trial value Sensitivity
1. n Main channel Central difference 0.001 0.030 5010.8000
2. n Left overbank Central difference 0.001 0.030 1865.7000
3. n Right overbank Central difference 0.001 0.030 271.3300
4. Mae Lao Weir coefficient Central difference 0.001 3.000 1.3968
5. Tham Wok Weir coefficient Central difference 0.001 3.000 1.3536

Table 3. Calibration

No. Parameter Trial value Value SSE
1. n Main channel 0.030 0.02770 618.540
2. n Left overbank 0.030 0.05500 612.735
3. n Right overbank 0.030 0.03389 612.520
4. Mae Lao Weir coefficient 3.000 2.80000 612.518
5. Tham Wok Weir coefficient 3.000 2.55000 612.418

Table 4. Sensitivity of parameters after calibration

No. Parameter Method b A Value Sensitivity
1. n Main channel Central difference 0.001 0.02770 16531.0000
2. n Left overbank Forward difference 0.001 0.05500 134.2500
3. n Right overbank Forward difference 0.001 0.03389 46.7000
4. Mae Lao Weir coefficient Central difference 0.001 2.80000 1.5217
5. Tham Wok Weir coefficient Central difference 0.001 2.55000 0.7800

The Mae Lao River flow model was a one-dimensional unsteady flow model. The water
surface was deepened in times and spaces. The model was results the water surface any times
and spaces for calculation and the model appreciably gave relatively good results for all water
years.
G00123
March 23-26, 2010
687
The hourly water levels in every subsequent water years was underestimated from
observations during November to December it was show in the dry season there are some
lateral water add in to the system such as ground water (see Figures 2-3). This study does not
include lateral flow into the model it computes outflow from inflow only.

Figure 2. Hourly water levels observe and model 2004 G.10 station

(a). water year 2005 (b). water year 2006

(c). water year 2007 (d). water year 2008

Figure 3. Hourly water levels observe and model at G.10 station

G00123
March 23-26, 2010
688
5. CONCLUSION
Before calibration, parameters sensitivities are 5010.80, 1865.70, 271.33, 1.40, and 1.35
for n main channel n left overbank, n right overbank, Mae Lao Weir coefficient, and
Tham Wok Weir coefficient, respectively. After calibration the order of parameters
sensitivities does not change and the values of parameters sensitivities are 16531.00, 134.25,
46.70, 1.52, and 0.78 for n main channel n left overbank, n right overbank, Mae Lao
Weir coefficient, and Tham Wok Weir coefficient, respectively.
The value of sum of squared error (SSE) before calibration was 624.40 m
2
(water year
2004) and the values of SSE after calibration were 612.42 m
2
, 1263.93 m
2
, 934.30 m
2
, 469.67
m
2
and 687.58 m
2
for water years 2004, 2005, 2006, 2007 and 2008, respectively.
The Mae Lao River flow model can calculate water surface elevation through the river
until end of calculation (G.8 station) from initial water surface level (G.10 station) include
floodplain along the river. The model can show water surface elevation from initial condition
but cannot predict water surface elevation in the future.
In the future, we will develop the Mae Lao River model for prove the underestimate was
occur the ground water and divide Mae Lao River a few intervals and calibrate it each
intervals for more accuracies and applied the model for all of Mae Lao River.

REFERENCES
1. Glossary of Meteorology (June 2000). Flood. Retrieved on 2009-01-09.
2. Brunner, G.W., HEC-RAS River Analysis System Hydraulic Reference Manual, US Army
Corps of Engineers, Hydrologic Engineering Center, Davis, CA., 2002, 2-22, 8-10.
3. Brunner, G.W., HEC-RAS River Analysis System Users Manual, US Army Corps of
Engineers, Hydrologic Engineering Center, Davis, CA., 2008.
4. Remo, J.W.F. and Pinter, N., Retro-modeling the Middle Mississippi River, Journal of
Hydrology, 2007, 337, 421-435.
5. Poeter, E.P. and Hill, M.C., UCODE, a computer code for universal inverse modeling,
Computers & Geosciences, 1999, 25, 457-462.
6. Kleinbaum, D.G., Kupper, L.L., Nizam, A. and Muller, K.E. Applied Regression Analysis
and Other Multivariable Methods, Fourth edition, Thomson, Canada, 2008, 395.
7. Chow, V.T., Open-Channel Hydraulics, McGrall-Hill, Tokyo, 1959, 112-113

ACKNOWLEDGMENTS
The author would like to gratefully the Royal Irrigation Department for geometry data of Mae
Lao River and the Hydrology and Water Management Center for Upper Northern Region
Rid for water surface elevation data at G.8 and G.10 stations in water years 2004-2008.
G00128
March 23-26, 2010
689
Development of Real-Time Short-Term Traffic Congestion
Prediction Method

K. Hi-ri-o-tappa
1A
, S. Pan-ngum
1B
, S. Narupiti
2
, W. Pattara-Atikom
3

1
Department of Computer Engineering, Chulalongkorn Univesity, Bangkok 10330, Thailand
2
Department of Civil Engineering, Chulalongkorn Univesity, Bangkok 10330, Thailand
3
National Electronics and Computer Technology Center, Pathumthani 12120, Thailand
A
E-mail: kittipong.h@gmail.com,
B
E-mail: Setha.P@chula.ac.th

ABSTRACT
The objective of this study is to develop the precursor algorithm that is able to
approach with the available traffic data in Thailand. With processing real-time data to
estimate short-term traffic characteristic leading to congestion is one of the most interesting
features of this study however the quantity and quality of data is one of the constrain for this
research that similar with other researchers to conduct theirs similar research. To overcome
this constrain, we applying time series pattern matching with dynamic time warping approach
in order to classify congestion traffic from normal traffic automatically. Difference from
traditional artificial intelligent algorithm that require the large training data set and/or prior
knowledge to satisfy the accuracy of result, the dynamic time warping approach require small
training data set but result in high accuracy and it has much ability to classification in very
complex time series such as speech recognition which yielding high accuracy. The result
show that Dynamic Time Warping has the potential capability of identifying traffic conditions
that lead to congestion however the noise which infect in the raw data both training and
classifying method is the main limitation of this algorithm that causing singularity occurrence
leading to miss classify and resulting in low accuracy. Moreover, insufficient data led to
limited the scope of this study to another precursor. These are being addressed in ongoing
studies with more extensive data.

Key Words: Congestion, Precursor, Traffic flow, Dynamic time warping

1. INTRODUCTION
With the limitation of roadway infrastructure construction, the continuing increase
in the number of vehicles causing more heavy traffic congestion in many highways especially
during the rush hours in bangkok as other large-sized worldwide metropolis. According to
Nachaivieng T., the public motoring cost during traffic congestion delay is worth more than
billions baht causing congestion management has been a major research topic transportation
studies. There are numerous researches algorithm how to detect the congestion occurrence.
The most of them have focused on artificial intelligent analyzing a huge historical data in
order to understand the behavior of traffic congestion. However, in Thailand, the availability
of installed sensors and provided infrastructures helping researchers to study easier are not
collecting the data enough to process it as real-time. This paper presents a method for
prediction highway congested condition that can be use small traffic data for training from
CCTV cameras. In this paper, researchers have focused primarily on recurrent traffic
congestion and extended to non-recurrent traffic congestion in future work. Preliminary
results show that the new approach algorithms are effective base on three months offline data,
but these have to be confirmed by further field trials.
We have organized the rest of this paper as follows. In Section 2, we review the
related literatures. We review background material, introduce the meaning of precursor and
the structure of the proposed model in Section 3. Section 4 suggests the methods to determine
G00128
March 23-26, 2010
690
congestion precursors in the model using real traffic flow data and evaluates the performance
of the model. Finally, we offer some conclusions and recommends future work in Section 5.

2.1 Issues on Traffic Congestion
In the transportation studies, traffic congestion generally relates to an excess of
traffic demand on a portion of roadway at a particular time resulting in slower speed than
normal and typically categorized as either recurrent or non-recurrent
Recurrent congestion is consequence of fluctuations in demand greater than the
technical maximum throughput capacity of the roadway. When roadways are operated at or
near their maximum capacity, small changes in available capacity due to such factors as
differential vehicle speeds, lane changes, and acceleration and deceleration cycles can trigger
a sudden switch from flowing to stop-and-go traffic. This type of congestion acts periodically
and might be predictable on the transportation system, such as 8.00 a.m. and 17.00 p.m. in
weekdays, holidays or weekend festivals. However, even recurrent congestion can display a
large degree of randomness, especially in its duration and severity.
Non-recurrent congestion or usually called incident is the effect of unexpected or
unplanned occurrence and depends on its nature caused by rains, accidents, and other random
events, such as inclement weather and debris. This type of congestion affect parts of the
transportation system less than recurrent congestion but they occur randomly and, as such,
cannot be easily predicted. The share of non-recurrent congestion varies from road network to
road network and is linked to the presence and effectiveness of incident response strategies,
roadwork scheduling and prevailing atmospheric conditions (snow, rain, dust, etc.).

Figure 1. Show the Tree of Congestion

2.2 Data Mining with Artificial Intelligent in time-series
Time-series database consists of sequences of values or events obtained
continuously over repeated measurements of time. The values are typically measured at equal
time intervals (e.g., hourly, daily, weekly). Time-series databases are popular in widely
applications, such as stock market analysis, engineering and traffic data. A time-series
database is also a sequence database. However, a sequence database is any database that
consists of sequences of ordered events, with or without concrete notions of time. For
example, Web page traversal sequences and customer shopping transaction sequences are
sequence data, but they may not be time-series data. Figure 2 show the one-day example time
series of traffic data in Khlonglhong highway from 6.00 a.m. (60000) to 18.00 p.m. (180000)

2.3 Review the Most Famous Traffic Congestion Detection Algorithm
The traffic congested detection algorithms have been studied extensively in a decade
ago. They provide much importance basics algorithms that may be implementing into
prediction method. Many algorithms were developed in the different of principles, complexity
and implementation method however most of them could be only operated on trained
roadways data which their accuracy and reliability was heavily affected by the quality of data.
G00128
March 23-26, 2010
691
In this section, we are about to focus on the part of algorithm development and review only
the grouped of algorithms that yield the good results of accuracy and flexibility based on
theirs principle. These algorithms consist of statistical algorithm, artificial intelligent
algorithm and time series algorithm
Statistical algorithm is the oldest approach technique but yield satisfactory result.
This algorithm uses the standard statistical theory in order to determine whether observed
detector data differ statistically from estimated or predicted traffic characteristics. The
standard normal deviate (SND) algorithm (Dudek et al., 1974) and Bayesian algorithm (Levin
and Krause, 1978; Tsai and Case, 1979) are two representative types of statistical traffic
congested detection algorithms that show impressive results with simulation and
preprocessing data. However, the quality of raw data is most importance. With field
implementation, it shows catastrophic result due to noises and uncontrolled parameters.
Time series algorithm is the next generation statistical algorithm development that
assumes traffic normally follows a predictable pattern over time. They employ time series
models to predict normal traffic conditions and detect incidents when detector measurements
deviate significantly from model outputs. Several different techniques have been used to
predict time-dependent traffic for congestion detection, including the autoregressive
integrated moving-average (ARIMA) model (Ahmed and Cook, 1977, 1980, 1982) and high
occupancy (HIOCC) algorithm (Collins et al., 1979).
Artificial intelligence algorithm is the newest approach method beginning with the
age of computerization all decision making by machine. This algorithm usually refers to a set
of procedures that apply inexact or black box reasoning and uncertainty in complex
decision-making and data-analysis processes. The artificial intelligence techniques applied in
automatic congestion detection system include neural networks (Ritchie and Cheu, 1993;
Cheu and Ritchie, 1995; Stephanedes and Liu, 1995; Dia and Rose, 1997; Abdulhai and
Ritchie, 1999; Adeli and Samant, 2000), fuzzy logic (Chang and Wang, 1994; Lin and Chang,
1998), and a combination of these two techniques (Hsiao et al., 1994; Ishak and Al-Deek,
1998).

2.4 The development of Traffic Congestion Prediction Method
Recurrently, researchers have become increasingly interested in traffic congestion
prediction. Due to lack of traffic observation infrastructure and technology limitation causing
tardy development in the beginning and few of papers were published. This type of study
examines a number of traffic flow characteristics that potentially congested traffic. These
characteristics are observed prior to congestion occurrence usually called as "congestion
precursors". Many modeling approaches were proposed to help predict recurrent and
particularly for incident (non-recurrent) traffic congestion. Among all types of incident,
accident prediction is the most interesting to researchers because their economic and human
life impacts. Recent studies of precursor are summarized below.
Golob and Recker determined the flow condition classified into mutually exclusive
regimes on an urban freeway that differ in terms of likelihood of crash by types by applying
nonlinear canonical correlation with cluster analysis. Although they did not conduct a full-
scale validation of their modeling approach, they did find that accurate estimation of crash
rates heavily depends on the quality of loop data.
Oh et al. applied a nonparametric Bayesian classification method by developed
probability density functions in order to predict real-time accident likelihood using loop
detector data of a freeway section in California. The results indicate that the standard
deviation of speed 5 minutes prior to crash occurrence is the best indicator that distinguishes
conditions leading to crash occurrence from normal conditions and reducing the variation in
speed generally reduces the likelihood of freeway crashes. This study show an innovative
approach however there are some limitations First, using only one parameter (standard
deviation of speed) to evaluate of traffic characteristics leading to accident but the accident
normally occur as a result of complex interaction of many factors. The single parameter
cannot sufficiently explain a broad spectrum of pre-crash conditions. Second, the measure of
crash likelihood estimated from probability density function overlooked such exposures as
G00128
March 23-26, 2010
692
volume, distance of travel and so on. To control for these external conditions, the variation of
exposures over space and time must be taken into account in the probability density function.
Third, the method is still inefficient and produces a significant rate of false alarm.
Lee et al. proposed a probabilistic incident prediction model using 13 months of
loop detector data from an urban freeway in Toronto. However, the model also displays some
limitations. First, the determination of precursor factors is subjective. The model makes an
assumption that the traffic factors 5 minutes prior to crash occurrence are important although
it is uncertain whether 5 minutes are the most desirable observation time period. Second, the
model uses a number of categorical variables but the study does not clearly explain how to
choose the optimal number of categories and the boundary values of each category. Finally,
the study fails to show if the model is robust for any categorization of precursors, and that the
model performance is not sensitive to different boundary values that are determined
subjectively. This paper addresses these issues.
Madanat et al. developed binary logit models for prediction the likelihood of two
freeway incidents include accidents and overheating vehicles. They considered both loop and
environment data in their model development. They found that the peak period, temperature,
rain, speed variance, and merge section to be significant predictors of overheating incidents.
For the crash prediction model, only three variables were found to be statistically significant,
which are rain, merge section, and visibility.
Figure 2 showing the precursor summary as the signal pulse and its location in
traffic time series which we review above.

Figure 2. Summarize of Precursor model from literature review

2.5 Dynamic Time Warping Algorithm
There have been many researches that developing existent data mining algorithms to
compute similarity between time series database. Almost of them used Euclidean distance
adding some distort or extend algorithm. However, main disadvantage of Euclidean distance
is over sensitivity to small distortion in time axis causing its results are very unintuitive. To
overcome this problem, the novel algorithm was proposed in 1996 by Berndt and Clifford.
Recently dynamic time warping (DTW) has become widespread and being use extensively in
computer science especially in speech recognition. However, it never used to process data in
transportation. To understand the algorithm, we first introduce the basic of Dynamic Time
warping and follow by its application to congestion matching in traffic time series.

G00128
March 23-26, 2010
693

Figure 3. Sample of DTW algorithm (J. Keogh, 2001)

Figure 7 show using DTW, two identical sequences a. will clearly produce a one to
one alignment b. However, if we slightly change a local feature, in this case the depth of a
valley c, DTW attempts to explain the difference in terms of the time-axis and produces two
singularities (d).
According to Keogh and Pazzani, assume we are given a set of time series X and Y,
of length n and m respectively, where:

Q = q
1
, q
2
, , q
i
,,q
n
(1)
C = c
1
, c
2
, , c
j
,,c
m
(2)

To align two sequences using DTW we construct an n-by-m matrix where the (i
th
, j
th
)
element of the matrix contains the distance d(q
i
,c
j
) between the two points q
i
and c
j
(Typically
the Euclidean distance is used, so d(q
i
,c
j
) = (q
i
- c
j
)
2
). Each matrix element (i,j) corresponds to
the alignment between the points q
i
and c
j
. This is illustrated in Figure 8. A warping path W,
is a contiguous (in the sense stated below) set of matrix elements that defines a mapping
between Q and C. The k
th
element of W is defined as w
k
= (i,j)
k
so we have:

W = w
1
, w
2
,,w
k
,,w
K
max(m,n) K < m+n-1 (3)

The warping path is typically subject to several constraints to prevent singularities
(Sakoe, & Chiba 1978). We briefly review them here.
1. Boundary conditions w
1
= (1,1) and w
K
= (m,n), simply stated, this requires the
warping path to start and finish in diagonally opposite corner cells of the matrix.
2. Continuity: Given w
k
= (a,b) then w
k-1
= (a',b') where aa' 1 and b-b' 1. This
restricts the allowable steps in the warping path to adjacent cells (including
diagonally adjacent cells).
3. Monotonicity: Given wk = (a,b) then wk-1 = (a',b') where aa' 0 and b-b' 0.
This forces the points in W to be monotonically spaced in time.
There are exponentially many warping paths that satisfy the above conditions,
however we are interested only in the path which minimizes the warping cost:
(4)

The K in the denominator is used to compensate for the fact that warping paths may
have different lengths. This path can be found very efficiently using dynamic programming to
evaluate the following recurrence which defines the cumulative distance (i,j) as the distance
(i,j) found in the current cell and the minimum of the cumulative distances of the adjacent
elements:
(i,j) = d(q
i
,c
j
) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) } (5)

G00128
March 23-26, 2010
694

Figure 4: An example of warping path.

3. EXPERIMENTAL
In this section, we will begin with our data source and its application that we use in
this research and then following the detail of research methodology

3.1 Data Description
In this study, we use both the real-world data and its simulation for prove our
proposed algorithm with no another noises (uncontrolled and undesirable). Beginning
with real-world data we selected the 10-kilometer-long stretch of the Khonglhong
highway traffic data from 158 closed circuit television or CCTV cameras installed around
Bangkok metropolitan area including 12 active and 146 inactive cameras, may be changeable,
at the date of writing this paper. The data was extracted by embedded system that included
image-processing algorithm monitoring vehicles enter and leave the camera. The image-
processing algorithm has two reference lines in order to detect the vehicles enter and leave in
each camera and record a time of each vehicle when it enter to reference line and leave out the
reference line. The speed can be calculate by physical formula which the real distance
between enter and leave reference line divided by time that vehicle travel in the reference line.
For example, To Bangkok direction, Assume a vehicle moving enter the reference line at
6:00:00 a.m. and leave the reference line at 6:00:04 a.m. with assuming the distance between
two reference line is 100 meters yield the average speed of this vehicle is 90 km/h . Traffic
flow can be calculate by counting the number of vehicles pass through this section with a
given time interval. The algorithm was verified by Kantip Kiratiratanapruk and Supakorn
Siddhichai (Kiratiratanapruk et al, 2006) with a good accuracy of data output.

The studied section of the Khonglhong highway locate in Pathumthani, Thailand
consist of three main lanes and two outer lane per each direction. The camera installed about
30 meter height at PTT petrol station signboard. The data were collected starting from 6.00
am to 18.00 pm over a six-month period from August 8, 2008 to February 10, 2009. Details of
highway are shown below in figure 4, point A show the position of installed CCTV and figure
5 show the example image from CCTV.

G00128
March 23-26, 2010
695

Figure 5. Section details of Khonglhong highway

The congestion database was created by the observers watching CCTV monitor and
recorded when congestion was occurred. We defined traffic database into two traffic
condition including of normal traffic condition, which the condition of free flow state, and
congested traffic condition, which the condition of section average speed per direction below
two time standard deviation of free flow speed. There are 64 congestions occur during this
period. Figure 6 below show the example of two traffic conditions

Figure 7. Details of Dataset
Outbound
To Ayuttaya
Inbound
To Bangkok
CCTV
A

A
l
g
o
r
i
t
h
m

r
e
f
e
r
e
n
c
e

l
i
n
e

Figure 6. Show the example image of CCTV
G00128
March 23-26, 2010
696
3.2 The Figure Identification as the Candidate of Congestion Precursor
To determine the traffic congestion precursor, we first introduce the details of each
traffic precursor in transportation studies. A precursor is a parameter calculated from time
series of traffic data. The value of parameter presenting its variations that can be indicate
some patterns leading to in traffic flow congestion behavior. Recent researches in incident
prediction model have widely used the concept of precursors in its models for predictions.
Traffic speed has traditionally been a precursor interesting to many researchers for
statistically relating to traffic congestion. This precursor has been statistically most
significant. However, it is a very sensitive precursor and short time aggregation before
congestion causing some restriction of using speed as a precursor in real-time prediction
systems.
Traffic flow is vehicle count per unit of time which is a shorter time aggregation of
volume, is another precursor that has been used by a couple of researchers to predict recurrent
traffic congestion rate. The results of models involving hourly flow have indicated some
definitive correlation between hourly flow and accident rate, as in the work of Hiselius an
increasing rate of accidents with hourly flow is indicated. Segregating hourly flow rate by
vehicle type, Hiselius also observed a constant increase in accident rate with hourly flow in
the case of cars, but a decreasing rate with hourly flow in the case of trucks. Another study
also affirms that hourly flow provides a better understanding of the interactions like incidents;
however, there has not been elaborate work on hourly flow as a precursor and a convenient
prediction model for real-time applications.
Time headway has been tried as a casual precursor. Research shows that shorter
headways have been the reason for collisions. However, again, there has been no
convincingly explanatory model for use of this precursor in real-time incident prediction
systems..
Owing to the characteristics of Khonglhong camera that processed data all lanes per
each direction simultaneously, headway data per lane cannot be extracting. Therefore, we
prefer to use two available indicators; speed and flow to represent an obvious difference
between normal traffic conditions and congested traffic conditions as defined above.
We calculate the best indicator by applying statistical t-test approach based on a
statistical idea of the difference between two dataset. Three candidates are selected as an
indicator includes flow, mean of speed and standard deviation of speed. The data set is
composed of 64-congestion sample and the rest is non-congestion sample. For the different
experiments we performed a stratified random split of the data in training and test sets
according to a train/test ratio equal to 60:40 or 3/2 resulting of 39-congestion samples for
training and 25-congestion sample for evaluate or test an effectiveness of algorithm. To
determine the best indicator new database that contained the calculated value of mean speed,
standard deviation of speed and minutely flow count was built with supporting robust and
automatic calculation. Each congestion candidate of training dataset was calculated compare
with non-congestion sample that the most likely external and internal factor such as 15-
minute before and after the congestion occur at the time of day, the same time of other day
before and after the week. The worse case or lowest calculated t-statistic of all cases was
selected. For example, the congestion candidate occur on Friday January 9, 2009 at 17.00 pm
and finish at 18.00 pm, this candidate is about to selected to compare with the non-congestion
training data set include;
Case a. Friday January 9, 2009 at 16.45 pm (15-minute before)
Case b. Friday January 9, 2009 at 18.15 pm (15-minute after)
Case c. Thursday January 8, 2009 at 17.00 pm (1-day before at the same time)
Case d. Saturday January 10, 2009 at 17.00 pm (1-day after at the same time)
Case e. Friday January 2, 2009 at 17.00 pm (1-week before)
Case f. Friday January 16, 2009 at 17.00 pm (1-week after)

In addition, we assume the calculated t-test for each case in table 1.

G00128
March 23-26, 2010
697
Table 1. Assuming t-test for each case

Case a B C d e f
t-test 2.8 2.9 3.5 3.7 6.2 3.8

If the result of calculating t-statistic shown that case a. is the worst case, Friday
January 9, 2009 at 16.45 pm will be select. For training dataset, it was due to lack of
congestion dataset that the non-congestion dataset would be selected as twice causing we
have 39-congestion samples and 78 non-congestion samples for training algorithm.
The most statistically significant candidate is about to select as an indicator
representing the change of traffic conditions. Table 2 summarizes the t-test results.

Table 2. t-test Analysis for find the Best Precursor

Candidates
Speed Flow
Average Standard Deviation Average
1-Min 4.93 3.77 -0.83
2-Min 3.39 2.31 -0.63
3-Min 2.30 2.01 -0.63
5-Min 2.57 1.99 -0.43

We found that the most significant was for 1-minute average of speed follow by 1-
minute standard deviation of speed and flow indicated statistically insignificant. For
simplicity, we therefore chose speed and its standard deviation as indicator to distinguish
congested traffic conditions leading to an accident and normal traffic conditions.

3.3 Trigger Configurations for Algorithm Decision
To conserve processor consumption, we generate trigger to run algorithm. Each
precursor is calculated the mean and standard deviation when traffic is normal and one time
of standard deviation from its mean is considerate to be a trigger of algorithm by statistically
reason.

Figure 8. Trigger of Speed algorithm

From figure 10, T1 is one time standard deviation indicated DTW to run its
algorithm and T2 is the invert of T1 that is indicated DTW to stop its algorithm.

G00128
March 23-26, 2010
698

Figure 9. Trigger of Standard Deviation algorithm

From figure 11, it is show the case of standard deviation (SD) precursor. The same
approach is used, T1 is one time standard deviation of the SD precursor mean indicated DTW
to run and stop by the condition direction of its algorithm ; The algorithm is waiting for
trigger to run algorithm or the algorithm is running waiting for stop.
3.4 Training algorithm using Classic Dynamic Time Warping
According section 4.2, with train/test ratio equal to 60:40, we have 39-congestion
from 64-congestion sample and 78-non-congestion sample for training DTW algorithm in
database and 25-congestion sample for evaluate effectiveness of this algorithm.
In case of speed, we import congestion database in the term of speed time series to
memory and then generate the moving windows to capture speed characteristics while
congestion occur. The length of moving windows is 15 minute derive from our objective that
require prediction period 15 minute before real congestion occur.

Figure 10. Details of Training Algorithm
The moving window is collecting all 39-congestion occurrences to be precursor
representation. The standard deviation is taking the same capture approach.

3.5 Training algorithm using Modify Boundary Dynamic Time Warping
Current developments of DTW are significantly to speeds up the DTW calculation
algorithm with a lower bounding technique based on the warping envelope (Keogh, E. 2002).

G00128
March 23-26, 2010
699

Figure 10. Details of Lower Bundary (Ratanamahatana, C., Keogh, E. 2002)

Figure 10 show the two type of boundary constrain that was the most frequently used as
global constraints in the literature. The left of figure is the Sakoe-Chiba Band and the right
figure is the Itakura Parallelogram which is more widely used than Sakoe-Chiba in the speech
community.
According to Ratanamahatana C. and Keogh E., The lower bound should be defined
only for sequences that have the same length. Therefore, if the sequences are of different
lengths, one of them must be interpolate. The lower bounding technique using the matrix of
warping window in order to create the bounding that envelope above and below the query
time series. Thus when applying the boundary constrain the calculation to obtain the shortest
part are reduce to the every possible part that falling within the bounding envelope. The
technique is illustrated in Figure 11.

Figure 11. The Sakoe-Chiba Band
A) can be used to create an envelope
B) around a query sequence Q. The Euclidean distance between any candidate
sequence C and the closest external part of the envelope
C) is a lower bound for the DTW distance (Ratanamahatana C. and Keogh E.)
Because of the good result of Itakura Parallelogram in speed recognition so we will
use this type of boundary for our traffic time series.

G00128
March 23-26, 2010
700

Figure 12. The Example of classification with boundary constrained.

We have 39-congestion sample that representative as precursor. To evaluate
DTW, we use the rest of them (25-congestion that not use to training DTW) to find
the effectiveness of this algorithm.
For classic DTW without tuning any parameter of algorithm, the preliminary
experiment shows an impressive result in both simulation and real-world data
In the case of speed precursor, the preliminary simulation result shows
100% of accuracy and 80% of accuracy that 20 from 25 congestions
sample can be classify into congestion correctly with the 558 seconds
mean time to detect (MTTD).
In the case of standard deviation of speed precursor, the preliminary result
shows that 18 from 25 congestions sample can be classify into congestion
correctly that yield its performance at 72% of accuracy and 100% accuracy
in simulation with the 752 seconds mean time to detect (MTTD).
For the lower boundary modification of distance measurement of DTW
algorithm, the preliminary experiment show result that its accuracy are approximately
the same as classic DTW
In the case of speed precursor, the preliminary simulation result shows
100% of accuracy and 80% of accuracy that 20 from 25 congestions
sample can be classify into congestion correctly with the 320 seconds
mean time to detect (MTTD).
In the case of standard deviation of speed precursor, the preliminary result
shows that 17 from 25 congestions sample can be classify into congestion
correctly that yield its performance at 72% of accuracy and 100% accuracy
in simulation with the 400 seconds mean time to detect (MTTD).

5. CONCLUSION
We propose the congestion prediction algorithm and assess microscopic traffic
variables by utilize real-world data from CCTV image processing and simulate this
data again in order to obtain data without noises or uncontrolled and affect of
G00128
March 23-26, 2010
701
undesirable parameter. The new technique was used in this study is Dynamic Time
Warping with classic and lower boundary modification approach
Simulation shows that
Proposed algorithm performs well compared to a well-known
benchmark algorithm in an ideal setting.
Relative speed statistics are good indicators of congestion.
Robust when information available near the congestion occurrence.
Results from real-world data shows that
Proposed algorithm can use Standard Deviation of speed statistics to
detect most of congestion cases.
A few cases that are missed occurred under high vehicle volume.
The modification of lower boundary in method of distance
measurement are statistically insignificant to improve the algorithm
accuracy however they increase or speed up processing time of
algorithm.

REFERENCES
1. Anderson, IB, K.M. Bauer, D.W. Harwood and K. Fitzpatrick. Relationship to safety of
geometric design consistency measures for rural two-lane highways. In Transportation
Research Record 1658,TRB, National Research Council, Washington, D.C., 1999, pp.
43-51
2. Balke, K.N. (1993). An evaluation of existing incident detection algorithms. Research
Report, FHWA/TX-93/1232-20, Texas Transportation Institute, the Texas A&M
University System, College Station, TX, November 1993.
3. Cedar, A., and M. Livneh (1982). Relationships between Road Accidents and Hourly
Traffic FlowI. Accident Analysis and Prevention, Vol. 14, pp.19-34.
4. Chang, E.C.P., 1992. A neural network approach to freeway incident detection. In: The 3rd
International Conference on Vehicle Navigation and Information Systems (VNIS), pp.
641647.
5. Chassiakos, A.P., Stephanedes, Y.J. Smoothing algorithms for incident detection.
Transportation Research Record 1394, 916, 1993
6. Cheu, R.L., Srinivasan, D., Teh, E.T., 2003. Support vector machine models for freeway
incident detection. In: Proceedings of Intelligent Transportation Systems, 1, pp. 238
243.
7. Gribe, P. (2003). Accident Prediction Models for Urban Roads. Accident Analysis and
Prevention, Vol. 35, pp. 273-285.
8. Itakura, F. Minimum Prediction Residual Principle Applied to Speech Recognition. IEEE
Trans. Acoustics, Speech, and Signal Proc. vol. ASSP-23, pp 52-72, 1975.
9. Jin, X., Srinivasan, D., Cheu, R.L., 2001. Classification of freeway traffic patterns for
incident detection using constructive probabilistic neural networks. IEEE Transaction on
Neural Networks 12 (5), 11731187.
10.Kiratiratanapruk K. and Siddhichai S. Vehicle Detection and Tracking for Traffic
Monitoring System. TENCON 2006, IEEE, 2006
11.Kirchsteiger, C. Impact of Accident Precursors on Risk Estimates from Accident
Databases. Journal of Loss Prev. Process Ind., Vol. 10, No. 3, 1997, pp. 159-167.
12.Knuiman, Matthew W., F.M. Council, and D.W. Reinfurt. Association of median width
and highway accident rates. Transportation Research Record 1401, TRB, National
Research Council,Washington, D.C., 1993, pp. 70~82
13.Kockelman, K. M., and J. Ma (2004). Freeway Speeds and Speed Variations Preceding
Crashes, within and across Lanes. Presented at ITS America 2004, 14th Annual Meeting
and Exposition, San Antonio, Texas.
14.Krammes, R.A. and S.W. Glascock. Geometric inconsistencies and accident experience on
G00128
March 23-26, 2010
702
two-lane rural highways. Transportation Research Record 1356, TRB, National
Research Council,Washington, D.C., 1992, pp. 1-10
15.Lee, C., B. Hellinga, and F. Saccomanno (2003). Real-Time Crash Prediction Model for
Application to Crash Prevention in Freeway Traffic. Transportation Research Record
1840, TRB, National Research Council, Washington, D.C., pp. 67-77.
16.Lee, C., F. Saccomanno, and B. Hellinga (2002). Analysis of Crash Precursors on
Instrumented Freeways. Transportation Research Record 1784, TRB, National
Research Council, Washington, D.C., pp. 1-8.
17.Levin, Moshe., Krause, Gerianne. Incident Detection: A Bayesian Approach.
Transportation Research Record 682 (52-58), TRB, National Research Council,
Washington D.C., 1978.
18.Ma X. and Andrasson I. Driver Reaction Time Estimation from Real Car-following Data
and Application in GM-Type Model Evaluation. Transportation Research Board 85th
Annual Meeting (2006) : pp. 1-19.
19.Nachaiwieng, T. Economic valuation of traffic congestion costs in Bangkok Final
Thesis Report, Chulalongkorn University, 2001
20.Oh, C., J. Oh, S. Ritchie, and M. Chang. Real-Time Estimation of Freeway Accident
Likelihood. 80th Annual Meeting of Transportation Research Board, Washington,
D.C., 2001.
21.Oh, J.-S., C. Oh, S. G. Ritchie, and M. Chang (2005). Real-Time Estimation of Accident
Likelihood for Safety Enhancement. Journal of Transportation Engineering, ASCE,
Vol. 131, No. 5, pp. 358-363.
22.Payne, H.J. and Tignor, S.C. (1978). Freeway incident-detection algorithms based on
decision trees with states. Transportation Research Record, No. 682, TRB, National
Research Council, pp. 30-37.
23.Ratanamahatana, C. & E. Keogh. Making Time-series Classification More Accurate
Using Learned Constraints. Proc of SIAM Intl. Conf. on Data Mining, pp. 11-22. Lake
Buena Vista, Florida, 2004.
24.Rechard O., Duda, Pattern Classification,1997, McGraw-Hill.
25.Ritchie, S.G., Abdulhai, B., 1997. Development testing and evaluation of advanced
techniques for freeway incident detection, California PATH Working Paper, UCB-ITS-
PWP-97-22, pp. 137.
26.Sakoe, H. & chiba, S. (1978). "Dynamic programming algorithm optimization fro spoken
word recognition." IEEE Trans. Acoustics, Speech, and Signal Proc., Vol. ASSP-26. pp
43-49.
27.Squires, C.A., and P.S. Parsonson. Accident comparison of raised median and two-way
left-turn lane median treatment. Transportation Research Record 1239, TRB, National
Research Council,Washington, D.C., 1989, pp. 30-40
28.Xie, Chi (2005). A Complete Review of Incident Detection Algorithms & Their
Deployment: What Works and What Doesnt ,The New England Transportation
Consortium, Project No. 00-7
29.Zegeer, C.V., D.W. Reinfurt, J. Hummer, L. Herf, and W. Hunter. Safety effects of cross-
section design for two-lane roads. Transportation Research Record 1195, TRB, National
Research Council, Washington, D.C., 1988, pp. 20-32
30.Zhang, K., Taylor, M.A.P., 2006. Effective arterial road incident detection: A Bayesian
network based algorithm. Transportation Research C 14, 403417.

ACKNOWLEDGMENTS
This study was supported by the National Science and Technology Development Agency. The
Center for Transportation Studies, Department of Civil Engineering, Chulalongkorn
University is acknowledged for its support.The Network Technology Laboratory Center,
National Electronic and Computer Center, cooperated in the study by providing the necessary
data.

G00129
March 23-26, 2010
703
Clustering of Search Results:
A Case Study of Thai-Language Web Pages
P. Sukriket
1,A
, C. Sangchai
1,B
, P. Saovapakhiran
1,C
, A. Surarerks
1,D
,
and A. Rungsawang
2,E

1
International School of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
2
Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, 10900, Thailand

E-mail:
A
parivat_nu@hotmail.com,
B
chanon18@hotmail.com,
C
gaia_ethos@hotmail.com,
D
athasit.s@chula.ac.th,
E
arnon@mikelab.net; Fax: 02-2186422 Tel: 02-2186422

ABSTRACT
In the field of Web Search, several search results clustering techniques have been
proposed to support different languages. However, clustering the Thai-language web
pages still has many problems. For one, the machine-generated cluster labels are often
not meaningful to human. In addition, some web pages are improperly assigned to
those cluster labels. In this paper, we propose a new framework for clustering the Thai-
language web page results returned from a search engine. The framework improves the
quality of Thai cluster labels in terms of their meaning and the properness of
assignment of resulting web pages to those label. Preprocessing is done to solve the
problems common to the characteristics of Thai language to improve the effectiveness
of document indexing. Cluster labels generated from clustering engine are post-
processed further to yield more meaningful cluster labels. Our paper illustrates with a
case study which uses a dataset of approximately 100,000 web pages in the domains of
Thai tourism and education. The proposed framework is capable of generating a more
meaningful cluster label to the users point of view. Also, the improper web page
assignments are substantially reduced. Most cluster labels that are irrelevant, not
meaningful, are avoided from being generated after the implementation of noise
reduction prior to the process of clustering search results. Web pages that were
supposed to be in such clusters then are assigned to the remaining relevant clusters.

Keywords: Search Result Clustering, Clustering Framework, Search Engine, Thai-
language Web Page.

1. INTRODUCTION
At present, it can be said that all information that one wants to find is readily available in
the Internet. Internet users can find their desired information using several techniques, which
are provided in the form of a search engines search results. However, because of an
enormous amount of information, these kinds of results are sometimes irrelevant and often do
not match the users expectation.
Therefore, the technique of search result clustering has been proposed to improve the
effectiveness of information retrieval from the web. Up to now, numerous search result
clustering frameworks have been developed to aid Internet users in finding more relevant and
accurate information. Nevertheless, these methods have been tuned for a better efficiency in
searching web pages displayed in English language. Little to no work is done in Thai web
page clustering services.
Clustering web pages that are written in Thai language poses many more problems than
doing so with one in English. One of the features of English language that make clustering
web pages easier to be done is that it readily divides a sentence into individual words using
space. In addition, each sentence is separated from one another with a full stop, or period (.).
On the other hand, Thai language does not have these features; hence there is a challenge in
proposing a framework for search result clustering in Thai language.
G00129
March 23-26, 2010
704
Thus, ignorantly bringing an existing search result clustering framework that are typically
used with English web pages to cluster Thai web pages yields results with low effectiveness.
One problem is the clustered labels that are generated tend to be not meaningful to the users
point of view. In addition, due to inefficient word separation, some search results may be
clustered into incorrect categories. These problems restrict people who wish to seek
information from Thai web pages from utilizing the technology of search result clustering.
Because of all the problems mentioned above, it is obvious that search result clustering
framework for Thai web pages with higher level of quality needs to be developed. In order to
do so, existing frameworks would be brought to modification.

This project mainly involves text classification problems. In this domain, some
researchers have successfully adopted several well-known machine learning techniques and
algorithms into solving text classification problems. [1-4] Here, algorithms mentioned below
will be used in this project to solve the problem of Thai webpage classification.

2.1.SUPPORT VECTOR MACHINE
Support vector machines (SVMs) are a set of related supervised learning methods used
for classification and regression. A support vector machine constructs a hyperplane or set of
hyperplanes in a high or infinite dimensional space, which can be used for classification,
regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has
the largest distance to the nearest training datapoints of any class (so-called functional
margin), since in general the larger the margin the lower the generalization error of the
classifier.

2.2.SUFFUX TREE
A suffix tree is a data structure that presents the suffixes of a given string in a way that
allows for a particularly fast implementation of many important string operations. The suffix
tree for a string S is a tree whose edges are labeled with strings, and such that each suffix of S
corresponds to exactly one path from the tree's root to a leaf. It is thus a radix tree for the
suffixes of S.
In the domain of search result clustering, many works have been done in improving the
efficiency of clustering search results. The followings are some of the works that our project
is interested in. These works are studied in detail in order to decide whether to adopt certain
parts in each work or not. Additionally, the strengths and weaknesses in each work have to be
identified. Many works have the same goal, and their methodologies often differ very
slightly. For our project, a decision would have to be made for which methodology best suit
our need.

2.3.LINGO
Lingo[1] was proposed while working on the Carrot2[5] system. Because it is a phrase
based algorithm, it first tries to find a set of phrases in the input documents that would be
good as cluster descriptions (called abstract concepts) and then the method attempts to match
documents to the found descriptions and in consequence to the groups. Lingo is also based on
a vector space model, but it also uses other advanced techniques such as Singular Value
Decomposition (SVD) of the term-document matrix or dimensionality reduction using low-
rank approximation. As usual for description-based algorithms, Lingo creates very good,
informative cluster labels (however, they may be too specific and fit only some of the
documents of the group in some cases). Generated clusters can also contain overlapping
results. Its main disadvantage is that Lingo in its original version does not support hierarchical
clustering.

2.4.CARROT2
G00129
March 23-26, 2010
705
Carrot2 is an Open Source search results clustering engine developed by Stanisaw
Osiski and Dawid Weiss. It can automatically organize (cluster) small collections of
documents, e.g. search results, into thematic categories. Apart from two specialized
document clustering algorithms, Carrot2 offers ready-to-use components for fetching search
results from various sources including YahooAPI, GoogleAPI, MSN Live API, eTools Meta
Search, Lucene, SOLR, Google Desktop and more.

A modified framework composing of five parts which are focused crawling,
preprocessing, indexing, clustering, and post processing is introduced. The preprocessing and
post processing are added to improve the meaning of cluster labels and the correctness of web
pages assignment to the labels.

Figure 1. Process Diagram

3.1.FOCUSED CRAWLING
It starts with the process of gathering a large number of websites that are displayed in
Thai languages and are in specific domains. For the project implementation, the domains of
education and tourism are chosen. This process can be done by an application known as Web
crawler.
Ordinary web crawling is done only in the forward direction. As crawler crawls deeper,
the direction of crawling will be further away from the domains, and the topics of the results
will consequently be more irrelevant. Therefore, in order to obtain more web pages that are
close to the seeds, we will also search for the web pages in the backward direction.
In our paper, a focused web crawler named Heritrix had been used. Heritrix is open
source software provided by the Internet Archive. It allows number of configurations for
focusing on the web pages of interest. In this paper, web pages displayed in Thai language
and related to the domains of Thai education and tourism. Google API will be used to
implement the backward crawling.

3.2.PREPROCESSING
As Thai language has no space and full-stop like English language to punctuate words
and sentences. Consequently, Thai words are unable to be separated and processed by the
machine. Word segmentation for Thai language of the web pages is, as a result, required to be
performed prior to passing them to an indexer.
In our paper, crawled documents are not directly fetched to an indexer. These raw
documents are HTML files, which consist of HTML tags and other irrelevant materials.
These contents are eliminated using a HTML parser named Jericho, which is open source
software capable of managing HTML files and contents. Moreover, a specialized document
analyzer has been used to tokenize terms in the documents. This analyzer is provided by
Lucene, which is the indexer that is used and is described in detail later.

G00129
March 23-26, 2010
706
3.3.INDEXING
An indexer will perform indexing of the contents in the web pages that are referred to by
the URLs crawled by the crawler. Indexing process will generate images or partial
information of the web pages for later use by a search engine.
In our paper, an open source file indexer named Lucene has been used. Processed web
documents from the previous parts are fetched into the indexer and an image file of the
indexed documents are created for later use when users provide an input query for searching.

3.4.NOISE REDUCTION
Before the clustering process is done, the returned search results enter the process of noise
reduction. This is to avoid the irrelevant clusters to occur, including one which may result
from the noise in a web document.
In our paper, several web pages are found to consist of many frames; most of them are not
the sought content. With the given query from the user, we count the frequency of the phrase
in each frame in a document. The frequency in each frame would refer to the frames
importance. Frames with low frequency of the term are taken out of clustering process, not to
be analyzed. This process hence increases the process of clustering, which take into account
only the essence of the given web documents.

3.5.CLUSTERING
After the search engine returns search results based on a keyword queried by the user,
those results will be inputs into the clustering engine. The clustering engine will cluster those
results based on a selected clustering algorithm such as Lingo, and a label name will be
generated and assigned to each cluster as a result. This framework will focus on making Thai
label names to be understandable by human and proper assignment of web documents to
clusters.
The framework has been built on a web server for online access,
http://radiant.cpe.ku.ac.th:8081/carrot2-webapp-3.1.0/search. It has been tested for the
relevance and meaningfulness of the generated cluster labels. The following example shows
how the proposed framework improves as mentioned.

Figure 2. Original search result
G00129
March 23-26, 2010
707
A set of predefined test queries have been used in examination. Figure 2 illustrates an
ordinary search result clustering using Carrot2 web application, which receives its search
result from other available online search engine. In this case, the query term is ,
which means Travel Location in Thai language. From the result, it can be seen that some of
the clusters, for instance, , which means female in Thai language, and skip to content
, which means skip to fun content, have no relevance and does not make any sense to the
query term.

Figure 3. Improved search result

From Figure 3, which is the web interface of the proposed framework, it is obvious that
an amount of noisy information, which has been removed from clustering computation, yields
an improved effect to the results, which can be perceived by users. It can be seen that the
clusters of and skip to content no longer exist in this modified framework.

5. CONCLUSION
This paper proposes a framework for clustering the Thai-language web page results
returned from a search engine. The framework improves the quality of Thai cluster labels in
terms of their meaning and the properness of assignment of resulting web pages to those label.
The components of the framework which contribute to this improvement are the
preprocessing and the noise reduction processes. A set of test queries are used with
the framework to test for the improvement. It can be observed that irrelevant and
meaningless cluster labels are significantly minimized. In addition, most web pages
are assigned to the proper cluster labels.

G00129
March 23-26, 2010
708
REFERENCES
1. Osinski, S., Stefanowski, J., Weiss, D., Lingo: Search results clustering algorithm based
on Singular Value Decomposition, Proceedings of the International IIS: Intelligent
Information Processing and Web Mining Conference. Advances in Soft Computing,
Zakopane Poland, Springer, 2004, 359368.
2. X. Qi and B. D. Davison, Web page Classification: Feature and Algorithms, Lehigh
University, ACM Computing Surveys, 41 (2), Article 12, February 2009.
3. N. Leardtharatat, W. Kreesuradej, A New Synthesizing Cluster Labels Algorithm for Thai
web Search Results, 23rd International Technical Conference on Circuits/Systems,
Computers and communication, 2008.
4. D. Shen, Z. Chen, Q. Yang, H. Zeng, B. Zhang, Y. Lu and W. Ma, Web-page
Classication through Summarization, Proceedings of the 27th Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval. ACM
Press, New York, 2004,242249
5. D. Weiss, Carrot2: Design of a Flexible and Efficient Web Information Retrieval
Framework, 3rd International Atlantic Web Intelligence Conference, Lodz, Poland,
2005, 439-444.

G00131
March 23-26, 2010
709
A Modified Version of Adaptive Arithmetic
Encoding Algorithm

A. Wiengpon
A
and A. Surarerks
B

Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
E-mail:
A
g50awn@cp.eng.chula.ac.th,
B
athasit.s@chula.ac.th; Fax. 02-2186955: Tel. 02-21896959

ABSTRACT
In this paper, an improvement for better compression ratio of adaptive arithmetic
coding is discussed. In this work, we propose a modified version of adaptive arithmetic
encoding algorithm based on the inequality probability concept. A dendrogram in
hierarchical clustering is used to determine the cluster of alphabet on the English text-
based sample dataset. Average probabilities in each cluster are computed and then
normalization of cluster probabilities with the lowest one is performed. More three
techniques are realized. The first is to reduce the gap of high range cluster probability
comparison to the lowest one. Second, some unused probabilities are eliminated. For
instance, the English text-base encoding can ignore the probability of the ASCII
alphabet that below 10 and beyond 127. Finally, a virtual partitioning concept is also
included in the encoding process. Referred to Canterbury corpus, text dataset used in
this work contains the bible and nine ordinary novels. Our experimental results
illustrate that smaller file size and better compression ratio improvements are in range
of 0.0830 1.2227% comparing to the adaptive arithmetic encoding algorithm.

Keywords: Adaptive Arithmetic Encoding, Hierarchical Clustering, Compression
Algorithm.

1. INTRODUCTION
Encoding algorithms, as known as compression decompression algorithms, are the
techniques to represent the data in a new smaller structure. Two schemes of encoding,
appeared in certain research [1], are lossy and lossless compression. Lossy compression, e.g.
wavelet compression, is usually used with multimedia data file such as audio, video and
image. Lossless compression can be separated into two categories which are dictionary and
statistical encodings. LZ77 [2] and LZ78 are examples of the dictionary encoding which
replaces words with reference of indices to a dictionary whereas Huffman [3] and arithmetic
encoding (AE) are the popular examples of the statistical encoding. Their concept is to assign
codes to symbols in order to match code length with probabilities of the symbols.
We usually evaluate performance of any compression algorithms with many following
measurements such as compression ratio [4-6], coding time and decoding time [7].
Compression ratio (CR) here is the ratio of the size of the original data to the size of the
compressed data as shown in (1). Higher compression ratio means smaller compressed size.
Lossy compression yields more CR than lossless compression because certain data is cut off
from the original one which beyond human perceptions. Dictionary encoding must allocate
some space to store a reference data and a table of dictionary of symbols. Consequently,
Kuroki et al. [5] proposed to include this table space into the CR calculation as illustrated by
(1).

bitin bitin Orginal size
CR
Compressed size bitout bitout table
= = =
+
(1)
There are many techniques to improve CR value nowadays. Text pre-processing model
will have greater CR when applying with encoding. Nevertheless, the issue of pre-processing
and text post-processing time should be also concerned [7]. Hybrid techniques [8-9], well
G00131
March 23-26, 2010
710
collaboration with many compression algorithms together, also give an improvement of CR as
well. Probability estimation of symbol probabilities is an essential part which considerably
interact with CR in statistical encoding techniques such AE. There are many other techniques
in probability estimation for example, assuming a specific distribution [5, 10] or a model like
virtual sliding window [11].
AE can represent a data file with a real number between [0, 1). It subdivides the interval
with a value of symbol probabilities. An adaptive arithmetic encoding starts to encode with
equal symbol probabilities and it updates its probability model continuously. Proper symbol
probability estimation impacts on CR directly. In this paper, we propose a modification of
symbol probabilities techniques to encode data file and we also compare our algorithm
effectiveness with an adaptive arithmetic encoding.
This paper is organized as follows: Section 2 describes the preliminaries of arithmetic
encoding techniques and cluster analysis. Section 3 introduces our purposed algorithms to
modify adaptive arithmetic encoding. Experimental results are reported in Section 4.
Finally, conclusions are in Section 5.

2. PRELIMINAIRES
There are three main fundamental backgrounds which are arithmetic encoding,
incremental adaptive arithmetic encoding and cluster analysis.

2.1 ARITHMETIC ENCODING
Elias et al. [12] proposed a new statistical encoding called arithmetic encoding since 1960.
Later, many researchers developed AE into many variations [4, 13]. From [4, 6], they
reported that arithmetic encoding generally outperformed Huffman and LZW in many cases.
But the main advantage of AE is flexibility; it can be applied to any models which work with
symbol probabilities. However, it tends to be slow due to high complexity in arithmetic
operations [13]. Symbol probabilities can be directly calculated from the input files. To get
the exact symbol probabilities, the input file is scanned twice in the encoding process but AE
needs only one scanning process by using the concepts of adaptive model. An adaptive
arithmetic encoding (AAE) is an arithmetic encoding that starts from setting the symbol
probabilities equally, and then recalculates their symbol probabilities during encoding
process. By using AAE, some studies reported that AAE depicted a better CR than AE [14-
15].

2.2 INCREMENTAL ADAPTIVE ARITHMETIC ENCODING (IAAE)
There are two bottlenecks when implementing both AE and AAE [13, 16]. First, AAE still
produces the output code in a form of real number. In programming, we thus need to declare
variable whose precision is sufficient to store that real number. Second, AAE also will not
produce the output code until the end of reading last symbols of data file. Howard proposed
the solutions of these two difficulties by applying the concept of incremental to AAE [13]. An
incremental adaptive arithmetic encoding will produce the common leading binary bit in
interval [a, b)
2
, where a and b denote as low and high range in binary. Then, it shifts the in-
processed bits to the left and also doubles the length of the current interval according to the
previous shift operation.

2.3 CLUSTER ANALYSIS
Cluster analysis or clustering is a method to partition the data with same characteristics
into non-overlapping groups and it is also be a tool in statistical data analysis, including
machine learning, data mining, pattern recognition, compression and vector quantization [17].
However, the main purpose of clustering is to reduce the size and the complexity of data to
few reference points. There are two main sub-categories of clustering methods which are
hierarchical and K-means clustering. Hierarchical clustering is the method to represent
objects relationship with unknown pre-defined cluster sizes into a graphical tree structure
which is called dendrogram. Whenever the cluster size (K) is predefined, we then called it K-
G00131
March 23-26, 2010
711
means clustering. K-means [18] intends to partition N samples into K clusters with the nearest
mean to minimize within-cluster sum of squares. In this paper, we propose a modified version
of incremental adaptive arithmetic encoding with an estimation of initial symbol probabilities
with hierarchical clustering and other techniques which will describe in Section 3.

3. MODIFICATION OF ADAPTIVE ARITHMETIC ENCODING
This experiment can be separated into two phases: a learning phase and evaluation phase.
First, we start to construct of symbol probabilities (0-255 of ASCII) from a sample English
text file. We also select the file of bible.txt from a well-known Canterbury corpus [4, 8, 14]
as our sample. After that, we follow the steps in Algorithm 1 as described below. Second, in
the evaluation phase, we evaluate our techniques with other ordinary English novels on CR as
described in Section 4.
Since adaptive arithmetic encoding starts encoding with symbol probabilities equally at the
beginning of process which is unfair for the most frequently used symbols such as the symbol
of space (ASCII-32) or the symbol of e (ASCII-101). Besides, there are many unused
symbols whose frequencies are zero as well as depicted in Figure 1(a). Almost symbol
frequencies of ASCII codes densely locate around ASCII code 90 to 130 approximately. We
thus introduce the concept of cluster analysis on symbol probabilities in order to get unequal
symbol probabilities before encoding process.

(a) (b)
Figure 1. (a) Characteristic distribution of bible.txt
(b) Partly hierarchical cluster analysis of symbol frequencies of bible.txt

The term of symbol frequencies is equivalent to the symbol probabilities and it is much
easier to analyze further. As a result, we will mention this term instead of symbol
probabilities. Now, let us begin with the learning phase. Algorithm 1, a construction of the
output (denoted by M
1
,) conducts the following steps. First, gathering symbol frequencies
from sample file. Then we analyze the distinct clusters by applying hierarchical clustering
(between group linkage method Euclidean distance interval) on symbol frequencies using
SPSS16. The output of hierarchical clustering is represented as a dendrogram (tree structure).
The leaf nodes of a tree, which are represented by ASCII codes, will be grouped together to
parents by considering the most likelihood of symbol frequencies. A root node also represents
the largest cluster, containing all of members or symbols in the space.
An appropriate cluster size can be selected from any levels of rescaled distance cluster
combine. In this paper, we choose the level at which produces about nine clusters as
illustrated in Figure 1(b) as the level of appropriate cluster size. However, if we move the
level of appropriate cluster size to the right hand side, it will obtain fewer cluster size and thus
degrade the likelihood of symbol frequencies consequently. Then we calculate the mean and
round up the value of symbol frequencies in the same cluster into integer. After that we
distribute these average frequencies to all symbols in each cluster. Normalization of average
frequencies can be done by dividing the all average frequencies with the lowest average one.
We call the frequencies at this step as cluster frequencies.
Algorithm 1: Construction_of_M
1
G00131
March 23-26, 2010
712
Input: file bible.txt
Output: M
1
= [m
i,1
]
i=0..256

method between_group_linkage
interval Euclidean_distance
for i: 0 i < 255
m
i,1
= 0; end
m
256, 1
1
///1
st
Normalization section///
for each asciiSymbol in bible.txt
m
i,1
+= asciiSymbol[i]; end for
DhierarchicalClustering(m
i,1
, method, interval)
cs appropriateClusterSize(D);
for k: 0 k < cs
f
av
(D
k
) F (D
k
); end
f
av-min
(D
k
) min(f
av
(D
k
))
for k: 0 k < cs
f
av-norm
(D
k
) f
av
(D
k
)/ f
av-min
(D
k
); end
for i: 0 i < 255

m
i1
fav-norm(D
k
); s
i
D
k
; end
/// 2
nd
Gap reduction section///
for i: 0 i < 255
m
i,1
m
i,1
/n; n -{1}; end
///3
rd
Eliminating unused symbol section///
for i: 0 i < 10
m
i,1
0; end
for i: 127 < i 255
m
i,1
0; end

Reducing the gap between the highest cluster frequency and lowest cluster frequency, as
described in the second section of Algorithm 1, can also improve CR as well. In this study,
we judge to divide all cluster frequencies by n (n {1}) and round up them into an
integer value in order to keep cluster information as close as the original cluster.
Moreover, some symbols never appear in English text file such as ASCII symbol that
below 10 and beyond 127. So we assign these symbol frequencies to zero as we described in
the third section of Algorithm 1. Finally we get M
1
as an initial matrix of 2571 (plus EOF
symbol at the frequency of 1 at the end of the matrix) of symbol frequencies which is ready
for encoding process.
Another technique that can also increase the value of CR is the file partitioning. During an
adaptive encoding process, the coder writes output bits when the buffer collects in-process
bits more than 8 bits in count. Calculation of CR and CR are possible at that point.
Whenever the value of CR is less than a small value of predefined threshold (), the current
symbol frequencies will be replaced with M
1
. A new value of threshold is the product of
previous threshold and predefined factor (). Then this new M
1
will be used as initial symbol
probabilities to the rest until reaching its EOF. The summary of this process is illustrated by
Algorithm 2: isFilePartition.
Algorithm 3 is an integration of Algorithm 1 and 2 together which is called a modified
version of incremental adaptive arithmetic encoding (MIAAE). Its inputs are file, , and
M
1
. In order to confirm that our MIAAE results to CR improvement, we start the value of
from 0.1 and it will be decreased by one tenth for four times. The value of also starts from
a set of factor F {0.1, 0.2, 0.3, 0.4, 0.5} and then they will be decreased by one tenth for five
times. So at this step, there are 4x5x5 = 100 replicates per one file to study the effectiveness
of Algorithm 3. The best CR of each tested file is selected from those replicates.

Algorithm 2: isFilePartition
Input: threshold (), factor ( ),
|bitin|, |bitout|, M
1

Output: isPartition;
bitin
CR
bitout
=

if (CR )

isPartition true
mm
256,1
M
1

return isPartition; end

Algorithm 3: MIAAE
Input: file, threshold (), factor (), M
1

Output: sequence of 0, 1

IAAE(file)
isPartition = isFilePartition(, , |bitin|,
|bitout|, M
1
)

In the evaluation phase, nine tested files are selected from ordinary novels such as A Tale
of Two Cities, Moby Dick and Robinson Crusoe and so forth. Compression ratios of
G00131
March 23-26, 2010
713
each five methods are shown below which are CR-1, CR-M
1
, CR-Gap/2, CR-Gap/3,
CR-Gap/4. Note that column original is the original file size. Column CR-1 is
compression ratio when initial symbol frequencies are equal to 1 (AAE). Column CR-M
1

only implements the first section (normalization section) of M
1
of Algorithm 1. Finally,
column CR-Gap/n, where n{2, 3, 4}, applying the first two sections (normalization and gap
reduction section) of Algorithm 1.
From Table 1, CR-Gap/4 results the best compression ratio at 1.7778 and 1.7806 when
apply Algorithm 1 and 2 respectively. The second lines in Table 1 of the last three methods
are the percentage improvement of CR as comparing to CR-1 of each file. In this experiment,
CR of the file of Native Son improves the highest percentage at 1.2227%. The lowest one is
at the file of Pride and Prejudice at 0.0830%.

Table 1. Compression ratio of various methods on nine English novel files using full
of Algorithm 1 and 2
File
CR -
Methods
Original CR-1
Eliminate unused symbol frequencies
File partition with threshold and factor
CR-M
1

CR-
Gap/2
CR-
Gap/3
CR-Gap/4
A Tale of Two Cities 776,629
1.783
0
1.7840
+0.0802
%
1.7848
+0.0979%
1.7849
+0.1062%
1.7850
+0.1089%
Ethan Frome 203,305
1.783
4
1.7865
+0.4184
%
1.7917
+0.4609%
1.7920
+0.4777%
1.7922
+0.4893%
Heart of Darkness 229,831
1.776
4
1.7791
+0.2410
%
1.7817
+0.2930%
1.7819
+0.3062%
1.7820
+0.3101%
Moby Dick
1,231,97
3
1.782
2
1.7828
+0.0957
%
1.7845
+0.1302%
1.7847
+0.1372%
1.7847
+0.1391%
Native Son 32,020
1.681
6
1.6916
+0.7194
%
1.6994
+1.0562%
1.7016
+1.1850%
1.7022
+1.2227%*
*
Pride and Prejudice 704,158
1.785
9
1.7869
+0.0650
%
1.7873
+0.0787%
1.7874
+0.0820%
1.7874
+0.0830%
Robinson Crusoe 642,573
1.842
6
1.8438
+0.1775
%
1.8462
+0.1977%
1.8463
+0.2017%
1.8463
+0.20430%
Silas Marner 413,529
1.786
5
1.7881
+0.1575
%
1.7897
+0.1813%
1.7899
+0.1913%
1.7899
+0.1935%
The Invisible Man 292,663
1.750
5
1.7525
+0.2464
%
1.7556
+0.2885%
1.7558
+0.3024%
1.7559
+0.3077%
Average CR
1.774
7
1.7772 1.7801 1.7805 1.7806*
is the highest average CR, ** is the highest CR of file

5. CONCLUSION
We propose an Algorithm 1 consisting of three steps: normalization, gap reduction and
eliminating unused symbols. We can also adapt this concept to other languages or file types.
However, in the section of eliminating of unused symbols, the MIAAE will not produce an
output codes if any symbols that below ASCII 10 or beyond 127 are entered to the encoding
process. So, a well-trained of initial symbol probabilities of M
1
is essentially required.
Furthermore, if we train M
1
with other English text files, it will interact with the best value of
threshold and factors as well. Thus, a general value of those two which yield the optimum CR
of English text files are still yet unknown.

G00131
March 23-26, 2010
714

REFERENCES
1. Blelloch, G.,E., Introduction to Data Compression., Computer Science Department,
Carnegie Mellon University, October 16, 2001.
2. Ziv, J., Lempel, A., A Universal Algorithm for Sequential Data Compression, IEEE
Transactions on Information Theory, 1997, 23, 337-343.
3. Huffman, D.A., A method for Construction of Minimum Redundancy Codes, Proc. IRE,
1952, 40, 1098-1101.
4. Apparaju, R., Agarwal, S., An Arithmetic Coding Scheme by Converting the
Multisymbol Alphabet to m-ary Alphabet., Conference on Computational Intelligence
and Multimedia Applications., 2007, 4, 142-146.
5. Kuroki, N., Manabe, T., Numa, M., Adaptive arithmetic coding for image prediction
errors., Circuits and Systems 2004., ISCAS '04., 3, 2004, III - 961-4.
6. Robert, L., Nadarajan, R., Simple lossless preprocessing algorithms for text
compression., Software, 2009, IET 3, 37 45.
7. Otten, F., Irwin, B., Thinyane, H., Evaluating Text Preprocessing to Improve
Compression on Maillogs, ACM International Conference Proceeding Series, 2009, 44-
53.
8. Barbir, A., A New Fast Approximate Arithmetic Coder., 28th Southeastern Symposium
on System Theory (SSST '96), 1996, 482.
9. Kochanek, J., Lansky, J., Uzel, P., Zemlicka, M., The new statistical compression
method: Multistream compression., Applications of Digital Information and Web
Technologies, 2008., ICADIWT 2008., First International Conference, 2008, 320
325.
10. Suwannik, W., Chongstitvatana, P., Solving large scale problems using estimation
distribution algorithm with arithmetic coding., ISCIT '07., 2007, 358 363.
11. Belyaev, E., Gilmutdinov, M., Turlikov, A. Binary Arithmetic Coding System with
Adaptive Probability Estimation by "Virtual Sliding Window"., 2006, IEEE. 1 5.
12. F., Jelinek, Probabilistic Information Theory., New York: McGraw-Hill, 1968, 46-489.
13. Howard, P., G., Vitter, J., S., Practical Implementations of Arithmetic Coding. Image
and Text Compression., Kluwer Academic Publishers, Norwell, MA., 1992, 85-112.
14. Powell M., Evaluating lossless compression methods., 2001.
15. Soyjaudah, K., M., S., Ramsamy, S., A comparative study of context free models of
arithmetic coding., EUROCON'2001, 2 2001, 428 431.
16. Bodden, E., Clasen, M., Kneis, J., Arithmetic Coding revealed A guided tour from
theory to praxis., Sable Research Group, McGill University, 2007.
17. MacQueen, J., B., Some Methods for Classification and Analysis of MultiVariate
Observations., Proc. of the fifth Berkeley Symposium on Mathematical Statistics and
Probability, 1967, 1, 281-297.
18. Bradley, P., S., Fayyad, U., M., Refining Initial Points for K-Means Clustering.,
Proceedings of the Fifteenth international Conference on Machine Learning., J. W.
Shavlik, Ed. Morgan Kaufmann Publishers, San Francisco, CA., 1998, 91-99.
G00133
March 23-26, 2010
715
Application of Optical Data Glove to Hand Gesture
Interpretation

T. Kumkurn
1,C
, and N. Eua-anant
1

1
Department of Computer Engineering Faculty, Khon Kaen University,
123, Moo 16, Ni Muang, Muang, Khon Kaen, 40002, Thailand
C
E-mail: phai_coe@hotmail.com Tel. 089-4199500

ABSTRACT
Data gloves are human-computer interfacing devices that have high potential for using
in various applications. However, due to expensive costs, data gloves are not popular.
This research has developed an optical data glove consisting of a glove, strings,
capillary tubes and a web camera which is inexpensive while possesses the same
functionality. The optical data glove contains strings in which one ends of the strings
are attached to each finger joints of the glove while the opposite ends are inserted into
an array of capillary tubes on the back of the hand. The web camera captures images of
string positions in the capillary array and transfers data to be processed to the computer.
This research has investigated feasibility and performance of the invented data glove in
hand gesture recognition. Two algorithms are used. The first algorithm is pattern
matching by comparing Euclidean distances between positions of strings obtained
when the glove is tested and string positions obtained from recorded prototype gestures.
The second method is the hand gesture pattern recognition using artificial neural
networks. Classification errors obtained from the experiments are reported.

Keywords: ANSCSE14, Mah Fah Luang University, Computational Conferences,
Hand Gesture, Data Glove.

REFERENCES
1. J. Wang, X. Liao, and Z. Yi, Springer-Verlag Berlin Heidelberg, 2005, 157-164.
2. R. Gottig, J. Newton, and S. Kaufmann, Design and Decision Support Systems in
Architecture and Urban Planning, 2004, 99-111.
3. V. I. Pavlovic, R. Sharma,and T. S. Huang, IEEE TRANSACTIONS ON PATTERN
ANALYSIS AND MACHINE INTELLIGENCE, 1997,19 NO. 7, 677-695.
4. Y. Wu and T. S. Huang, IEEE SIGNAL PROCESSING MAGAZINE, May 2001, 51-60.
G00133
March 23-26, 2010

716
Improved Image Watermarking using Pixel Averaging

Thitiporn Pramoun and Thumrongrat Amornraksa
2

Computer Engineering Department, King Mongkuts University of Technology Thonburi
126 Pracha-uthit Rd., Bangmod, Thungkru, Bangkok, 10140, Thailand
. E-mail: thitiporn1016@gmail.com, t_amornraksa@cpe.kmutt.ac.th

ABSTRACT
In this paper, a method based on pixel averaging is proposed and implemented to
improve the watermark retrieval performance of a color image watermarking method.
According to the image watermarking based on the modification of image pixels [6],
where the strength of embedding watermark depends upon the luminance component of
the embedding pixel, we apply the pixel averaging technique to create the localized
luminance component, and then use it to fine tune the watermark signal strength. The
experimental results show the improved performance in term of accuracy of the
retrieved watermark, compared to the previous watermarking methods proposed in [6,
7]. The robustness against attacks i.e. Gaussian noise and JPEG compression is
evaluated and compared as well.

Keywords: Digital watermarking, pixel averaging

1. INTRODUCTION
In digital communication networks, digitized information transmitted through networks,
such as video streaming, images, voice, etc., is frequently referred to as digital multimedia.
The main advantage of using digital multimedia is that it can be easily exchanged and
distributed between users in the networks. Furthermore, it can be reproduced repeatedly
without any difference compared to the original copy. As a result, it leads to copyright related
problems e.g. someone may reproduce his own copy and then distribute it to the others,
without permission from its original owner. In this circumstance, digital watermarking
technique is considered and introduced as a way to discourage people from making and
distributing illegal made copies. The technique works in such a way that the watermark signal
embedded in the distributed media can be extracted and used to provide authentication and
copyright related information, and hence protect the ownership of the violated media. To
implement the digital watermarking efficiently, the techniques or algorithms of digital
watermarking should fulfill some basic requirements as mentioned in [1].
In the past, the watermarking techniques have been developed in different ways, such as
[2], [3] and [4]. In this research, a digital watermarking technique based on the modifications
of image pixels was considered since the capacity of watermark that can be embedded into an
image is vast compared to the other watermarking techniques. For example, M. Kutter et al
[5] proposed a method to embed a watermark bit into an image pixel in blue channel by
modifying that pixel using either additive or subtractive, depending on the watermark bit, and
proportional to the luminance of the embedding pixel. Later, T. Amornraksa et al [6]
proposed some techniques to enhance its watermark retrieval performance by balancing the
watermark bits around the embedding pixels, tuning the strength of the embedding watermark
in accordance with the nearby luminance, and reducing the bias in the process of predicting
the original image pixel from the surrounding watermarked pixels. The authors also
demonstrated how to embed a watermark image (logo) into a color image having the same
resolution i.e. by embedding a mn watermark bits into a mn image pixels. Recently, N.
Mettripun el at [7] proposed the technique to improve the quality of the watermarked images
using Human Visual System (HVS) based on DWT masking. Accordingly, they used the
DWT to create the approximation of host image, and used the luminance from such image
G00133
March 23-26, 2010

717
(the approximated one) to fine tune the strength of the embedding watermark. The
experimental results, based on the above concept, showed the improvements in terms of both
PSNR and NC, compared to the previous technique proposed in [6].
In this paper, a different method is considered and proposed to further improve the
retrieval performance of the image watermarking based on the modification of image pixels.
In the proposed method, we apply the technique of pixel averaging to directly create the
localized luminance component, and used it to fine tune the watermark signal strength. Sets of
experiments were carried out to verify the effectiveness of our approach and to find out the
optimum setting for a practical watermarking system. In the section 2, brief concept of digital
watermarking based on the modifications of image pixels is presented. In section 3, the
factors that affect the watermark retrieval performance are analyzed, and the watermarking
method based on the luminance pixel averaging technique is proposed and described. In
section 4, the experimental setting was given, and the results were shown and discussed. The
conclusions were finally drawn in section 5.

2. THE PREVIOUS WATERMARKING METHOD
Principally, a unique binary bit-stream is generated and considered as a watermark w(i,j)
{1,-1} to be embedded into an image. It is then permuted, using XOR operation, with a
pseudo-random bit-stream generated from a key-based stream cipher to improve the balance
of w around (i,j). The watermark embedding is performed by modifying the blue component
at a given coordinate (i,j), in a line scan fashion. Note that the blue component is selected to
be watermarked because it is the one that human eye is least sensitive to [5]. The
modifications of the blue component in each pixel B(i,j) are either additive or subtractive,
depending on w(i,j), and proportional to the modification of luminance of the embedding
pixel L(i,j) = 0.299R(i,j)+0.587G(i,j)+ 0.114B(i,j). Due to the fact that changes in high
luminance pixels are less perceptible to the human eye, the luminance value is hence
considered and used for tuning the strength of watermark, so that more energy of watermark
can be added to achieve a higher level of robustness. Notice that the modification of
luminance L(i,j) is obtained from a Gaussian pixel weighting mask [6]. The watermarked
pixel B(i,j) is expressed as equation (1)
) , ( ' ) , ( ) , ( ) , ( ' j i sL j i w j i B j i B (1)
where s is a watermark signal strength and considered as a scaling factor applied to the
whole image frame. In practice, s must be carefully selected to obtain the best trade-off
between imperceptibility and robustness. At the other end, the embedded watermark can be
retrieved based on two assumptions. First, any pixel value within an image is close to its
surrounding neighbors, so that a pixel value at a given coordinate (i,j) can be estimated by the
average of its nearby pixel values. Second, the summation of w around (i,j) is close to zero, so
that the embedded bit at (i,j) can be estimated by the following equation
) ) , ( ' ) , ( ' (
8
1
) , ( ' ) , ( '
1
1
1
1

m n
j i B n j m i B j i B j i w
(2)
Since w(i,j) can be either 1 and -1, we set the value of w(i,j) = 0 as a threshold. As a
result, the sign of w(i,j) is used to estimate the value of the embedded bit, i.e. if w(i,j) is
positive (or negative), w(i,j) is 1 (or -1, respectively).

3. THE PROPOSED WATERMARKING METHOD
To analyze the factors that influence the watermark retrieval performance, equation (2) is
rewritten by
G00133
March 23-26, 2010

718
)) , ( ' ) , ( ) , ( ' ) , ( (
8
1

) ) , ( ) , ( (
8
1

) , ( ' ) , ( ) , ( ) , ( '
1
1
1
1
1
1
1
1
j i sL j i w n j m i sL n j m i w
n m B n j m i B
j i sL j i w j i B j i w
m n
m n

(3)
The first and second terms in the right-hand side represent the original pixel value and
the watermark energy at (i,j), while the third and fourth terms represent the prediction of B(i,j)
and the summation of watermark energy around (i,j), respectively. Therefore, the watermark
energy at (i,j) can be recovered back if the first term equals the third term, and the fourth term
equals zero. According to the assumptions previously made, if the first assumption holds, the
difference between the first and the third terms should approach zero, and if the second
assumption holds, the summation of watermark energy around (i,j) should approach zero.
However, the value of L(i,j) and its surrounding pixels in the fourth term are different, and it
results in different watermark energy around (i,j), and hence make the fourth term always
nonzero. It can be essentially said that the estimation of w(i,j) will now depend on the second
term, which is proportional to s and L(i,j), and the remaining of the fourth term. The accuracy
of the estimation may be worse if the remaining value has opposite sign to the second term. It
is obvious from the above analysis that the value of L(i,j) should be locally modified to make
the forth term approaches zero. That is, if the L(i,j) around (i,j) are equal, equation 3 can be
rewritten by
) ) , ( ) , ( (
8
) , ( '

) , ( ' ) , (
1
1
1
1
) , ( '

m n
j i w n j m i w
j i sL
j i sL j i w j i w
(4)
and if the second assumption holds, this time, the forth term should approach zero. Based
on the above analysis, the value of L(i,j) was then locally modified by using the technique of
pixel averaging. In other words, the new value of L(i,j) was derived from the approximation
of the luminance of host image, which was created by using the pixel averaging technique. In
the experiments, the size of the luminance block used for pixel averaging was varied from 2
to 2
a
, a = 1, 2, 3, and the approximation of the host image luminance can be determined by
the following equation.
1,2,3,... and 2 where
), ) , ( (
2
1
) , (
a
1 2
0
1 2
0

n n j i
n j m i L j i L
a a
m n
a
(5)
and the new watermarked pixel B(i,j) is determined by
) , ( ) , ( ) , ( ) , ( ' j i L s j i w j i B j i B
(6)
where ) , ( j i L is a localized luminance component at a given coordinate (j,j).Figure 1
shows example of the pixel averaging results at the block size of 44 pixels.

(a) Original pixel values (b) Averaged pixel values
Figure. 1. Example of pixel averaging process at level a = 2

Figure 2 shows different versions of the luminance components of the original image
Tower from the pixel averaging process at various levels. To implement our proposed
method, the luminance pixel averaging, in the watermark embedding process, the L(i,j) in
G00133
March 23-26, 2010

719
equation (1) is simply replaced by the ) , ( j i L at the same position (i,j). For the watermark
retrieval, the process still be the same.

(a) a = 1 (b) a = 2 (c) a = 3 (d) a = 4

(e) a = 5 (f) a = 6 (g) a = 7 (h) a = 8
Figure 2. The results of luminance pixel averaging at various levels.

4. EXPERIMENTAL SETTING AND RESULT
In our experiments, four 256256 pixels color images having various characteristics,
Bird, Fish, Lena and Tower, were used as original testing images. A 256256 pixels
black & white image containing a logo CPE 2009 was used as a watermark i.e. by
considering the black color pixel as -1, and white as 1. Some original host images and the
watermark logo are shown in Figure 3.

(a) Bird (b) Fish (d) CPE 2009 (d) CPE 2009
Figure 3. Some original testing images and the watermark logo
To evaluate the quality of the watermarked images, PSNR (Peak Signal-to-Noise
Ratio) was considered and used. After retrieving the embedded watermark, its quality was
evaluated by NC (Normal Correlation). The NC can be calculated as follows:

M
i
N
j
M
i
N
j
M
i
N
j
j i w j i w
j i w j i w
NC
1 1
2
1 1
2
1 1
) , ( ' ) , (
) , ( ' ) , (
(7)
where w(i,j) and w(i,j) are the original and the extracted watermark bits at pixel (i,j),
respectively. In the first experiment, we determined the suitable value of watermark
strength to achieve the optimum performance between PSNR and NC. As a result, the
signal strength of 0.1, that gave the PSNR 29.9 dB and the NC 0.87, was selected and
used in the remaining experiments. We next compared the performance of our proposed
watermarking method to the previous methods proposed in [6] and [7]. To obtain a fair
comparison, we adjusted the signal strength in each watermarking method until they all
achieved the equivalent image quality at PSNR 29.9 dB with the difference of less than
0.03 dB. Then, the accuracy of the retrieved watermark from different methods was
measured and compared. The results of average NC values obtained from three
watermarking methods at different pixel block sizes are shown and compared in Figure4.
Notice that the previous two watermarking methods obtained the same NC value at every
level, a.

G00133
March 23-26, 2010

720

0.857
0.862
0.867
0.872
0.877
0.882
1 2 3 4 5 6 7 8
Block level
A
v
e
r
a
g
e

N
C

v
a
l
u
e
previous method in [6]
proposed method

Figure 4. Average NC values at various pixel block sizes

As can be clearly seen from the above figure, the performance of the proposed
method proportionally improved in accordance with the increase of the pixel block size.
The best result, judged from the highest value of NC, obtained at the level a = 8 or at the
pixel block size of 256256 pixels. Also, at a = 5 and above, the performance of our
proposed method became superior to the previous ones. This is because of a lower
numbers of sudden change values during the watermark retrieval process.
To implement our watermark method, a threshold must be established in order to
determine the existence of genius embedded watermark. Theoretically, the calculation of
NC between two different watermarks, after XORing with a pseudo-random bit-stream,
results in the NC value of approximately 0.5. We then computed the NC value between the
original watermark and the one obtained directly from the non-watermarked testing image.
The highest NC value obtained from the image Fish was 0.672. We thus used this NC
value as the threshold to validate the retrieved watermark. The next experiment
demonstrated the efficiency of the proposed watermarking method against attacks. At this
stage, we selected two common attacks, namely additive Gaussian distributed noise, to
simulate the additive noise from communication channels, and JPEG compression
standard, which is the most popular image compression standard widely used nowadays. In
the experiment, the watermark strength was set to 0.1, while the block size of luminance
pixels was set to 256256 pixels. For the attacks applied, the variances () of the zero
mean additive Gaussian distributed noise was varied from 0.005 to 0.09, while the image
quality of the JPEG compression standard was varied from 85 to 5%. The plots of the
average NC values of the retrieved watermarks after being attacked are shown in Figure 5
and 6.

0.69
0.71
0.73
0.75
0.77
0.005 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
Variance of additive Gaussian distributed noise
A
v
e
r
a
g
e

N
C

v
a
l
u
e
proposed method

Figure 5. Average NC values at various of Gaussian noise

G00133
March 23-26, 2010

721

0.668
0.673
0.678
0.683
0.688
0.693
85% 75% 65% 55% 45% 35% 25% 15% 5%
JPEG Quality (%)
A
v
e
r
a
g
e

N
C

v
a
l
u
e
proposed method
threshold

Figure 6. Average NC values at various JPEG qualities

It is obvious from the two figures that, in case of Gaussian distributed noise, our proposed
method achieved a slight higher average NC value than the other two. For JPEG compression
standard however it can be said that the performance of three different methods was
equivalent, judged from the closed NC values given in the figure 6. It should be note that even
if the JPEG compression standard at 20% image quality was applied, the valid retrieved
watermark can still be obtained.

5. CONCLUSION
We have presented in this paper that by locally modifying the luminance component of the
host image in the watermark strength tuning process, the improvement of the watermark
retrieval process can be obtained. The experimental results showed the improved accuracy in
term of NC obtained from our approach, compared to the previous methods proposed in [6]
and [7].

REFERENCES
1. C. De Vleeschouwer, J.F. Delaigle, and B. Macq, Invisibility and application
functionalities in perceptual watermarking and overview, Proc. of the IEEE, vol. 90, pp. 64-
72, 2002.
2. G. OBrien and G. Cook. Effects of misalignment on pixel averaging when scaling
templates and images before scanning, Proc. of IEEE Southeastcon, Nasville, USA, pp. 399-
406, 2000.
3. J.F. Delaigle, C. De Vleeschouwer and B. Macq, Watermarking algorithm based on a
human visual model, Signal Processing, vol. 24, no. 2, pp. 319-335, May 1998.
4. S. Zaboli, A Tabibiazar, and R. Safabakhsh, Entropy-based image watermarking using
DWT and HVS, 3
rd
Int. Con.: Sciences of Electronic, Technologies of Info. and Telecom.,
Tunisia, 27-31 March 2005.
5. M. Kutter, F. Jordan, and F. Bossen, Digital signature of color images using amplitude
modulation, Journal of Electronic Imaging, vol. 7, pp. 326-332, 1997.
6. T. Amornraksa and K. Janthawongwilai, Enhanced images watermarking based on
amplitude modulation, Image and Vision Computing, vol. 24, no. 2, pp. 111-119, 2006.
7. N. Mettripun, S. Tachaphetpiboon and T. Amornraksa, Digital watermarking based on
human visual system using DWT marking, Proc. of the JCSSC 2009, Phuket, Thailand, vol.
1, pp. 268-278, 13-15 May 2009.

G00134
March 23-26, 2010
722
Flexible Grammar Recognization Algorithm

N. Rujeerapaiboon
A
, and A. Surarerks
B
.
Department of Computer Engineering, Faculty of Engineering,
E-mail:
A
napat.r13@gmail.com,
B
athasit.s@chula.ac.th; Fax: 02-2186955; Tel. 02-2186996

ABSTRACT
We present new mathematical model used for describing a data set. We named this
model Flexible Grammar which extends from the deterministic finite automata or
regular grammar. The original model, deterministic finite automata, is a discrete model
that can describe any regular language (set of sequences of symbols) by providing a
mechanism for recognizing members of a language and others. However, this model
has one major disadvantage. Deterministic finite automata are not suitable for coping
with some kinds of data sets; in other words, it is not flexible enough to describe some
kinds of data sets with any noise or outlier. Due to this lack of robustness, many kinds
of applications are unable adopt this model in order to achieve their goals. For this
reason, we have been developed the new model as mentioned above that can express
data sets with noise and outlier more accurately and meaningfully. Besides, the Flexible
Grammar can effectively handle small internal errors in each data element.
Constructing this model, we combined two principal concepts. The first is about how to
inference grammars and another one is about number representations. Overall, the
purpose of this paper is to provide a learner algorithm to construct a Flexible Grammar
that fit an input data set and reach the flexibility level determined by any user. In
addition, this model can be used further in various kinds of applications. For example,
all of us can use this model for clustering which is defined as the process of partitioning
a data set into several clusters, based on relationships between each element in that set.

Keywords: Regular Grammar, Grammar Inference, Deterministic Finite Automata.

1. INTRODUCTION
At the present, various applications employ data models in order to manage their
information effectively and efficiently. This paper is mainly interested in a concept of pattern
recognition which is one of the popular branches for clustering. There are many researches on
pattern recognition, for example, artificial neural network, genetic algorithm and transition
model. We are interested in a transition model which is a model constructed by finding rules
and regulations that represent syntaxes of the strings - sequences of symbols. We can
demonstrate this model in another view named grammar. The concept of learning a data set to
create a corresponding grammar is called grammar inference

[1-3].
The purpose of this paper is to present a flexible grammar which is a flexible model that
can robust to noise and outliers and to introduce an algorithm for constructing the mentioned
model. This flexible grammar is capable of describing any input data set. Thus, we can further
use this model for clustering an input data set, especially for regular language.
This paper is organized as follows: Section 2 reviews some knowledge on flexible interval
representation and on-line computation. Our model together with its proof is introduced in
Section 3. The results and discussions are focused in Section 4. Section 5 concludes this work.
G00134
March 23-26, 2010
723
2. PRELIMINARIES

First, let us restate some definitions and notations used in this paper. We start from interval
arithmetic and the flexible interval representation system. We terminate this section by
recalling the notion of an on-line finite automaton.

2.1. INTERVAL ARITHMETIC
An interval is described by a pair of two numbers which represents all numbers in the
range. This system enables arithmetic operations to be performed with guarantee error
bounds. In detail, the interval [x, y] represents the set of all real numbers r such that x r y
where x and y are named as lower-endpoint and upper-endpoint respectively. This system is
useful for working with uncertainty data. The basic operations (+///) of interval arithmetic
are defined as follows:
[a, b] + [c, d] = [a + c, b + d]
[a, b] [c, d] = [a d, b c]
[a, b] [c, d] = [min(a c, a d, b c, b d), max(a c, a d, b c, b d)]
[a, b] [c, d] = [min(a c, a d, b c, b d), max(a c, a d, b c, b d)]
It is remarked that division by an interval containing the number zero are not defined.

2.2.FLEXIBLE INTERVAL REPRESENTATION SYSTEM
Flexible interval representation system is proposed for representing an interval as one
sequence of symbol. The advantage of this system is that arithmetic computation can be
realized similar to the classical redundant number system. The definitions of this system and
the numerical values are described as follows.

Definition 1
Flexible interval representation system is an interval system that composes of the binary base
and a digit set D,
D = {0, 1, , o, , -o, -}
such that the upper-endpoint and lower-endpoint are shown in Table 1. Note that - and -
are the additive inverse of and respectively.

Table 1.The numerical value of flexible digits
Flexible
Digit
Endpoint
Lower Upper
0 0 0
1 1 1

-1 -1
o 0 1
-o 0 -1
-1 0
- 1 0

Definition 2
A representation X = x
n
x
n-1
x
n-2
x
0
in flexible interval representation system is used to
represent an interval [a, b] where

G00134
March 23-26, 2010
724

It is remarked that flexible interval representation system is a redundant system (i.e., an
interval can have more than one representation in the system.). Some examples for FIRS are
given in Table 2.

Table 2. Some example for FIRS
Interval Representation in FIRS
[7, 10] 10, 1 0
[-3, 8] 0
[-12, 2] 0

Converting algorithm named IntervalToFIR is an algorithm used for converting a
traditional interval representation system which is defined by lower-endpoint and upper-
endpoint to a flexible interval representation system. The detail of IntervalToFIR algorithm
can be found in [4], and here we will present you about this algorithm in short.

Algorithm IntervalToFIR
Input interval [A, B]
A = a
0
a
1
a
n
where a
i
e{0, 1, 1}
B = b
0
b
1
b
n
where b
i
e{0, 1, 1}
Output S = s
0
s
1
s
n
where s
i
e{0, 1, , , 1}
Begin
C = B A where C is binary number
i 0
while(not-end-of-data) do
if(c
i
= 1) then c
i

i i +1
endif
enddo
S C + A
End

An algorithm for computing the basic operations between two interval
representations can be found in [4].

2.3.DETERMINISTIC FINITE AUTOMATON
Briefly, Deterministic finite automata or DFA is a discrete model that can describe any
regular languages (set of sequences of symbols) by providing a mechanism for recognizing
members of a language and others. DFA is composed of five components (Q, , o, q
0
,A). The
symbol Q stands for a finite set of states. The symbol stands for a finite set of input
characters. The q
0
symbol stands for the initial state. The symbol A is a set of accepted states,
and o: Q Q is a transition function.

3. FLEXIBLE GRAMMAR RECOGNIZATION
Constructing a flexible grammar for describing a data set (which has only positive
samples) has several steps which we will describe later. We use two majors concept to
construct such a model flexible interval representation system and grammar inference. In
detail, we will construct a special kind of deterministic finite automata called prefix tree
automata (PTA) from all of the positive samples by applying flexible interval representations
as its characters.
G00134
March 23-26, 2010
725
First, we introduce background knowledge about PTA, which is a specialized version of
deterministic finite automaton. In fact, PTA is constructed from a set of positive samples S
+

(note that, in this paper negative samples are not necessary for constructing the model). The
characteristic of a PTA is that any two different strings containing a common prefix must
have a common prefix path started from the initial state.

Example 1
Given a set of positive examples, S
+
= {a, abc, bac, abbc}, the prefix tree automata
corresponding to S
+
is illustrated by Figure 1.

Figure1. PTA which is corresponding to the set {a, abc, bac, abbc}.

To simplify the latter part of this paper, we will give you some more notations and theirs
definitions.

Definition 3
In a given deterministic finite automata M (Q, , o, q
0
,A), we define function t:Q2
as
follow
t(cs) = {c | ((cs,c),ns) eo, Q ns e - }.

Definition 4
For a given flexible interval representation I, and a given binary number b, a notation
distance (I, B) is refer to the distance between I and B. The formal definition for distance (I,
B) is defined as follow
distance (I, B) = |) B i (| max
I i
e
.

In this paper, we mainly focus on constructing a model for describing any input data set
which consists of sequences of numeral values, for example S
+
= {(12, 34, 54), (10, 38, 57),
(13, 37, 55, 2)}. We use the symbol d denoting for sequence in the input data set and use a
notation length (d) denoting for the number of numeral values in sequence d. For each
sequence d, a notation d[i] stands for the i
th
element in sequence d. For example, considering
a sequence d = (12, 34, 54), length (d) is 3, d[1] is 12, d[2] is 34 and d[3] is 54.
The following page is an algorithm for constructing a PTA from an input data set.
G00134
March 23-26, 2010
726
Algorithm : Constructing PTA from a set of positive samples

Input : S
+
: A set of positive samples
MIS: The maximum acceptable interval size
N: The number of required bits used for representing each numeral value
Output : M (Q, , o, q
0
,A): A deterministic finite automaton corresponding to S
+

Q A set of states
A finite set of input characters
o A transition function
q
0
The initial state
A A set of accepted states
begin
Q {q
0
}
A, o
numState 1
A set of all flexible interval representation with length N
foreach d in S
+

cs q
0

for(i = 1 to length(d))
if there exist an interval m et(m) which satisfies the condition d[i] e
m
ns o(cs, m)
endif
else
m null
let m be in t(cs) which is the most closet to d[i], if any
if m!= null and the distance(m, d[i]) is less than MIS
ns o(cs, m)
o o {((cs, m), ns)}
t(cs) t(cs) {m}
lower-endpoint min(d[i], min(m))
upper-endpoint max(d[i], max(m))
m
1
intervalToFIR(lower-endpoint, upper-
endpoint)
o o {((cs, m
1
), ns)}
t(cs) t(cs) {m
1
}
endif
else
o o {((cs, d[i]), q
numState
)}
t(cs) t(cs) {d[i]}
ns q
numState

Q Q {q
numState
}
t(q
numState
)
numState numState + 1
endelse
endelse
cs ns
endfor
A A {cs}
endfor
end

G00134
March 23-26, 2010
727
The data set that we used in our experiment is a set of ECG (electrocardiogram) which is
an electrical recording of the heard from Chulalongkorn Hospital. The data contain three
normal adults 4-hour ECG. Our model is used to construct the characteristic of each lead of
ECG, where one lead is illustrated by a sequence of about two hundreds of integers varied
from -7000 to 10,000. The maximum acceptable interval size is fixed to 256. The grammar of
ECG learned from our algorithm is shown in Figure 2. Comparing to previous related work
[5] which produces 73.45 %recognition, It is obtained that the recognition accuracy for ECG
can be increased.

Figure2. Electrocardiogram recognized from our model

Table 3. shows the comparison results between all data sets and the grammar results.
Data set No of leads No. of recognized leads % recognition
1 15,000 13,993 93.29 %
2 8,000 7,655 95.69 %
3 11,000 10,908 99.16 %
Total 34,000 32,556 95.75 %

5. CONCLUSION
We propose a flexible model and learning algorithm for recognizing a grammar
(characteristic) using some positive samples. Efficiency of an algorithm depends on the
maximum acceptable interval size which can be identified by user. The percent of recognition
increases by comparing to the previous work.

REFERENCES

1. K. Nakamura & M. Matsumoto. Incremental learning of context free grammars based on
bottom-up parsing and search, Pattern Recognition, 2005, 38(9), 1384-1392.
2. Y. Sakakibara, Learning context-free grammars using tabular representations. Pattern
Recognition, 2005, 38(9), 1372-1383.
3. P. Langley, Stromsten S. Learning context-free grammars with a simplicity bias. Machine
Learning: ECML 2000, LNAI 1810, Springer, Berlin, 2000, 336-355.
4. P. Thienprapasith & A. Surarerks, A Flexible Interval Representation System and Its
Fundamental Arithmetic Operations, In Proceeding of the 5th International Conference on
Information Technology and Applications (ICITA 2008), Cairns, Queensland, Australia,
June 23-26, 2008.
5. S. Tangtidtham, Syntactic Electrocardiography Classification. Master thesis,
Chulalongkorn University, Bangkok, Thailand, 2008.
G00135 728
Classification of Loan Borrowers of National Pension
and Provident Fund of Bhutan: A Case Study

Kinzang Wangdi
1
, Akara Prayote
2
, Utomporn Phalavonk
3
1
Department of Information Technology, Faculty of Information Technology,
2
Department of Computer and Information Science, Email: akarap@kmutnb.ac.th
3
Department of Mathematics, Email: upv@kmutnb.ac.th
King Mongkuts University of Technology North Bangkok
1518 Pibulsongkram, Bangsue, Bangkok, Thailand 10800

ABSTRACT
Since the introduction of member financing schemes such as Housing and
Education loan by the National Pension and Provident Fund (NPPF) to its
members. The numbers of applicants are growing every year and granting loans
itself have become important decision for the Loan Officials to avoid potential risk
in future. The credibility of current evaluation criteria such as repayment capacity
is at stake as it has been observed that the occurrence of repayment defaulters and
Non Performing Loan (NPL) are increasing and there are a number of cases that the
mortgages have to be ceased. In this study, data of borrowers is investigated and
different ANN models are constructed to classify loan applicants. Back propagation
algorithm is used to train networks. An evaluation of classification efficiency is
conducted on F-measure. So far the best model can classify loan applicants at
90.02%.

Keywords: Classification, Loan Applicant, Multilayer Perceptron

G00137
March 23-26, 2010
729
Implementation of QRS detection with Python
on Linux system

C. Suwansaroj
1,C
, D. Thanapatay
2
, C. Thanawattano
3
and N. SUGINO
4
1
Department of Electrical EngineeringKasetsart University, Bangkok, Thailand
2
Department of Electrical Engineering, Faculty of Engineering, Kasetsart University, Bangkok,
Thailand
3
National Electronics and Computer Technology Center(NECTEC) , Thailand
4
The Department of Information processing, Tokyo Institute of Technology, JAPAN
C
E-mail: chaiwat.naam@gmail.com; Tel. 02-9428555 ext. 1540

ABSTRACT
This paper presents an implementation of QRS detection on x86 based Linux operating
system. Python is used for programming. Python has a good library for programming
not only networking and database but also numerical method and signal processing. For
operating system for x86-platform we choose Linux operating system since it has a
versatile networking features and is supported with many open-source applications.

Keywords: ECG, QRS complex detection, Python programming.

REFERENCES
1. Hong Ming, Zhang Yajun and Hu Xiaoping, BioMedical Engineering and Informatics,
2008, 2, 667 671.
2. Dong Ik Shin, Soo Jin Huh, and Pil June Pak, Information Technology Applications in
Biomedicine, 2007, 313 315.
3. Dagtas, S., Pekhteryev, G., and Sahinoglu, Z., Advanced Information Networking and
Applications Workshops, 2007, 2, 782 786.
4. Shuo-Jen Hsu, Hsin-Hsien Wu, Shih-Wei Chen, Tsang-Chi Liu, Wen-Tzeng Huang,
Yuan-Jen Chang, Chin-Hsing Chen and You-Yin Chen, Multimedia and Ubiquitous
Engineering, 2008, 597 604.
5. Julong Pan, Shanping Li and Zhendong Wu, Internet Computing in Science and
Engineering, 2008, 160 165.
6. Shuo Tang, Weng Chi Chan, Vai, M.I. and Peng Un Mak, Engineering in Medicine and
Biology Society, 2004, 2, 3116 3119.
7. Hamilton, Patrick S. and Tompkins, Willis J., Biomedical Engineering, IEEE
Transactions, 1986, 33(12), 1157 1165.
8. Pan, Jiapu and Tompkins, Willis J., Biomedical Engineering, IEEE Transactions, 1985,
32(3), 230 236.
9. Benitez, D.S., Gaydecki, P.A., Zaidi, A. and Fitzpatrick, A.P., Computers in Cardiology,
2000, 379 382.
10. de Oliveira, F.I. and Cortez, P.C., Machine Learning for Signal Processing, 2004,
Proceedings of the 2004 14th IEEE Signal Processing Society Workshop, 481 489.
G00138
March 23-26, 2010
730
Parallel Additive Operation in Flexible Interval
Representation System

K. Worrasangasilpa
1,A
, W. Jarangkul
1,B
, A. Surarerks
2,C

1
Mahidol Wittayanusorn School, 364, Moo5,Salaya, Phutthamonthon, Nakorn Prathom, 73170,
Thailand
2
Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330,
Thailand
E-mail:
A
kawin_earth@hotmail.com,
B
naruto_yondaime_konoha@hotmail.com,
C
athasit.s@chula.ac.th;Fax: 02-2186955; Tel. 02-2186959

ABSTRACT
Major problem in a domain of computer arithmetic concerns how computational time
can be speeded up. Many researches focussed on introducing high speed computing
techniques. However, the computation may not always produce the exact value.
Therefore, interval representation system is established to handle the problem. Since an
interval is a pair of numbers, it is guaranteed that uncertainty in the input data can be
represented in this system. The space used and computational time for interal arithmetic
is very high. Flexible interval representation system is introduced in order to solve such
problems. Our binary system contains some additional flexible or interval digits (i.e., a
digit that can represent an interval). The computational space can be reduced up to
twenty-five percents comparing with space used for the classical signed-digit interval
representation system. Unfortunately, previous results shows that arithmetic operations
in this system can be performed in an on-line manner. This paper focuses on how a
parallel additive operation can be applied for the flexible interval representation system.
We propose a novel parallel addition algorithm for this system. Our technique can be
realized by dividing an interval representation to several groups of digits, and a
proposed digit-set conversion algorithm (together with the proof of correctness) is also
introduced into each group separately. Theoretical results illustrate that the algorithm
can be performed with a group of three digits.

Keywords: Interval Arithmetic, Flexible Interval Representation, Redundant Number
System

1. INTRODUCTION
Scientific computations performed today usually use the floating point model. Inaccurate
results detection is a very hard problem in floating point computation, see detail in [1-2].
Most computable operations in floating point arithmetic may result in a rounding error [3].
The interval arithmetic is realized for handling two types of inexact arithmetic. First, the
classical computer system meets the problem of round-off error cause by the finite
representation of real number. Round-off error is usually occurred during the computation
process. Some interesting effects are studied in [2]. Second, uncertainty in the input data
affects the correct value of the output data. Fundamental arithmetic operations on interval
need high computational time in the case where an interval is classically represented by two
numbers: lower and upper endpoints.
In order to reduce the computational time, redundant number systems such as signed digit
number representation system can be applied for representing both endpoints. In binary
signed digit system, one digit needs two-bit representation. Consequently, the size of the
representation becomes double. In 2008, Thienprapasith and Surarerks [4] introduced an
interval representation called flexible interval representation system where interval is denoted
G00138
March 23-26, 2010
731
by one sequence of digits. The concept of this system is to illustrate each digit by an interval.
These special digits are called flexible digits. Fundamental arithmetic operations are showed
to be computable in serial manner as illustrated by [4]. The modified system was proposed by
Sukontarach and Surarerks [5] in order to parallelize the operations. The concept is to
introduce two more flexible digits. The obtained system contains many steps of computation.
In this work, we concentrate on how we can simplify the algorithm and decrease the
complexity of amount of work done.
In this paper, we point on an algorithm for additive operations (i.e., addition and
subtraction) for the flexible interval representation system together with its proof.
This paper is organized as follows: Section 2 recalls some theoretical background on
redundant number system and flexible interval representation system. The proposed algorithm
is introduced in Section 3. An example is also demonstrated. The paper is terminated by some
discussion in Section 4.

2. REDUNDANT NUMBER & INTERVAL SYSTEMS
This section, we recall some theory of redundant number system which is necessary for
parallel computation. The definition and concept of flexible interval representation are also
expressed in this section.

2.1. REDUNDANT NUMBER REPRESENTATION SYSTEM
A redundant number system (, D) consists of the base which can be a real or complex
number such that || || > 1 and a finite contiguous digit set D of real or complex numbers such
that D = {-a, -a+1,,-1, 0, 1,, a-1, a} where a e Z with /2 a and |D| > . A -
representation x on D is a sequence of digits in D as follows:
x = (x
n
x
n-1
x
0
.x
-1
x
-2
)
,

where x
i
e D for i n and for some n e Z. The numerical value of x in base , denoted by
||X||, is equal to
||X|| =
=
n
i
x
i
n
.
The characteristic of a redundant number system is that there exists at least one number
can have more than one representation. For example, a binary signed-digit number system
composes of = 2 and D = { -1, 0, 1 }, we can see that 5 can be expressed as 101
2
or 1(-1)1(-
1)
2
.

2.2. FLEXIBLE INTERVAL REPRESENTATION SYSTEM
This concept of the representation is to introduce some additional interval digits named
flexible digits. A flexible digit can represent many digits in the same time.

Definition 1.
Flexible interval representation (FIR) consists of the base = 2 and a digit set D,

D = {-1, , -, 0, ,-, 1}
such that o and are called flexible digits. Flexible digit denotes the interval [0, 1] and
denotes the interval [-1, 0]. Note that , the additive inverse of , denotes the interval [0, -1]
and , the additive inverse of , denotes the interval [1, 0].

Definition 2.
The representation X = x
n
x
n-1
x
0
in FIRS is used to represent an interval [a, b] if

= =
= =
n
0 i
i
i
n
0 i
i
i
2 x b 2 x a ) ( max ) ( min

G00138
March 23-26, 2010
732
An important property of FIRS is to represent an interval using only one number or
representation (i.e., a single sequence of digits.) Some examples for FIRS are shown in Table
1.

Table 1. Some examples for flexible interval representations
Interval FIR
[51, 86] 1(-1)01 or 1(-1)1(-1)
[-82, 5] 1(-)(-1)(-)
[-31, 91] 0(-1)0(-)(-)1

The completeness of the representation system is proved in [4]. The system is also
redundant. For instance, [51, 86] can has more than one representation as shown in Table 1.

3. PARALLEL ADDITIVE OPERATION ALGORITHM
One of the most important for fundamental arithmetic operation is addition (subtraction).
This section we introduce a parallel algorithm for performing addition for two flexible
interval representations. To simplify the algorithm, the process is separated into two steps;
first, intermediate digits are parallel produced by performing addition digit-to-digit. Second,
certain intermediate digits which are not arrowed must be removed from the final
representation.

Theorem 1.
Addition of two numbers in flexible interval representation system can be realized in parallel
manner.

Proof: The proof of the theorem can be realized by introducing two algorithms; Intermediate
addition and normalization algorithms. The Intermediate addition algorithm starts by
performing addition digit by digit separately. It is obtained that digits of the intermediate
result should be in the set {-1, , 0, , 1,- ,- } { ,- } where represents the interval [-1,
1] and , the additive inverse of , represents the interval [1, -1]. The normalization
algorithm must remove all unsatisfied digits, and , from the intermediate result. The final
representation will be valid in the flexible representation system.

Algorithm 1: Intermediate Addition
input FIRS X,Y X = X
n
X
n-1
X
0

Y = Y
n
Yn-1Y
0
where X
i
Y
i
{-1, , 0, , 1,- ,- }
output FIRS Z Z = Z
n
Z
n-1
Z
0
where Z
i
D { ,- }
begin
for each i
X
i
= [a
i
, b
i
]
Y
i
= [c
i
, d
i
]
X
i
+Y
i
= [a
i
+ c
i
, b
i
+ d
i
] = 2C
i+1
+ S
i
= 2[r
i+1
, s
i+1
] + [t
i
, u
i
]
where Case a
i
+c
i
= -2 r
i+1
= -1, t
i
= 0
Case a
i
+c
i
= -1 r
i+1
= -1, t
i
= 1 if a
i-1
+c
i-1
< 0
r
i+1
= 0, t
i
= -1 if a
i-1
+c
i-1
0
Case a
i
+c
i
= 0 r
i+1
= 0, t
i
= 0
Case a
i
+c
i
= 1 r
i+1
= 0, t
i
= 1 if a
i-1
+c
i-1
< 0
r
i+1
= 1, t
i
= -1 if a
i-1
+c
i-1
0
Case a
i
+c
i
= 2 r
i+1
= 1, t
i
= 0

where Case b
i
+d
i
= -2 s
i+1
= -1, u
i
= 0
G00138
March 23-26, 2010
733
Case b
i
+d
i
= -1 s
i+1
= -1, u
i
= 1 if b
i-1
+d
i-1
< 0
s
i+1
= 0, u
i
= -1 if b
i-1
+d
i-1
0
Case b
i
+d
i
= 0 s
i+1
= 0, u
i
= 0
Case b
i
+d
i
= 1 s
i+1
= 0, u
i
= 1 if b
i-1
+d
i-1
< 0
s
i+1
= 1, u
i
= -1 if b
i-1
+d
i-1
0
Case b
i
+d
i
= 2 s
i+1
= 1, u
i
= 0
Z
i
= C
i
+ S
i

end

Proof of algorithm: First we will show that the numerical value of the result is equal to the
addition of two input operands, proof of correctness. Second, digits of the result are always in
the set {-1, , 0, , 1,- ,- } { ,- }, proof of validation.
Proof of correctness: We have to show that X + Y = Z.
Since X+Y = = Z.
From the algorithm, any cases satisfy that a
i
+ c
i
= 2r
i+1
+ t
i
and b
i
+ d
i
= 2s
i+1
+ u
i
for all 0 s i
s n.
Proof of validation: This is to demonstrate that z
i
e {-1, , 0, , 1,- ,- } { ,- }. From
the algorithm, Z
i
= C
i
+ S
i
where C
i
= [r
i
, s
i
] and S
i
= [t
i
, u
i
]. The problems become if t
i
+ r
i
and
s
i
+ u
i
are in the set {-1, 0, 1}. Consider three cases of t
i
: Case 1: t
i
= 0. It is easy to see that r
i

can be -1, 0 or 1. Case 2. t
i
= 1. This is realized only if a
i-1
+ c
i-1
< 0, then a
i-1
+ c
i-1
= -1 or -2.
This leads to the case that r
i
must be -1 or 0. Case 3. t
i
= -1. This comes from a
i-1
+ c
i-1
0,
then a
i-1
+ c
i-1
= 0, 1 or 2. It is thus r
i
= 0 or 1. We can then conclude that t
i
+ r
i
is always in {-
1, 0, 1}. Similar to the proof of t
i
+ r
i
, it is also obtained that s
i
+ u
i
is in {-1, 0, 1}.

Algorithm 2: Normalization
input FIRS Z Z = Z
n
Z
n-1
Z
0
where Z
i
{-1, , 0, , 1,-, -} {, -}
output FIRS F F = F
n
F
n-1
F
0
where F
i
{-1, , 0, , 1,-, -}
Begin
for each i
Z
i
= 2R
i+1
+ S
i
(look-up Table 2.)
F
i
= R
i
+ S
i

end

Proof of algorithm: By the same way as the proof of intermediate algorithm, the proof will
be composed of two steps: proof of correctness and proof of validation.
Proof of correctness: We need to show that the input Z and the output F have the same
numerical value.
Since Z = =
= F.
Proof of validation: This is to show that F
i
is in the set {-1, , 0, , 1,-, -}. From the
algorithm, it is obvious that Z
i
= 2R
i+1
+ S
i
for all 0 s i s n. The proof is separated into four
cases as follows:
Case 1. Z
i
= or . Consider Z
i
= , if R
i+1
= and S
i
= -, this implies that R
i
must be in {,
0, , , 1}. It is thus F
i
= R
i
+ S
i
must be in {-1, 0, ,-, -}. The case Z
i
= is similar to this
case.
Case 2. Z
i
= 1. Consider all cases that R
i+1
= 0 and S
i
= 1, this implies that R
i
e{-, 0, -1, }. It
is then F
i
= R
i
+ S
i
must be in {1, 0, , -}. For all the cases that R
i+1
= and S
i
= -, then R
i

e{, 0, , }. It is then F
i
= R
i
+ S
i
must be in {0, -, -}. For other cases of Z
i
= 1, the proof is
similar to this proof.
Case 3. Z
i
= 0. There are only two cases that R
i+1
and S
i
are not zeroes. The first case, both of
Z
i-1
and Z
i-2
are , it is then R
i+1
is and S
i
is -1-. This implies that R
i
= . That is R
i
+ S
i
= -1.
The second case, both of Z
i-1
and Z
i-2
are -, it is then R
i+1
is - and S
i
is 1+. This implies that
R
i
= -. That is R
i
+ S
i
= 1.
G00138
March 23-26, 2010
734

Table 2.Look-up table for R
i+1
and S
i
.

Z
i
= Z
i
=
Z
i-1
Z
i-2
Z
i-3
R
i+1
S
i
Z
i-1
Z
i-2
Z
i-3
R
i+1
S
i

,, any any
-
,, any any
-
0 any 0 any
-1 1
1 Z
i-2
- or Z
i-3
- -1 Z
i-2
- or Z
i-3
-
Otherwise 0 Otherwise 0

Z
i
=1 Z
i
= and i 0 (mod 2)
Z
i-1
Z
i-2
Z
i-3
R
i+1
S
i
Z
i-1
Z
i-2
Z
i-3
R
i+1
S
i

-1,- , any any
0 1
- any
-1
,- -1,- , any 1,,- any any
,- 0 0 any
0 Z
i-2
any - any
- - - 1,,- any
- - 0 -
any
-
-1,-,

any any
1
- Z
i-3
- 0 - any
- - any
-
0 Zi-2 - any
- Z
i-3
- -1,-, any
1,,- any any
1 -1
- 0
,- 1,,- any - 0 else
,- 0 - - - any 0
0 any Zi-2 - any -

Z
i
=0 Z
i
= and i 1 )mod 2(
Z
i-1
Z
i-2
Z
i-3
R
i+1
S
i
Z
i-1
Z
i-2
Z
i-3
R
i+1
S
i

any
-1-

1,,- any any
-1
- - - 1+

- -
Otherwise 0 0 0 any
Remark: The case that Z
i
is negative, the
value of R
i+1
and S
i
are obtained by inverse
of the sign
-1,-,

any any
1
0 Z
i-2
any
Z
i-2
- or Z
i-3
- -
-
- Z
i-2
or Z
i-3
0

G00138
March 23-26, 2010
735
Case 4. Z
i
= . The proof can be considered as two cases depended on the parity of i; even or
odd. Case 4.1. i = even. Consider the case that R
i+1
= and S
i
= -1, this implies that R
i
must be
in {-, 0, , 1}. It is then F
i
= R
i
+ S
i
must be in {-1, 0, , -}. The case that R
i+1
= 0 and S
i
= ,
this implies that R
i
must be in {-, -}. It is then F
i
= R
i
+ S
i
must be in {0, }. The proof of
other cases is similar to this case.
Case 4.2. i = odd. Consider the case that R
i+1
= and S
i
= -1, this implies that R
i
must be in {-
, 0, , 1}. It is then F
i
= R
i
+ S
i
must be in {-1, 0, , -}. The case that R
i+1
= and S
i
= -, this
implies that R
i
must be in {, , }. It is then F
i
= R
i
+ S
i
must be in {0, -, -}. The proof of
other cases is similar to this case.

Example 1
Addition of two intervals, 1(-1)01 and 1(-)(-1)(-) can be performed as follows.

Solution: Addition can be parallel realized, column-to-column, using Algorithm 1 and 2
serially, as shown in Figure 1.

1

-1

1
0

-
1

-1

-

-

-1

0
0

0
-

0
0

-

0

0 -1 0 -

-

0

0
0

0
-1

0
0

0
-

0

0
-

1

0

0 -1 0 -

- 1

Figure 1. Addition of two FIRS

The output result is 010(-1)0(-)(-)1. Since 1(-1)01 = [51,86], 1(-)(-1)(-) = [-82,5]
and 00(-1)0(-)(-)1 = [-31,91], the result is thus correct.

4. CONCLUSION
We propose an algorithm for performing parallel addition for flexible interval
representation system. Theoretical results illustrate that our algorithm can be performed by
considering only three digits of neighbours comparing with the previous algorithm [5] which
needs six digits.

REFERENCES
1. Goldberg, D., What Every Computer Scientist Should Know About Floating-Point
Arithmetic, Computing Survey, 1991, 23 (1), 5-78.
G00138
March 23-26, 2010
736
2. Moore, R.E., Interval Analysis, Prentice Hall, Englewood Cliffs, 1996.
3. Skeel, R., Round off Error and the Patriot Missile, SIAM News, 1992, 25.
4. Thienprapasith, P., and Surarerks, A., A Flexible Interval Representation and its
Fundamental Arithmetic Operations, 5
th
International Conference in Information
technology and Applications, 2008.
5. Sukontarach, J., and Surarerks, A., Parallel Addition and Subtraction for the Flexible
Interval Representation system, International Conference on Theoretical and
Mathematical Foundations of Computer Science, 2009, Orlando.
G00139
March 23-26, 2010
737
Web Spam Recognization by EDGE Label

W. Wongsarasin
1,A
, A. Rungsawang
2,B
, and A. Surarerks
1,C

1
Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
2
Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, 10900, Thailand

E-mail:
A
g50wwg@cp.eng.chula.ac.th,
B
arnon@mikelab.net,
C
athasit@cp.eng.chula.ac.th;
Fax: 02-2186955; Tel. 02-2186959

ABSTRACT
Link farm technique is constructed with the intention of misleading search engines in
order to increase some web page (or web spam) rank scores appeared in search engine
results. On the other hand, web spam can decrease the quality of search engine results
which is one of the major problems in search engine research domain. Several
researches focus on web spam detection; those are link-based spam detections, content-
based spam detections and combination of both techniques. Our work is to study the
characteristic of structure of links of a link farm. Web Graph is selected to be a
mathematical model in this work. The objective is to extract some characteristic of web
spam from the graph. It is remarked that a rank score of pages mostly depend on two
interesting features, in/out-degrees and the number of reciprocity links. Some previous
works demonstrated that a logarithmic function is found to work better for
classification of link structures in Web Graph. We then start by introducing a mapping
function for transforming the Web Graph into its homomorphism graph. Our mapping
contains PageRank score, TruncatedPageRank score and two edge label functions;
logarithmic function of degrees and reciprocity generating function. Our mathematical
model of link farm is expressed in a graph grammar. The dataset used in this paper
obtains from Yahoo research center (11,402 hosts in domain.uk). Our experimental
results shows that 98.51% of the one fifth highest score spam pages can be recognized
by our function.

Keywords: Web Graph, Web spam, Link farm, Graph grammar.

1. INTRODUCTION
Nowadays most people surf internet for searching information, knowledge, news, shopping
catalogs, etc. Many popular search engines such as Google, Yahoo and AltaVista are
developed to help users for searching information on internet. Researchers attempt to improve
algorithms for retrieving the most relevant results with respect to submitted queries. Normally
users surf on the web pages with high rank score (i.e., appeared in the top of search results.)
Thus, the commercial web pages desire to receive high rank score. There are some techniques
that attempt to increase the rank score of web pages even if these web pages are not relevant
to the search keyword. This kind of web page is called a Web spam.
The web spamming techniques can be categorized into two groups: Boosting techniques
and Hiding techniques [1]. Recently, most of web spam is usually created by boosting
techniques. Boosting techniques can be classified into two groups: Content spam and Link
spam. Content spam technique (term spam) can increase the rank result by distorting content
of the web pages to match with the most of the queries. Meanwhile, Link spam technique
attempts to build a group of web pages having densely connected hyperlinks in order to
increase rank score of target page.
Several researchers focus on how web spam can be detected. For instance, optimal link
farms are studied by Y. Du et.al [2]. The technique is to express the structure that can
G00139
March 23-26, 2010
738
maximize the rank score. The previous work [3] also shows that 94.78% of web spam can be
recognized by PageRank and TruncatedPageRank scores. In this work, we combine the both
scores together with the notion of describing a web spam structure especially boosting pages
using graph grammar inference technique. We then start by introducing a mapping function
for transforming the Web Graph into its homomorphism graph. Precisely, we propose a
mapping function named an edge labeling technique for identifying hyperlinks in the Web
Graph. In order to detect web spam, we then study the characteristic of edge labels of
spammed webs and of normal pages. Our experimental results shows that 98.51% of the one
fifth highest score spammed pages can be recognized by our function.
The remainder of this paper is organized as follows: in Section 2, we give some
preliminary and related works. The detail of edge labeling technique is illustrated in Section
3. Section 4 reports the experimental results. Finally, we conclude this paper in Section 5.

2. PRELIMINARY AND RELATED WORKS
For better understand, in this section, we first give related definitions and notations used in
this paper. We then give a short review of related works on web spam detection.
2.1 WEB GRAPH
Web Graph is a graph model for representing webs in the internet. Many researchers [4, 5]
usually analyze webs structure by Web Graph. The following is the definition for Web Graph.

Definition: A Web Graph is a direct graph, G = (V, E) where V denotes a finite set of nodes
representing web pages and E correspondent to a finite set of edges that representing
hyperlinks (or links) between them at least one edge. An edge (x, y) e E express that a page x
has a hyperlink to page y with no self loop.
Hyperlink containing in each page that points into itself is called in-link and that point to
other page is called out-link. The hyperlink interchanging between two pages is called
reciprocity-link. A number of in-links and out-links of each page are called in-degree and out-
degree respectively. Web pages pointing into a considering page are called in-neighbors. All
pages that are pointed from a considering page are called out-neighbors.
2.2 LINK FARM
A link farm is a densely connected set of web pages, created with the purpose of deceiving
a link-based ranking algorithm. Each web page in a link farm has a lot of edges that point to
others in that link farm. The web page being in a link farm is called spam page, otherwise is
called normal page.

Figure 1. Link farm model.
In Figure 1 proposed in [1], there are three types of pages in a link farm: inaccessible,
accessible and own page. The first is a page that spammer cannot modify any link. For the
second, the spammer can add or change any link to point into own link farm. Most of
accessible pages are web board or blog. Finally, each page in the own link farm can be
considered as a full-content-controlled web page. It is remarked that own link farm is
Own Link Farm Accessible Inaccessible
.
.
.
.
.
.
Boost
Boost
Boost
Target
.
.
.
G00139
March 23-26, 2010
739
composed of two types of pages: target pages and boost pages. Target page means a page that
spammer aims to increase its rank score, meanwhile boost page means a page which is used to
increase score to target pages.
2.3 RANKING ALGORITHMS
This section recalls some classical ranking algorithms. PageRank [6] is a ranking
algorithm used to calculate rank scores for all web pages on the Web Graph model. The
following is the formal definition for PageRank score.

Definition: PageRank score is a rank score assigned to all web pages on the Web Graph. The
PageRank score can be defined as follows:
N N
v R
c u R
u
B v v
) 1 ( ) (
) (
o
+ =

e
,
where R(u) is a PageRank score of page u,
R(v) is a PageRank score of page v,
N
v
is an out-degree of page v,
B
u
is an in-degree of page u,
N is the number of all web pages on Web Graph, and
is a damping factor (usually = 0.85.)
TruncatedPageRank [7] is a modified version of PageRank that decreases the influence of
a page to the PageRank score of its neighbors.

2.4 WEB SPAM DETECTION
There are many researches in web spam detection. In 2006, Becchetti et al. [5] applied a
statistical analysis for spam detection. They also used several metrics such as degree
correlations, number of neighbors with different distance from the target host, rank
propagation through links, TrustRank etc. They analyzed the performance of each used metric
and the performance of combining several metrics. The results of detection are tested on .uk
domain. Using this approach, they are able to detect 80.4% of web spam and 1.1% of false
positive. In the same year, Ntoulas et al. [8] focused on content analysis such as number of
words in page or page title, average length of words, amount of anchor text, independent n-
gram likelihoods etc. They studied the effectiveness of web spam classification algorithm
using different features. Their result is that 86.2% of spam pages can be detected, while
misidentifying percentage is 3.1. Recently, Castillo et al. [9] present a spam detection system
combining link-based and content-based features. They demonstrated three methods of
incorporating the Web Graph topology into the predictions resulted for their classifier. The
first technique is to cluster host graph and assign the label of all hosts in each cluster by
majority vote. The second is to propagate the predicted labels to neighboring host. The last
one uses the predicted labels of neighboring host as new features and retrains the classifier.
The experimental results showed that their best classifier can detect 88.4% of spam hosts, but
the false positive rate is 6.3%. Moreover in 2008, Chobtham et al. [4] proposed a graph
grammar for recognizing web spam structure. Their result illustrated that 99% of spam hosts
are described by their graph grammar.

3. LINK FARM GRAPH GRAMMAR
Spammers intend to create a link farm in order to boost rank score of the target pages. We
found that the structure of link farm usually needs a lot of boosting pages. This can be
possibly considered as certain characteristic of link farm structure. To simplify the
experiment, in sprit of using Web Graph, Host Graph (i.e., each node in the graph defines a
host) will be used to represent our model. Our concept is to explain a link farm structure as a
graph grammar of link farm of hosts. A mapping function from Web Graph to an
G00139
March 23-26, 2010
740
isomorphism graph must be defined. Precisely, hyperlinks in Web Graph should be
systematically labeled. In this paper, we propose edge labeling techniques in order to
recognize host spam structures. We are interested in boosting characteristic of spam structures
by applying the two link-based features which are in/out-degree and the number of reciprocity
links. Some previous works [9] experimented that transformation by using logarithm function
were found to work better for classification comparing to a linear function. So our edge
labeling techniques contain PageRank function (E
1
), TruncatedPageRank function (E
2
),
logarithmic function of degrees (E
3
) and reciprocity generating function (E
4
). Our edge
labeling functions as shown in Figure 2 are composed of two functions defined as follows:

E
i
= 1 ; if E
t,i
> E
n,i
0 ; otherwise

where for all 1 s i s 4,
E
j,1
= PageRank score of node j
E
j,2
= TruncatedPageRank score of node j
E
j,3
= log(in-degree of node j) / log(out-degree of node j)
E
j,4
= reciprocities link of node j (back link / out-degree)
j e {t (target node), n (neighbor node)}

Figure 2. Edge labeling function.
It is noticed that our edge label functions are binary encoding which can generate sixteen
patterns of edge label in Web Graph. In our grammar, we are only interested in four patterns
as stated in Figure 3.

Figure 3. Four patterns of edge label.
To characterize the grammar of link farm, we now propose a graph grammar illustrated by
Figure 4 that can describe the structure of link farm. The edge label in the proposed grammar
is defined only on the boosting nodes (b) which have a directed out-link to the target nodes (t)
and on all out-neighbors (o). The experimental result shows that link farm can be
distinguished from normal hosts by using at least one of two situations: the first is that E
1
= 1,
E
2
= 1 and 0 s r
i
s 1 or r
i
> 42 where r
i
is defined by (1),

( )
( )
3
2
P f
P f
r
i
i
i
= (r
i
is ratio of number of production P
2
and P
3,
) (1)
or the second, more than 50% of in-links have the same values of E
3
and E
4
. This can be used
to identify spam hosts. The final grammar is expressed as Figure 4.
<1,1,0,0>
H
d

<1,1,0,1>
<1,1,1,0>
<1,1,1,1>
H
d

H
d

H
d

H
s

H
s

H
s

H
s

H
d
H
s

<E
1
, E
2
, E
3
, E
4
>
E
t,1
, E
t,2
, E
t,3
, E
t,4
E
n,1
, E
n,2
, E
n,3
, E
n,4

where H
d
is destination host
H
s
is source host

G00139
March 23-26, 2010
741

Figure 4. Graph grammar for recognizing spam hosts.

4. WEB SPAM RECOGNIZATION
In our experiment we used dataset from Yahoo Research in domain .uk [10] that crawled
on June 4th, 2007. They contain 11,402 different hosts and labeled types of host by team of
volunteers. Label type of host can be separated into three groups: normal, spam, and
undecided hosts as shown in Table 1.

Table 1. Number of normal, spam and undecided hosts

Type Frequency Percent (%)
Normal 4,948 43.40
Spam 674 5.91
Undecided 5,780 50.69

We focus on only the spam hosts classified in the one fifth highest PageRank score hosts.
By using our grammar of link farm, 98.51% of spam hosts can be recognized. Comparing to
our previous work, our grammar can increase the number of accepted spam hosts.

S

T

B
T T
B
B b
T t
P
1

P
3

P
4

P
6

<1,1,i,j>
<1,1,i,j>
where i, j e {0,1}
S is starting node
T is target node
B is boosting node
O is out-neighbor node
b, o and t is terminal node

T O
T
P
2

<1,1,i,j>

O o
P
5

G00139
March 23-26, 2010
742
5. CONCLUSIONS AND DISCUSSION
In this paper, we propose an edge labeling function for recognizing web spam structures.
We use a mapping function for transforming the Web Graph into its homomorphism graph.
The experimental result on Yahoo dataset demonstrates that 98.51% of the one fifth highest
PageRank score spam hosts can be recognized by our function. In order to increase the
recognition rate, the notion of node labeling function should be introduced to the graph
grammar. This is our future work.

REFERENCES
1. Z. Gyongyi and H. Garcia-Nolina, Web Spam Taxonomy, In the Proceedings of the 1st
International Workshop on Adversarial Information Retrieval on the Web, 2005.
2. Y. Du, Y. Shi and X. Zhao, Using Spam Farm to Boost PageRank, AIRWEB07.
3. W. Wongsarasin, A. Rungsawang and A. Surarerks, Link Farm Formalization by Graph
Grammar, KST 2009, Burapha University, Chonburi, Thailand, July 24-25, 2009.
4. K. Chobtham, A. Surarerks, A. Rungsawang, Formalization of Link Farm Structure using
Graph Grammar, AINA, Japan, March 25-28, 2008, 904-911.
5. L. Becchetti, C. Castillo, and D. Donato, Link-Based Characterization and Detection of
Web Spam, AIRWEB06, August 10, 2006, Seattle, Washington, USA.
6. L. Pages, S. Brin, R. Motwani, and T. Winograd, The PageRank Citation Ranking: Bring
Order to the Web, Technical report, Stanford Digital Libraries, 1998.
7. L. Becchetti, C. Castillo, and D. Donato, Using Rank Propagation and Probabilistic
Counting for Link-Based Spam Detection, WebKDD 2006.
8. A. Ntoulas, M. Najork, M. Manasse and D. Fetterly, Detecting Spam Web Pages through
Content Analysis, WWW 2006, May 23-26,2006, Edinburgh, Scotland
9. C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri, Know your Neighbors:
Web Spam Detection using the Web Topology, SIGIR07, Amsterdam, The Netherlands,
23-27 July 2007, 423-430.
10. C. Castillo, D. Donato, L. Becchetti, and P. Boldi, A Reference Collection for Web Spam,
SIGIR Forum, 40(2), December 2006, 11-24.

G00140
March 23-26, 2010
743
Enhanced Image Watermarking Using Adaptive Pixel
Prediction and Local Variance

Prat Nudklin
1
and Thumrongrat Amornraksa
2

1
Multimedia Communication Laboratory,
Computer Engineering Department, Faculty of Engineering,
King Mongkuts University of Technology Thonburi, Bangkok, Thailand
E-mail: prat.nud@kmutt.ac.th, t_amornraksa@cpe.kmutt.ac.th; Fax: +662-872-5050; Tel. +662-470-9083

ABSTRACT
This paper proposes an adaptive original pixel prediction to enhance the watermark
retrieval performance of the image watermarking method proposed in [8].
Conceptually, our proposed method adaptively removes one or more surrounding
pixel(s) around the predicted pixel, depending on a given threshold derived from the
local variance of the nearby image pixels, to improve the accuracy in the original pixel
prediction process. The experimental results show the improvement in term of the
accuracy of the retrieved watermark, compared to the previous retrieval method in [8].
The robustness against various types of attacks is also evaluated and compared.

Keywords: Digital image processing, Digital watermarking, Adaptive pixel prediction.

1. INTRODUCTION
Digital watermarking is a method used to provide an electronic proof of ownership and/or
receipt in distributed copies of digital media. Essentially, decent watermarking methods
should provide efficient watermark retrieval even if the watermarked media is intentionally
attacked. Apart from imperceptibility, reliability, security and robustness decent
watermarking method should provide a blind detection property, so that watermark recovery
can be achieved without the need of original image. More details on efficient digital
watermarking including its general requirements can be found in [1,2]. In the past, various
efficient watermarking methods have been developed and proposed. They can be divided into
two categories depending on the domain of watermark embedding i.e. in spatial and
frequency domains. Spatial domain watermarking was formerly developed by directly
embedding the watermark into image pixels e.g. [3,4], while the frequency one was later
developed by embedding the watermark into image coefficients in transformed domain e.g.
[5,6]. The main advantage gained from the spatial domain based approach is that the
embedding methods used are more straightforward and less computationally expensive than
the ones using transforms, while the frequency domain based approach normally offers a
higher degree of robustness, especially against most image compression schemes e.g. JPEG
compression standard.

In this research, a spatial domain image watermarking technique based on the modification
of image pixels was considered due to its ability to convey large amount of watermark within
a host image, compared to many other watermarking methods. For instance, M. Kutter et al.
[7] proposed a method to embed a watermark bit into an image pixel in blue channel by
modifying that pixel using either additive or subtractive, depending on the watermark bit, and
proportional to the luminance of the embedding pixel. Their blind watermark retrieval process
was achieved by using a prediction technique based on a linear combination of pixel values in
cross-shape neighbor around the embedded pixels. This watermarking method was further
developed to improve the watermark retrieval performance by balancing the watermark bits
around the embedding pixels, tuning the strength of the embedding watermark in accordance
G00140
March 23-26, 2010
744
with the nearby luminance, and reducing the bias in the process of predicting the original
image pixel from the surrounding watermarked pixels [8]. The authors also demonstrated in
that the developed watermarking scheme can embed a watermark image (logo) into a color
image having the same resolution i.e. by embedding a mn watermark bits into a mn image
pixels. However, based on their third technique, where the reduced bias was obtained by
removing one surrounding pixel that most differs from the predicted pixel, we observed that if
the variation of the seven remaining watermarked pixels is still high, the probability to
correctly retrieve the watermark will be slightly improved. This situation can frequently occur
when the host image contains lots of high frequency components.

In this paper, we hence propose a new watermark retrieval method based on the adaptive
pixel prediction and local variance in order to maximize the watermark retrieval performance.
Precisely, our proposed method adaptively removes one or more surrounding pixel(s) around
the predicted pixel, depending on a given threshold derived from the local variance of the
nearby image pixels.

2. THE PREVIOUS WATERMARKING METHOD [8]
In the watermark embedding process, a unique binary bit-stream is generated and considered
as a watermark w(i,j) {1,-1} to be embedded into an image. It is then permuted, using XOR
operation, with a pseudo-random bit-stream generated from a key-based stream cipher to
improve the balance of w around (i,j), and the security of the embedded watermark. The
watermark embedding is performed by modifying the blue component at a given coordinate
(i,j), in a line scan fashion. Note that the blue component is selected to be watermarked
because it is the one that human eye is least sensitive to [7]. The modifications of the blue
component in each pixel B(i,j) are either additive or subtractive, depending on w(i,j), and
proportional to the modification of luminance of the embedding pixel L(i,j) = 0.299R(i,j) +
0.587G(i,j) + 0.114B(i,j). Due to the fact that changes in high luminance pixels are less
perceptible to the human eye, the luminance value is hence considered and used for tuning the
strength of watermark, so that more energy of watermark can be added to achieve a higher
level of robustness. Notice that the modification of luminance L(i,j) is obtained from a
Gaussian pixel weighting mask [8]. The watermarked pixel B(i,j) is expressed as equation
(1).

) , ( ' ) , ( ) , ( ) , ( ' j i sL j i w j i B j i B (1)

where s is a watermark signal strength and considered as a scaling factor applied to the whole
image frame. In practice, s must be carefully selected to obtain the best trade-off between
imperceptibility and robustness. In the watermark retrieval process, the embedded watermark
can be recovered based on two assumptions. First, any pixel value within an image is close to
its surrounding neighbors, so that a pixel value at a given coordinate (i,j) can be predicted by
the average of its nearby pixel values. Second, the summation of w around (i,j) is close to
zero, so that the embedded bit at (i,j) can be predicted by the following equation

) ) _max _max, ( ' ) , ( ' (
8
1
) , ( ' ) , ( '
1
1
1
1

m n
n m B n j m i B j i B j i w
(2)

where w(i,j) is the estimation of the embedded watermark w around (i,j), and B(m_max,
n_max) is a neighboring pixel around (i,j) that most differs from B(i,j). Since w(i,j) can be
either 1 and -1, the value of w(i,j) = 0 is set as a threshold, and its sign is used to estimate the
value of w(i,j). That is, if w(i,j) is positive (or negative), w(i,j) is 1 (or -1, respectively).
Notice that the magnitude of w(i,j) reflects a confident level of estimating w(i,j).
G00140
March 23-26, 2010
745
3. THE PROPOSED RETRIEVAL METHOD
Reasonably, from the equation (2) it can be seen that the large variation in pixels around
(i,j) can cause bias in the original pixel prediction process, which results in an erroneous
original pixel value. However, we can minimize this bias by removing all surrounding pixels
around (i,j) that most differs from the remaining. For instance, in case of having two extreme
pixel values within the eight surrounding pixels, those two pixels should be removed from the
prediction process. Based on the first assumptions previously made, the prediction accuracy
of an original pixel depends mainly on its nearby pixel values. In other words it can be said
that a small difference in variance (
2
) between the original surrounding pixels and the
watermarked surrounding pixels gives a higher accuracy of the original pixel prediction than a
large one. Therefore, if the variance of eight pixels around B(i,j),
B

2
, is much different
than that of eight pixels around B(i,j),
B

2
, one pixel around B(i,j) that affects most to the
variance should be removed in order to make the new variance close to
B

2
. Based on this
concept,
B

2
is thus referred to as a prediction threshold in the original pixel prediction
process. Since in the watermark retrieval, the blind detection is considered and the real value
of
B

2
cannot be determined. In this research, we hence used the variance of the local area
around (i,j) in the watermarked image as
B

2
. Accordingly, from the above discussions, we
hence propose a new watermark retrieval method based on the removal of surrounding
pixel(s) that most affects to its variance, compared to the threshold derived from the
watermarked images local variance. If one surrounding pixel around (i,j) is already removed
and the resultant variance of the remaining pixels around (i,j) still be much different from the
threshold, we will remove another surrounding pixel that most affects to its variance again.
The process continues until the resultant variance of the remaining pixels around (i,j) is close
enough to the threshold. However, the maximum number of pixel removal is limited to 4
pixels to sustain the number of surrounding pixels left for the prediction process. The steps of
our proposed watermark retrieval method are as follow:
1. To predict an original pixel from the watermarked image at coordinate (i,j), the eight
surrounding pixels around (i,j) are stored in the neighbor array.
2. Sort out the pixel values in the neighbor array, in order from the smallest value to the
largest one.
3. Compare the variance of neighbor array to the local variance of the watermarked
image. If the neighbor array variance is higher, remove one pixel at the first or last
position in the neighbor array. The pixel to be removed depends on the new variance
obtained after removing it. That is, if removing the pixel at the first position gives the
new variance less than removing the pixel at the last position, the pixel at the last
position is removed.
4. Repeat step three until
B

2
is lower than the threshold or four surrounding pixels are
removed.

G00140
March 23-26, 2010
746
The pseudo code to implement our proposed watermark retrieval method is given below.

In all experiments, four 256256 pixels color images having various characteristics,
Bird, Fish, Lena and Tower, were used as original testing images, while a 256256
pixels black & white image containing a logo CPE 2009 was used as a watermark i.e. by
considering the black color pixel as -1, and white as 1. To evaluate the quality of the
watermarked images, PSNR (Peak Signal-to-Noise Ratio) was considered and used. After
retrieving the embedded watermark, its quality was evaluated by NC (Normal Correlation),
which is defined as follows.

M
i
N
j
M
i
N
j
M
i
N
j
j i w j i w
j i w j i w
NC
1 1
2
1 1
2
1 1
) , ( ' ) , (
) , ( ' ) , (
(3)

where w(i,j) and w(i,j) are the original and the retrieved watermark bits at pixel (i,j),
respectively. First of all, we illustrated the relationship between the quality of watermarked
image and the signal strength used to embed the watermark. The PSNR values obtained from
the watermarked testing images at various signal strengths were averaged, and the results are
given in Table 1.
Table 1. The average PSNR value at various signal strengths.

S 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13
PSNR (dB) 34.32 32.99 31.86 30.83 29.92 29.11 28.352 27.66

According to the results obtained, we selected the signal strength of 0.09 that gave the
optimum performance between the quality of the watermarked image and the retrieved
watermark. i.e. average PSNR 30.83 dB and the average NC 0.86. We used this signal
strength for the rest of our experiments. Next, we determined the optimum performance of our
Function adaptive original pixel prediction at coordinate (i,j)
optimum threshold - suitable value derived from image variance
neighbor[] - array of surrounding pixels sorted in order from the smallest value
to the largest one
neighbor size - number of surrounding pixels used for the prediction
criterion1 - (size of neighbor[] >= neighbor size)
criterion2 - variance of neighbor[] > optimum threshold
v1 - variance of neighbor[] after removing one pixel at first position
v2 - variance of neighbor[] after removing one pixel at last position

While (criterion1 and criterion2)
If (v1>v2)
temp[] = neighbor[]*the one without the pixel at first position
Else
temp[] = neighbor[] *the one without the pixel at last position
End if
neighbor[] = temp[]
End while
Return ( sum(neighbor[]) / size of neighbor[] )
G00140
March 23-26, 2010
747
proposed watermark retrieval method. Two factors are needed to implement our method with
the practical watermarking scheme. The optimum threshold used to decide whether a
surrounding pixel around (i.j) that gave most affects to the group variance is removed or not,
and the maximum number of pixels to be removed in the original pixel prediction process. To
determine the optimum threshold, we evaluated the retrieval performance at various local
variances derived from the image area with the window size of mn pixels around the center
pixel B(i,j), where m = n = odd number ranging from 3, 7, 9, Note that B(i,j) is not
included in the calculation of the local variance. In the experiment, we set the window sizes of
image of the image area ranging from 33 pixels to the image size of 256256 pixels. To
determine the number of pixels used in the prediction process, we evaluated the retrieval
performance at various numbers of surrounding pixels. In the experiment, we set the
maximum number of pixel removal from one pixel to four pixels. The average NC values
obtained from four different removal techniques at various window sizes are shown in Fig. 1.
According to the results obtained, it can be clearly seen that on average the best
performance of our retrieval method occurred at around 2121 pixels window size the image
area around (i,j). For the maximum number of pixel removal, on average, removing 1 pixel
obtained the best performance. Hence, based on the results obtained, we selected the optimum
threshold as the local variance derived from the image area with 2121 pixels around (i,j) and
one maximum number of pixel removal as default set up, and used it in the remaining
experiments for performance comparison purpose.

Figure 1. Average NC values obtained from four different removal techniques at
various size of window variance B
2

In the next experiment, we compared the performance of our proposed watermark retrieval
method to the previous method in [8]. Note that, with the same signal strength, the resultant
watermarked image from both comparing methods obtained the same PSNR. The
performance of the two watermark retrieval methods at various signal strengths is shown and
compared in Table 2.

G00140
March 23-26, 2010
748
Table 2. Average NC values between two watermark retrieval methods at
various signal strengths.

Signal strength 0.07 0.08 0.09 0.1 0.11 0.12
Previous 0.851 0.857 0.862 0.868 0.872 0.876
Proposed 0.853 0.860 0.865 0.871 0.875 0.880

It is obvious from the above table that our proposed method significantly outperformed the
previous one, at all signal strength values. To verify the existence of the genius embedded
watermark, a specific NC value must be established and used as a threshold. This is achieved
by computing the NC value between the original watermark and the one retrieved directly
from the non-watermarked images. The highest NC value of 0.686 obtained from this
experiment came from the image Bird. We thus used this NC value as the threshold to
validate the extracted watermark. The plots of the averaged NC values of the retrieved
watermark after being attacked, by JPEG compression standard at various image qualities and
image cropping at various percentages, are shown in Fig. 2-3. Notice that, to extract the
embedded watermark from the cropped watermarked image, the missing parts of the
watermarked image were replaced by the black color pixels.
From the results obtained, the performance of our retrieval method against three types of
attacks was higher than the previous method. Notice that with our proposed method the
retrieval of the valid watermark can still be obtained even if the JPEG compression standard
at 12% image quality was applied.

Figure 2. NC values at various JPEG image qualities.

5. CONCLUSION
In this paper, we have presented a new watermark retrieval method for the image
watermarking based on the modification of image pixels. In brief conclusion, our proposed
method adaptively removes one or more surrounding pixel(s) around the predicted pixel,
depending on a given threshold in order to maximize the watermark retrieval performance.
The experimental results showed the efficiency of our retrieval method i.e. the improved
performance in term of NC was achieved, compared to the previous method in [8].

G00140
March 23-26, 2010
749
REFERENCES
[1] I.J. Cox, L Mathew, M. Miller, A. Bloom, Jessica Fridrich, and Ton Kaller, Digital
Watermarking and Steganography., Morgan Kaufman, Los Altos, CA, USA, 2002.
[2] F.Y.Shih, Digital Watermarking and Steganography Fun.and Techniques, CRC Press,
New York, USA, 2007.
[3] O. Bruyndonckx, J.J. Quisquater, and B. Macq, Spatial method for copyright labeling of
digital images, IEEE Nonlinear Signal Processing Workshop, Thessaloniki, Greece, pp.
456459, 1995.
[4] M. M. Yeung and F. Mintzer, An invisible watermarking techniques for image
verification, IEEE ICIP97, vol. 2, pp. 680-683, 1997.
[5] Y. Wang, J.F. Doherty and R.E.V. Dyck, A wavelet based watermarking algorithm for
ownership verification of digital images, IEEE trans. Image Processing, vol. 11, no. 2,
pp. 77-88, 2002.
[6] M.A. Suhail and M.S. Obaidat, Digital Watermarking Based DCT and JPEG Model,
IEEE trans. on Instrumentation and Measurement, vol. 52, no. 5, pp. 1640-1647, 2003.
[7] M. Kutter, F. Jordan and F. Bossen, Digital Signature of Colour Images using Amplitude
Modulation, Journal of Electronic Imaging, vol. 7, pp. 326-332, 1998.
[8] T. Amornraksa and K. Janthawongwilai, Enhanced Images Watermarking Based on
Amplitude Modulation, Image and Vision Computing, vol. 24, no. 2, pp. 111-119, 2006.
G00141
March 23-26, 2010
750
KML Generator for Visualizing of Numerical Results from
Weather and Ocean Wave Simulation in Google Earth API

S. Chuai-Aree
1,C
, W. Kanbua
2

1
Department of Mathematics and Computer Science, Faculty of Science and Technology, Prince of
Songkla University, Pattani Campus, 181, Rusamilae, Muang, Pattani, 94000, Thailand
2
Thai Marine Meteorology, Bangkok, 10330, Thailand
C

ABSTRACT
KML is a file format used to visualize geographic data in an earth browser, such as
Google Earth, Google Maps. It is based on XML data type. Thai meteorological
department has forecasted daily meteorological numerical simulation in advance for
weather simulation based on MM5 model and ocean wave simulation based on WAM
model. To visualize these numerical simulation results, this paper proposes the software
generating the KML script for gathering available simulated images viewed in Google
Earth. The result can be viewed in web browser and Google Earth. The animations of
weather and ocean wave changing are available on internet for early warning systems.
The animation is daily automatic updated until the latest result of each simulation. The
software is programmed in Delphi programming language. The prototype can be used
for other purposes, such as the tsunami simulation from SiTProS model, virtual cloud
visualization from satellite images from VirtualCloud3D software.

Keywords: KML (Keyhole Markup Language), Google Earth, Google Maps,
Meteorological Numerical Simulation.

REFERENCES
1. Bock, H. G., Chuai-Aree, S., Jaeger, W., Kanbua, W., Kroemker, S., and Siripant, S., 3D
cloud and storm reconstruction from satellite image, Proc. of Intern. Conf. on High
Performance Scientific Computing (HPSCHanoi 2006), March 6-10, Hanoi, Vietnam,
2006.
2. S. Chuai-Aree, W. Kanbua : SiTProS: Fast and Real-Time Simulation of Tsunami
Propagation, AMS2007: Asia Modelling Symposium international conference, Phuket,
Thailand, March 27-30, 2007.
3. Google Earth API, http://code.google.com/apis/earth/

G00143
March 23-26, 2010
751
Power Management for WLAN DAM Environmental
Monitoring System

Pariwat Wongsamran
1
, Kiyomichi Araki
3
, Rachaporn Keinprasit
4
,
Udom Lewlomphaisarl
4
and Teerasit Kasetkasem
2

1
TAIST Tokyo Tech, ICTES Program, Department of Electrical Engineering, Kasetsart University,
2
Department of Electrical Engineering, Kasetsart University, Bangkok 10900, Thailand
3
Department of Electrical and Electronic Engineering, Tokyo Institute of Technology, Tokyo
152-8550, Japan
4
Industrial Control and Automation Laboratory, National Electronics and Computer Technology
Center, Pathumthani 12120, Thailand
E-mail: g5165456@ku.ac.th, araki@mobile.ee.titech.ac.jp, rachaporn.keinprasit@nectec.or.th,
udom.lewlomphaisarl@nectec.or.th, fengtsk@ku.ac.th

ABSTRACT
In general, the wired fiber optic network is a backbone of treditional dam
environmental monitoring system. Nevertheless, in real working environment, the dam
structure comprise of main dam, spillway, and small dams which they are mostly
separated by mountains and forests. Hence, using only fiber optic network for
monitoring system, it will very expensive and has very limited flexibility for
installation and maintenance in the future. Therefore, Wireless Local Area Network
(WLAN) is a great alternative solution that overcome those problems. However, using
WLAN technology in outdoor working environment such dam, mostly it is powered by
DC power source such solar panel rather than connected directly to AC power source.
Hence, it has a consequence problem that is how to management power consumption of
the system in efficiency manner. As a result, this paper proposes simulation of power
management methedologies for WLAN dam environmental monitoring system,
deployed in real working environment at Rajjaprabha Dam, Surat Thani, Thailand. In
order to prove that the power management solution for WLAN dam environmental
monitoring system works effeciency, it will be tested in the worst case scenario as
raining working condition that cannot charge power into battery at such period
properly.

Keywords: Power Management, WLAN, Dam Environmental Monitoring System.

1. INTRODUCTION
Nowadays, the progress of information technology is changing the world of traditional
dam environmental monitoring system. The wired fiber optic is not the only communication
media anymore. The monitoring system is able to communicate via both wired and wireless
network as well. The wireless system can be designed by using WLAN as shown in Figure 1.
It benefits to lower cost and more flexible manner when comparing with the fiber optic
cabling system. Nevertheless, the designed WLAN system for dam environmental monitoring
mostly is used for outdoor working environment. Thus, practically, it needs sufficient
alternative DC power source such as solar panel rather than connected to AC power source as
similar as indoor system. Hence, the power management is a problem for WLAN dam
environmental monitoring system for now. There are many research papers that propose
solutions to overcome this power management problem, for example, application driven
approach as in [4] and [8], network topology design approach as in [7], state machine policy
G00143
March 23-26, 2010
752
approach as in [3], [5], and [6]. Hence, this paper aims to introduce a technique by using the
low power circuit design approach and shows comparison of power consumption in different
scenarios. During system testing, we use a Linksys WRT54GL as a router working under real
working environment condition at the Rajjaprabha Dam, located in Surat Thani, southern of
Thailand.

Solar panel
& Battery
Linksys
WRT54GL
I2C
Transceiver
Temperature
& Humidity
sensor
ADC I/V
Pressure
sensor
12 V
3.3 V
5 V
5 V
4-20 mA
External I2C BUS

Figure 1. Overview WLAN Dam Environmental Monitoring System Design

2. LINKSYS WRT54GL MODEL
According to Linksys WRT54GL model as shown in Figure 2, the power requirement is
12 volts DC at 1 amp. Its processor architecture uses a Broadcom MIPS (Microprocessor
without Interlocked Pipeline Stages) processor, the BCM5352 family which its specification
requires power supply at 5 volts DC. For WiFi component, it has an 802.11 b/g radio chipset,
the BCM2050 which it requires power supply at 1.8 volts DC. Upon entering the WRT54GL,
the incoming power 12 volts at 1 amp is immediately split into two different power regulation
chips which deliver reduced voltage requirements as 5 volts DC at 2 amps to all of the
components to the board, and 3.3 volts DC at 2 amps as an output. Overall, during testing
WRT54GL under moderate load and with wireless set to the default power draws 30
milliamps approximately. Thus, the receive-mode current consumption draws about 110
milliamps at 1.8 volts DC, whereas transmit-mode consumes current 80 milliamps typically at
1.8 volts DC.

G00143
March 23-26, 2010
753

Figure 2. Linksys WRT54GL Block Diagram.

3. EXTERNAL PCB SCHEMATIC DESIGN
The schematic design for I
2
C transceiver module is shown in Figure 3. There are two
components that are an I
2
C real-time clock; DS1307 chip, and an open collector HEX inverter
buffer; HD74LS05P chip. The real-time clock is needed because it does not exist in the
Linksys WRT54GL. The open collector HEX inverter buffer delivers increased voltage as 3.3
volts coming from the WRT54GL to be as requirement as 5 volts to all of component to the
external PCB circuit. In this design, I
2
C transceiver module current consumption draws about
6.6 milliamps at 5 volts DC.

Figure 3. Schematic design for I
2
C transceiver module.
The SHT15 chip is use in temperature and humidity sensor module, thus the schematic
design is shown in Figure 4. This module is used to measure temperature and humidity as
digital data, then, send data over I
2
C bus through the router. In this design, temperature and
humidity sensor module current consumption draws about 550 A at 5 volts DC.
G00143
March 23-26, 2010
754

Figure 4. Schematic design for temperature and humidity sensor module.

Because the pressure data is measured by a pressure sensor as in analog data of current 4-
20 milliamps, then it is converted from current to be voltage by using I/V converter module,
we use four of RCV420 chips for four connection port. Its schematic design is shown in
Figure 5. Then, this analog data must to be converted to digital by using A/D converter. We
use LTC2408 to be an ADC converter that has referent input from LTC1236CS8. Its
schematic design is shown in Figure 6. In this design, RCV420 draws power consumption
about 3 milliamps at 12 volts DC and LTC2408 consumes current 200A at 5 volts DC and
LTC1236CS8 consumes current about 2 milliamps at 12 volts DC.

Figure 5. Schematic design for I/V module.

G00143
March 23-26, 2010
755

Figure 6. Schematic design for ADC module.

In total, testing WLAN Dam environmental monitoring system under moderate load of
external PCB board and with wireless set to the default power draws about 0.565 watts.

Regarding to system requirements, the pressure, temperature and humidity parameters
need to be measured and transmits to server every hour. Normally, in rainy season, the
weather over the Rajjaprabha Dam has raining and cloud for 7 days long. Thus, at that time,
the solar panel cannot generate any power to battery. Hence, the system also requires backup
time about 7 days or 168 hours. Therefore, the system is tested under three scenarios in order
to compare the required battery size and solar panel in different technique.
The first scenario is always on, meaning that every component in the system will be turn
on all day long. Hence, the power consumption for always on scenario is about 0.565 watts-
hour as shown in Figure 7. The second scenario is Wake-on-WLAN thus the system turns on
only CPU and open receive-mode while the other components are turn off. Hence, the power
consumption is about 0.216 watts-hour. Every hour the parameters are needed to be measured,
the main router will send wake-up request to radio receiver in order to awake the system. In
measuring period, the power consumption will increase to be 0.565 watts-hour. After
measuring and send data to server, it will go back turn off mode again. From start to the end
of process, it takes about 5 minutes long which the power consumption for wake-on-WLAN
scenario is shown in Figure 8. The final scenario is scheduling timer which is setting a timer
to awake the system every hour. In this case, the system will turn on only CPU and I
2
C
transceiver module that has real-time clock. Hence, the power consumption is about 0.051
watts-hour. Similarly, every wake-up time it will take 5 minutes approximately, thus, its
power consumption timeline is shown in Figure 9. For solar cell supply in this testing, its
specification is 10 volts open circuit, 1 amp short circuit with charging time is an hour. Thus,
the battery and solar panel requirements are shown in Table 1.

G00143
March 23-26, 2010
756
Power (W)

0.565

Time (hr.)
1 2 3

Figure 7. Scenario 1: Always on

Power (W)

0.565

0.216

Time (hr.)
1 2 3

Figure 8. Scenario 2: Wake-on-WLAN

Power (W)

0.565

0.051
Time (hr.)
1 2 3

Figure 9. Scenario 3: Scheduling Timer

Table 1.Battery and Solar panel Requirements.

Scenario Battery Requirement Solar panel Requirement
Always on 7.91 amp-hour 23.73 watts-hour
Wake-on-WLAN 3.03 amp-hour 9.09 watts-hour
Scheduling Timer 0.72 amp-hour 2.16 watts-hour
G00143
March 23-26, 2010
757

5. CONCLUSION
In short, the power management in efficiency manner is a problem for WLAN dam
environmental monitoring system. There are many solutions that can overcome this problem.
As the system testing, the results proved that the low power circuit design approach with
appropriate policy can reduce power consumption of system significantly. Hence, the cost of
required battery and solar panel also reduce proportionally.

REFERENCES
1. Asadoorlan, P., Pesce, L., and Siles, R., Linksys WRT54G Ultimate Hacking, 1
st
ed,
Syngress Publishing, Inc., Elsevier, Inc., Burlington, 2007.
2. B.Glisic, D.Inaudi, P.Kronenberg and S. Vurpillot, Dam Monitoring Using Long SOFO
Sensor, Hydropower, 1999.
3. Chuan Lin, Yan-Xiang He and Naixue Xiong, An Energy-Efficient Dynamic Power
Management in Wireless Sensor Networks, IEEE ISPDC06, 2006.
4. Eduardo S. C. Takahashi, Application Aware Scheduling for Power Management on
IEEE 802.11, Department of Electrical and Computer Engineering, Carnegie Mellon
University, USA, 2000.
5. Huan Chen and Cheng-Wei Huang, Power Management Modeling and Optimal Policy
for IEEE 802.11 WLAN Systems, Department of Electrical Engineering, National Chung
Cheng University, Taiwan, 2004.
6. Mishra, N., Chebrolu, K., Raman, B., and Pathak, A., Wake-on-WLAN, IW3C2, 2006
7. P. Kronenberg, N. Casanova, D. Inaudi and S. Vurpillot, Dam Monitoring with Fiber
Optics Deformation Sensors, SPIE Conference on Smart Structures and Materials, 1997.
8. Rodrigo M. Passos, Claudionor J.N. Coelho Jr, Antonio A.F. Loureiro, and Raquel A.F.
Mini, Dynamic Power Management in Wireless Sensor Networks: An Application-driven
Approach, IEEE WONS05, 2005.

ACKNOWLEDGMENTS
Technology Development Agency (NSTDA), Tokyo Institute of Technology (Tokyo Tech)
and Center of Promoting Research and Development of Satellite Image Application in
Agriculture, Kasetsart University. Furthermore, this research is informationally supported by
Civil Department, Rajjaprabha Dam, Electricity Generating Authority of Thailand (EGAT).

G00151
March 23-26, 2010
758
Thai Numeric Hand Written Character Recognition by
Counter propagation and Hopfield Neural Network

W. Waiyawut
Department of Computer Engineering, Rajamangala Institute of University, 1381, Piboonsongkram
Road, Bangsue, Bangkok, 10800, Thailand
E-mail: wanapun@hotmail.com; Tel. 0-2913-2424

ABSTRACT
This work aims to compare recognition of handwritten characters by using counter
propagation neural networks (CPN) and Hopfield networks. The most important work
in pattern recognition is handwritten characters. Neural networks recognize character
by learning without any mathematical model. The given inputs is detrmined for the
system output by estimate from already learned patterns. The CPN and Hopfield
networks are robust and fast but if number of testing data set is increase CPN has high
success rate than Hopfield network.

Keywords: Character recognition, counter propagation neural networks, Hopfield
networks.

REFERENCES
1. Hecht-Nielsen R., Counter propagation networks, Applied Optics , pp.7979 - 4984.
2. Hopfield, J. J., Neuralnetwork and physical systems with emergent collective
computational abilities, Proceedings of the National Academy of Science , 1982, pp.1554
- 2558.
3. Hopfield, J. J., Neural computation of decisions in optimization problems, Biol.Cybern,
1985, pp.141 - 152.
4. Kohonen, T., Self-Organization and Associative Memory, second edition., Springer
Verlag, 1988.
5. Grossberg, S., NCompetitive learning: From interactive activation to adaptive resonance,
BCognitive Science, 1987, pp.23 - 63.

G00152
March 23-26, 2010
759
A Novel Hybrid Clustering Method for Customer
Segmentation

Sithar Dorji
1
and Phayung Meesad
2,C

1
2
Department of Teacher Training in Electrical Engineering, Faculty of Technical Education
C
E-mail: sithar.dorji@gmail.com, pym@kmutnb.ac.th

ABSTRACT
Fuzzy c-means and Hierarchical are two of the most popular clustering methods. Real
data, in most cases, cannot be partitioned into well-separated clusters. Fuzzy c-means
clustering allows each data object to belong to more than one cluster with varying
degree of membership. However, the output of Fuzzy c-means clustering depends on
the initial choice of clusters, cluster centers and the degree of membership.
Agglomerative Hierarchical clustering, on the other hand, does not require us to specify
number of clusters in advance. In this paper, we present a novel two-stage hybrid
Clustering Algorithm for segmenting real world data of cellular phone customers. In the
first stage, Agglomerative Hierarchical clustering is carried out to get the initial clusters
and centroids. Using these initial data from the first stage, Fuzzy c-means is run in the
second stage to get the final customer segments. Real customer data is used for the
proposed method.

Keywords: Customer Segmentation, Hierarchical Clustering, Fuzzy c-means, Cluster
analysis

1. INTRODUCTION
The telecommunications industry generates and stores a tremendous amount of data. The
amount of data is so great that manual analysis of the data is difficult, if not impossible. Data
mining technology maybe the solution to this problem and for this reason the
telecommunications industry was an early adopter of data mining technology [1]. By using
data mining, Telecom companies can get the customer segmentation model to identify the
groups of customers with the similar characteristics which will help understand customers.
Eventually it will be used as the fundamental of customer analysis and marketing strategy
development [2]. The most popular data mining technique for customer segmentation is the
cluster analysis [3].
Fuzzy c-means and Hierarchical are two of the most popular clustering methods. Most often,
in real data, the data objects cannot be partitioned into well-separated clusters, and there will
be certain arbitrariness in assigning an object to a particular cluster. In such cases, it might be
more appropriate to assign a weight to each object and each cluster that indicates the degree to
which the object belongs to the cluster [4]. However, the output of Fuzzy c-means depends on
the initial choice of clusters, cluster centers and the degree of membership. Agglomerative
Hierarchical clustering, on the other hand, does not require us to specify number of clusters in
advance. Main disadvantage of Agglomerative Hierarchical clustering is that it does not know
when to terminate. Thus discarding the weak points and adopting the advantages of these two
clustering methods, we present a novel two-stage hybrid Clustering Algorithm for segmenting
real world data of cellular phone customers. In the first stage, Agglomerative Hierarchical
clustering is carried out to get the initial clusters and centroids. Using these initial data from
the first stage, Fuzzy c-means is run in the second stage to get the final customer segments.
Real customer data of Bhutan Telecom Ltd.s mobile customer is used for the proposed
method.
The remainder of this paper is organized as follows. Section 2 contains the theory and
literature related to our hybrid method. Section 3 describes the methodology of our algorithm.
G00152
March 23-26, 2010
760
Section 4 has the experimental results of our algorithm and Section 5 draws the conclusion of
our research.

Agglomerative Hierarchical Clustering
Agglomerative hierarchical clustering is one of the two basic hierarchical clustering
methods (the other being the divisive hierarchical clustering). It is relatively old method
compared to many clustering algorithms, but they still enjoy widespread use [4].
Agglomerative hierarchical clustering method starts with each data object as a separate
cluster. The closest pair of clusters is merged at successive iteration, by satisfying some
similarity criteria, until all of the data objects are in one cluster.
The algorithm for the agglomerative clustering is as follows [4]:
1. Assign all data points as individual clusters
2. Compute the proximity matrix
3. repeat
4. Merge the closest pair of clusters based on the proximity matrix
5. Update the proximity matrix to reflect the proximity between the new cluster and the
original clusters
6. Until only one cluster remains.
Hierarchical clusters presented in the form of trees called dendrograms are of great interest for
many application domains. They provide a view of data at different levels of abstraction.
Fuzzy Clustering
In hard clustering, each data object belongs to exactly one cluster. In fuzzy clustering,
each data object can be associated with more than one cluster with varying degree of
membership. Since the concept of fuzzy sets was introduced, fuzzy clustering has been widely
discussed, studied, and applied in various areas [8].One of the most widely used fuzzy
clustering algorithms is the Fuzzy C-Means (FCM) or Fuzzy K-means. The fuzzy k-means
algorithm is an extension of the k-means algorithm for fuzzy clustering.
The basic FCM algorithm is as given below [4].
1. Select an initial fuzzy pseudo-partition
2. Repeat
3. Compute the centroid of each cluster using the fuzzy pseudo-partition.
4. Recompute the fuzzy pseudo-partition
5. Until the centroids dont change.
Related Works
Bernard Chen et al [10] have effectively carried out hybrid hierarchical agglomerative-k-
means to cluster microarray gene expression data. According to them, the hybrid method
outperforms the single k-means algorithm and also dealt with outliers better. K.Thammi
Reddy et al [9] carried out two stage hierarchical-k-means algorithm for document clustering.
They also reported better results. R.J. Kuo [3] applied Hybrid SOM+Genetic K-Means for
Freight transport industry market segmentation. Sheng-Chai Chi and Chih Chieh Yang [11]
used Ant colony SOM + K-Means for clustering on public data sets (iris, wine, optical data).
Thus it is increasingly becoming clearer that none of the approaches, by themselves, are
sufficient and that the application of various techniques will allow different aspects of data to
be explored [10].

3. METHODOLOGY
Research Data
This research used the call detail records (CDR) data of post-paid customers of Bhutan
Telecom Ltd.s cellular mobile service for the month of July, August and September, 2009.
The CDR is then summarized for each customer with the following features, (1) Average calls
out/day, (2) Average calls in/day, (3) Different numbers called, (4) Average call time in, (5)
G00152
March 23-26, 2010
761
%Daytime call(9am-5pm), (6) Average sms out/day, (7) Average sms in/day, (8) %Weekday
calls (Monday - Friday), (9) % offnet call (call to competitors subscribers), and (10) %
International Call.

The Algorithm

Figure 1. Flowchart of the new hybrid algorithm

The new algorithm is designed combining Hierarchical Agglomerative and Fuzzy c-
means clustering algorithms. The Hierarchical Agglomerative clustering method is used to
find the initial clusters and cluster centroids. This saves the fuzzy c-means from random
initialization. Fuzzy c-means algorithm uses the cluster formed at hierarchical clustering stage
as its initialization. Initialization and termination on cluster centers was adopted rather than on
conventional membership matrix. This was done for the convenience of the new hybrid
algorithm and also according to [12], initializing and terminating on V rather than on U is
advantageous as it is convenient, faster and has lower memory usage.

G00152
March 23-26, 2010
762
4. EXPERIMENTAL RESULTS AND DISCUSSION
Our algorithm was applied to the CDR data of cellular mobile post-paid customers of
Bhutan Telecom Ltd. For Hierarchical Agglomerative method, we used Euclidean distance
and Average linkage method for distance calculation. Data was normalized by mapping each
row's means to 0 and deviations to 1. For Fuzzy c-means, value of m (the membership
exponent) was set to 2, maximum tolerance to 0.001 and maximum iteration to 100.
Number of cluster in not known in advance, so the optimal number of clusters was obtained
using the validation methods viz, Partition Coefficient (PC) , Classification Entropy (CE),
Partition Index(PI), Separation Index (SI) and Xie and Benis Index (XB). Elbow Criterion
was used to find the optimal cluster. The elbow criterion says that one should choose a
number of clusters such that adding another cluster does not add sufficient information.
Partition Coefficient (PC)
0
0.5
1
0 2 4 6 8 10 12 14 16
Cluster No.
P
C

Partition Index (PI)
0
0.2
0.4
0.6
0 2 4 6 8 10 12 14 16
Cluster No.
P
I

Separation Index (SI)
0
0.0001
0.0002
0.0003
0 2 4 6 8 10 12 14 16
Cluster No.
S
I

Xie and Beni's Index (XB)
0
1
2
3
4
0 2 4 6 8 10 12 14 16
Cluster No.
X
B

Figure 2. Values of PC, CE, PI, SI and XB against number of clusters

Figure 2, shows the plots of values of PC, CE, PI, SI and XB against number of clusters.
No single validation method is reliable by itself. Therefore all the validation indexes are
shown. Optimum numbers of clusters are found by comparing the values of all indexes.
Partition with less clusters are considered better when the differences between the values of a
G00152
March 23-26, 2010
763
validation index are minor. Partition Coefficient and Classification Entropy values suggest
that the optimum number of clusters is 9 but as mentioned earlier we do not take these values
into consideration. However, the Partition Index, Separation Index and Xie and Benis Index
values suggest the optimum number of clusters to be 4. Therefore, we consider the optimum
number of clusters to be 4.
Table 1. Cluster Results.

Features 1 2 3 4 5 6 7 8 9 10
Average 8.63 6.15 121.09 104.15 57.75 1.10 0.35 75.35 7.21 4.24
Clusters
1 17.17 12.01 226.43 73.70 60.88 1.61 0.53 76.27 8.06 2.76
2 8.82 6.60 129.33 85.09 59.16 1.21 0.41 75.68 7.18 2.80
3 4.34 3.25 66.43 81.03 58.74 0.51 0.16 75.79 6.84 3.22
4 4.19 2.76 62.18 176.77 52.21 1.06 0.29 73.67 6.75 8.18

Table 1 shows the segmentation results. The feature numbers are described as follows. (1)
Average calls out/day, (2) Average calls in/day, (3) Different numbers called, (4) Average call
time in seconds, (5) %Daytime call(9am-5pm), (6) Average sms out/day, (7) Average sms
in/day, (8) %Weekday calls (Monday - Friday), (9) % offnet call (call to competitors
subscribers), and (10) % International Call.
The clusters thus formed are explained as follows:
Cluster 1: In this cluster, the customers send and receive relatively higher number of
calls then the rest. They also have more contacts than others. They make calls more often
during the daytime and during the weekdays. SMS usage is higher than others. However their
call duration is relatively lesser and they also make much less international calls than others.
Cluster 2: Here, the customers send and receive more than average number of calls and
sms. They also have more than average number of contacts. Their call duration is less than
average. They make less than average calls to competitors customers. They also make a
fewer international calls.
Cluster 3: These customers make fewer calls and sms (less than average). They also have
less than average number of contacts. Their calls to competitors customers are relatively less
and they make slightly more international calls than those in the first 2 clusters but less than
average.
Cluster 4: Customers in this segment make the most international calls. Their call
duration is also very high compared to the rest. They make relatively more calls during
evening and on weekends. They make fewer calls and send/receive sms. They have much less
number of contacts

5. CONCLUSION
In this paper we presented the new hybrid algorithm to segment post-paid customers of
cellular mobile service of Bhutan Telecom Ltd. The hybrid algorithm used two of the most
popular clustering algorithms viz., Hierarchical Agglomerative and Fuzzy C-means
algorithms in a two stage fashion. In the first stage hierarchical agglomerative algorithm was
carried out to get the initial cluster and cluster centroids. Fuzzy c-means was used in the
second stage to get the final clustered output. Total number of clusters or segments of
customers were found to be 4 based on various validation methods. The outcome of this
research is expected to assist Bhutan Telecom Ltd. in applying targeted marketing strategies.

G00152
March 23-26, 2010
764
REFERENCES
1. Gary M. Weiss, Data Mining in Telecommunications, O. Maimon and L. Rokach(eds.),
Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners
and Researchers, Kluwer Academic Publishers, pp 1189-1201, 2005
2. Corinne Baragoin et al., Mining Your Own Business in Telecoms Using DB2 Intelligent
Miner for Data, IBM Corporation, September 2001
3. R.J. Kuo, Y.L. An, H.S. Wang and W.J. Chung, Integration of Self-Organizing Feature
Maps Neural Network and Genetic K-means Algorithm for Market Segmentation",
International Journal of Expert Systems with Applications, 30(2), pp.313-324, February,
2006.
4. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining,
Addison Wesley, 2006.
5. S.M.H. Jansen, Customer Segmentation and Customer Profiling for a Mobile
Telecommunications Company Based on Usage Behavior, University of Maastricht
(UM), Netherlands, 2007
6. Qining Lin, Mobile Customer Clustering Analysis Based on Call Detail Records,
Communications of the IIMA, Volume 7 Issue 4, pp 95-100, 2007.
7. A. Vellido, P.J.G Lisboa, and K Meehan Segmentation of the on-line shopping market
using neural networks, Expert Systems with Applications. Vol. 17, pp 303-314, 1999.
8. Guojun Gan,Chaoqun Ma, Jianhong Wu, Data Clustering Theory, Algorithms,and
Applications, Society for Industrial and Applied Mathematics, 2007
9. K.Thammi Reddy, M.Shashi, and L.Pratap Reddy Hybrid Clustering Approach for
Concept Generation, IJCSNS International Journal of Computer Science and Network
Security, VOL.7 No.4, April 2007
10. Bernard Chen, Phang C. Tai, R. Harrison and Yi Pan, Novel Hybrid Hierarchical-K-
Means Clustering Method (H-K-means) for Microarray Analysis, Proceedings of the
2005 IEEE Computational Systems Bioinformatics Conference Workshops (CSBW05),
2005
11. Sheng-Chai Chi, Chih Chieh Yang, A Two-stage Clustering Method Combining Ant
Colony SOM and K-means, Journal of Information Science and Engineering, Volume
24. pp 1445-1460, 2008
12. Mahdi Amiri, http://ce.sharif.edu/~m_amiri/project/yfcmc/index.htm, last accessed on 22,
January 2010.
G00156
March 23-26, 2010

765
Development of Mobile Robot Based on Differential Drive
Integrated with Accelerometer

Surachai Panich
Mechanical Engineering, Srinakhariwirot University,
Ongkharak, Nakornnayok, Thailand
E-mail: surachap@swu.ac.th

ABSTRACT
This paper proposed mainly a fusion between encoders and accelerometer for distance
increment with Kalman filter to estimate robots position and heading. A developed
differential encoder system with integrated accelerometer is analyzed and experimental
tested in square shape. Applying the Kalman filtering theory, we successfully fused
differential encoders and accelerometer to obtain improved position and heading angle
estimation. Finally, the experimental result and simulation present the different
trajectory generated by only differential encoders and differential encoders integrated
with accelerometer.

Keywords: Accelerometer, Encoder, Sensor Data Fusion.

1. INTRODUCTION
The easiest way of making a mobile robot go to a particular location is to guide it. This is
usually done by burying an inductive loop or magnets in the floor, painting lines on the floor,
or by placing beacons, markers, barcodes etc. in the environment. Such Automated Guided
Vehicles (AGVs) are used in industrial scenarios for transportation tasks. They can carry
several thousand pounds of payload, and their positioning accuracy can be as good as 50mm.
AGVs are robots built for one specific purpose, controlled by a pre-supplied control program,
or a human controller. Modifying them for alternative routes is difficult, and may even require
modifications to the environment. AGVs are expensive to install or to modify after
installation, because they require specially prepared environments. It would be desirable,
therefore, to build robot navigation systems that would allow a mobile robot to navigate in
unmodified (natural) environments. As said before, the fundamental competences of
navigation are localization, path planning, map-building and map-interpretation. If we want to
build an autonomous navigating robot that operates in unmodified environments, we would
have to anchor all of these competences in some frames of reference, rather than following
a pre-specified path. In robotics, typically such a frame of reference is either a Cartesian (or
related coordinate system, or a Landmark-based system. The former uses dead reckoning
strategies based on odometry, the latter sensor signal processing strategies to identify and
classify landmarks. There are also hybrid models have elements of both dead reckoning and
landmark recognition. Because of the odometry drift problems it is rare to find mobile robot
navigation systems based solely on dead reckoning. It is more common to find navigation
systems that are fundamentally based on dead reckoning, but that use sensory information in
addition. One good example of a robot navigation system purely based on a Cartesian
reference frame and dead reckoning is the use of certainty grids (Elfes, 1987). As the robot
explores its environment, estimating its current position by dead reckoning, more and more
cells are converted into free or occupied cells, dependent on range sensor data, until the
entire environment is mapped. Occupancy grid systems start with an empty map, which is
completed as the robot explores its environment. They therefore face the problem that any
error in the robots position estimate will affect both the map-building and the map--
interpretation. One can get round some of these difficulties by supplying the robot with a
ready-made map (Kampmann & Schmidt, 1991). Their robot MACROBE is supplied with a
G00156
March 23-26, 2010

766
2D geometric map before it starts to operate. MACROBE then determines free space and
occupied space by tessellating the environment into traversable regions, and plans paths using
this tessellation. To be able to do this, the robot needs to know precisely where it is.
MACROBE does this by a combination of dead reckoning and matching the map against
precise range data obtained from a 3D laser range camera.

2. SENSORS SYSTEM AND DATA FUSION
This section describes the background of the developed sensors system for mobile robots
position and heading estimation.
2.1 The differential encoders system
The basic concept of differential encoder is the transformation from wheel revolution to
linear translation on the floor. This transformation is affected with errors by
- Wheel slippage,
- Unequal wheel diameter, inaccurate wheel base distance,
- Unknown factor and others.
The real increment translation of wheel encoder is prone to systematic and some non-
systematic error. Optical incremental encoders are widely used in industries for shaft angular
displacement measurement. Normally the encoder is mounted on the motor shaft, and one has
to power the motor in order to acquire and observe the encoder signals.

Figure 1.Mobile robot navigation in global coordinate

From equations (1)-(5) hold true when wheel revolutions can be translated accurately into
linear displacement relative to the floor. One can calculate the integration of incremental
motion information over time using the information from shaft encoders. But since the
encoder measurement is based on the wheel shaft, it leads inevitably to unbounded
accumulation of errors if wheel slippage occurs. Specifically, orientation errors will cause
large lateral position errors, which will increase proportionally with the distance traveled by
the robot. As described in the previous paragraph, one can estimate the position and heading
angles of the mobile robot using the feedback information from the two encoders on the left
and right wheels of mobile robot shown in figure 1. The distance and heading increment can
be obtained as follows:

, ,
2
en en
R k L k en
k
+
=
d d
d (1)
, ,
en en
R k L k
k
k
=
d d
B
(2)

Then the position (X, Y, ) can be estimated as

X
Robot

Y
Robot

X
Global

Y
Global

G00156
March 23-26, 2010

767
1 ,
en
k k x k +
= + X X d , (3)
1 ,
en
k k y k +
= + Y Y d (4)
1 k k k +
= + (5)
, where

,
cos
en en
x k k k
= d d g
,
sin
en en
y k k k
= d d g

2.2 Accelerometer
An accelerometer measures the acceleration it experiences relative to freefall. This is
equivalent to inertial acceleration minus the local gravitational acceleration, where inertial
acceleration is understood in the Newtonian sense of acceleration with respect to a fixed
reference frame, which the Earth is often considered to approximate. As a consequence, quite
counter-intuitively, an accelerometer at rest on the Earth's surface will actually indicate 1 g
upwards along the vertical axis. To obtain the inertial acceleration (due to motion alone), this
gravity offset must be subtracted. Along all horizontal directions, the device yields
acceleration directly. Conversely, the device's output will be zero during free fall, where the
acceleration exactly follows gravity. This includes use in an earth orbiting spaceship, but not a
(non-free) fall with air resistance, where drag forces reduce the acceleration until terminal
velocity is reached, at which point the device would once again indicate the 1 g vertical offset.
The reason for the appearance of a gravitational offset is Einstein's equivalence principle,
which states that the effects of gravity on an object are indistinguishable from acceleration of
the reference frame. When held fixed in a gravitational field by, for example, applying a
ground reaction force or an equivalent upward thrust, the reference frame for an
accelerometer (its own casing) accelerates upwards with respect to a free-falling reference
frame. The effect of this reference frame acceleration is indistinguishable from any other
acceleration experienced by the instrument. For the practical purpose of finding the
acceleration of objects with respect to the Earth, such as for use in an inertial navigation
system, knowledge of local gravity is required. This paper selects the ADXL202, which is a
complete dual axis acceleration measurement system on a single monolithic IC. It contains a
polysilicon surface - micromachined sensor and signal conditioning circuitry to implement
open loop acceleration measurement architecture. For each axis, an output circuit converts the
analog signal to a duty cycle modulated (DCM) digital signal that can be decoded with a
counter/timer port on a microprocessor. The ADXL202 is capable of measuring both positive
and negative accelerations to a maximum level 2 g. The accelerometer measures static
acceleration forces such as gravity, allowing it to be used as a tilt sensor. The sensor is a
surface micro machined polysilicon structure built on top of the silicon wafer. Polysilicon
springs suspend the structure over the surface of the wafer and provide a resistance against
acceleration forces. Deflection of the structure is measured using a differential capacitor that
consists of independent fixed plates and central plates attached to the moving mass. The fixed
plates are driven by l80 out of phase square waves. Acceleration will deflect the beam and
unbalance the differential capacitor, resulting in an output square wave whose amplitude is
proportional to acceleration. Phase sensitive demodulation techniques are then used to rectify
the signal and determine the direction of the acceleration. The output of the demodulator
drives a duty cycle modulator (DCM) stage through a 32 k resistor. At this point a pin is
available on each channel to allow the user to set the signal bandwidth of the device by
adding a capacitor. This filtering improves measurement resolution and helps prevent
aliasing.

G00156
March 23-26, 2010

768

Figure 2.ADXL Module

The ADXL can generate distance increment in X and Y direction,
. acc
x
d and
. acc
y
d .

2.3 Data Fusion Algorithm
Consider an algorithm to comprise of two sensors, which are accelerometer and encoders. The
distance increment from accelerometer from previous section is given as
.
,
acc
x k
d in X-
coordinate and
.
,
acc
y k
d in Y-direction. To comprise of two sensors corrupted by measurement
error, the optimum estimate (Gelb, 1974) can be expressed as follows

2 2
2, 1,
1, 2, 2 2 2 2
1, 2, 1, 2,
k k
k k k
k k k k
| | | |
= + | |
| |
+ +
\ . \ .
X Z Z

g g (6)
, where
1,k
Z ; The measurement from sensor 1 with error variance
2
1,k

2,k
Z ; The measurement from sensor 2 with error variance
2
2,k

Then the information from encoders and accelerometer can be combined by means of the
following equations:

In X-direction:

2 2
- ., - , , .
, , , 2 2 2 2
- , - ., - , - .,
x acc k x en k en acc en acc

x k x k x k
x en k x acc k x en k x acc k
| | | |
= + | |
| |
+ +
\ . \ .
d d d

g g (7)

In Y-direction:

2 2
- ., - , , . .
, , , 2 2 2 2
- , - ., - , - .,
y acc k y en k en acc en acc

y k y k y k
y en k y acc k y en k y acc k
| | | |
= + | |
| |
+ +
\ . \ .
d d d

g g (8)
, where

, .
,
en acc
x k
d : Estimated distance increment in X-direction,
, .
,
en acc
y k
d : Estimated distance increment in Y-direction,
,
en
x k
d : Increment distance from encoder in X-direction,
,
en
y k
d : Increment distance from encoder in Y-direction,
.
,
acc
x k
d : Increment distance from accelerometer in X-direction,
G00156
March 23-26, 2010

769
.
,
acc
y k
d : Increment distance from accelerometer in Y-direction,
2
- ., x acc k
: Variance of increment distance from accelerometer in X-direction,
2
- ., y acc k
: Variance of increment distance from accelerometer in Y-direction,
2
- , x en k
: Variance of increment distance from encoder in X-direction,
2
- , y en k
: Variance of increment distance from encoder in Y-direction.

The resulting estimation based on data fusion is passed to Kalman filter loop and then the
position
k
P and heading
k
are finally estimated shown in figure 4.

3. EXPERIMENTAL AND RESULTS
The experiment was carried out and is presented in this section to demonstrate the
feasibility, accuracy and performance of the proposed system. The experiment focused on the
observation of the position accuracy of the differential encoders system after combination
with the accelerometer.

Figure 4.Block diagram of the proposed system

Figure 5.Simulation of mobile robots position proposed system in X-coordinate with Matlab
software.
Differential
Encoder
System
,
en
R k
d
,
en
L k
d
Data Fusion
Algorithm
,
en
x k
d
,
en
y k
d
Accelerometer
Sensor
k

.
,
acc
x k
d

.
,
acc
y k
d

Navigation
System with
Kalman Filter

, .
,
en acc
x k
d
k
P
k

, .
,
en acc
y k
d
G00156
March 23-26, 2010

770
The results will be compared and discussed. We start with calibration the
accelerometer and controlled the vehicle along a square shape for 3x3 m
2
with start point (0,
0). The trajectory was estimated based on shaft encoders from start point (0, 0) point (0, 3)
point (3, -3) point (0, -3) end point (0, 0). Figure 5 shows the recorded experimental
results. The estimation error from fusion algorithm between encoders and accelerometer at
end point in the X-coordinate was 23.5 cm, and in figure 6 shows the estimation error in Y-
coordinate was 32 cm. We see that the estimation with sensor fusion can improve
performance of mobile robots localization.

Figure 6.Simulation of mobile robots position proposed system in Y-coordinate with Matlab
software.

4. CONCLUSION
This paper deals with mainly a fusion between encoders and accelerometer for distance
increment with Kalman filter to estimate robots position and heading. Experimental results
confirm that our system based on sensor data fusion improves the accuracy and confidence of
position estimation. Our experiments show that when accelerometer is used on mobile robots,
carefully calibrated are very effective in minimizing errors. In summary, we conclude that
accelerometer have good potential to be useful in mobile robot positioning, especially as part
of a multi-sensor system.
G00156
March 23-26, 2010
771
REFERENCES
1. Alberto Elfes, Sonar-Based Real-World Mapping and Navigation, IEEE Journal of Robotics
and Automation, RA Vol. 3, Issue 3, pp. 249-265, June 1987.
2. Peter Kampmann and Guenther Schmidt, Indoor Navigation of Mobile robots by Use of
Learned Maps, in [Schmidt 91], pp. 151-169.
3. Borenstein, J., & Feng, L. (1996). Measurement and Correction of Systematic Odometry
Errors in Mobile Robot. IEEE Trans. on Rob. and Autom., 12(6), 869-880.
4. Flgueroa, F., & Mahajan, A. (1994). A Robust Navigation System for Autonomous Vehicles
Using Ultrasonics. Control Engineering Practice, 2(1), 49-59.
5. Haykin, Simon (1996). Adaptive Filter Theory (3rd ed.). Englewood Cliffs NJ: Prentice Hall,
Inc.
6. Iyengar, S.S., & Prasad, L. (1995). HLA MIN, Advances in Distributed
7. Sensor Technology. Englewood Cliffs NJ: Prentice Hall, Inc.
8. Kim, J.H., & Seong, P.H. (1996). Experiments on Orientation Recovery and Steering of
Autonomous Mobile Robot Using Encoded Magnetic Compass Disc. IEEE Trans. on Instrum.
And Meas. 45(1), 271-273.
9. Precision Navigation. INC. (1996). Vector 2X Compass Module, Application Notes, Version
1.03, January.
10. Gelb (1974). Applied Optimal Estimation. MIT Press, Massachusetts Institute of Technology
Cambridge, Massachusetts 02142

G00160
March 23-26, 2010

772
Credit Application Classification: A Case Study of National
Pension and Provident Fund of Bhutan

Kinzang Wangdi
1,C
, Akara Prayote
2
, Utomporn Phalavonk
3
1
2
Department of Computer and Information Science, Email: akarap@kmutnb.ac.th
3
Department of Mathematics, Email: upv@kmutnb.ac.th
1518 Pibulsongkram, Bangsue, Bangkok, Thailand 10800
C
Email: kinzang_w@hotmail.com

ABSTRACT
Since the introduction of member financing schemes such as Housing and Education loan by
the National Pension and Provident Fund (NPPF) to its members. The numbers of applicants
are growing every year and granting loans itself have become important decision for the Loan
Officials to avoid potential risk in future. The credibility of current evaluation criteria such as
repayment capacity is at stake as it has been observed that the occurrence of repayment
defaulters and Non Performing Loan (NPL) are increasing and there are a number of cases that
the mortgages have to be ceased. This study investigates different models to classify loan
applicants. An evaluation of classification efficiency is conducted on F-measure. We are now
able to construct models which correctly classify applicants at 96.5%, with 80.4% true positive
detection of bad borrowers.
Keywords: Classification, Loan Applicantion, Multilayer Perceptron, SMOTE, Segmentation
1. INTRODUCTION
Granting of loans by National Pension and Provident Fund (NPPF) has become important
decision for loan officers as number of applicants are increasing every year. It is a general perception
that the capability to distinguish good and bad loan applications correctly by human beings are rather
poor, since human decisions are sometimes bias and not able to draw inference from large historical
data [1].
Credit risk evaluation decision support systems plays important role to enable faster credit
decisions. Credit scoring is mostly commonly used technique for evaluating the credit worthiness of
loan applicants [1-4]. There are many different types of classification algorithms used to evaluate
credit risk such as Logistics regression, linear and quadratic discreminant analysis; linear
programming; Support vector machines; Decision trees and rules; Bayesian network classifiers;
Neural network; K-nearest neighbor classifiers [5]. As compared to statistical methods such as Linear
Discriminant and logistic regression, artificial intelligence like neural networks and genetic
algorithms achieved higher prediction accuracy [6]. However, inability to explain reason for the
decision made by neural networks is deterrent to enhance transparency of credit scoring which is the
main factor for a successful deployment. In [6] using Evolutionary-Neuro-Fuzzy algorithms, the
transparency of decision system was achieved.
However, there is no single algorithm that performs best for all datasets across different
domains that can classify into good or bad applications. An algorithm may be best on some particular
datasets, but it will perform worse on different datasets. It is necessary to study a number of different
algorithms to find the best one [8,9].
In this paper, we study five classification algorithms, namely Multilayer Perceptron, Radial
Basis Function Networks, Sequential Minimal Optimization, J48 and Random Forest, with NPPF
dataset. Performance in classifying good or bad borrowers of the five models is compared using F-
measure. We are more interested in classifying bad borrowers correctly or the true positive detection.
G00160
March 23-26, 2010

773
The rest of the paper is organized as follows. Section 2 classification algorithms, Section 3
Dataset and Preprocessing, Section 4 experimental results and discussions and Section 5 Conclusion.
2. CLASSIFICATION ALGORITHMS
2.1 Multilayer Perceptron
Artificial Neural Networks (ANNs) are mathematical models of neurons to simulate the
functioning of human brains [4]. The most important features of ANNs are their ability to learn. Just
like human brain, ANNs can learn by example and dynamically modify themselves to fit the data
presented. Moreover, ANNs models are also able to learn from noise, distorted, or incomplete
sample data. ANNs are highly robust with respect to data distributions (non-parametric) and no
assumptions are made about relationship between variables since, it does not depend upon any
assumptions regarding the data.

Figure 1. Multilayer Feedforward Neural Network.

A Multilayer Perceptron (MLP) is a network consists of a number of highly interconnected
simple computing units called neurons, nodes, which are organized in layers. Each neuron performs
simple task of information processing by converting received inputs into processed outputs [4,5,7].
Figure 1 shows the architecture of a three-layer feedforward neural network that consists of neurons
organized in layers such as input layers, hidden layers and output layers. Details explanations on how
MLP is used for classification is found in [1,5]. In order for the network to work as a classifier, it has
to go through the process of learning, i.e., is to learn how to classify a certain input. The back
propagation algorithm is a commonly used training ANNs algorithms. Gradient decent is a widely
known optimization algorithm for non-linear unconstrained optimization problems.
The useful properties and capabilities of ANNs are nonlinearity, adaptability, and can be
generalized. However, most neural networks results are Black box kind or inability to explain
reasoning for the decision made.
2.2 Radial Basis Function Network
The radial basis function (RBF) is a special type of neural networks is a popular type of network
that is very useful for pattern classification problems. A RBF networks consist of three layers, namely
the input layer, hidden layer, and the output layer. The information of the input neurons is transferred
to the hidden layers neurons. Then RBF in the hidden layers responses to input information, and then
generate outputs in the neurons of the output layers. The advantage of the RBF network is that the
hidden neurons can have non-zero response provided minimum function is in the range of pre-defined
input values; otherwise, the response will be zero. Time required to train RBF networks is
proportional to the numbers of neurons [10].
G00160
March 23-26, 2010

774
2.3 J48
The J48 is a simple Decision tree classifier. In this algorithm, in order to classify a new item, it
first needs to create a decision tree based on the attribute values of the available training data. It uses
the fact that each attribute of the data can be used to a decision by splitting the data into smaller
subsets. It examines the information gain (difference in entropy) that results from choosing an
attributes for splitting the data. Decision is made based on the attributes with the highest normalized
information gain. Then, the algorithm recurs on the smaller subsets until all the instances in subsets
belong to the same class which is called leaf node. If none of the features gives any information gain
then it creates a decision node higher up in the tree using the expected values of the class. The
advantages of J48 algorithm is that it can handle both continuous and discrete attributes, training data
with missing attributes values, and attributes with differing cost. Moreover, it provides an option for
pruning tree after creation.
2.4 Random Forest
Random forest is a class of ensemble methods designed mainly for decision tree classifiers.
Random Forest combines the predictions made by multiple decision trees. Each tree is generated
based on the values of an independent set of random vector. The main features of Random forest are
that it runs efficiently on large databases, it can handle thousands of input variables, it gives estimates
on what variables are important, and it has methods for balancing error in class population of
unbalanced dataset [11].
2.5 Sequential Minimal Optimization
Sequential Minimal Optimization (SMO) is a new support vector machine (SVM) learning
algorithm. Unlike numerical Quadratic Programming (QP), SMO chooses to solve the smallest
possible optimization problems at every step [12]. The advantage of SMO is that it can solve two
lagrange multipliers analytically, thus avoiding numerical QP optimization. Moreover, very large
SVM training problems can fit inside of the memory of an ordinary personal computer or workstation
since SMO does not required extra matrix storage at all. No matrix algorithms are used in SMO
therefore, it is less susceptible to numerical precision problems.

3. DATASET AND PREPROCESSING
3.1 Dataset
The Dataset contained borrowers information of NPPF for the last five years (i.e 2004-2009).
The classification of borrowers is based on the characteristics, the total loan applied, collateral, and
repayment capacity. All the attribute names and values are changed to meaningless symbols to protect
confidentiality of NPPFs borrower information. The last column/attribute is the outcome of the
application, (i.e g means good application where as b means bad application). We collected 8539
instances of borrowers information, of these, 8162 (95.6%) instances are good applications, and 356
(04.4%) instances are bad applications. In additions, 194 instances are discarded due to missing
attribute values. We used 16 different attributes to study the characteristics of good and bad
applications. A borrower who defaulted more than times per year is considered as bad applications
and who defaulted three and below is considered as good applications.
The study is to develop a model to detect bad applications. The risk of predicting bad
applications as good applications is higher than the risk of predicting good applications as bad
applications. In [14] explain that mistakes due to predicting a bad applications as good applications
will cost lost of principal loan amount, lost interest amount, lost administration fee, incur legal fee,
insurance coverage and property taxes.

G00160
March 23-26, 2010

775
3.2 Data Preprocessing
The data is preprocessed before it is fed into classification algorithm. We used MLP algorithms
which required discrete or continuous numerical values, categorical data is needed to translate into
numerical values. We used 1-out-of-N encoding technique to translate categorical data into numerical
values [15]. The numerical data is then normalized between ranges of 0 to 1 as required by MLP
model. The normalization used in this paper is called rescaling, and was done by finding the
maximum value of particular numerical attributes and then dividing each of the instances of attribute
by this maximum.
4. EXPERIMENTAL RESULTS AND DISCUSSIONS
4.1 A Comparison of Classification Algorithms
The five algorithms are run separately to construct classification models. Evaluation is
measured by true positive(TP), false positive(FP), true negative(TN), false negative(FN), accuracy,
precision, recall, and F-measure, as shown in Table 1.

Table 1. An experimental result of five classification models.
Algorithms TP FN FP TN Accuracy Precision Recall F-Measure
MLP 4 363 14 8158 95.585 0.222 0.011 0.021
RBFNetwork 0 367 0 8172 95.7021 0 0 0
SMO 0 367 0 8172 95.7021 0 0 0
J48 0 367 0 8172 95.7021 0 0 0
RandomForest 3 364 17 8155 95.5381 0.15 0.008 0.016

From table 1, RBFNetwork, SMO and J48 are slightly better than MLP and RandomForest in
terms of accuracy. However, true positive shows that only MLP can detect some bad borrowers
followed by RandomForest, while other three failed to detect any bad borrowers. Refer table 1, only
MLP and RandomForest has small values of precision, recall and f-measure which are not
satisfactory while other algorithms have 0 values. From the experimental results indicates that all
the algorithms is very poor in detecting bad borrowers. Therefore, the classification problems are
due to imbalanced data.
The imbalance problems occur when one class has more instances than the others, thus
reducing classification performance on the minority classes. The class that contains more instances
as compared to other class is called the majority class and class that contains fewer instances is
called minority class. A study in [13] shows the effect of the imbalance data problem on
classification performance. However, in this experiment, among five classification algorithms, the
maximum true positive is detected by MLP (i.e. 4 bad borrowers). So, we selected MLP algorithms
for further study.
4.2 MLP with SMOTE
The dataset consists of 95.6% of good borrowers (majority class) and only 4.4% consists of bad
borrowers (minority class) which are clearly imbalanced dataset. So, common data solutions for
imbalanced datasets are under sampling and over sampling. It is mentioned in [13] that in under
sampling, some instances of majority class are excluded so that the ratio between majority and
minority classes are more balanced where as for over sampling, data is balanced by adding some
more instances to the minority classes by using a popular sampling method for numerical data called
SMOTE (Synthetic Minority Over-sampling Technique).
SMOTE generates synthetics minority instances to over-sample minority class that lie on the
boundary between majority class instances and minority class instances [13]. The advantages of
SMOTE is that it makes the decision region larger and less specific. In [14], SMOTE is fully
integrated into a standard booting procedure, thus improved the prediction of the minority class while
not sacrificing the accuracy of the whole testing set.
G00160
March 23-26, 2010

776
We applied supervised instances SMOTE technique to synthetically increase the bad borrowers.
The bad borrowers instances have increased to 734 while instances of good borrowers are same. The
new dataset consist of 8906 instances of which 8172 are good borrowers. In other words, 8.2% of
dataset consists of bad borrowers. We conducted experiment using MLP algorithm from Weka
3.6.1.The empirical result shows that the overall accuracy has decreased to 91.3%. However, this
model was able to detect 49 instances of bad borrowers correctly (i.e 6.6% correct classification of
bad borrowers) which is better than the earlier model. Precision has increased from 0.222 to 0.353,
recall increased from 0.011 to .067 and F-measure from 0.021 to 0.112.
4.3 MLP with SMOTE and Segmentation
MLP with SMOTE has improved the classification accuracy of bad borrowers from merely 1%
bad borrowers to 6.6%. However, this result is not satisfactory has it could detects only 49 bad
borrowers out of 734 bad borrowers. In order to solve bad borrowers classification problems, we
implemented data segmentation.
Data segmentation is also a solution to the imbalanced data problem. In this technique
researches can pay more attention on target segments and ignore other segments. However,
segmentation technique differs amongst researches. In [9] discussed that an alternative method of
learning minority classes on imbalanced dataset is one-class learning. The characteristics of rare class
are recognized by learning only minority class rather than involving the majority class.
We have applied two segmentation schemes that are 50% and 25% of the whole dataset. We
want to increase the concentration of minority class (bad borrowers) in the dataset. Firstly we
segmented 50% of dataset from 4.1 dataset by using Weka unsupervised instances remove percent
filter. The new dataset contains 4453 instances, of which 4021 are good borrowers and 432 are bad
borrowers (i.e 9.7% of bad borrowers). Similarly, we conducted experiment using MLP algorithm.
The empirical result shows that the overall result has increased to 96.1%. Moreover, this model could
detect 303 true positive instances (i.e 70.1% of bad borrowers) which is far better than earlier two
models.
We further segmented 25% of whole dataset to increase more concentration of bad borrowers.
Total dataset has now decreased to 2225 out of these 1844 as good borrowers and 382 as bad
borrowers. The concentration of bad borrowers has increased to 17.2%. Similar experiment is
conducted and the result shows that there is slightly increase of overall accuracy from 96.1% to
96.3%. This model is able to detect 307 true positive instances (i.e 80.4% of bad borrowers) which is
better than earlier three models.

Figure 2. Accuracy of overall, good, bad vs Model Figure 3. F-Measure of good, bad vs Model
Figure 2 is a graft of classification accuracies versus models. As indicated in the graft, the
accuracy of good borrowers (average of 99.29%) is more than the overall accuracy (average
94.81%) by all the models. The accuracy of bad borrowers has steadily increased from 1% to 80.37%
using different models. Model 4 is the best model to detect bad borrowers from NPPF dataset.
G00160
March 23-26, 2010

777
Figure 3 is a graft of f-measures versus model. There is little increase or decrease of f-
measure of good borrowers while there is drastic improvements of f-measure of bad borrowers. More
f-measure values more the accuracy of classification.

Figure 4. Accuracy versus hidden layers Figure 5. F-measure versus hidden layers
From the experimental results, model 4 is the best among other models. We performed
experiments by changing the number of hidden layers from 1 to 15 and found that there is marginal
increased of accuracy when hidden layers are increased. Figure 5 is a graft showing f-measure values
of good and bad borrowers.

5. CONCLUSION
In this study, we developed an accurate classification models that can detect bad borrowers of
NPPFs dataset. The experimental results indicate that we can improve the classification of minority
class by making use of SMOTE and Segmentation technique. We are able to increase the
classification of bad borrowers without much impact of good borrowers. The highest accuracy is
96.5% with 80.4% true positive detection on bad borrowers.
Future works include developments of classification models using under sampling techniques for
imbalanced dataset.

REFERENCES
1. Handzic Meliha, Tjandrawibawa Felix and Yeo Julia, How Neural Networks Can Help Loan
Officers to Make Better Informed Application Decisions, The University of New South Wales,
Sydney, Australia, June 2003.
2. L., Lahsasna, R.N. Ainon, and Wah Teh Ying, Intelligent Credit Scoring Model using Soft
Computing Approach, Faculty of Computer Science and Information Technology. University of
Malaya, 50503, Kuala Lumpur, Malaysia, IEEE, August 2008, 395-402.
3. Li Kai, Xu Zhonghua, Wang Baoqin, Research of Intelligent Decision Support System based on
Neural Networks, Yangtze University, Hubei Province, P. R. China, 434023, IEEE, August 2008,
124-126.
4. Kai Li, Zhonghua Xu, Baoqin Wang, Research of Intelligent Decision Support System based on
Neural Networks, Yangtze University, Hubei Province, P. R. China, 434023, IEEE, August 2008,
124-126.
5. YE Qian
1
, Liu Benquan
2
, Classification Algorithms Based on Neural Network and Its Application
In the Credit Market,
1
College of Economics & Management, China Three Gorges University,
Yichang, Hubei, 443002, China,
2
Zhejiang University of Finance & Economics, Hangzhou,
310018, China, July 2006
G00160
March 23-26, 2010

778
6. L., Lahsasna, R.N. Ainon, and Wah Teh Ying, Credit Risk Evaluation Decision Modeling
Through Optimized Fuzzy Classifier, Faculty of Computer Science and Information Technology,
University of Malaya, 50503 Kuala Lumpur, Malaysia, August 2008.
7. Fahad Khalid, Measure-based Learning Algorithms An Analysis of Back-propagated Neural
Networks, Department of interaction and System Design Blekinge Institude of Technology, SE-
372 25 Ronneby, Sweden, June 2008.
8. Sunardi Oetama Raymond, Pears Russel, Dynamic Credit Scoring Using Payment Prediction,
Computer and Mathematical Sciences at AUT, 2007.
9. G. M. Weiss, Minig with rarity: a unifying framework, ACM SIGKDD Explorations Newsletter,
2004,7-19.
10. Lai Wuxing
a
, Peter W. Tse
b
, Zhang Guicai
a
, Shi Tielin
a
,Classification of gear faults using
cumulants and the radial basis function network,
a
Department of Mechtronics, School of
Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan
430074,
b
Department of Manufacturing Engineering and Engineering Management, City
University of Hong Kong, Hong Kong, Peoples Republic of China,2004.
11. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson
International edition, 2006.
12. John C. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector
Machines, Microsoft Research,1998
13. Chawla, Bowyer, K. W., Hall, L.O & Kegelmeyer,W. P. (2002). SMOTE: Sysnthetic Minority
Oer-sampling Technique. Journal of Artificial Intelligence Research, 15,321-356.
14. G. N., Nayak, and C.G., Turvey, Credit Risk Assesment and the Opportunity Cost of Loan
Misclassification, Canadian Journal of Agriculture Economics, 1996,45(3), 285-299.
15. K.A. Smit, Introduction to neural networks and data mining for business applications. Eruditions
Publishing, Australia,1999.

G00164
March 23-26, 2010
779
Development of Free Bulge Test Tooling for Flow Stress
Curve Determination of Tubular Materials

Perawat Boonpuek
1
, Suwat Jirathearanat
2
, Nattawut Depaiwa
3
and Naoto Ohtake
4,C

1
Graduate Student, TAIST Tokyo Tech Automotive Engineering (International Program),
International College, King Mongkuts Institute of Technology Ladkrabang, Bangkok 10520, Thailand

2
National Metal and Materials Technology Center,
National Science and Technology Development Agency, Bangkok, 10400, Thailand
C
E-mail: K.perawat.b@gmail.com; Fax: (662) 5647001-5; Tel. 089-0789857, (662) 5647000

ABSTRACT

This study aims to design die inserts for use in free bulge testing to determine flow stress
curves of tubular materials. Hwangs Model is used to determine the suitable free bulge shape
of tubular materials being tested. The results of Finite Element simulation of free bulge
forming are compared with the analytical results of Hwangs Model. The results of this study
show that the die inserts designed are able to perform in tubular materials testing with the free
bulge shape suitable for proper flow stress curve determination. A numerical study on effects
of K and n values on hydroformability of fuel filler pipe is conducted by Finite Element
Software, DYNAFORM. STKM 11A carbon steel tubing is evaluated for its plastic
deformation through consideration of forming limit diagram and thickness distribution. It is
also shown in this study that the hydroformability of the fuel filler pipe is highly sensitive to
the magnitude of K and n values. Therefore, proper determination of flow stress curves of
tubular materials is of great importance.

Keywords: Tube Hydroforming, Free Bulge Test, Flow Stress Curve

1. INTRODUCTION
Hydroforming processes have become popular in various industries, such as bicycle,
automotive, aircraft and aerospace, etc., due to increasing demands for lightweight parts, high
strength and reducing welding line of parts. This technology is a relatively new manufacturing
process able to form more complex shapes compared to stamping. In comparison to
conventional metal forming, tube hydroforming has several advantages, such as: (1) reduction
in number of workpieces, tool cost and product weight, (2) improvement of structural stability
and increase of strength and stiffness of the formed parts, (3) more uniform thickness
distribution, and (4) fewer secondary operations, etc. On the other hand, this technology has
some disadvantages, such as long cycle time, expensive equipment and lack of effective data-
base for tooling and process design [1-3].
Several studies concerning the axis-symmetric hydraulic bulge forming have been
reported in the literature review. Yeong Maw Hwang, Yi - Kai Lin and T. Altan [4, 5], they
proposed an analytical model to evaluate the effects of die entrance radius on the internal
pressure and thickness distribution of formed tubes. In their paper, assumption of an elliptical
surface for the bulge profile is made. A thin wall thickness of free bulge region is
approximated using a quadratic distribution. However, they did not consider the influence of
R
d
/ t and L / OD ratios, where R
d
, L, t, OD are die entrance radius, bulge length, thickness
and outer diameter of tubes, respectively.
This paper aims at designing proper die inserts for use in tube bulge test considering the
effects of R
d
/t and L/OD ratios. The FEA simulation method is used to investigate the
above- mentioned parameters of the die inserts that can form the tubes in the range of proper
free bulge shapes. Hwangs model is considered to determine the proper free bulge shape
comparing with FEA simulation results. In addition, FEA simulation method is also used to
G00164
March 23-26, 2010
780
study effects of strength coefficient (K) and strain-hardening exponent (n) on
hydroformability of fuel filler pipe forming.

2. ANALYTICAL MODEL
In Hwangs Model, the effective stress and effective strain are derived from the hydraulic
bulge profile of tube and internal pressure. Assumptions of free bulge profile are as follows:
(1) The profile of the forming tube at the free bulge shape is assumed to be of an elliptical
curve as shown in Figure 1.
(2) The two ends of the tube with a bulge length (L) are completely fixed.

Figure 1. Schematic of tube free-bulge test [4, 5]

From the bulge model (Figure1), the coordinate of contact point e, R
e,
and Z
e
can be calculated
by Eq. (1) and Eq. (2)
R
e
= R
0
+ R
d
(1- cos
e
) (1)

Z
e
= L/2 R
d
sin
e
(2)

Nomenclature
D
P
diameter of the forming tube
contact angle
at the pole
circumferential and meridian radii of

K strength coefficient curvature at any point on the tube surface
L bulge length
p
,
p
circumferential and meridian radii
n strain hardening exponent of curvature at pole p
P
i
internal pressure
,
t
strain at the pole in the hoop
R
0
initial tube outer radius and thickness directions
R
d
entrance radius of the die , effective stress and effective strain
R
e
, Z
e
coordinates of contact point e

,
stress in the meridian and

R
P
, Rz half length of the minor and major circumferential directions
axes of the elliptical tube surface
o
,
o
initial yield stress and initial yield strain
t
P
tube thickness at pole p principal strain ratio

1
,
2
major strain and minor strain
G00164
March 23-26, 2010
781
where
e
, L , R
0
, R
d
are the contact angle, bulge length, tube outer diameter , die entrance
radius , respectively. The half length of the minor and major axes of the elliptical surface in
the free bulge region, R
P
, R
Z
can be obtained by Eq. (3) and Eq. (4).

R
p
= (3)

R
z
= ) (4)

The circumferential and meridian radii of curvature at pole p of the tube bulge shape,
p
and
p
, can be expressed, respectively, as follows:

p
= R
P
(5)

p
= R
Z
/ R
P
(6)

Eq. (1) Eq. (6) are used to calculate the
p
and
p
in the bulge region. Finally, flow stress
curve can be derived from the effective stress and effective strain in biaxial stress state of
plastic deformation. The flow stress can be expressed using an exponential strain hardening as
Eq. (7) [4-6].
(7)

where , , K,
0
, n are the effective stress, effective strain , strength coefficient , initial
strain and strain hardening exponent, respectively,

3. DIE INSERT DESIGN for FREE BULGE TEST
3.1 FEA with varied dimensions of die inserts
The objective of this FEA simulation for tooling design is to investigate dimensions of the
die inserts (R
d
and L) that allow forming of the proper free bulge shapes. The schematic of
free bulge study is shown in Figure 2.

Figure 2. Schematic of free bulge test and the design parameters

From Figure 2, R
d
and L values are the design parameters of interest. R
d
signifies mobility of
the tubes into the die cavities. L signifies a bulging ability of the tubes. These factors have
effects on the thickness distributions. Dimensions of the investigated die insert for FEA
simulation are defined in Table 1. R
d
and L are varied to investigate the forming of the bulge
shapes ranging from initial bulge to cracking.

G00164
March 23-26, 2010
782
Table 1. Parameters of finite element model

Model
Rd
(mm)
t
o

(mm)
Rd/ t
o
L
(mm)
OD
(mm)
L/OD
1 5 1 5 25.4 25.4 1
2 5 1 5 38.1 25.4 1.5
3 5 1 5 50.8 25.4 2
4 15 1 15 25.4 25.4 1
5 15 1 15 38.1 25.4 1.5
6 15 1 15 50.8 25.4 2
7 25 1 25 25.4 25.4 1
8 25 1 25 38.1 25.4 1.5
9 25 1 25 50.8 25.4 2

DYNAFORM is used to conduct the FEA simulations of hydro-forming processes. LS - Dyna
solver is directly adopted to solve the nonlinear problems. During the simulation, two ends of
the tube are fixed in the closed die inserts. For the process design, Models of die inserts are
assumed to be rigid body. Plastic deformation behavior of the tube is a biaxial stress state.
The material used for tube blank model is mild steel: CQ T36 (USA Standard), OD = 25.4
mm, thickness = 1 mm. Mechanical properties of tubular materials are defined as default of
software. In this study, heat and temperature are neglected. A constant friction coefficient ()
of 0.125 is assumed at the interface between the tube and dies. The model of the tube is
meshed by shell elements at the middle surface. Element size of die mesh is maximum size =
2, minimum size = 0.5. Element size of the tube mesh is maximum size = 1, minimum size =
0.5. Feeding distance is zero (fixed). Density of Water = 9.95e
-7
kg/mm
3
and Bulk = 2200
GPa.

3.2 Results of Finite Element Simulation
All the simulation results of the free bulging are analyzed and summarized in this
section. The bulge shape from each simulation is considered at the time when a crack is first
predicted, Figure 3. Figure 4 shows the forming tube at various intermediate forming steps.
The bulge shapes from all 9 cases are compared in Figure 5.

Figure 3. Half model of tube free bulging

Symmetry plane
G00164
March 23-26, 2010
783

Figure 4. Intermediate forming steps with thickness distribution: (a) tube ends are fixed by
punches, (b) early deformation by internal pressurization, (c) end of free bulge forming

Figure 5. Comparison of predicted bulge shapes from different die insert geometry
The thickness at the pole versus bulge heights is compared for different ratios of R
d
/ t and
L/OD, in order to determine a bulge shape suitable for flow stress curve determination of
tubes, thus selecting proper tooling dimensions. The thickness versus bulge heights is plotted
in Figure 6.

G00164
March 23-26, 2010
784

Figure 6. Bulge heights versus thickness at the pole p

From Figure 5, it can be seen that Model #3, Model #5, Model #6, Model #7, Model #8,
Model #9 have improper bulge shapes because they do not resemble elliptical shape and
crack too early. The 6 out of 9 models (i.e. Model #3, Model #5-Model #9) have cracked by
the end of the simulation. This leads to conclude that Model #1, Model #2, Model #4 seem to
be suitable for the testing as they from to be large bulges without any cracks. This is
confirmed by larger wall thickness of the three models compared to the rest, Figure 6. In order
to compare among the Models #1, Models #2 and Models #4, curvature of the bulges (i.e.
circumferential radius and meridian radius) are compared with those calculated by Hwangs
model (see Figure 7, Figure 8 and Figure 9).

Figure 7. Deviation of simulated bulge shape curvatures from Hwangs model
Figure 7 graphs the curvature deviations of the simulated Model #1, #2, and #4 from the
corresponding analytical results based on Hwangs model. Model #2 best fits Hwangs model
as it shows the least curvature deviations. Figure 8 and 9 show comparisons of the simulated
(i.e. from Model #2) and exact curvature (i.e. from Hwangs model). It can be seen that the
bulge curvatures (i.e. both circumferential and meridian) follow Hwangs model well.
Therefore, the die geometry of model #2 (Rd / t = 5 and L/OD = 1.5) seems to be the proper
die geometry for flow stress curve determination experiment. Those values are then chosen
for the free bulge test tooling manufacturing.
G00164
March 23-26, 2010
785

Figure 8. Meridian radius comparison between simulation results and Hwangs model

Figure 9. Circumferential radius comparison between simulation results and Hwangs model

4. SENSIVITY STUDY of K and n on PART HYDROFORMABILITY
The objective of this FE simulation is to study the effects of material properties, i.e. varied
strength coefficient (K) and strain-hardening exponent (n). To evaluate the effect of these
material parameters on hydroformability, the hydroforming of fuel filler pipe is modeled and
simulated. The material used for tube blank models is mild steel: CQ T36 (USA Standard),
OD=28.6 mm, thickness = 1.2 mm. The FE simulations matrix is summarized in Table 2.

Table1 2. FEA for sensitivity study of k, n

No. Change K (MPa) n
1 Default 479.3 0.226
2 10 % K 527.23
0.226
Constant
3 20 % K 575.16
4 -10 % K 431.37
5 -20 % K 383.44
6 10 % n
479.3
Constant
0.2712
7 20 % n 0.2486
8 -10 % n 0.2034
9 -20 % n 0.1808

G00164
March 23-26, 2010
786
Maximum feeding distance of the punch is defined as 74.4 mm. Friction coefficient () of
0.125 is assumed. The FE model is meshed using shell elements. Element size of die is
maximum size = 1 mm, minimum size = 0.5 mm. Element size of the tube is maximum size =
1 mm, minimum size = 0.5 mm. Density of Water = 9.95e
-7
kg/mm
3
and Bulk = 2200 GPa.
FE model is shown in Figure 10.

Figure 10. Half FEA mode of hydroforming of fuel filler pipe

Figure 11 shows the thickness distribution along the cutting line shown, which is obtained
from some selected FE models (see table 2). Figure 12 shows the strain paths of the element
at the pole from all the FE models.

(a)

(b)
G00164
March 23-26, 2010
787

(c)
Figure 11.Thickness distribution when (a) FE model,(b) vary K, n constant,
(c) vary n, K constant

Figure 12. Forming limit diagram

From the simulation results on study of K and n effects, it is seen that the fuel filler pipe
made of tubes with mechanical properties that are just varied by 10 % of the K and n values
(see Table 2) have totally different distributions compared to the one with normal K and n
values. Moreover, the strain paths of the element at the pole from different models seem to
have drastic changes in the direction away from the one with normal K and n values. This
suggests that a light change in the material parameter values (i.e., K and n) will significantly
affect the part thickness distribution and strain paths. This seems more apparent in tube
hydroforming process than stamping process.

5. CONCLUSION
A FEA approach for determination of proper tube free bulge tooling geometry is
proposed. An analytical model for the tube profile at the free bulge region is assumed to be of
an elliptical surface. Effect of die entrance radius (R
d
) and bulge length (L) on the bulge shape
are investigated by using finite element simulations. Circumferential and meridian radii
obtained from FE model measurements are compared with those values calculated from
Hwangs model to determine the proper die insert geometry. Finally, the die insert suitable for
G00164
March 23-26, 2010
788
free bulge test is found (i.e., R
d
= 5 mm, L=38.1 mm). In the study of K, n effects, the results
of finite element simulations show that the strength coefficient (K) and strain hardening (n)
have a pronounced effect on the thickness distributions and hydroformability of fuel filler
pipe.

REFERENCE
1. Muammer Ko, Hydroforming for advanced Manufacturing, First Published,
England: Woodhead Publishing Limited and CRC Press LLC. 2008
2. F. Dohmann, Ch. Hartl, Liquid-bulge-forming as a flexible production method, Journal of
Materials Processing Technology 45 (1994) 377
3. F. Dohmann, Ch. Hartl, Hydroforminga method to manufacture light-weight parts,
Journal of Materials Processing Technology 60 (1996) 669
4. Y.M. Hwang, Yi-Kai Lin, Analysis of tube bulge forming in an open die considering
anisotropic effect of the tubular material, Journal of Machine Tools & Manufacture 46
(2006), p. 1921-1928
5. Y.M. Hwang, Yi-Kai Lin, T. Altan: Evaluation of tubular materials by a hydraulic bulge
test. Journal of Machine Tools & Manufacture 47 (2007), p. 343-351
6. William F. Hosford, Robert M. Caddell, METAL FORMING Mechanics and Metallurgy,
Third edition, first published, Cambridge University Press, The Edinburgh Building,
Cambridge CB2 8RU, UK, 2007

ACKNOWLEDGMENTS
The authors would like to extend their appreciation to the National Science and Technology
Development Agency, Thailand Science Park. The advice and financial support of MTEC are
greatly acknowledged. The authors would also thank TAIST Tokyo Tech Automotive
Engineering (International Program) International College, King Mongkuts Institute of
Technology Ladkrabang, Thailand.
789
Author Index
March 23-26, 2010

A

Amornraksa, T. 716, 743
Ang, L. S. 135, 258,263, 269
Anpalagan, A. 490
Anuntalabhochai, S. 39
Anussornnitisarn, P. 598, 610
Aquino, A. J. A. 177
Araki, K. 662, 751
Arsawang, U. 182, 188
Asaruddin, M. R. 178
Asavanant, J. 349
Atiwongsangthong, N. 436

B

Bangphoomi, K. 106
Banjongkan, A. 471
Barbatti, M. 130
Benyajati, C. 362
Beyer, A. 191
Bieber, J. W. 274, 346
Boonchieng, E. 678
Boonkon, P. 317
Boonpuek, P. 779
Boonsin, R. 505
Boonyawan, D. 176
Bovornratanaraks, T. 159
Buachan, C. 275
Budcharoentong, D. 554
Busaman, A. 399

C

Chaijaruwanich, J. 31, 39, 677
Chaiyarat, W. 345
Chan, J. H. 59, 65
Chanajaree, R. 180
Chanapote, T. 293
Chanpoom, T. 247
Chantawannakul, P. 84
Chantrapornchai
(Phongpensri), C
676
Chantrapornchai, C. 454

Chantratita, W. 25
Charalambides, M. N. 385
Charawae, N. 675
Charnsethikul, P. 220
Charoenroop, N. 632
Cheevadhanarak, S. 44
Cheirsirikul, S. 326, 331
Chiverton, H. 571
Chiverton, J. 571, 209, 565
Choi, S. B. 99
Choong, Y. S. 92, 98
Choowongkomon, K. 106
Chotipanbandit, P. 575
Chuai-aree, S. 51, 377, 399, 619,
675, 750
Chuayjan, W., 348
Chumkiew, S. 32
Chuwhite, M. 641
Chuychai, P. 281
Ckeersirikul, S. 436
Cutler, R. 31

D

Daengngern, R. 120, 130
Decha, P 160, 189
Dechaumphai, P. 356
Depaiwa, N. 779
Deraman, R. 135, 258
Distsatien, A. 494
Do, D.D. 141
Dokmaisrijan, S. 140, 179
Dolwithayakul, B. 454
Dorji, S. 759
Dungkaew, W. 318, 324

E

Ekgasit, S. 170
Engchuan, W. 59
Eua-anant, N. 715
Eungwanichayapant, A. 632, 683
Evenson, P. 346

790
Author Index
March 23-26, 2010

F

Fauzi, M. 214
Fritzsche, S. 180

G

Gamonpilas, C. 385
Gleeson, M. P. 119
Guayjarernpanishk, P. 349

H

Hadiwijaya, B. 71
Haller, K. J. 314, 315, 316,
317, 318, 324, 325
Hamid, S. A. 190
Hannongbua, S. 112, 119, 160, 161,
180, 182, 188, 189
Haomao, B. 646
Harding, D. 140
Harding, P. 140
Harnsamut, N. 203
Harris, M.
KY-1
Hirankarn, N. 65
Hirao, H. INV-1
Hi-ri-o-tappa, K. 689
Hongsthong, A. 44
Hussim, M. H. 135, 258

I

Ikram, N. K. K. 121
Intharathep, P. 182, 188, 189
Itngom, P. 170

J

Jaeger, W. 51
Janpuk, S. 652
Jarangkul, W. 730
Jaroensutasinee, K. 1, 7, 13, 19, 32, 50,
282, 511, 518, 549

Jaroensutasinee, M. 1, 7, 13, 19, 32, 50,
282, 511, 518, 549
Jayasooriya, U. A. 269
Jinpon, P. 518
Jinuntuya, N. 254
Jirathearanat, S. 779
Jitonnom, J. 174
Jungsuttiwong, S. 162
Juntasaro, V. 356
Juntasaro, E. 411
Jutapruet, S. 19
Juttijudata, V. 400

K

Kaemarungsi, K. 662
Kaiyawet, N. 160, 161
Kamyan, N. 276
Kanbua, W. 370, 377, 399, 535,
750
Kanchanawarin, C. INV-3
Kasetkasem, T. 751
Keinprasit, R. 751
Kerdcharoen, T. 175, 581
Kerdprasop, K. 543
Kerdprasop, N. 543
Khamchuay, P. 549
Khamkaew, R. 593
Khantuwan, W 417, 658
Khetchaturat, C. 370, 377, 535
Khiripet, J. 501, 658
Khiripet, N. 417, 501, 505, 658
Khunhao, S. 423
Kijsipongse, E. 448
Kittiwutthisakdi, K. 241
Klomkliang, N. 141
Klungien, N. 423
Koichi, A. 391
Kokpol, S. 181
Konglok, S. A. 202
Krachodnok, S. 325
Krasienapibal, T. S. 170
Krittinatham, W. 274
Krongdang, S. 84
791
Author Index
March 23-26, 2010

Kuhapong, U. 50
Kulsirirat, K. 308
Kumkurn, T 715
Kungwan, N. 120, 130, 162

L

Lao-ngam, Ch. 201
Laopaiboon, R. 347
Lee, V. S. INV-5, 39, 84,
118, 174, 176, 179
Lee, Y. V. 92
Lewlomphaisarl, U. 751
Lischka, H. 177
Lokavee, S. 581

M

Mahasirimongkol, S. 25
Malaisree, M. 160
Malan, P. 411
Mamat, M. 214
Martnez, T. J. 150
Marukatat, S. 524
Masayuki, I. 391
Matthaeus, W. 275
Matthaeus, W. H. 281
Medhisuwakul, M. 179
Meechai, A. 59, 65
Meeprasert, A. 161
Meepripruek, M. 315
Meesad, P. 759
Mekha, P. 677
Meleshko, S.V. 234
Mohamed-Ibrahim, M. I. 135, 258, 263, 269
Mohd, I. 214
Mookum, T. 369
Morokuma, K. 175
Moshkin, N. P. 398
Muanglhua, R. 423
Muangsin, V. 429
Muchtaridi 151
Mueanpong, D. 646
Mulholland, A.J. 174

N

Nakjai, P. 603, 626
Naknoi, S. 241
Namvong, A. 668
Narathanathanan, T. 25
Narupiti, S. 689
Na-udom, A. 603
Nawi, M. S. M. 190
Ngamsaad, W. 85
Nghauylha, W. 641
Ngiamsoongnirn, K. 411
Niemcharoen, S. 423
Nilthong, R. 632, 668, 683
Nimmanpipug, P. INV-4, 84, 174,
176, 179
Nokthai, P. 118
Noonsang, P. 13
Noppakuat, N. 442
Normi, Y. M. 99
Nudklin, P. 743
Nunthaboot, N. INV-6
Nupairon, N. 464
Nuttavut, N. 253

O

Oeckler, O.M. 314
Ohtake, N. 779
Orankitjaroen, S. 369
Osamu, N. 391

P

Pabchanda, S. 347
Page, A. J. 175
Paithoonrangsarid, K. 44
Panich, S. 765
Pan-ngum, S. 689
Pansuwan, P. 529
Panyakampol, J. 44
Parasuk, V. 159, 177, 189
Park, K. 412
Pattanasiri, B. 253
792
Author Index
March 23-26, 2010

Pattara-Atikom, W. 689
Payaka, A. 122
Pecharapa, W. 308
Pengchan, W. 326, 331, 436
Phalavonk, U. 728, 772
Phansri, B. 412
Phatanapherom, S 460
Pheera, W. 7, 282
Phengsuwan, J. 464
Phetchakul, T. 326, 331
Phongphanchanthra, N. 423
Phongphanchantra, N. 436
Phonyiem, M. 196
Phuchamnong, A. 362
Piansawan, T. 120
Pianwanit, A. 188
Pianwanit, S. 181
Pipatpaisan, J. 676
Pitaksapsin, N. 362
Plengvidhya, V. 44
Pochai, N. 202
Pongcharoen, P. 529
Pongjanla, S. 598
Poodchakarn, S. 554
Poopanya, P. 287
Poulter, J. 254
Poyai, A. 326, 331
Pramoun, T. 716
Prasitsathapron, C. 652
Prasittichok, K. 163
Prasitwattanaseree, S. 163
Pratumwal, Y. 299
Prayote, A. 728, 772
Premvaranon, P. 299
Prommeenate, P. 44
Prom-on, S. 25, 59, 65
Promsuwan, P. 220
Prueksaaroon, S 471
Pungpo, P. 191
Punkvang, A. 191
Puntharod, R. 316
Punwong, C. 150
Putpan, J. 347

R

Rahman, N. A. 190
Raksapatcharawong, M 587
Remsungnen, T. 180, 182, 188
Ritraksa, S. 619
Rivaie, M. 214
Ruangphanit, A. 326, 331, 423
Ruangpornvisuti, V. 170
Ruangrassamee, A. 429
Ruengjitchatchawalya, M. 112
Ruffolo, D. 274, 275, 276, 281,
346
Rujeerapaiboon, N. 722
Rukwong, N. 529
Rungrattanaubol, J. 603, 626
Rungrotmongkol, T. INV-8, 160, 161,
182, 188
Rungsawang, A. 703, 737
Ruttanapun, C. 229

S

Sabri, M. 490
Saelim, R. 619
Saengsawang, O. 182, 188
Saenton, S. 683
Sagarik, K. 196, 201
Saimek, S. 554
Sairattanain, S. 683
Saisa-ard, O. 318, 324
Siz, A. 275, 276, 346
Sangarun, P. 282
Sangchai, C. 703
Sanglub, S. 646
Sangprasert, W. 176
Sa-nguandee, I. 587
Saovapakhiran, P. 703
Saparpakorn, P. 112
Sattayanon, C. 120
Seangrat, J. 442
Senachak, J. 44
Sengkeaw, P. 71
Seripienlert, A. 275, 281

793
Author Index
March 23-26, 2010
Shank, L. 118
Shulz, E. 234
Silachan, K. 484
Sinthupinyo, S. 524
Sippl, W. INV-7, 181
Siridejachai, S. 229
Siripant, S. 51, 399
Siripatana, A. 511
Sirisup, S. 203, 235, 350
Somboon, T. 119
Somhom, S. 593
Somkror, B. 678
Somphon, W. 314
Sompornpisut, P. 160, 161, 189
Soparat, J. 299, 362
Sornbundit, K. 85
Sornmee, P. 182, 188
Sornsiriaphilux, P. 662
Srimungkala, A 356
Sriprapai, D. 554
Srisaikum, A. 336
Suapang P. 641, 646, 652
Subpaiboonkit, S. 31
Sugino, N. 729
Sujaree, K. 441
Sukhasem, R. 610
Sukrat, K. 177, 182
Sukriket, P. 703
Sulaiman, S. 135, 258, 263, 269
Surarerks, A. 703, 709, 722, 730,
737
Suwannasri, P. 398
Suwansaroj, C. 729

T

Tanadkithirun, R. 235
Tang, I.M. 348
Tangmanee, S. 202
Tangsathitkulchai, C. 141
Tanpipat, N. 336
Tantirungrotechai, Y. INV-2
Tashiro, K.
KY-2
Techitdheera, W. 308, 341

Teralapsuwan, A. 299
Thammarongtham, C. 31
Thanadngarn, C. 554
Thanapatay, D. 662, 729
Thanawattano, C. 729
Thassana, C. 341
Thawornrattanawanit, S. 429
Thompho, S. 180
Thongbai, P. 71
Toadithep, N. 494
Toh, P. 269
Tomkratoke, S 203, 350
Tongraar, A. 122
Tooprakai , P. 276, 281
Treepong, P. 65
Triampo, D. 85, 253
Triampo, W. 85, 253
Truong, T. N. 120
Tunega, D. 177

U

Udommaneethanakit, T. 189
Udomvech, A. 175, 581
Udomwong, P. 39
U-ruekolan, S. 448
Uthayopas, P. 442, 460
Uttama, S. 517, 565

V

Vannarat, S. 179, 229, 293, 350,
448, 471
Varavithya, V. 471
Vasupongayya, S. 478
Vchirawongkwin, V. 170
Veerakachen, W. 587
Vilaithong, T. 179
Viriyarattanasak, P. 391
Virochsiri, K. 429
Visuthsiri, K. 370
Vongachariya, A. 159
Vongpradhip, S. 575

794
Author Index
March 23-26, 2010
W

Wacharanad, S. 441
Wahab, H. A. 77, 92, 98, 99, 121,
151, 178, 190
Waiyawut, W. 758
Wanchai, V. 44
Wangdi, K. 728, 772
Warnitchai, P. 412
Watcharabutsarakham, S. 524
Wattanawongsakun, P. 362
Wichapong, K. 181
Wiengpon, A. 709
Wikaisuksakul, S. 675
Wilaisil, W. 494
Williams, I. D. 325
Williams, J. G. 385
Wiwatanapataphee, B. 348, 369
Wolschann, P. 189, 191
Wong, K. 235
Wongkoblap, A. 141
Wongkoon, S. 1
Wongpornchai, S. 163
Wongsamran, P. 751
Wongsarasin, W. 737
Wood, B. R. 316
Worrasangasilpa, K. 730
Wu, Y. H. 348, 369

Y

Yakhantip, T. 120, 162
Yam, W. K. 77
Yana, J. 179
Yangthaisong, A. 287, 293, 336, 345,
347
Yongpisanphop, J. 112
Yooyeunyong, S. 454
795

March 23-26, 2010
ANSCSE 14 COMMITTEE

Steering Committee
Vanchai Sirichana, MFU
Ted Tesprateep, MFU
Supa Hannongbua, KU
Pornpan Pungpo, UBU
Jeerayut Chaijaruwanich, CMU
Ekachai Juntasaro, TGGS
Vara Varavithya, KMUTNB
Anucha Yangthaisong, UBU
Putchong Uthayopas, KU
Anant Eungwanichayapant, MFU
Piyawut Srichaikul, NECTEC
Supot Hannongbua, CU
David Ruffolo, MU
Vudhichai Parasuk, CU
Sornthep Vannarat, NECTEC

796

March 23-26, 2010
Technical Committee
Jeerayut Chaijaruwanich, CMU
Pornpan Pungpo, UBU
Ekachai Juntasaro, TGGS
Anucha Yangthaisong, UBU
Vara Varavithya, KMUTNB
Supa Hannongbua, KU
Vudhichai Parasuk, CU
Supot Hannongbua, CU
Jack Asavanant, CU
Nikolay Moshkin, SUT
Varangrat Juntasaro, KU
Vejapong Juttijudata, KU
Sirod Sirisup, NECTEC
David Ruffolo, MU
Piyawut Srichaikul, NECTEC
Putchong Uthayopas, KU
Sornthep Vannarat, NECTEC
Vannajan Sanghiran Lee, CMU

797

March 23-26, 2010
Local Organizing Committee
Siriwat Wongsiri, MFU
Thongchai Yooyativong, MFU
Preecha Upayokin, MFU
Anant Eungwanichayapant, MFU
Rungrote Nilthong, MFU
Panom Winyayong, MFU
Ekachai Chukeatirote, MFU
Uraiwan Intatha, MFU
Kanchana Watla-iad, MFU
Phunrawie Promnart, MFU
Surapong Uttama, MFU
Punnarumol Temdee, MFU
John Chiverton, MFU
Machima Chotipunvitayakul, MFU
Kanlayanee Moonkhum, MFU
Warawan Nawa, MFU
Nittaya Ngammoh, MFU
Roungsan Chaisricharoen, MFU
Kayun Chantarasathaporn, MFU
Piyanate Chuychai, MFU
Vittayasak Rujivorakul, MFU
Pruet Putjorn, MFU
Kumpol Chailert, MFU
798

March 23-26, 2010
Piyanuch Siriwat, MFU
Hataikan Poncharoensil, MFU
Prapassorn Eungwanichayapant, MFU
Nattakan Soykeabkaew, MFU
Suwanna Deachathai, MFU
Tophan Thandorn, MFU
Natthawut Yodsuwan, MFU
Wacharapong Srisang, MFU
Padoungkiart Chaisawat, MFU
Nut-rada Saeng-usamat, MFU
Weerayut Wongsupa, MFU
Kwansuda Singpun, MFU
Narumon Chantonganun, MFU
Rawiwan Sukphol, MFU
Suchart Rattanaroam, MFU
Watcharin Jombua, MFU

FINAL ANSCSE14 Proceedings W Cover

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FINAL ANSCSE14 Proceedings W Cover

Uploaded by

Copyright:

Available Formats

ANSCSE 14 Mae Fah Luang University, Chiang Rai, Thailand

March 23-26, 2010

Building C2 Room 208

Building C2 Room 208

Building C2 Room 209

Building C2 Room 209

Building C2 Room 210

Building C2 Room 210

Building C2 Room 213

Building C2 Room 213

Building C2 Room 214

Building C2 Room 214

Building C2 Room 215

Building C2 Room 215

is the energy due to the interaction between layers. To

[14]. It is obtained by performing the Fourier transform of spins and

, we average two dimensional structure factors over

. The characteristic domain size at time t is given by

Permanent office : Pharmaceutical chemistry Laboratorium of Faculty of Pharmacy, Universitas

, the anticancer drug.

, while the SPC model is used

. The sulfonic acid groups (-SO

are not well understood. In the present work, elementary

, PEMFCs, ab initio MD simulations.

especially at the molecular level are not well

. Separation of hydrated Nafion

, structures, energetic and dynamics of the precursors

were theoretically studied by

were studied using the

, BOMD simulations, sulfonate group, PEMFC.

represents the location vector of the observed potential and

represents the source location vector.

is a white noise. While solving problems related with nonlinear ordinary

W is a Wiener process and N

t is the corresponding compensated process. Assume that

= for < 2/.

= for any size of .

which is always finite and positive.

~ 1.2378 when 1/L is

= . Furthermore, only electronic thermal conductivity is considered here and it can

transferred to the surrounding fluid is proportional to the object's

T . The constant of proportionality h is termed the convection heat-

= . Eventually, the calculated Seebeck coefficients and

is the sea surface

are the surface wind stresses and bx

is the group velocity function of variables f y x , , and u . The

g were then computed using a least square error method by matching

is a latent heat of solidification divided by a

denotes the vector of

Core2 Quad Q9550 at 2.83 GHz, 8 GB RAM

0.6 0.7 0.50

0.07 0.11 0.10 0.08 0.13 0.07 0.24 0.19

) is an optimal method synthesis. It computes a

x acc k x en k en acc en acc

y acc k y en k en acc en acc

circumferential and meridian radii of

stress in the meridian and

You might also like