You are on page 1of 5

Code Runner:

Nhn din v thc thi code vit tay


WenXiao Du
Department of Electrical Engineering, Stanford University

Tm ttBi bo ny gii thiu v "Code Runner", thi code c vit bng ngn ng Ruby [1]. Mt
mt ng dng Android c kh nng nhn dng v thc team i hc Jadavpur cng c mt vi bi bo
thi code vit tay trn giy t ngi dng. Xut pht t v nhn dng ch vit tay s dng Tesseract-ORC
thc t h thng nhn dng quang hc OCR hin nay c
[2] [3]. Mt vi tng v cch thc hin cng
kh nng nhn din cc ch in t ti liu scan rt tt
nhng kh nhn din c ch vit tay do vic c rt c nu ra cc kha hc trc [4], ...
nhiu kiu vit khc nhau vi cng mt k t, cng nh Trong nhng phn tip theo, chng ti s trnh
khong cch gia cc ch, cc dng khng u. gii by v cu trc chung ca ton b h thng. Sau
quyt vn nu trn, chng ti s dng v training , chi tit tng phn s c gii thiu bao gm
mt h thng OCR ph bin l Tesseract kt hp thm tin x l, training Tesseract, x l code. Cui cng
mt vi phng php x l nh c th nhn dng c
code vit tay v thc thi on code. Kt qu chy th cho
l kt qu v nh gi tnh chnh xc ca ng dng
thy h thng nhn dng chnh xc cc k t cng cng s c cp.
nh code thc t. Bi bo ny s trnh by t gii thut
cho n qu trnh thc hin h thng. II. TNG QUAN H THNG
T khach vit tay, nhn dng, Tesseract, x l nh
T thc t rng h thng Tesseract OCR yu cu
v tnh ton rt nng phc tp, v vy, chy ton
I. GII THIU b h thng trn in thoi l khng hp l. Nhm
Trong nhiu cuc phng vn v lp trnh, cc gii quyt vn trn, ng dng c xy dng
ng vin thng c yu cu vit mt on code da trn m hnh client-server. Client y chnh
ln giy. Vi cch vit ny, ngi phng vn c th l in thoi Android, n c nhim v chp nh,
thy c tnh logic ca on code nhng li khng gi nh v cho server v hin th kt qu tr v
th thc thi c on code va vit bit c t server cho ngi dng. Trong khi , server c
n c thc s chy ng hay khng. Code Runner nhim v tnh ton v x l. Cc tng chnh trong
l mt gii php cho nhng tnh hung nh vy, qu trnh x l:
ng dng s chp nh on code bng mt chic 1) Chp v gi nh: in thoi android c
smartphone Android, sau s tin hnh x l v nhim v chp v gi nh v cho my tnh.
hin th ng ra ca on code sau khi c thc 2) Tin x l: Trong bc ny, cc phng php
thi. x l nh c s dng gip cho h thng
thun tin, ngn ng C c chn nhn nhn dng c th x l c nh c gi v.
din v n c c php n gin nhng cht ch, Cc phng php ny bao gm: Nh phn ha
b t c s dng trong cc cu lnh t hn so nh, xa vt th nhiu v morphological. Chi
vi nhng ngn ng lp trnh hin i khc. Mt tit s c nu trong phn III.
u im na ca C l cc khong trng, cch l 3) Nhn dng k t: Tip theo h thng s nhn
khng quan trng ( khc vi mt s ngn ng khc dng ch bng Tesseract-OCR v tch cc
nh Python), v vy h thng ch cn tp trung vo ch ny ra. Thc t Tesseract khng th nhn
nhn dng cc k t. dng ch vit tay, v vy, cn phi c qu trnh
c rt nhiu ng dng tng t ra i, v d training n c th lm c iu ny. Chi
nh Iris, mt ng dng c th nhn dng v thc tit s c nu trong phn IV.
4) X l cui: Trong bc ny, mt s gii thut cc ca s c kch thc 20x20, mi ca s s c
s c s dng nhm kt ni cc ch v k nh phn s dng phng php Otsu. Cc ca s
t nhn dng c bc 3 cc k t cnh nhau s c chng ln nhau mt vng c
nhn dng c gn vi code C hn. Chi tit ng rng l 10 pixel lm mt kt qu.
s c nu trong phn V.
5) Kim tra v xc nhn t ngi dng: D vi
tt c nhng bc trn, h thng cng khng B. Loi b vt th nhiu
th m bo rng qu trnh nhn dng s ng
100% , ch cn mt li nh cng dn n Bi trn camera hoc trn giy c th dn n
qu trnh compile code b sai. V vy, sau nhiu trn bc nh. loi b cc nhiu ny, cc
khi nhn dng, server s gi kt qu nhn vt th trn nh c rng nh hn 12 pixel s b
dng v cho in thoi v ngi dng c xa, mc ngng 12 pixel c chn xa b cc
th kim tra v xc nhn trc khi compile vng nhiu m khng xa i cc k t nh nh cc
code. du cu.
6) Compile, chy code v hin th kt qu: Sau
khi kt qu nhn dng c kim tra bi
ngi dng, in thoi s gi kt qu v cho C. Morphological opening
server thc thi v tr kt qu v cho in
thoi. Cc bin ca cc k t thnh thong c th s b
S khi tng quan h thng c trnh phn loi nhm do qu trnh nh phn ha. Phng
by Hnh 1 php morphological opening c s dng lm
mt cc vng bin ca k t v lm n trong thc
t hn. Mt v d v bc nh trc v sau khi tin
x l c th hin trn Hnh 2.

Hnh 1. M hnh h thng.

III. TIN X L Hnh 2. nh trc (tri) v sau khi tin x l (phi).


nh c chp c t camera in thoi c th
b nh hng bi nhiu yu t khng l tng nh
IV. NHN DNG L T BNG TESSERACT
nh sng khng u trn ton bc nh, mt vi vng
b nhiu hoc bi hoc b m do phn gii thp.
Nhng yu t ny c th nh hng v lm qu Tesseract l mt cng c nhn dng quang hc
trnh nhn dng bng Tesseract-OCR khng thnh c pht trin bi HP Labs trong khong t nm
cng. Tin x l l qu trnh nhm hn ch v loi 1985 n nm 1995, sau n c ci tin v m
b nhng yu t nhiu ny, qu trnh ny c chia rng bi Google. Tesseract ban u ch c thit
ra thnh cc qu trnh nh sau y: k nhn dng cc ch ci ting anh, tuy nhin
sau , vi nhiu n lc ci tin v pht trin, cng
A. Nh phn ha nh c gi c th nhn dng c nhiu ngn ng
Nu bc nh u vo l nh mu, n phi c khc nhau [6]. Do , nu chng ta nh ngha
chuyn thnh nh trng en trc bng cch dng mt kiu ngn ng mi, chng ta hon ton c th
b nh phn ha thch ng. nh s c chia thnh training Tesseract c th nhn din c n.
A. Chun b cc mu ch vit tay theo v tr, hng ct. file ny khng t ng to bi
Tesseract m cn phi c lm bng tay. Chng
ti s dng mt cng c online lm vic nh
du k t ny. nh chp qu trnh nh du c
th hin trn Hnh 4.

Hnh 4. nh chp mn hnh ca cng c chnh sa online.

Sau bc ny, chng ta c th lm theo cc


bc trong ti liu hng d ca Tesseract tin
hnh to model training.

C. Mt vi mo nh
Mt vi mo nh c th c p dng tng
Hnh 3. Mt mu training. hiu nng ca h thng.
1) Cung cp freq-dawg trong qu trnh train-
Thay v s dng cc tp d liu t Internet, chng ing. y l mt ty chn ca Tesseract nhng
ti t to ring mt tp nh training ca ring mnh, kh hiu qu trong qu trnh training. Trong
chng ti vit 10 trang code bao gm 2184 ch freq-dawg cha cc t c dng vi tn
ci. Mc d phn b s lng ch ci c th khng sut ln trong ngn ng cn training. V trong
ging nhau nhng ti hon ton c th m bo ngn ng C tp t kh nh nn vic thm cc
rng mt ch ci s c t nht 20 mu training, t ny cng khng qu kh khn nhng mang
tha mn yu cu ca vic training Tesseract. li kt qu nhn dng tt hn.
Mt l do khin chng ti khng s dng cc 2) Trnh cc k t c nhiu ngha. S nhiu
trang training vi mi trang l mt ch v cng c ngha trong mt k t l mt trong nhng
Tesseract tin hnh phn tch cc layout v t bn nhn t dn n phn loi sai trong qu trnh
trong, ngha l n s dng cc kin thc v ngn nhn dng. V vy trong project ny, chng
ng nhn dng ( v d nh cc quy tc cu to ti chnh sa cch vit ca mt s k t
t, cc t trong t in, ...). Do s dng cc t trnh nhng nhm ln nh trn. V d nh
thc t s c hiu qu training cao hn. Hnh 2, du ngoc kp c ni li vi nhau
Sau qu trnh chun b, camera in thoi c nh hnh ch pi trnh nhm ln vi 2 du
s dng scan ton b trang code sau tin ngoc n, du chm trn ch i v du chm
hnh qu trnh tin x l v training nhng Hnh phy ; c ni li st vi nhau trnh
3. nhm ln vi du chm. Thc nghim cho
thy hiu qu ca vic iu chnh ny.
B. nh du bng tay
Cng c Tesseract yu cu mt file tng ng V. X L CUI
vi mi nh c training, trong file cha tt Cc k t c nhn dng ng ra ca
c cc k t tng ng c trong bc nh vi ng Tesseract c th cn cha nhiu li. Trc khi xut
n ra ngi dng kim tra v sa li, mt vi nh 2 b gim gc nhn xung cn nh 3 th b xoay
gii thut t ng iu chnh k t cho ng vi t nghing.
c p dng gim thiu ti a s iu chnh
bng tay. Vi tp t c tun sut xut hin nhiu
c to ra nh ni phn IV-C. Cc k t sau
khi nhn dng s c tch ra v em tnh ton v
so snh vi tp t trn, nu 2 t ny khc nhau
di mt mc ngng nht nh, th t ang xt s
c thay th bng t trong tp t.
Phng php trn cng c vi hn ch. V d
nh t int nm trong tp t, nu ngi dng
khai bo mt bin c tn l ant th bin ny s b
thay th bi t int. Cc gii thut c phc tp
cao hn c th c th v thay th trong tng lai
gii quyt hn ch ny.

VI. KT QU V NH GI
A. nh chp mn hnh
Hnh 5 th hin nh chp mn hnh ca ng dng
trn in thoi trong bc cui cng trong qu trnh
x l ca h thng l s iu chnh bng tay ca
ngi dng.

Hnh 5. nh chp mn hnh ca ng dng Code Runner.

B. chnh xc trn tp nh test


Hnh 6. Cc nh test 1,2,3.
bc ny, chng ti s dng 3 bc nh test
nh gi chnh xc ca Code Runner. 3 bc C 3 bc nh u cha 216 k t. chnh xc
nh ny c cht lng hon ton khc nhau, nh 1 ca qu trnh nhn dng ca cc bc nh c th
l mt nh bnh thng c chp thng trc din, hin trong bng 1:
No. chnh xc
nh 1 83.79%
song bn cnh cn c nhng hn ch cn c
nh 2 77.31% khc phc.
nh 3 0% im hn ch u tin l khng th x l nh
c nghing, chng ti s dng php bin
Bng 1: chnh xc vi cc nh test i Hough iu chnh nghing, tuy nhin n
khng cho ra kt qu tt nh mong i. ti ngh c
Chng ta c th thy Code Runner cho chnh mt cch sa hn ch ny l xc nh v v
xc cao vi nh bnh thng khng b thay i gc cc ng k ngang tng ng vi dng ch trc
nhn hay b xoay. Thay i gc nhn khng lm khi s dng php bin i Hough. Mt vn khc
chnh xc ca h thng qu nhiu tuy nhin c th l hin ti Code Runner khng th ly nh u vo
thy rng vi nh b xoay th h thng hon ton trong qu trnh chy, v vy tnh ng dng ca h
khng th nhn dng c. thng b hn ch li rt nhiu.
Bn cnh , qu trnh x l c th c ci thin
C. nh gi chnh xc hot ng tt hn. V d nh, c th s dng
Chng ti cng th nghim h thng vi mt vi cc quy tc trong lp trnh C ci thin qu trnh
tp test cc nh vi k t tch bit( tc l cc k nhn dng nh quy tc mt bin phi c khai
t c vit c lp vi nhau v khng to thnh bo trc khi s dng, ... Qu trnh nhn dng ca
t, mi tp ch cha mt k t). Vi mi k t, h thng hin ti mi ch gii hn trong vic nhn
10 mu c s dng, tng cng l 560 k t dng ch vit tay ca ring ti do vic thiu cc
c kim tra. Kt qu cho thy, chnh xc l mu training. Trong tng lai, nu mun h thng
70.58%. Hnh 7 th hin ma trn nhm ln trong c th nhn dng c ch vit tay ca mt ngi
qu trnh phn loi. bt k th cn phi s dng nhiu tp hun luyn
Chng ta c th thy rng chnh xc gim hn.
mt cch mnh m so vi cc th nghim c Cui cng l vic nhn dng v thc hin code
thc hin phn B. S thay i l do: 1) Vi nh ca cc ngn ng laajo trnh khc. Mt th thch
c cc k t tch bit , tp t c tn sut dng cao c th d dng nhn thy l vi cc ngn ng c
khng th gip tng chnh xc ca qu trnh 2) quy nh v vic tht l, du cch nh Python th
cng c Tesseract phn tch layout nh k t tch rt kh nhn dng v cng c Tesseract khng
bit khng c tt cho lm. th nhn dng c cc tht l ny mt cch chnh
xc, v vy, phi c phng php khc gii quyt
c im hn ch ny.
TI LIU THAM KHO
[1] Gonzalez, Brian M. "Iris : A Solution for Executing Handwrit-
ten Code." Iris : A Solution for Executing Handwritten Code.
University of Agder, 1 June 2012. Web. 06 June 2014
[2] Rakshit, Sandip. "ArXiv.org Cs ArXiv:1003.5898."
[1003.5898] Recognition of Handwritten Roman Numerals
Using Tesseract Open Source OCR Engine. Jadavpur
University, 30 May 2010. Web. 06 June 2014
[3] Rakshit, Sandip. "ArXiv.org Cs ArXiv:1003.5891."
[1003.5891] Recognition of Handwritten Roman Script
Using Tesseract Open Source OCR Engine. Jadavpur
University, 30 May 2010. Web. 06 June 2014.
[4] Eric Thong Codeable: Generating Compilable Source Code
from Handwritten Source Code EE 368 Course project
Hnh 7. Ma trn phn loi
[5] https://code.google.com/p/tesseract-ocr/
[6] https://code.google.com/p/tesseract-
ocr/wiki/TrainingTesseract3
VII. KT LUN V HNG PHT TRIN [7] http://pp19dd.com/tesseract-ocr-chopper/
Mt cch tng qut, Code Runner c th nhn
dng v thc thi phn ln cc code c a vo,

You might also like