You are on page 1of 4

4/24/2015

Linux Commando: OCR Scanning

LinuxCommando
ThisblogisabouttheLinuxCommandLineInterface(CLI),withanoccasionalforayintoGUIterritory.Insteadofjust
givingyouinformationlikesomemanpage,Ihopetoillustrateeachcommandinreallifescenarios.

GoogleAdWordsIndonesia
OfficialFreeSupportfromGoogle.StartNowandSaveRp450000!

Home

Resources

ContactMe/Crowdfunding/Advertising

Search

Thursday,January9,2014

OCRScanning
Thispostdescribeshowtoscanpagesfromaprintedbookandconverttheimagetotextusing
OpticalCharacterRecognition(OCR)technology.
ThetoolsthatIuseare:
1. SimpleScan
2. tesseract
Preparation
http://linuxcommando.blogspot.com/2014/01/ocr-scanning.html

1/4

4/24/2015

Linux Commando: OCR Scanning

SimpleScanisaGUIscanapplicationthatcomespreinstalledinmanyLinuxdistributions
(includingDebianWheezy).
TomanuallyinstallitonDebian:
$ sudo apt-get install simple-scan
tesseractisacommandlineOCRprogram.
Toinstall:
$ sudo apt-get install tesseract-ocr
IfEnglishisthelanguageused,thatisallyouneedtoinstall.Ifyourequireanotherlanguage,you
mustinstalladditionaltesseractlanguagepacks.ExamplesaretesseractocrrusforRussian,
tesseractocrdeuforGerman,andtesseractocrfraforFrench.
OCRProcedure

Follow ers

1. ScanthepagesusingSimpleScan.
Jointhissite
w ithGoogleFriendConnect

Members(246) More
Andrei
Pak

2. Savetheimage.

Alreadyamember?Signin

3. Runthetesseractcommand:
$ tesseract OnWritingWell.jpg out
Tesseract Open Source OCR Engine v3.02 with Leptonica
Thefirstparameteristheinputimagefilename.Thesecondparameteristhedesired
basenameoftheoutputtextfile.Thedefaulttxtextensionisaddedtothebasename,e.g.,
out.txt.
http://linuxcommando.blogspot.com/2014/01/ocr-scanning.html

Subscribeinareader

Enteryouremailaddress:
2/4

4/24/2015

Linux Commando: OCR Scanning

IfthelanguageisnotEnglish,youneedtospecifythelanguageonthecommandlineusing
a3characterlanguagecode(refertothetesseractmanpage).Thefollowingcommand
specifiestheuseof3languages:Russian,GermanandFrench.
$ tesseract OnWritingWell.jpg myout -l rus+deu+fra

Subscribe
DeliveredbyFeedBurner

Accuracy

Intheaboveexample,therewereatotalof734words.Withintheoutputtextfile,119words(16%
oftotal)requiresomeformofmanualcorrection.Thisroughlytranslatesto84%OCRaccuracy.
Thesamplesizeistoosmalltobescientific,orstatisticallyvalid.Whatistheperformancethat
youaregettingfromOCR?
PostedbyPeterLeungat5:07PM

+1 Recommend this on Google

35

Follow

submittoreddit

3comments:
JesusEmilioVillaGiraldosaid...

StumbleUpon
PopularPosts

Thanksalot.veryeasy.
February4,2014at10:35AM

Howtocountnumberoffilesinadirectory
HowtodisableSSHhostkeychecking

professordesociologiasaid...

Showprogressduringddcopy

Thanks,man!Itreallyhelpedme!
September15,2014at6:02PM

CompareDirectoriesusingDiffinLinux
HowtoDisplayRoutingTable

Anonymoussaid...
Manythanksforclearcommandlineexample
sriharikonakanchi
November23,2014at8:01AM

BlogArchive

2015(10)
2014(50)

PostaComment

Linkstothispost
CreateaLink
http://linuxcommando.blogspot.com/2014/01/ocr-scanning.html

December(2)
November(2)
October(2)
September(5)
3/4

4/24/2015

Linux Commando: OCR Scanning

August(4)
July(5)
June(4)
May(7)
April(6)
March(6)
February(3)
January(4)
HowtosplitupPDFfilespart2
Printtextfileswithmultiplespagesper
sheet
pinta:alightweightpaintappthathas
(requires)...
OCRScanning
NewerPost
Subscribeto:PostComments(Atom)

Home

OlderPost

2013(22)
2012(1)
2010(1)
2009(9)
2008(51)
2007(21)

http://linuxcommando.blogspot.com/2014/01/ocr-scanning.html

4/4

You might also like