Professional Documents
Culture Documents
Contents
What are captchas? Problem with current audio captchas. Testing of current captchas. Categories of audio Captcha. Algorithm used and its details. Need for audio reCaptcha. Applications. Pitfalls. Conclusion.
2
THE ALGORITHM
Given the .wav file of an audio CAPTCHA Segmentation - selecting portions of the audio which most likely are digits/letters Recognition Extract features from the segment Classify segment as digit/letter or noise and output the label Stop once a maximum number of segments are classified
8
10
11
12
13
THE ALGORITHM
Input: Audio CAPTCHA as an audio file Segmentation Find the highest energy peak, and extract a fixed size segment centered at that peak Recognition Extract features from segment Give segment to classifier and obtain label Stop extracting segments once all segments have been labeled or a max solution size is reached.
14
GooglereCAPTCHA Digg
15
THE GOAL
Make a secure audio CAPTCHA which will be easier for a human to pass and harder for a computer to pass. Equate solving a CAPTCHA with doing some useful work. In other words, create an audio reCAPTCHA.
16
WHAT IS reCAPTCHA?
reCAPTCHA helps digitize text on which OCR fails by using the text as its CAPTCHA. Since millions of people solve CAPTCHAs each day, millions of words get digitized each day!
17
18
19
Applications
Preventing Comment Spam in Blogs. Protecting Website Registration. Protecting Email Addresses From Scrapers. Online Polls Preventing Dictionary Attacks. Worms and Spam.
20
ANALYSIS OF SECURITY
Speaker independent recognition is difficult. Open vocabularies make it even more difficult for ASR systems AM broadcasts and .mp3 compression cause the loss of important data needed for automatic analysis
21
CONCLUSION
CAPTCHAs need to be more accessible, yet remain secure and not too difficult for humans. Deploy audio reCAPTCHA through reCAPTCHA site. Help make knowledge captured in audio available in text form
22
Thank you
23