You are on page 1of 4

A 3-layer Dynamic CAPTCHA Implementation

JingSong Cui, WuZhou Zhang, Yang Peng, Yu Liang, Bang Xiao, JingTing Mei, Da Zhang, Xia Wang
School of Computer Science
Wuhan University
Wuhan, Hubei, China
Email: cuijs@qq.com
AbstractIn order to avoid tremendous attack from malicious
computer programs, CAPTCHA (Completely Automated Public
Turing test to tell Computers and Human Apart) mechanism has
been introduced to distinguish humans and computers. Due to
the fast development of pattern recognition and artificial
intelligence technology, there are increasing safety loopholes
concerning traditional 2D static CAPTCHAs, resulting in that
certain malicious computer programs could launch serious
program attack through breaking such CAPTCHAs. In this
article, we proposal a practical and safe 3-layer dynamic
CAPTCHA, originally bonding the biological vision theory with
the single-frame zero-knowledge theory, ensuring it not only
extremely hard to recognize each single frame, but easy to
identify for humans as well. It also makes full use of
disadvantages of computers in recognizing numerous moving
objects from a complicated background, making it still very
difficult for computer programs to break even using several
frames. To verify its security, an analysis is further carried out.
Moreover, the 3-layer structure makes the design of CAPTCHA
more distinct, taking on high expansibility as well as plenty of
room for sustainable optimization.
Keywords-CAPTCHA; 3-layer; dynamic; single-frame zero-
knowledge theory; biological vision theory; moving objects
recognition
I. INTRODUCTION
In September 1999, www.slashdot.com conducted a survey
to figure out in which university graduates of the department
of computer science were the best [1]. Although the voting
system could prevent the same IP address from voting more
than once, students from CMU wrote a program to make the
number of votes for CMU go up rapidly. The next day,
students from MIT adopted a similar approach, resulting in
that the number of votes for either of these two universities far
exceeded the other universities. Moreover, a report from the
Barracuda Network Security Corporation in the USA said that
in 2007 nearly 95% of the mails received by the worlds
Internet users were junk mails. Similar situations are
registering user accounts maliciously, cracking account
passwords with brute force, etc. All of these bring a great
threat to the network.
In order to prevent similar incidents from happening again,
CAPTCHA mechanism comes into being, which is short for
Completely Automated Public Turing Test to Tell Computers
and Humans Apart [2, 3]. In 2000 Carnegie Mellon University
set up the first CAPTCHA group, followed by many scholars
studying CAPTCHA to find how to better tell between humans
and computers apart.
Currently, in order to prevent malicious programs from
issuing advertisements or other useless information recklessly,
message boards of BBS, blog and wiki have widely used
CAPTCHA mechanism, requiring that users must input the
correct letters to leave a message.
CAPTCHA also plays a significant role in limiting usage
rate. For example, the automatic use of a particular service is
allowed unless such use goes beyond a certain extent and
affects other users. When that happens, we can limit such
usage through the introduction of CAPTCHA mechanism.
CAPTCHA is also used in a variety of online trading
systems, such as online banks or reservation systems, to
prevent malicious programs from trying a large number of
dealings. Similarly, Email service systems such as Gmail and
Hotmail, also introduce CAPTCHA mechanism to limit the
frequency of registrations or logins to avoid troubles brought
by tremendous junk mails.
This paper is organized as follows. Section 2 presents an
overview of related work on CAPTCHAs. The origin of 3-
layer dynamic CAPTCHA design is discussed in Section 3.
Then the specific design and implementation of 3-layer
dynamic CAPTCHA is given in Section 4, followed by the
security analysis of this new design in Section 5. Finally, a
conclusion is given in Section 6.
II. RELATED WORK
Currently, there are mainly three kinds of methods to
implement the CAPTCHA mechanism: OCR (Optical
character recognition) visual method, non-OCR visual method
and non-visual method.
The 2D static CAPTCHA based on OCR visual method
takes advantage of superiority in language barrier, security and
easy use, becoming the most widely used CAPTCHA [4], as
shown in Fig. 1 (a). Commonly seen CAPTCHAs are: Gimpy
series CAPTCHA designed by Carnegie Mellon University in
2000, Pessimal Print CAPTCHA designed by Henry Baird
from PARC(Palo Alto Research Center) in 2000, and Baffle
Text CAPTCHA designed by Baird in cooperation with
Monica Chew from California Berkeley in 2003 [5]. However,
with the fast development of OCR technology based on neural
network, as well as the emergence of a variety of character
segmentation technology, CAPTCHAs of lots of websites have
Acknowledgement:
This work is supported by NSFC 60603012, NSFC 60703009 as well as
National 863 plan projects 2008AA01Z404, NSFC 90718006.
2010 Second International Workshop on Education Technology and Computer Science
978-0-7695-3987-4/10 $26.00 2010 IEEE
DOI 10.1109/ETCS.2010.575
23
Authorized licensed use limited to: Sudhakar Palanichamy. Downloaded on May 19,2010 at 06:12:40 UTC from IEEE Xplore. Restrictions apply.
been attacked. A Russian programmer has ever cracked the
CAPTCHA mechanism of Yahoo with 35% success rate. Also,
the CAPTCHA mechanism of Microsoft live mail has been
bothered by junk mails many times. Given facts like these,
newly designed CAPTCHAs have become increasingly
complex, so that some of those are extremely difficult to
identify, as shown in Fig. 1 (b).
Though there are many different kinds of specific
implementations for non-OCR visual method, it eventually
comes down to the OCR problem in general, requiring users to
identify images. It is not so widely used. Up to now, except
some research sites, commercial sites rarely use it. Specific
implementation algorithms are: CAPTCHA algorithm based
on real object image identification and designed by R. Datta,
etc [6], CAPTCHA algorithm based on image similarity
judgment and designed by J. Elson, etc [7], and so forth. Non-
OCR visual method is designed for special occasions and
certain user groups, thus it has very limited applications.
Examples are: voice-based CAPTCHA algorithm intended for
visually disabled people and designed by G. Kochanski, etc [8],
CAPTCHA algorithm based on collaborative filtering and
designed by M. Chew, etc [9], and so forth.
In conclusion, the OCR-based 2D static visual method is
the main way to implement current CAPTCHA mechanism.
However, it could no longer strike a balance between security
and easy use, calling for a new kind of CAPTCHA to address
this increasingly prominent problem.

(a) (b)
Figure 1. 2D static CAPTCHA based on OCR visual method
III. ORIGIN OF THE DESIGN
Given the deficiencies of traditional 2D static CAPTCHA,
we come up with the idea of 3-layer Dynamic CAPTCHA.
First we will illustrate the origin of the whole design in detail.
A. Origin of the Idea
In order to make it easy for humans to identify, images of
traditional static CAPTCHA must contain sufficient valid
information. However, the easier it is for humans to identify,
the less security it bears in general. So we can not help
thinking that if we distribute the valid information among
multiple frames according to certain rules to make every single
frame difficult to identify, and that if we can also make sure
that it is still very difficult for computer programs to crack
even using multiple frames, as well as easy for humans to
identify, then the new design can pursue a better balance
between security and practicality.
B. Security of Single Frame
In order to ensure that every single frame is difficult to
identify, we put forward a dynamic CAPTCHA design theory -
-- single-frame zero-knowledge theory. Suppose 1
x T e
is an
object in animated video, and T indicates a set of objects.
Then for any single frame
i
p
, there are reasons to believe that
the object in i
p
could possibly be i, j
x
, where i,j i
x R T e _
and 1 i
x R e
.
Using appropriate algorithm to guarantee that i
min{| R |}>e
and
e
is big enough, we can theoretically prevent existing 2D image
text identification algorithms from attacking any single frame
of animation videos. In practical, we could simply add enough
interference in each frame to implement the single-frame zero-
knowledge theory.
C. Security of Multiple Frames
In order to ensure that it is also very difficult to break for
computer programs using multiple frames, we adopt the form
of movement, thus reducing the crack of multiple frames to the
problem solving of identifying and tracking moving objects
from images. In current fields of artificial intelligence and
image processing, the problem of identifying moving object is
far more difficult than the OCR problem, particularly
identifying and tracking multiple moving objects in a complex
context [10], which is commonly recognized as an academic
difficult problem. This is the origin of "dynamic" in this design.
D. Easy Identification for Humans
Dynamic CAPTCHA can make it not only extremely hard
to crack for computer programs using multiple frames, but also
easy for humans to identify. According to anatomical,
physiological and functional characteristics of the visual
system, there are two visual pathways in the brain, the ventral
pathway, which function is to identify objects, and the dorsal
pathway, which function is to identify spatial location and
movement of objects, as shown in Fig. 2. Both the
identifiability and contrast ratio of images will affect moving
objects. In the right hemisphere, 3D movement shows stronger
brain activity than 2D movement [11]. The biological vision
theory says that the perception ability of moving objects far
exceeds that of static objects for biological vision. For example,
we can easily recognize a running cheetah in a jungle while
could hardly notice a stationary cheetah in the jungle. The
reason is that the human visual system can easily reconstruct
the overall shape merely from vague displacements of parts of
the moving object.

Figure 2. Two visual pathways in the brain


24
Authorized licensed use limited to: Sudhakar Palanichamy. Downloaded on May 19,2010 at 06:12:40 UTC from IEEE Xplore. Restrictions apply.
IV. SPECIFIC DESIGN AND IMPLEMENTATION
In order to make the design of dynamic CAPTCHA clearer
to bear strong scalability and large space for optimization, we
have further introduced the "layered" concept. Three layers of
this design are: character layer, background interference layer
and foreground interference layer, as shown in Fig. 3.
Figure 3. Diagram of 3-layer model
The implementation of 3-layer dynamic CAPTCHA is very
simple, as described in detail below:
Note: The following steps (1) - (4) involve the character
layer, while step (5) - (6) involve the background interference
layer and the foreground interference layer, respectively.
(1) Determination of the number of characters. CAPTCHA
often consists of 4-7 characters, and we choose the minimum
length 4.
(2) Random selection of characters. Our program randomly
chooses 4 characters from a total of 62 characters consisting of
26 lowercase letters, 26 uppercase letters and 10 Arabic
numerals.
(3) Determination of character attributes. Optional
character attributes are size, font, color, tilt, twist, spin, etc. In
the same CAPTCHA, a variety of fonts or different sizes can
easily increase the difficulty of attack. Colors of diversity, tilt,
twist and spin could also greatly improve security. In this case
we set font type as Courier New, and size 40 pixels. In order to
reduce the ultimate gif size for fast network transmission, we
use only 2 colors: black and white, and set identifying
characters white.
(4) Determination of character positions. As it is known,
various smooth curves in mathematics, such as the sine curve,
cosine curve, tangent curve, cotangent curve and logarithmic
curve, etc, can not only produce good visual effects, but also
bring great challenges for computer programs to break,
considering the non-linear characteristic of these curves. We
use 3D coordinate ( , , )
i i i
x y z to indicate the position of the
ith character. In this case, we adopt the cosine curve, and the
formula is as shown in (1):
i xi xi xi xi
i yi yi yi yi
i 0
x a cos(w t b ) c
y a cos(w t b ) c i
z z
= + +

= + + ( =1, 2, 3, 4)

(1)
To further enhance security, we make the characters
overlap partially. After countless repeated tests, we finally
determine the parameters, as shown in Tab. 1.
TABLE I. VALUES OF ALL PARAMETERS
i
xi
a
xi
w
xi
b
xi
c
yi
a

yi
w

yi
b
yi
c
0
z
1 -0.14 0.18 0.3 -0.85 -0.08 0.15 0.3 -0.2 -1.0
2 -0.07 0.17 0.5 -0.45 -0.08 0.16 0.5 -0.2 -1.0
3 -0.07 0.16 0.7 0.01 -0.08 0.17 0.7 -0.2 -1.0
4 -0.07 0.15 0.9 0.45 -0.08 0.18 0.9 -0.2 -1.0
The reason that
0
z
equals -1.0 is to move the characters a
unit into the screen to produce better visual effect, which is
also one of the reasons to introduce the "3D" concept.
We can also set different parameters, so that trajectories of
characters in each CAPTCHA image are different, further
increasing the difficulty of attack.
(5) Implementation of the background interference layer.
The background interference of this design can include not
only background color transformation and messy pixels or
characters, etc, traditional interference sources used in 2D
static images, but also light, smoke and texture rendering, etc,
new interference sources used in 3D dynamic videos. In this
case, we combine the interference point and the interference
character, randomly selecting some regions and generating a
lot of interference points as well as an interference character.
(6) Implementation of the foreground interference layer.
Different with the background interference layer, the
foreground interference is to make the identifying characters in
the character layer incomplete, further increasing difficulty of
attack whether using single frame or multiple frames.
Foreground interference involves character interference, line
interference and point interference. In this case we combine all
three together.
(7) Save this new kind of CAPTCHA. In this case, we
employ the form of gif, considering the smaller size of gif.
As dynamic gif images can not be displayed here, to see
the actual dynamic effect, please refer to [12].
Single-frame results of CAPTCHA L9ML are shown in
Tab. 2.
25
Authorized licensed use limited to: Sudhakar Palanichamy. Downloaded on May 19,2010 at 06:12:40 UTC from IEEE Xplore. Restrictions apply.
TABLE II. SINGLE-FRAME RESULTS OF CAPTCHA"L9ML"
Image
of
CAPTCHA
Background
interference
character
Foreground
interference
character
Letter a Letter i
Letter c Letter k
Letter h Letter s
V. SECURITY ANALYSIS
Since possible threats mainly come from superposition and
composition of multiple frames, here we employ variance, the
commonly used numerical analysis method in probability
theory, to analyze as follows:
Suppose
P
indicates a dynamic CAPTCHA, of which
( ) P k indicates the kth frame and
,
( )
i j
P k indicates the gray
value of each pixel of the frame. And we define addition or
subtraction of frames as that of corresponding pixel gray
values. The variance formula is as shown in (2):
2
1
( ) [ ( ) ( )]
k n
Var P P k P k
s s
=
(2)
Applying it to multiple frames of CAPTCHA "MiVj", we
get the result shown in Fig. 4. There are not any obvious
loopholes.

Figure 4. Variance analysis of multiple frames of CAPTCHA "MiVj"


VI. CONCLUSION
In this article, we proposal a novel, practical and safe 3-
layer dynamic CAPTCHA, originally bonding the biological
vision theory with the single-frame zero-knowledge theory,
ensuring it not only extremely hard to recognize every single
frame, but easy to identify for humans as well. It also makes
full use of disadvantages of computers in recognizing
numerous moving objects from a complicated background,
making it still very difficult for computer programs to break
even using several frames. Moreover, the 3-layer structure
makes the design of CAPTCHA more distinct, taking on high
expansibility as well as plenty of room for sustainable
optimization. The security analysis shows that this new design
can prevent attacks efficiently from existing algorithms as well
as possible ones using multiple frames. Furthermore,
transformation from 2D to 3D optimizes the visual effects,
providing a new idea for the design of CAPTCHA. In short,
this article will be a good guide for the design of next-
generation CAPTCHA, and that how to design a more
practical and safer 3-layer dynamic CAPTCHA will be the
focus of our further research.
REFERENCES
[1] JIN Hai-kun, DU Wen-jie SHA Li-min. Research on security model
with Chinese CAPTCHA [J]. Computer Engineering and Design, 2006,
27(6): 985-987 (in Chinese).
[2] Luis von Ahn, Manuel Blum, Nicholas J, Hopper and John Langford,
The CAPTCHA Web Page: http://www.captcha.net, 2000.
[3] Luis von Ahn, Manuel Blum and John Langford, Telling Humans and
Computers Apart Automatically: How Lazy Cryptographers do AI, In
Communications of the ACM, 2004.
[4] L. von Ahn, M. Blum, N. Hopper, and J. Langford. CAPTCHA: Using
hard AI problems for security. In Proceedings of Eurocrypt, 2003, 2003.
[5] HU Jin-rong, WANG Ling. Technique of randomized question reading
CAPTCHA based on character feature [J]. Computer Engineering and
Design, 2008, 29(7): 1619-1621 (in Chinese).
[6] R. Datta, J. Li, and J. Z. Wang. IMAGINATION: a robust image-based
CAPTCHA generation system. Proc. of 13th ACM Int. Conf. on
Multimedia (MULTIMEDIA 05), pp. 331334, November 2005.
[7] J. Elson, J. R. Douceur, J. Howell, and J. Saul. ASIRRA: a CAPTCHA
that exploits interest-aligned manual image categorization. Proc. of 14th
ACM Conf. on Computer and Communications Security (CCS 2007),
pp. 366374, October November 2007.
[8] G. Kochanski, D. Lopresti, and C. Shih. A Reverse Turing Test Using
Speech. Proc. of 7th Int. Conf. on Spoken Language Processing, pp.
13571360, September 2002.
[9] M. Chew and J. Tygar. Collaborative filtering CAPTCHAs. Proc. of
2nd Int. Workshop on Human Interactive Proofs (HIP 2005), vol. 3517
of Lecture Notes in Computer Science, pp. 6681, May 2005.
[10] Lin Hongwen, Tu Dan, and Li Guohui. Moving Objects Detection
Method Based on Statistical Background Model. Computer Engineering,
Vol.29, No.16, p97-99, September 2003 (in Chinese).
[11] Luo Yanlin, Luo Yuejia. Research Status Of Brain Mechanism Of
Visual Motion Perception [J]. Advances in Psychological Science, 2003,
11(2): 132-135 (in Chinese).
[12] http://img.bimg.126.net/photo/i0qg9hqHVxtd_gp86Szrdg==/256902211
2438987652.jpg. September 2009.
26
Authorized licensed use limited to: Sudhakar Palanichamy. Downloaded on May 19,2010 at 06:12:40 UTC from IEEE Xplore. Restrictions apply.

You might also like