Professional Documents
Culture Documents
ABSTRACT 1. INTRODUCTION
We address a particular case of video genre classification, Significant efforts are made to develop innovative automatic
namely the classification of animated movies. This task is content-based indexing techniques to cope with challenges
achieved using two categories of content descriptors, tempo- caused by accessing large collections of video footage. Of
ral and color based, which are adapted to this particular con- particular interest is the automatic cataloging of video footage
tent. Temporal descriptors, like rhythm or action, are quan- into some predefined semantic categories. This can be per-
tifying the perception of the action content at different lev- formed globally, by classifying videos into one of several main
els. Color descriptors are determined using color perception genres, e.g. cartoons, music, news, sports. Also, sub-genres
which is quantified in terms of statistics of color distribu- can be involved, e.g. identifying specific types of sports
tion, elementary hues, color properties (e.g. amount of light (football, hockey, etc.), movies (drama, thriller, etc.), and so
colors, cold colors, etc.) and color relationship. The poten- on. Another solution aims at classifying movie content lo-
tial of the proposed descriptors to the classification task has cally, thus considering video segments and specific concepts,
been proved through experimental tests conducted on more e.g. outdoor vs. indoor, action, violence, etc. [1].
than 749 hours of video footage. Despite the high diver-
sity of the video material, the proposed descriptors achieve In this paper we address the global classification of a par-
an average precision and recall ratios up to 90% and 92%, ticular genre, namely the animated movies. The animated
respectively, and a global correct detection ratio up to 92%. movie industry witnessed nowadays a spectacular develop-
ment and gain in popularity: abundance of entertainment
Categories and Subject Descriptors cartoon movies, festivals and expo, e.g. France - Annecy
International Animated Film Festival, Canada - Ottawa In-
I.2.10 [Artificial Intelligence]: Vision and Scene Under-
ternational Animation Festival, Portugal - CINANIMA In-
standing—color, action descriptors; I.5.3 [Pattern Recog-
ternational Animation Film Festival, etc. Animated movies
nition]: Clustering—video genre, animated movies.
now target equally children and adults, becoming a distinc-
tive industry similar to the artistic movies.
General Terms
Algorithms, Performances In the context of the automatic content-based retrieval, a
common task related to this field is the automatic selection
Keywords of the ”animated” content from other genres. Regardless the
animated genre classification, action content, color proper- approach, the main challenge is to derive attributes which
ties, video indexing. are discriminant enough to distinguish between genres while
maintaining a reduced dimensionality of the feature space.
To this purpose, several approaches have been proposed in
the literature.
Using the same reasoning and keywords specific to each color Color relationship. The final two parameters are related
property, we define: to the concept of perceptual relation of color in terms of
adjacency and complementarity. Padj reflects the amount of
similar perceptual colors in the movie (neighborhood pairs of
• dark color ratio, denoted Pdark , where Wdark ∈ {”dark”, colors on a perceptual color wheel, e.g. Itten’s color wheel),
”obscure”, ”black”}; thus:
• hard color ratio, denoted Phard , which reflects the a- Card{ce |Adj(ce , c′e ) = T rue}
Padj = (9)
mount of saturated colors. Whard ∈ {”hard”, ”faded”}∪ 2 · Nce
Γe , where Γe is the elementary color set (see equation where ce 6= c′e are the indexes of two significant elemen-
6, elementary colors are 100% saturated colors); tary colors from the movie, Adj() is the adjacency operator
returning the true value if the two colors are analogous on
• weak color ratio, denoted Pweak which is opposite to Itten’s color wheel, and Nce is the movie’s total number of el-
Phard , Wweak ∈ {”weak”, ”dull”}; ementary colors. Using the same reasoning, we define Pcompl
which reflects the amount of opposite perceptual color pairs
• warm color ratio, denoted Pwarm , which reflects the (antipodal).
amount of warm colors; in art, some hues are com-
monly perceived to exhibit some levels of warmth, na-
mely: ”Yellow”, ”Orange”, ”Red”, ”Yellow-Orange”, ”Red- 5. EXPERIMENTAL RESULTS
Orange”, ”Red-Violet”, ”Magenta”, ”Pink” and ”Spring”; In order to obtain the most pertinent results, validation tests
were conducted on a very large video database, i.e. 749 clips,
• cold color ratio, denoted Pcold , where ”Green”, ”Blue”, with a high diversity of genres and sub-genres (more than
”Violet”, ”Yellow-Green”, ”Blue-Green”, ”Blue-Violet”, 159 hours of video footage retrieved mainly from several TV
”Teal”, ”Cyan” and ”Azure” are reflecting coldness. chains).
Animated vs. all (KNN) Animated vs. all (LDA) Animated vs. all (SVM)
0.76
0.9
0.75
0.74
0.85
0.7
0.72
0.8
0.65 0.7
precision
precision
precision
0.75
0.6 0.68
0.7
0.55 0.66
0.65
0.64
0.5
0.6 0.62
0.45
0.55 0.6
0.5 0.6 0.7 0.8 0.65 0.7 0.75 0.8 0.85 0.9 0.4 0.5 0.6 0.7
recall recall recall
KNN on action LDA on action SVM on action
KNN on all LDA on all SVM on all
KNN on hGW LDA on hGW SVM on hGW
KNN on hE LDA on hE SVM on hE
KNN on properties LDA on properties SVM on properties
Figure 3: Precision vs. recall curves for different runs (action descriptors, hGW , hE , color properties and all
parameters together) and amounts of training data (% of training is increasing along the curves).
5.1 Descriptor examples majority rule), Support Vector Machines (SVM, with a lin-
To preliminary analyze the discriminant power of the pro- ear kernel) and Linear Discriminant Analysis (LDA, applied
posed descriptors, Figure 2 depicts average color (see Section on a PCA-reduced feature space) [17]. The method parame-
4) and action (see Section 3) feature vectors for each genres. ters were set to optimal values for this scenario after several
When compared to the other genres, the animated movies preliminary tests.
show a relatively different signature, e.g. have a different
color pattern (more variations of basic hues being used, see As the choice of the training set may distort the accuracy
the peaks in hGW ), most of the common hues are used in of the results, we have adopted an exhaustive testing. Tests
important amounts (see hE ), they tend to have a reduced were performed for different amounts of training data (see
global visual rhythm (see v̄T ); while commercials and mu- the beginning of Table 3). For each set, tests are repeated
sic clips have a high visual rhythm and action content (see using a cross validation approach, thus generating all pos-
v̄T and HA), sports have a predominant hue (see the pre- sible combinations between training and test data, in order
dominant peak in hE ), and so on. Discriminant power of to shuffle all sequences.
the features is evidenced however in the classification task
below. To assess performance, we adopt several strategies. First, we
evaluate average precision (P ) and recall (R) ratios, thus:
5.2 Classification approach TP TP
Animated genre classification is carried out with a binary P = , R= (10)
TP + FP TP + FN
classification approach, i.e. considering two classes: ani-
mated and non animated. Each movie is represented with where T P , F P and F N represent the average number of
a feature vector, according to the previously presented con- good detections (true positives), false detections (false posi-
tent descriptors (several combinations are tested). For the tives) and non detections (false negatives), respectively, over
classification we use three approaches, thus: the k-Nearest all experimentations for a certain amount of training data
Neighbors algorithm (KNN, with k=5, cosine distance and (all combinations between test and training sequences).
CD Fscore
85
KNN on action
90 KNN on all
80 KNN on hGW
KNN on hE
KNN on properties
85 75 LDA on action
LDA on all
LDA on hGW
70 LDA on hE
80 LDA on properties
SVM on action
65
SVM on all
SVM on hGW
75 60 SVM on hE
SVM on properties
55
70
50
65 45
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
amount of training data % amount of training data %
Figure 4: Average correct detection CD and Fscore ratios for different amounts of training data.
However, the best method in terms of both precision and Nevertheless, all false detections are dropping with the in-
recall proves to be KNN run on all action-color descriptors crease of the training set, being very reduced for an amount
together. The resulted average precision, recall, T P , F P of training above 50%.
and F N are presented in detail in Table 3 (for visualization
purpose, actual real data values are to be rounded to nearest
integer value). The results are very promising considering 5.4 Global evaluation
the diversity of video material (including a high variety of Figure 4 depicts the obtained average correct detections
animated genres, see the beginning of Section 5) and also (CD) and the Fscore ratios for different amounts of train-
the size of test dataset. ing and runs. Based on this information, the most powerful
approach proves to be, again, the combination of all de-
For only 10% of training, average precision and recall are scriptors and KNN classification which is followed by LDA
around 70% when testing on 674 sequences from which 188 classification.
are animated, while for 50% training precision approaches
90% and recall 80%. Also, one may observe the reduced We obtained average CD and Fscore ratios up to 91.63% and
number of false detections while maintaining a good detec- 83.82%, respectively. For only 50% of training data, correct
tion ratio. For instance, using 70% training we obtain in detection ratio is above 90%, thus from 374 sequences more
average 48 good detections, only 6 false detections and 14 than 336 were labeled correctly in one of the two categories,
animated or non animated. The results are significant even classification”, IEEE International Conference on
for the lowest amount of training. For only 10% training Multimedia and Expo, 1, pp. 449-452, 2003.
data, i.e. 75 sequences (for all genres, see Table 3), from [4] M. Roach, J. S. Mason, and M. Pawlewski,
764 test sequences 626 were correctly labeled into the two ”Motion-based classication of cartoons”, International
categories. Symposium on Intelligent Multimedia, pp. 146-149,
2001.
6. CONCLUSIONS AND FUTURE WORK [5] R. Glasberg, A. Samour, K. Elazouzi and T. Sikora,
We addressed a particular case of video genre classification, ”Cartoon-recognition using video and
i.e. the classification of the animated genre. We proposed audio-descriptors”, 13th European signal processing
two categories of content descriptors which are adapted to conference, Antalya, Turkey, 2005.
animated contents, namely: temporal descriptors, e.g. rhy- [6] X. Gao, J. Li and N. Zhang, ”A Cartoon Video
thm, action, for which user experiments have been con- Detection Method Based on Active Relevance
ducted on animated movies to quantify the perception of Feedback and SVM”, Springer Lecture Notes in
the action content at different levels and color descriptors Computer Science, 3972, pp. 436-441, 2006.
for which color perception is quantified in terms of statis- [7] B. Ionescu, D. Coquin, P. Lambert and V. Buzuloiu:
tics of color distribution, elementary hues, color properties ”A Fuzzy Color-Based Approach for Understanding
(e.g. amount of light colors, cold colors, etc.) and color Animated Movies Content in the Indexing Task”,
relationship. Eurasip Journal on Image and Video Processing,
doi:10.1155/2008/849625, 2008.
These descriptors were used with several binary classifica- [8] B. Ionescu, L. Ott, P. Lambert, D. Coquin, A.
tion techniques to classify video footage into animated and Pacureanu and V. Buzuloiu, ”Tackling Action - Based
non animated content. To provide a pertinent evaluation Video Abstraction of Animated Movies for Video
tests were performed on an extensive data set, namely 749 Browsing”, SPIE - Journal of Electronic Imaging,
sequences containing various genres of animated movies, but 19(3), 2010.
also other video genres: commercials, documentaries, movies, [9] D. Brezeale, D.J. Cook, ”Automatic Video
news, sport and music. We achieve very promising results Classification: A Survey of the Literature”, IEEE
when using all descriptors together (considering the size of Transactions on Systems, Man, and Cybernetics, Part
the test database and the diversity of video material) namely C: Applications and Reviews, 38(3), pp. 416-430, 2008.
an average precision and recall ratios up to 90% and 92%, [10] M. Montagnuolo, A. Messina, ”Parallel Neural
respectively, and a global correct detection ratio up to 92%. Networks for Multimodal Video Genre Classification”,
Multimedia Tools and Applications, 41(1), pp.
However, these descriptors, alone, prove to be efficient for 125-159, 2009.
this particular classification task, being not discriminative [11] CITIA - Animaquid Animated Movie Indexing
enough to retrieve all other genres (through tests proved System, http://www.annecy.org/home/index.php?
that genre classification requires a multimodal approach, e.g. Page ID=44.
using audio-visual features). Future work on this matter [12] B. Ionescu, V. Buzuloiu, P. Lambert, D. Coquin,
should push forward descriptors to a higher semantic level, ”Improved Cut Detection for the Segmentation of
like exploiting human concept detection. Animation Movies”, IEEE International Conference on
Acoustic, Speech and Signal Processing, Toulouse,
7. ACKNOWLEDGMENTS France, 2006.
This work has been co-funded by the Sectoral Operational [13] H.W. Chen, J.-H. Kuo, W.-T. Chu and J.-L. Wu,
Programme Human Resources Development 2007-2013 of ”Action movies segmentation and summarization
the Romanian Ministry of Labour, Family and Social Pro- based on tempo analysis”, ACM International
tection through the Financial Agreement POSDRU/89/1.5/ Workshop on Multimedia Information Retrieval, pp.
S/62557. The authors would like to thank CITIA - The 251- 258, New York, 2004.
City of Moving Images and Folimage Animation Company [14] W.A.C. Fernando, C.N. Canagarajah, D.R. Bull,
for providing them with access to their animated movie ”Fade and Dissolve Detection in Uncompressed and
database. Compressed Video Sequence”, IEEE International
Conference on Image Processing, Kobe, Japan, pp.
8. REFERENCES 299-303, 1999.
[1] A. F. Smeaton, P. Over, W. Kraaij, ”High-Level [15] C.-W. Su, H.-Y.M. Liao, H.-R. Tyan, K.-C. Fan, L.-H.
Feature Detection from Video in TRECVid: a 5-Year Chen, ”A Motion-Tolerant Dissolve Detection
Retrospective of Achievements, Multimedia Content Algorithm”, IEEE Transactions on Multimedia, 7(6),
Analysis”, Theory and Applications, Springer pp. 1106-1113, 2005.
Verlag-Berlin, pp. 151-174, ISBN 978-0-387-76567-9, [16] R. W. Floyd and L. Steinberg, ”An adaptive algorithm
2009. for spatial gray scale”, Proceedings of Society for
[2] V. Athitsos, M.J. Swain and C. Frankel, Information Display International Symposium, p.
”Distinguishing photographs and graphics on the 3637, Washington, DC, USA, April 1975.
world wide web”, IEEE Workshop on Content-Based [17] I. H. Witten, E. Frank, ”Data Mining: Practical
Access of Image and Video Libraries, pp. 10-17, 1997. Machine Learning Tools and Techniques”, Second
[3] T.I. Ianeva, A.P. Vries and H. Rohrig, ”Detecting Edition, Eds. Morgan Kaufmann, ISBN 0-12-088407-0,
cartoons: a case study in automatic video-genre 2005.