You are on page 1of 6

International Research Journal of Computer Science (IRJCS)

Issue 1, Volume 2 (January 2015)

ISSN: 2393-9842
www.irjcs.com

Classification of Single-Food Images by Combining Local


HSV-AKAZE Features and Global Features
Yusuke Kajiwara *
Ritsumeikan University

Munehiro Nakamura

Haruhiko Kimura

Kanazawa Institute University

Kanazawa University

Abstract This paper presents a system for assisting nutrition management for solitary elderly persons. Since dealing
with diseases is one of the important issues for solitary elderly, their health control in daily life has been in focus in
recent years. As preprocessing to develop a nutrition management system for solitary elderly, systems for
discriminating the category of a food image have been proposed. However, classification of food images is still a
challenging task due to the variety of their shape and color. In order to improve the performance on the classification,
we propose three regions of interests extracted by HSV-AKAZE. The three regions are used to extract various local
features such as AKAZE, HSV-AKAZE, and color information, enhances the classification performance. Evaluation
experiments for 2000 food images in 50 categories have shown that the classification accuracy has increased by 8%
compared with the existing system.
Keywords HSV-AKAZE, Machine Learning, Solitary Elderly Persons, ROI, Single Food Image
I. INTRODUCTION
Since solitary elderly persons are inferior to young people regarding the power of chewing, swallowing, bowel
peristalsis, and a sense of taste, they are often suffered from undernutrition due to a lack of protein and energy. The
undernutrition is one of the geriatric syndromes that cause reduction of muscle strength, body fat, immune strength, and
so on and even leads to bedridden state and long-term care [1]. As a solution to this problem, systems for assisting
nutritional management have been required for solitary elderly persons.
In order to develop such systems, methods for classifying food images have been proposed. The existing methods can
be divided into two approaches. One focuses on extracting structural features of food images. For example, Yang et al. [2]
proposed the pairwise local feature distribution that considers the combination of ingredients such as meat and vegetables.
Zong et al. [3] proposed a method of learning structural features extracted by Scale-Invariant Feature Transform (SIFT)
[6] and local binary patterns (LBP) [7]. Joutou et al. [4] [5] proposed a method of combining multiple kernels that learn
features extracted by SIFT, CSIFT [8], Gabor feature [10], and Histograms of Oriented Gradients (HOG) [9]. However,
classification of food images is still a challenging task because the overall color and shape of a food image in each
category is often similar to each other.
This paper is organized as follows. First, we explain how to extract features from food images and how to learn them
in section 2. In section 3, the proposed method is compared with the one without the extraction of the three regions. Both
methods are applied to food images in 50 categories. In section 4, conclusion and future works are addressed.
II. PROPOSED METHODS
In general, the overall color and shape of food images is depended on positions of ingredients and sauces. In order to
learn the color and shape of food images effectively, we propose a method of extracting local features from three regions
localed near boundaries between ingredients and sauces. In the proposed method, the food image is firstly divided into
hue image, saturation image, and value image. Second, HSV-AKAZE [9] is used to extract the three regions from each of
the images. Finally, the machine learning is executed to learn various features extracted from each of the regions. In
HSV-AKAZE, the color format is converted from RGB to HSV (hue H, saturation S, and value V) as below

GB
) if MAX R
60(
MAX
rgb MIN rgb

BR
H 60(
) if MAX G
MAX
rgb MIN rgb

RG

60( MAX MIN ) if MAX B


rgb
rgb

MAX rgb MIN rgb


MAX rgb

V MAX rgb

(1)

(2)
(3)

_________________________________________________________________________________________________
2015, IRJCS- All Rights Reserved
Page -12

International Research Journal of Computer Science (IRJCS)


Issue 1, Volume 2 (January 2015)

ISSN: 2393-9842
www.irjcs.com

Fig. 1 Original image is divided into hue region, saturation region, and value region by HSV-SIFT. The circles in each images is the key points
detected by HSV-SIFT.

Fig. 2 Each of hue region, saturation region, and value region is represented as the rectangle area.

where MAXrgb is the maximum value in RGB, MINrgb is the minimum value in RGB. In the term (1), the value 360 is
added to H when H is negative.
In HSV-AKAZE are applied to each of the hue, saturation, and value image independently, and key points are
extracted from regions where changes of each color are significant. The extracted key points are robust against changes
of scale, luminance, and angle.
Next, three local regions called hue region, saturation region, and value region are extracted from each of the images
as below.
1. Key points are extracted hue image, saturation image, and value image using HSV-AKAZE as shown in Fig.1.
2. In Fig.2, each of hue region, saturation region, and value region is represented as the rectangle area where the left
upper is defined as (xl, yu), the left bottom is defined as (xl, yd), the right upper is defined as (xr, yu), and the right
bottom is defined as (xr, yd). xl and xr are the most left and right side of edge in the key points respectively,
calculated from the two-sided 90% confidence interval of the probabilistic density for x-axis. yl and yr are the most
upper and bottom of the edge in the key points, calculated from the two-sided 90% confidence interval of the
probabilistic density for y-axis.
3. In this paper, we define features extracted from the hue region as H-feature, features extracted from the saturation
region as S-feature, and features extracted from the value region as V-feature, features extracted from the original
image as O-feature.
III. EVALUATION ENVIRONMENT
We prepared food images in 50 categories. Table.1 shows all the categories. These food images were searched by each
food name using the Google Images, and 40 food images were obtained from the top of the list in the search of each
category. However, we have removed the food images that do not match with their food name.

_________________________________________________________________________________________________
2015, IRJCS- All Rights Reserved
Page -13

International Research Journal of Computer Science (IRJCS)


Issue 1, Volume 2 (January 2015)

ISSN: 2393-9842
www.irjcs.com

TABLE I
ALL THE 50 FOOD CATEGORIES FOR THE CLASSIFICATION.

(1)broiled eel and rice

(2)shrimps with chili sauce

(3)oden

(4)omelet

(5)savoury pancake with various ingredients

(6)Udon

(7)pork cutlet on rice

(8)curry and rice

(9)kimpira gobo

(10)gratin

(11)croissant

(12)corn soup

(13)croquette.

(14)rice

(15)zaru soba

(16)sandwiches

(17)Pacific saury

(18)stew

(19)sukiyaki

(20)spaghetti

(21)rice fried with chicken.

(22)fried rice

(23)toast

(24)hamburger on a bun

(25)hamburger

(26)pizza

(27)bibimbap

(28)hot dog

(29)potato salad

(30)Ramen

(31)Stuffed cabbage

(32)Sushi

(33)yakisoba

(34)rice topped with chicken and eggs

(35)sweetandsour pork

(36)Chawan-mushi

(37)fried chicken

(38)Tempura Udon

(39)Tianjin rice bowl

(40)Tendon

(41)niku-jyaga

(42)Natto

(43)Mabo-tofu

(44)Miso soup

(45)sunny-side up

(46)vegetable tenpura

(47)sauteed vegetables

(48)Hiyashi chuka

(49)cold tofu

(50)Gyoza

TABLE II
MEAN OF RECALL, PRECISION, AND F-MEASURES OBTAINED IN THE CLASSIFICATION..

Original

Machine learning

HSV-ROI

Recall

Precision

F-measure

Recall

Precision

F-measure

Nave Bayes

0.46

0.43

0.44

0.49

0.45

0.47

Random Forest

0.58

0.57

0.57

0.62

0.61

0.61

Linear SVM

0.38

0.34

0.36

0.52

0.51

0.51

RBF SVM

0.31

0.25

0.28

0.42

0.34

0.37

The existing systems [4] [5] use the features extracted from CSIFT, SIFT, and HOG, color information, and Gabor
feature. Hence, the proposed method uses the same features except the ones extracted by CSIFT and SIFT. This is
because the proposed method divides the three regions by HSV-AKAZE and AKAZE. In AKAZE, HSV-AKAZE, and
color information, BoF (Bag of Features) [13] creates frequency histograms.
In BoF, the number of visual words was configured as 50. After converting frequency histograms into a vector, the
spatial pyramid matching is applied to the vector for adding location information. The spatial pyramid divides a color
image into 11, 22, 33 blocks where a frequency histogram is calculated block by block. In HOG, the number of
quantization is configured with 9, the number of cells in an image is configured with 8 8, and the number of cells in a
block is configured with 33. In the extraction of Gabor feature, the number of resolution is configured with 4, the
number of direction is configured with 6.
We compared two cases: (a) Only the O-feature is used to build each of the classifiers, (b) The HSV-AKAZE features
are used to build each of the classifiers. We call the case (a) as O-feature and the case (b) as HSV-feature in this paper.
The classifiers used in each of the cases are Linear SVM, Random Forest, and Naive Bayes. Evaluation performance was
conducted by 10-fold cross validation where F-measures (harmonic mean of recall and precision) were calculated. As
classification algorithms, we implemented Naive Bayes, Random Forest, Linear kernel SVM and Radial Basis Function
kernel (RBF) SVM in the R library for statistical computing. Parameters for all the classifiers were configured as default
in the R library. The experiments were executed by a computer with Intel Core i7 870 2.93Ghz and 16G memory.
IV. EXPERIMENTAL RESULTS
First, Table.2 shows results of the classification. From the table, we can see that the average classification accuracy in
O-feature has increased by 3%, 4%, 15%, and 9%, respectively. Moreover, the average classification accuracy obtained
by HSV-feature is 8% higher than that of O-feature. Next, Fig.3 shows results of the classification of food images in 50
categories. From the figure, compared with O-feature, the average classification accuracy in HSV-feature has increased
by 8%. These result show that the proposed method works well on food images. From the figure and Table 1, we can find
that the classification accuracy for kimpira gobo, corn soup, tempra udon has increased by more than 20%. This result
shows that HSV-feature works well on the classification of food images.
_________________________________________________________________________________________________
2015, IRJCS- All Rights Reserved
Page -14

International Research Journal of Computer Science (IRJCS)


Issue 1, Volume 2 (January 2015)

ISSN: 2393-9842
www.irjcs.com

Fig. 3 Mean of F-measures for each of the categories in Random Forest.

Fig. 4 Comparison between HSV-ROI and O-feature with respect to the number of the food images incorrectly discriminated in the classification.

Fig.4 shows details of the eight categories above. In the figure, the vertical axis represents the difference value
calculated as +1 when HSV-feature could correctly discriminate a food image that was incorrectly discriminated by Ofeature and as -1 when O-feature could correctly discriminate a food image that was incorrectly discriminated by HSVfeature, and the horizontal axis represents the category incorrectly discriminated by HSV-feature or O-feature in the
calculation of the difference values. From the figure, we can find that the incorrect discriminations such as curry and rice,
mabo tofu, and sauteed vegetables in the category of kimpira gobo in the category of croissant, pacific saury and Ramen in the
category of cornsoup, and oden, udon, stew in the category of tempura udon have decreased remarkably by O-feature.
V. DISCUSSION
As described in 3.2, the classification performance was significantly improved on some food images by HSV-feature
and O-feature. By focusing on the images, this section describes the effect of the proposed method.
Here, we show pork cutlet on rice and pizza as a representative example where RGB and density values are quite
similar to each other in both of the original images. For example, in the case of pork cutlet on rice, changes of the color
information are significant on the boundary between the white plate and the yellow pork, and the green trefoil and the
yellow pork. In the color space of hue, the HSV-AKAZE algorithm extracts key points where changes of the color
_________________________________________________________________________________________________
2015, IRJCS- All Rights Reserved
Page -15

International Research Journal of Computer Science (IRJCS)


Issue 1, Volume 2 (January 2015)

ISSN: 2393-9842
www.irjcs.com

information are significant. Therefore, many key points are located near the borderline between the white plate and the
yellow pork, and the green trefoil and the yellow pork as shown in Fig.6. On the other hand, in the image of pizza,

Fig. 5 Example of food images correctly discriminated by HSV-ROI.

Fig. 6 Example of food images incorrectly discriminated by HSV-feature.

various ingredients such as shrimp and squid are put on. As a result, key points are extracted from the whole image
because color values are not stable.
The HSV-feature also contributes to increase classification performance by learning the local features extracted from
the three regions located around borderlines between ingredients and sauces. Fig.5 shows an example of food images
correctly classified by the proposed method. In this figure, while each of the three regions are located at different place,
there is an overlap of the three regions. For example, key points in each regions were located near borderlines between
ingredients in the image of chawan-mushi, sauces and the meat in the image of the curry sauce and the rice in the image
of curry and rice. The hue, saturation, and value are also unstable around the borderlines. These characteristic have been
found in the images between curry rice, and tempura udon.
Fig.6 shows some of the food images incorrectly discriminated by both of HSV-feature and O-feature. From this
figure, we can see that the images between sukiyaki and bibimbap, chawan-mushi and stew, and kimpira gobo and
yakisoba are similar to each other with respect to the color and shape of all the three regions.
_________________________________________________________________________________________________
2015, IRJCS- All Rights Reserved
Page -16

International Research Journal of Computer Science (IRJCS)


Issue 1, Volume 2 (January 2015)

ISSN: 2393-9842
www.irjcs.com

The reason is probably because each of the original images has similar color and shape, and places of the ingredients
are similar to each other. In order to classify the images correctly, we need to consider other information such as situation,
smell, and amount of food.
VI. CONCLUSION
As preprocessing to develop a nutrition management system for solitary elderly, this paper has presented a method of
discriminating single food images. Experimental results on discriminating 50 food categories have shown that the mean
classification rate has increased by 5.5% compared with the existing system. Besides, we showed that the proposed
method improves classification accuracy to images whose color and density information are similar to each other. On the
other hand, we found that it is still difficult to classify images that have similar information between the three regions.
As future works, we need to consider other information such as situation, smell, and amount of food. Moreover, we
would like to develop a system of displaying nutritional energy by measuring amount of food with a 3D camera.
REFERENCES
[1] Yukawa, H., Longitudinal study on dietary intake and health by the elderly in an urban community: The Japan
Association for the Integrated Study of Dietory Hobits. 2005, 16(2):100-103.
[2] Yang, S., Chen, M., Pomerleau, D., Sukthankar R., Food recognition using statistics of pairwise local features:
Proc.IEEE Computer Vision and Pattern Recognition, 2010, 2249-2256.
[3] Zong, Z., Nguyen, D.T., Ogunbona, P., Li, W., On the combination of local texture and global structure for food
classification: IEEE International Symposium on Multimedia, 2010, 204-211.
[4] Joutou, T., Hoashi, H., Yanai, K., 50-Class Food-Image Recognition Employing Multiple Kernel Learning: The
Transactions of the Institute of Electronics Information and Communication Engineers (IEICE). 2010, J93D(8):1397-1406.
[5] Hoashi, H., Yanai, K., Recognition of multi-food images by detecting candidate regions. The Transactions of the
Institute of Electronics Informationand Communication Engineers(IEICE), 2012, J95-D(8):1554-1564.
[6] Lowe, D., Distinctive Image Features from Scale-Invariant Keypoints: International Journal of Computer Vision.
2004, 60(2):91-110.
[7] Ojala, T., ainen, M.P., Harwood, D., A comparative study of texture measures with classification based on featured
distributions: Pattern Recognition. 1996, 29:51-59.
[8] Abdel-Hakim, A.E., Farag, A.A., Csift:A sift descriptor with color invariant characteristics: IEEE Computer
Society Conference on Computer Vision and Pattern Recognition. 2006, 2:1978-1983.
[9]
P. F. Alcantarilla and J. Nuevo and A. Bartoli, Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces., British
Machine Vision Conf. (BMVC), pp.1-11(2013)
[10] Dalal, N., Triggs, B., Histograms of oriented gradients for human detection: Proc.of IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2005, 886-893.
[11] Jones, J., Palmer, L., An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat
striate cortex: J Neurophysiol. 1987, 58(6):1233-1258.
[12] Varma, M., Ray, D., Learning the discriminative power-invariance trade- off: ICCV 2007, 2007, 1-8.
[13] Bosch, A., Zisserman, A., Munoz, X., Scene Classification Using a Hybrid Generative/Discriminative Approach:
IEEE Transactions on Pattern Analysis and Machine Intelligence. 2008, 30(4):712-727.
[14] Csurka, G., Dance, C.R., Fan, L., Bray, C., Visual Categorization with Bags of Keypoints: European Conference on
Computer Vision, 2004, 1-22.
[15] Lazebnik, S., Schmid, C., Ponce, J., Beyond bags of features:Spatial pyramid matching for recognizing natural
scene categories: Proc.IEEE Computer Vision and Pattern Recognition, 2006, 2169-2178.
[16] Arthur, D., Vassilvitskii, S., k-means++: the advantages of careful seeding: Proceedings of the eighteenth annual
ACM-SIAM symposium on discrete algorithms, 2007, 1027-1035.
[17] Freund, Y., Schapir, R.E., A Decision-Theoretic Generalization of on-Line Learning and an Application to
Boosting: Journal of Computer and System Sciences. 1997, 55(1):119-139.
[18] Breiman, L., Random Forests: Machine Learning. 2001, 45(1):5-32.
[19] Snoek, C.G.M., Worring, M., Smeulders, A.W.M., Early versus late fusion in semantic video analysis: Proceedings
of the 13th annual ACM international conference on Multimedia, 2005, 399-402.

_________________________________________________________________________________________________
2015, IRJCS- All Rights Reserved
Page -17

You might also like