You are on page 1of 7

Classification of Types of Apples with K-Nearest

Neighbor Method

Devira Monda Astri Harahap Agus Pratondo


S-1 Applied Multimedia Engineering Technology S-1 Applied Multimedia Engineering Technology
Faculty of Applied Sciences Faculty of Applied Sciences
Telkom University Telkom University
Bandung, West Java, Indonesia Bandung, West Java , Indonesia
deviramondastri@gmail.com pratondo@gmail.com

Abstract —Classification of apples by type is an easy thing supervised where in results the query instances are classified
to do for humans, but it is not easy for computers to do. based on the majority of the categories on K-NN [9].
Human perception usually tends to be subjective towards an
object, this is due to the color composition factors possessed by
the object. Therefore, a tool for classifying apples is needed
based on the types that are done automatically. This study aims II. RESEARCH METHODS
to classify apple types using the algorithm K-Nearest Neighbor
(K-NN), leave-one-out and histogram feature extraction. The A. K-Nearest Neighbor (K-NN) Classification Algorithm
data used in this study are two, namely training data and data The classification process is one of the important stages
testing , amounting to 200 images. The results of this study are before carrying out the product marketing process [7]. The
systems that can classify types of apples with an accuracy rate classification process aims to identify the characteristics,
of 87.5% and the total computation time is 150.64 seconds. patterns, or structures in the image and then group them into
Keywords: K-Nearest Neighbor, Feature Extraction, and Apples
more specific types [10]. One of the commonly used
classification methods is the Algorithm K-Nearest Neighbor
I. INTRODUCTION (K-NN). Algorithm K-Nearest Neighbor (K-NN) is a
classification method by grouping data based on the distance
Apples are annual fruit plants originating from the West of the data with some of the closest data (nearest neighbor)
Asian region with sub-tropical climates [1]. In Indonesia [11]. K-NN gives a value with a range of 0 to 1 that is easily
apples have been planted since 1934 until now [2]. Apples applied using the rule fuzzy K-NN [12]. Method This is
also contain vitamins A, B1, and C [3]. Not only can the done by first storing all training data in the database.
fruit be consumed but the skin from the fruit can also be Furthermore, the data to be tested is compared with the
consumed because it contains flavonoids called quercetin training data that is there to determine the number K (K is
[4]. Quercetin has antioxidant activity which functions to an integer). Conclusion of classification is determined by
prevent the attack of free radicals so that it can protect the any point that is closest to the data being tested.
body from possible cancer attacks [5]. This makes apples
one of the fruits favored by the general public in Indonesia In the phase training, the K-NN algorithm only stores
[6]. feature vectors and data classification training. In the
Apples have many types, each of which has a classification phase, the same features are calculated for
distinctiveness both from physical and chemical properties, data testing. The distance from the new vector to the entire
there are four types of apples that have different skin tastes, vector of data is training calculated, and the number of the
shapes and textures. The four apples are Royal Gala, Fuji closest K pieces is taken. The new point whose
RRT, Malang Cherry, and Envy. Large production and the classification is predicted belongs to the most classification
emergence of the need to classify products with certain of these points. The value of K used depends on the data. In
parameters make the product classification process become general, a high K value will reduce the effect of noise on
one of the important stages before the marketing phase [7]. classification [13].
In general, the product classification process is done The steps in the K-NN algorithm are as follows:
manually by human power, but this is limited to the ability
of the human senses which tend to be subjective and fooled a. Determine the K parameter (using integers).
when used to measure textures, shapes, colors, and other
b. Calculate the square of the distance of the Euclidean
parameters [8]. Due to the limited ability of the human
object against the data training provided.
senses and inconsistency in assessing the physical properties
of a product, it will result in data inaccuracies. c. Sorting the square of distance the Euclidean object
Seeing the problems above, the classification process to the data training in ascending (in order from high
is one of the right solutions to be implemented. So this study to low value).
aims to identify the types of apples automatically using the
d. Collect Y category (classification nearest neighbor
algorithm K-Nearest Neighbor (K-NN). So that it will
based on K value).
produce classification of apples Royal Gala, Fuji RRT,
Malang Cherry, and Envy a more accurate. Algorithm K-
Nearest Neighbor (K-NN) algorithm that uses algorithm

1
e. By using the category nearest neighbor most standard formula for changing color images (RGB) to
majority, it can be predicted the class or type of the imagery Grayscale :
object [14].
Z = 0.144 * R + 0.587 * G + 0.299 * B (8)

B. Image Acquisition Cropping is the process of cutting parts of an image to


This section explains how and where data is taken. The simplify the size and focus more on the object to be
primary apple data collection is obtained randomly from the processed . In addition to simplifying the size, cropping also
supermarket. For secondary data obtained from journals aims to find out the part of the image that will be processed
/papers concerning research on apples. Here is an example in the next stage. The image is partially cut to meet the pixel
of an apple image to be classified: size to 350 x 400 pixels, because the Nikon D5300 DSLR
camera module has a size of 6000 x 4000 pixels.

Types of Before Cropping After Cropping


Apples
Royal Gala

Royal Gala Fuji RRT Malang Envy apple Fuji RRT


apple apple Cherry apple

Preparation for taking data is as follows:


a. Taken using a Nikon D5300 DSLR camera using an Malang
18-55mm lens. Cherry
b. The height of shooting is 30 cm high.
c. Settings Camera include focal length 18mm, ISO-
400, and diaphragms f /11.
d. Data is taken outdoors at 08.30 - 11.00 WIB. Envy
e. Apples are placed on white HVS paper.
f. The data includes 200 photos of apples with details,
50 photos of Royal Gala apples, 50 photos of Fuji
RRT apples, 50 photos of Malang Cherry apples,
and 50 photos of Envy apples.
g. Each photo of an apple has provisions of 6000 x
4000 pixels, resolution of 300 dpi, 24 bit depth, and
RGB. C. Feature Extraction
Feature Extraction is one way that can be used to
Equipment carried out to work on this research include: recognize an object based on the histogram specific that the
a. DSLR cameras are used as a tool for taking apples. object has. Feature Extraction aims to perform calculations
b. Adobe Photoshop CS6 (64-bit) is used as an and comparisons that can be used to classify an image based
application to edit the photo of an apple. on the histogram characteristics that are owned [16]. Feature
c. Windows 10 (64-bit) is used as an operating Extraction aims to get values for important, not duplicate in
system. biometric images in the introduction of individuals.
The values taken from the apple image are RGB values
d. Matlab R2016a is used as an application to process
consisting of Red, Green, and Blue. This image taking is
datasets.
done because it is estimated that it will have different values
e. Microsoft Word is used as a writing aid in research. for each class of apple quality. Retrieving these values is
carried out on all pixels of the image that are included in the
The image processing process begins with the image segmentation, so that only values from fruit images can be
retrieval process, image quality improvement, and image obtained without an image background. Each pixel is no
representation [15]. In the process of improving image more than image height and no more than the image width.
quality, color images (RGB) are converted into images In addition, the location of pixels must enter into a
Grayscale. This is done to simplify the image. In the image segmented area.
Gray-scale there is no color, what is there is the degree of
gray. To convert a color image (RGB) to image Gray-scale,
the conversion can be done by taking into account the
composition of each existing RGB color. Here is the

2
D. Gray scaling Fuji RRT Apple
Gray scaling is the process of changing color images
into gray scale images. This process occurs by changing the
three red, green, and blue channels into one channel with
intensity values from 0-255. The following is an example of
the process gray scaling using the matlab function.

RGB Gray scale

Malang Cherry Apple

Envy Apple

Royal Gala Apple

E. Leave One Out Cross Validation


Method Leave One Out Cross Validation is a derivative
method of K-Fold Cross Validation, where K is the total
number of datasets, and the dataset is broken down into K
parts of the same size. Every time the process runs, one data
acts as data testing while the other part becomes data
training [17]. The process is carried out as many as K times
so that each data has the opportunity to be data testing
exactly once and become data training as many as K [18].
The advantage of this method is that there are no problems
in data sharing. The disadvantage of this method is that the
process must be done as much as K times which means
using K times computational time [19].

3
So that the data obtained can be validated, then all data 4. 8GB RAM.
is divided into two parts so that from 200 datasets it will be 5. 1000GB capacity hard drive.
divided into two parts by 50% for data training and 50%. 6. 64-bit Windows 10 is used as an operating system.
7. Matlab R2016a is used as an application to process
III. RESULTS AND DISCUSSION datasets.
8. Microsoft Word is used as writing in research.
This section contains a description of the materials and
tools used in this study in the form of hardware and software
The dataset is taken using a Nikon D5300 camera
as well as the stages of research related to the previous
chapter. Research algorithms and results analysis will be with a camera and apple distance of 30 cm with a 6000 x
explained in this section. 4000 pixel. Then the image is through the process of
cropping (cutting) in the background so as to produce a
size of 350 x 400 pixel. The dataset used is a color
image with a pixel size of 350 x 400 pixel. The dataset is
200 apple images obtained from supermarkets with
details of 50 types of Apple Envy fruit, 50 types of Fuji
RRT Apple fruit, 50 types of Malang Cherry Apples,
and 50 types of Royal Gala Apples. Of the 200 images
each image is made in a different folder according to the
class. Dataset Apple used is:
No. Apples image Class
1 envy (1) .jpg Envy
2 fuji (1) .jpg Fuji RRT
3 poor (1) .jpg Malang Cherry
4 royalgala (1) .jpg Royal Gala

Each image in a different folder will read by the


system with the "imread" command in matlab. After all
the images are read by the system, the next thing is the
image histogram. In the image histogram, each will be
taken channel R, G, and B. After taken channels are R,
G, and Bin each image, the image that previously has 3
channels will be changed to 1 channel (gray scaling).

a. Experimental Setup
Literature used includes journals, books and
documentation from the internet. Literature screening is
done by choosing the theories that are used for this
research. Literature learning is carried out both
independently and by guiding the supervisor. The
theories studied include image processing techniques,
digital image histograms, apple varieties, algorithm K-
Nearest Neighbor (K-NN), image acquisition, feature
extraction, and leave-one-out.
The equipment, devices and materials used in this
study are as follows:
1. Nikon D5300 DSLR Camera.
2. White HVS paper.
3. The processor used is Intel® Core ™ i7-6500U
CPU @ 2.50Ghz.

4
Next, the image that has been converted to a vector will
be transposed to match the workings of matlab as below.

The image has been changed to grayscale, then the


image size is changed from 350 x 400 pixel to a size of 50 x
50 pixel using the "imresize" command in matlab.

Next, create a feature vector and class, where all images


have been transposed. will be made in one matrix as shown
below

Feature Vector Feature Class

Algorithm K-Nearest Neighbor is a method for


classifying objects based on learning data which is the
closest distance to the object in question. Using the value K
as the value of the number of clusters usually uses K with an
odd value. The decision making process in this program
design uses the basic concept of the K-NN algorithm. The
comparison results with 200 images are sorted by similarity.

Leave-one-out divides the dataset into a random


partition. Then do a number of experiments, where each
Furthermore, the image grayscale that has been resized experiment uses the first partition data as data testing and
will be converted into a vector as below. utilizes the rest of the other partitions as data training.
Datasets 200 image are divided into 200 parts where one
part will be used as data testing and the remainder as data
training. So there are 200 attempts, where each image is
right once as data testing.

5
b. Results K=9
In this section a trial is conducted on a system that has
been made to determine the ability of the theories used, Class Succeed Percentage
observe system performance, and identify the obstacles that
Envy 39 78%
arise. Evaluation is done by comparing the results of the
accuracy of the data classification. The testing process is Fuji RRT 48 96%
done 50 times for each class, so there are 200 test results.
The level of accuracy of success with the testing process Malang Cherry 50 100%
based on K = 1,3,5,7. Here is a test of classification of
Royal Gala 35 70%
apples with different values K

Based Based on the level of accuracy of success with the testing


K=1
process based on K = 1.3, 5,7,9 can be summarized in the
Class Succeed Percentage following table:

Envy 45 90%
Accuracy Success
Fuji RRT 47 94% Test
No. k= k=3 k=5 k=7 k=
Data
Malang Cherry 50 100% 1 9
1. Envy 90% 88% 86% 80% 78%
Royal Gala 38 76%
2. Fuji 94% 92% 94% 94% 96%
RRT
K=3 3. Malan 100 100% 100% 100% 100
g % %
Class Succeed Percentage Cherry
4. Royal 76% 72% 70% 70% 70%
Envy 44 88%
Gala
Fuji RRT 46 92% 90% 88% 87.5% 86% 86%
Average
Malang Cherry 50 100% overall
87.5%
Accuracy
Royal Gala 36 72%
Testing of computing time is done for knowing
K=5 how long the system takes to complete the process until
it gets the classification results. In this test a different
Class Succeed Percentage computational time with K is tested.
Envy 43 86%
No. Condition Computational
Fuji RRT 47 94%
Time
Malang Cherry 50 100%
1. K=1 26.11 seconds
Royal Gala 35 70%
2. K=3 23.40 seconds

3. K=5 51.88 seconds


K=7
4. K=7 23.24 seconds
C Succeed Percentage
5. K=9 26.01 seconds
Envy 40 80%
Total 150.64 seconds
Fuji RRT 47 94%
Average 30.12 seconds
Malang Cherry 50 100%

Royal Gala 35 70%

6
c. Discussion [7] RT Sari, "Comparison of Backpropagation Artificial Neural Network
Methods and Matching Algorithms in Identifying Tomato Maturity
Implementation of the apple classification system Based on RGB Color Characteristics," 2013.
using the KNN method, Feature Extraction, Image [8] MI Sikki, "Face Recognition Using K - Nearest Neighbor with
Acquisition and leave-one-out, starting with the Wavelet Transform Process," Paradigm, vol. 10, p. 2, 2009.
determination of [9] Leidiyana, 2010, "Application of K-Nearest Neighbor Algorithm for
The classification program created using the K- Determining Risk of Motorized Vehicle Ownership Credit",
Computer Science Research Journal, STMIK Nusa Mandiri.
Nearest Neighbor (KNN) algorithm. Managed to classify [10] Avelita. B, 2013, "K-Nearest Neighbor Classification", from
apples Envy, Fuji RRT, Malang Cherry, and Royal Gala www.academia.edu:https://www.academia.edu/9131959/A_Klasifikas
using the method leave-one-out and color characteristics. i_K-Nearest_Neighbor, Accessed on November 20, 2018.
Based on the results of the tests conducted, the results [11] Santoso. B, 2007, "Data Mining Techniques for Using Data for
Business Purposes (1 ed).", Yogyakarta: Graha Ilmu.
for accuracy of values are for K = 1 is 90%, for K = 3 is
[12] Agus. Pratondo, Chee-Kong. Chui, Sim-Heng. Ong, 2016,
88%, for K = 5 is 87.5%, for K = 7 is 86%, and for K = 9 "Integrating Machine Learning with Region-Based Active Contour
is 86%. So it can be concluded that the testing of the Models in Medical Image Segmentation", National University of
most optimal K value is 1, namely with 90% accuracy. Singapore.
[13] Yofianto. Evan, 2010, "Book TA: K-Nearest Neighbor (K-NN)",
In the classification of apple images using the
http://kuliahinformatika.wordpress.com/2010/02/13/buku-ta-k-
algorithm K-Nearest Neighbor (K-NN) using 200 nearest-neighbor-knn/, Accessed at November 20, 2018.
datasets apple consisting of 50 types of Apple Envy [14] Informatics, 2017, "K-Nearest Neighbor (K-NN) Algorithm",
fruit, 50 types of Fuji RRT apples, 50 types of Malang https://informatikalogi.com/algoritma-k-nn-k-nearest-neighbor/,
Accessed on November 20, 2018.
Cherry Apples, and 50 types of Royal Gala Apples
[15] RS Bahri and I. Maliki, 2012, "Comparison of Template Maching
obtain an accuracy of 87.5%. Algorithms and Feature Extraction in Optical Character Recognition",
The computing time needed by the system to detect Journal of Computers and Informatics.
until the classification process requires a total time of [16] I Nyoman Gede Arya, 2016, "Histogram Equalization and Gaussian
150.64 seconds with an average of 30.12 seconds. Filtering Method", Makassar: Indonesian Muslim University.
[17] Saprina. Mamase, Joko. Lianto. Buliali, 2015, "K-Means Hybrid
So that the results can be concluded that using the Method and Generalized Regression Neural Network for Traffic Flow
K-NN algorithm and using histogram feature extraction Prediction", Informatics Engineering: Ten November Institute of
can be applied quite well to the classification of apple Technology.
types. [18] Yuliana. Hermawan, 2012, "Basic Theory of Estimating Classifier
Accuracy in the Holdout Method".
Future studies are expected to be able to identify
[19] M. Reza. Faisal, 2013, "Learning Science Data Series: Classification
more types of apples from previous studies. Testing with with R Programming Language".
data in training more order to increase accuracy in the
data classification process. Images can be detected in
light, slope, and different data collection distances.
In research feature extraction for classification of
apple types there are several things that need to be
considered in order to be better in the future, namely the
use of the same image and data background so that the
feature extraction is carried out more optimally. Data
features need to pay attention to taking adequate angle
and lighting.
This system can be developed using other methods
to classify better types of apples.

REFERENCES

[1] Notodimedjo. Soewarno, 1995, "Horticulture Plant


Cultivation" Especially the Fruit-Fruit Plant, Fak. Agriculture,
Universitas Brawijaya, poor.
[2] Suhariyono, 2013, "History of Apell Development in
Indonesia", Agricultural Research and Development Agency.
[3] Apritantono, 1987, "Food Science", University of Indonesia,
Jakarta.
[4] Soelarso. R Bambang, 1996, "Apple Cultivation", Kanisius,
Yogyakarta.
[5] S. Jatmika and D. Purnamasari, "Designing an Apple Maturity
Detection Tool Using the Image Processing Method Based on Color
Composition," vol. 8, 2014.
[6] Radityo, Dimas Riski, et al. 2012. Fruit Maturing and Checking Tools
Using Color Sensors. Jakarta: Bina Nusantara University.

You might also like