Professional Documents
Culture Documents
Neighbor Method
Abstract —Classification of apples by type is an easy thing supervised where in results the query instances are classified
to do for humans, but it is not easy for computers to do. based on the majority of the categories on K-NN [9].
Human perception usually tends to be subjective towards an
object, this is due to the color composition factors possessed by
the object. Therefore, a tool for classifying apples is needed
based on the types that are done automatically. This study aims II. RESEARCH METHODS
to classify apple types using the algorithm K-Nearest Neighbor
(K-NN), leave-one-out and histogram feature extraction. The A. K-Nearest Neighbor (K-NN) Classification Algorithm
data used in this study are two, namely training data and data The classification process is one of the important stages
testing , amounting to 200 images. The results of this study are before carrying out the product marketing process [7]. The
systems that can classify types of apples with an accuracy rate classification process aims to identify the characteristics,
of 87.5% and the total computation time is 150.64 seconds. patterns, or structures in the image and then group them into
Keywords: K-Nearest Neighbor, Feature Extraction, and Apples
more specific types [10]. One of the commonly used
classification methods is the Algorithm K-Nearest Neighbor
I. INTRODUCTION (K-NN). Algorithm K-Nearest Neighbor (K-NN) is a
classification method by grouping data based on the distance
Apples are annual fruit plants originating from the West of the data with some of the closest data (nearest neighbor)
Asian region with sub-tropical climates [1]. In Indonesia [11]. K-NN gives a value with a range of 0 to 1 that is easily
apples have been planted since 1934 until now [2]. Apples applied using the rule fuzzy K-NN [12]. Method This is
also contain vitamins A, B1, and C [3]. Not only can the done by first storing all training data in the database.
fruit be consumed but the skin from the fruit can also be Furthermore, the data to be tested is compared with the
consumed because it contains flavonoids called quercetin training data that is there to determine the number K (K is
[4]. Quercetin has antioxidant activity which functions to an integer). Conclusion of classification is determined by
prevent the attack of free radicals so that it can protect the any point that is closest to the data being tested.
body from possible cancer attacks [5]. This makes apples
one of the fruits favored by the general public in Indonesia In the phase training, the K-NN algorithm only stores
[6]. feature vectors and data classification training. In the
Apples have many types, each of which has a classification phase, the same features are calculated for
distinctiveness both from physical and chemical properties, data testing. The distance from the new vector to the entire
there are four types of apples that have different skin tastes, vector of data is training calculated, and the number of the
shapes and textures. The four apples are Royal Gala, Fuji closest K pieces is taken. The new point whose
RRT, Malang Cherry, and Envy. Large production and the classification is predicted belongs to the most classification
emergence of the need to classify products with certain of these points. The value of K used depends on the data. In
parameters make the product classification process become general, a high K value will reduce the effect of noise on
one of the important stages before the marketing phase [7]. classification [13].
In general, the product classification process is done The steps in the K-NN algorithm are as follows:
manually by human power, but this is limited to the ability
of the human senses which tend to be subjective and fooled a. Determine the K parameter (using integers).
when used to measure textures, shapes, colors, and other
b. Calculate the square of the distance of the Euclidean
parameters [8]. Due to the limited ability of the human
object against the data training provided.
senses and inconsistency in assessing the physical properties
of a product, it will result in data inaccuracies. c. Sorting the square of distance the Euclidean object
Seeing the problems above, the classification process to the data training in ascending (in order from high
is one of the right solutions to be implemented. So this study to low value).
aims to identify the types of apples automatically using the
d. Collect Y category (classification nearest neighbor
algorithm K-Nearest Neighbor (K-NN). So that it will
based on K value).
produce classification of apples Royal Gala, Fuji RRT,
Malang Cherry, and Envy a more accurate. Algorithm K-
Nearest Neighbor (K-NN) algorithm that uses algorithm
1
e. By using the category nearest neighbor most standard formula for changing color images (RGB) to
majority, it can be predicted the class or type of the imagery Grayscale :
object [14].
Z = 0.144 * R + 0.587 * G + 0.299 * B (8)
2
D. Gray scaling Fuji RRT Apple
Gray scaling is the process of changing color images
into gray scale images. This process occurs by changing the
three red, green, and blue channels into one channel with
intensity values from 0-255. The following is an example of
the process gray scaling using the matlab function.
Envy Apple
3
So that the data obtained can be validated, then all data 4. 8GB RAM.
is divided into two parts so that from 200 datasets it will be 5. 1000GB capacity hard drive.
divided into two parts by 50% for data training and 50%. 6. 64-bit Windows 10 is used as an operating system.
7. Matlab R2016a is used as an application to process
III. RESULTS AND DISCUSSION datasets.
8. Microsoft Word is used as writing in research.
This section contains a description of the materials and
tools used in this study in the form of hardware and software
The dataset is taken using a Nikon D5300 camera
as well as the stages of research related to the previous
chapter. Research algorithms and results analysis will be with a camera and apple distance of 30 cm with a 6000 x
explained in this section. 4000 pixel. Then the image is through the process of
cropping (cutting) in the background so as to produce a
size of 350 x 400 pixel. The dataset used is a color
image with a pixel size of 350 x 400 pixel. The dataset is
200 apple images obtained from supermarkets with
details of 50 types of Apple Envy fruit, 50 types of Fuji
RRT Apple fruit, 50 types of Malang Cherry Apples,
and 50 types of Royal Gala Apples. Of the 200 images
each image is made in a different folder according to the
class. Dataset Apple used is:
No. Apples image Class
1 envy (1) .jpg Envy
2 fuji (1) .jpg Fuji RRT
3 poor (1) .jpg Malang Cherry
4 royalgala (1) .jpg Royal Gala
a. Experimental Setup
Literature used includes journals, books and
documentation from the internet. Literature screening is
done by choosing the theories that are used for this
research. Literature learning is carried out both
independently and by guiding the supervisor. The
theories studied include image processing techniques,
digital image histograms, apple varieties, algorithm K-
Nearest Neighbor (K-NN), image acquisition, feature
extraction, and leave-one-out.
The equipment, devices and materials used in this
study are as follows:
1. Nikon D5300 DSLR Camera.
2. White HVS paper.
3. The processor used is Intel® Core ™ i7-6500U
CPU @ 2.50Ghz.
4
Next, the image that has been converted to a vector will
be transposed to match the workings of matlab as below.
5
b. Results K=9
In this section a trial is conducted on a system that has
been made to determine the ability of the theories used, Class Succeed Percentage
observe system performance, and identify the obstacles that
Envy 39 78%
arise. Evaluation is done by comparing the results of the
accuracy of the data classification. The testing process is Fuji RRT 48 96%
done 50 times for each class, so there are 200 test results.
The level of accuracy of success with the testing process Malang Cherry 50 100%
based on K = 1,3,5,7. Here is a test of classification of
Royal Gala 35 70%
apples with different values K
Envy 45 90%
Accuracy Success
Fuji RRT 47 94% Test
No. k= k=3 k=5 k=7 k=
Data
Malang Cherry 50 100% 1 9
1. Envy 90% 88% 86% 80% 78%
Royal Gala 38 76%
2. Fuji 94% 92% 94% 94% 96%
RRT
K=3 3. Malan 100 100% 100% 100% 100
g % %
Class Succeed Percentage Cherry
4. Royal 76% 72% 70% 70% 70%
Envy 44 88%
Gala
Fuji RRT 46 92% 90% 88% 87.5% 86% 86%
Average
Malang Cherry 50 100% overall
87.5%
Accuracy
Royal Gala 36 72%
Testing of computing time is done for knowing
K=5 how long the system takes to complete the process until
it gets the classification results. In this test a different
Class Succeed Percentage computational time with K is tested.
Envy 43 86%
No. Condition Computational
Fuji RRT 47 94%
Time
Malang Cherry 50 100%
1. K=1 26.11 seconds
Royal Gala 35 70%
2. K=3 23.40 seconds
6
c. Discussion [7] RT Sari, "Comparison of Backpropagation Artificial Neural Network
Methods and Matching Algorithms in Identifying Tomato Maturity
Implementation of the apple classification system Based on RGB Color Characteristics," 2013.
using the KNN method, Feature Extraction, Image [8] MI Sikki, "Face Recognition Using K - Nearest Neighbor with
Acquisition and leave-one-out, starting with the Wavelet Transform Process," Paradigm, vol. 10, p. 2, 2009.
determination of [9] Leidiyana, 2010, "Application of K-Nearest Neighbor Algorithm for
The classification program created using the K- Determining Risk of Motorized Vehicle Ownership Credit",
Computer Science Research Journal, STMIK Nusa Mandiri.
Nearest Neighbor (KNN) algorithm. Managed to classify [10] Avelita. B, 2013, "K-Nearest Neighbor Classification", from
apples Envy, Fuji RRT, Malang Cherry, and Royal Gala www.academia.edu:https://www.academia.edu/9131959/A_Klasifikas
using the method leave-one-out and color characteristics. i_K-Nearest_Neighbor, Accessed on November 20, 2018.
Based on the results of the tests conducted, the results [11] Santoso. B, 2007, "Data Mining Techniques for Using Data for
Business Purposes (1 ed).", Yogyakarta: Graha Ilmu.
for accuracy of values are for K = 1 is 90%, for K = 3 is
[12] Agus. Pratondo, Chee-Kong. Chui, Sim-Heng. Ong, 2016,
88%, for K = 5 is 87.5%, for K = 7 is 86%, and for K = 9 "Integrating Machine Learning with Region-Based Active Contour
is 86%. So it can be concluded that the testing of the Models in Medical Image Segmentation", National University of
most optimal K value is 1, namely with 90% accuracy. Singapore.
[13] Yofianto. Evan, 2010, "Book TA: K-Nearest Neighbor (K-NN)",
In the classification of apple images using the
http://kuliahinformatika.wordpress.com/2010/02/13/buku-ta-k-
algorithm K-Nearest Neighbor (K-NN) using 200 nearest-neighbor-knn/, Accessed at November 20, 2018.
datasets apple consisting of 50 types of Apple Envy [14] Informatics, 2017, "K-Nearest Neighbor (K-NN) Algorithm",
fruit, 50 types of Fuji RRT apples, 50 types of Malang https://informatikalogi.com/algoritma-k-nn-k-nearest-neighbor/,
Accessed on November 20, 2018.
Cherry Apples, and 50 types of Royal Gala Apples
[15] RS Bahri and I. Maliki, 2012, "Comparison of Template Maching
obtain an accuracy of 87.5%. Algorithms and Feature Extraction in Optical Character Recognition",
The computing time needed by the system to detect Journal of Computers and Informatics.
until the classification process requires a total time of [16] I Nyoman Gede Arya, 2016, "Histogram Equalization and Gaussian
150.64 seconds with an average of 30.12 seconds. Filtering Method", Makassar: Indonesian Muslim University.
[17] Saprina. Mamase, Joko. Lianto. Buliali, 2015, "K-Means Hybrid
So that the results can be concluded that using the Method and Generalized Regression Neural Network for Traffic Flow
K-NN algorithm and using histogram feature extraction Prediction", Informatics Engineering: Ten November Institute of
can be applied quite well to the classification of apple Technology.
types. [18] Yuliana. Hermawan, 2012, "Basic Theory of Estimating Classifier
Accuracy in the Holdout Method".
Future studies are expected to be able to identify
[19] M. Reza. Faisal, 2013, "Learning Science Data Series: Classification
more types of apples from previous studies. Testing with with R Programming Language".
data in training more order to increase accuracy in the
data classification process. Images can be detected in
light, slope, and different data collection distances.
In research feature extraction for classification of
apple types there are several things that need to be
considered in order to be better in the future, namely the
use of the same image and data background so that the
feature extraction is carried out more optimally. Data
features need to pay attention to taking adequate angle
and lighting.
This system can be developed using other methods
to classify better types of apples.
REFERENCES