Professional Documents
Culture Documents
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz
ABSTRACT
Harumanis Mango (Mangifera indica) is known as one of the
Categories and Subject Descriptors
best table tropical fruit, due to its aroma and sweetness. I.5.2 [Computing Methodologies]: Classifier Design and
Harumanis mango cultivar is included in the national agenda as Analysis, Pattern Analysis.
a specialty fruit from Perlis, Malaysia for the world. Despite its
overwhelming local demand in Malaysia and also
internationally, the fruit supply never meets the demand. Mango General Terms
flowering stem prediction is important as one of the factors to
predict mango yield in order to implement effective forward Algorithms, Performance.
marketing. Forward marketing is a co ntract that is signed
between supplier and client based on the amount of delivery and Keywords
the price of delivery in future, based on t he predicted yield. In
this paper, machine learning techniques are used to perform
prediction of the flowering tree branches that could be used to Machine learning; mango flowering stem prediction; soft
predict yield in mango trees. Results shows that machine computing
learning techniques could be used to predict the flowering
branches.
1. INTRODUCTION
Harumanis mango is one of the fruit that has high
economic demand and potential for Malaysia export
business especially the Perlis State in Malaysia. Perlis
exported 3.1 metric tons of Harumanis mango to Japan in
2010 and has targeted the export demand to increase to
100 metric tons by 2020. Harumanis mango tree is a
yearly fruit bearing trees and reproductive phase of the
mango trees often starts from January and ends nearly on
June. This type of mango is highly sensitive to the climate
and only grows in Perlis and part of Surabaya in
Indonesia. It requires a significant dry weather period to
Research Notes in Information Science (RNIS)
initial flowering and the productive phase can be
Volume13,May 2013
doi:10.4156/rnis.vol13.10
46
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz
significantly affected by change in weather. Although it that include machine learning techniques in agricultural
does grow on Surabaya, the variety that grows in Perlis is domains for yield prediction [1,5], crop classification,
often highly valued for export and therefore attracts high crop disease detection [19], management and advisory
foreign earnings. The demand outstrips supply most of the expert system [7,8,9,13,14,20] .
time and there is a need to study and understand the yield
cycle in order to accurately predict supply.
Harumanis Mango growth and reproductive phases are Machine learning is about learning the structures from
illustrated in Figure 1 that depicts the growth and data. Machine learning techniques can be used to perform
reproductive phases of Harumanis Mango in Perlis classification and prediction for future observation.
associated with the period of months. The vegetative Classification is a task of assigning objects to one of
growth period is approximately from July to December. several predefined classes to create a classification model,
December and January are considered as Pre-Flowering whereas prediction is where the classification model is
phase where the flower induction process is stimulated. used to predict the new observations. A classification
January to February months is the period when the flowers technique such as k-NN classifier, rule-based classifier
grow and bloom. Fruit bearing occurs during March to and nave Bayes classifiers employs a l earning algorithm
April that leads to harvest from May to June. to find a model that best suits the relationship between the
attributes and the classification categories.
47
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz
Lysimeter Nominal
Length of first whorl Continous Figure 2. The typical mango terminal stems
displaying the three whorls where the measurements
Length of second whorl Continous are taken from. The whorls are representing the
termination of each previous flush of vegetative
Length of Third Whorl Continous growth.
The Diameter of the Third Continous
Whorl
Machine learning techniques that were used in this work
Stem State Categorical are k-NN, Nave Bayes, SVM, CAT and RF. k-NN
algorithm is a technique used to classify the new
observations based on the closest neighbors labels. In this
technique the distance (similarity) between the test set and
There are 5 attributes that have been used in this research. the training set is used to determine the nearest-neighbor
These are lysimeter, that describe the type of lysimeter list. The appropriate number of k (the number of
that the trees are planted in, length of the first, second and neighbor instances to be compared) is important to be
the third stem whorl (length in mm of the 3 whorl
branches) and their diameters. determined. A small k value might cause the classifier to
be susceptible to over fitting because of noise in the
Lysimeter feature is assigned a v alue between 1 and 3 training data. On the other hand a large k value might
which represent the root zones of the mango trees. The cause misclassification because neighbors that are located
mango trees are planted in 3 different lysimeter sizes far from the neighborhood will also be included in the
which are micro-lysimeter (1), lysimeter (2) and classification decision.
unrestricted root zones (3). The micro-lysimeter has
dimension of 50 cm in deep and a diameter of 50 c m The k- NN models uses Euclidean distance metrics as
while the other lysimeters size with dimensions of 0.75 m shown in Equation 1 to get the nearest neighbors
deep and 1.5 m in diameter. Mango trees with unrestricted
root zones are planted directly into the ground.
d= ( x y )i i
2
5: y = arg max
v
I (v = y ),
( xi , yi )Dx*
i
6: end
48
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz
n
v = arg max P v j ( ) P(ai v j ) Internal Node
v i V i =1
(Equation 2)
n
pj = P vj ( ) P(ai v j ) RF builds several classification trees to a d ata set and
i =1 combines the predictions from all the trees. A
(Equation 3) classification tree is fitted to each bootstrap samples from
the data. At each node, a small number of randomly
which, after normalization so that the sum of p j is equal selected variables are made available for binary
to 1, represents class probabilities. The class probabilities partitioning. In random forest each variable that is
and conditional probabilities (priors) in the above importance is measured.
formulae are estimates from the training data: class
probability is equal to the relative class frequency, while The classifier models using machine learning techniques
the conditional probability of attribute value given class is have been built and tested using 10 fold cross validation
computed by figuring out the proportion of instances with tests. The k-NN technique has been applied to build a
model to classify the flowering and vegetative branches.
a value of i-th attribute equal to a i among instances that The Euclidean metrics has been used as learning metrics.
from class v j . The classification accuracy with values from 0 to 1, where
0 is the worst classification and 1 is the best classification
Relative frequency is used when computing prior has been recorded. Two learner metrics in k-NN
conditional probabilities. So the total number of training technique performance are compared to find the better
examples is n and nc is the number of training example learner technique that could be used in the classification
that has the specific condition. The relative frequency model. The Learner metrics are Euclidean and Hamming
corresponding to the probability would be metrics. The results are displayed in Figure 4. Since the
Euclidean metrics show better learning ability, this metrics
nc has been used through out the training and testing of the
P= data.
n
(Equation 4) The other techniques, Nave Bayes, SVM, CAT and RF
have also been used to build the classifier models, tested
and compared to report the best technique that could
predict the flowering stems. The results are displayed and
SVM [17] is one of the techniques that is widely used in
discussed in the following section.
classification problems. SVM separates the classes
independently with the hyperplane that maximizes the
distance from a hyperplane separating the classes to the
nearest point in the data set.
3. RESULT AND DISCUSSION
CAT is an algorithm that splits the training instances
Figure 3 displays the result of k-NN technique
accordingly and builds a tree that consists of root node,
classification accuracy which varies the number of
internal nodes and leaf nodes as a model to be used to
neighbors from 1 t o 10, using 10 f old cross validation
classify the test examples. testing. The highest classification accuracy is achieved
using Euclidean metrics for k va lue from 1 to 7 which is
49
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz
0.6794. The accuracy decreases for k from 8 to 10, with SVM and Classification Trees classifier models
values from 0.6711 to 0.6589. On the other hand the outperform the other methods applied in this study in
classification accuracy of Hamming learning metric is predicting the flowering stems and also the vegetative
lower than that of Euclidean metrics. The highest accuracy stem with accuracy levels of more than 70%.
is achieved at k =9 where the accuracy is 0.6534.
4. CONCLUSION
The development of accurate yield prediction methods is
invaluable both to suppliers and the importers. This is
more critical in high value crops that can help in
communities and government to predict and plan
expenditure. The results presented in this paper show that
SVM and Classification trees classifier models outperform
other methods tested in this study in predicting the
flowering stems. The results also demonstrate that the
machine learning technique could be used to perform
classification on flowering and non flowering stems. This
classification algorithm can be used in Decision support
system that could predict tree yield every season using
Figure 3. The Classification Accuracy using k-NN biotic factors.
(k=1to10) and Different Leaning Metrics
50
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz
estimation of mango (Mangifera indica L. cv. Chok Anan) cotton crop production as a basis for decision support
fruit yields under different irrigation regimes. Agricultural system in precision agriculture application. Applied Soft
Water Management, (2012). Computing Journal 11, 4 (2011), 36433657.
[8] Hernndez-Snchez, C., Luis, G., Moreno, I., et al. [14] Pomar, J. and Pomar, C. A knowledge-based decision
Differentiation of mangoes (Magnifera indica L.) support system to improve sow farm productivity. Expert
conventional and organically cultivated according to their Systems with Applications 29, 1 (2005), 3340.
mineral content by using support vector machines. Talanta
97, (2012), 32530. [15] Ramrez, F. and Davenport, T.L. Mango (Mangifera indica
L.) flowering physiology. Scientia Horticulturae 126, 2
[9] Hu, J., Li, D., Duan, Q., Han, Y., Chen, G., and Si, X. Fish (2010), 6572.
species classification by color, texture and multi-class
support vector machine using computer vision. Computers [16] Tan, P.-N., Steinbach, M., and Kumar, V. Introduction to
and Electronics in Agriculture 88, (2012), 133140. Data Mining. Addison Wesley, 2006.
[10] Nez-Elisea, R. and Davenport, T.L. Requirements for [17] Vapnik, V.N. Statistical Learning Theory. John Wiley &
mature leaves during floral induction and floral transition Sons, Inc., 1998.
in developing shoots of mango. Acta Hortic. 296, (1992),
3337. [18] Whiley, A.W. Environmental effects on phenology and
physiology of mango-a review. Acta Hortic. 341, (1993),
[11] Nez-Elisea, R. and Davenport, T.L. Flowering of mango 168176.
trees in containers as influenced by seasonal temperature
and water stress. Scientia Horticulturae 58, 1-2 (1994), 57 [19] Yang, C.-C., Prasher, S.O., Landry, J.-A., and
66. Ramaswamy, H.S. Development of a herbicide application
map using artificial neural networks and fuzzy logic.
[12] Nez-Elisea, R. Effect of leaf age, duration of cool Agricultural Systems 76, 2 (2003), 561574.
temperature treatment, and photoperiod on bud dormancy
release and floral initiation in mango. Scientia [20] Zheng, H. and Lu, H. A least-squares support vector
Horticulturae 62, 1-2 (1995), 6373. machine (LS-SVM) based on fractal analysis and CIELab
parameters for the detection of browning degree on mango
[13] Papageorgiou, E.I., Markinos, A.T., and Gemtos, T.A. (Mangifera indica L.). Computers and Electronics in
Fuzzy cognitive map based approach for predicting yield in Agriculture 83, (2012), 4751.
51