You are on page 1of 6

Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques

R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz

Harumanis Mango Flowering Stem Prediction using


Machine Learning Techniques
R.S.M. Farooka,*, H. Alib, A. Harun c, Ndzi. D. L.d, A. Y. M. Shakaff c, Mahmad Nor Jaafare , Z.
Husina , A.H.A. Aziza
a
School of Computer and Communication Engineering, University Malaysia Perlis (UNIMAP), Perlis, Malaysia ,
Block A, Kompleks Pusat Pengajian Seberang Ramai
No. 12 & 14, Jalan Satu, Taman Seberang Jaya Fasa 3
Seberang Ramai, 02000 Kuala Perlis
Perlis Darul Sunnah
Telephone Number : 604-9851654
Email: rohani@unimap.edu.my
Fax Number : 604-9851695
b
Information Technology Department, Polytechnic Tuanku Syed Sirajuddin, (PTSS), Perlis Malaysia
c
School of Mechatronic Engineering, University Malaysia Perlis (UNIMAP), Perlis, Malaysia
d
School of Engineering, University of Portsmouth, Portsmouth, UK
e
Agrotechnology Research Station, University Malaysia Perlis (UNIMAP), Perlis, Malaysia
rohassanie@yahoo.com, David.ndzi@port.ac.uk
{ zulhusin, hallis, aliyeon, mahmad }@unimap.edu.my

ABSTRACT
Harumanis Mango (Mangifera indica) is known as one of the
Categories and Subject Descriptors
best table tropical fruit, due to its aroma and sweetness. I.5.2 [Computing Methodologies]: Classifier Design and
Harumanis mango cultivar is included in the national agenda as Analysis, Pattern Analysis.
a specialty fruit from Perlis, Malaysia for the world. Despite its
overwhelming local demand in Malaysia and also
internationally, the fruit supply never meets the demand. Mango General Terms
flowering stem prediction is important as one of the factors to
predict mango yield in order to implement effective forward Algorithms, Performance.
marketing. Forward marketing is a co ntract that is signed
between supplier and client based on the amount of delivery and Keywords
the price of delivery in future, based on t he predicted yield. In
this paper, machine learning techniques are used to perform
prediction of the flowering tree branches that could be used to Machine learning; mango flowering stem prediction; soft
predict yield in mango trees. Results shows that machine computing
learning techniques could be used to predict the flowering
branches.
1. INTRODUCTION
Harumanis mango is one of the fruit that has high
economic demand and potential for Malaysia export
business especially the Perlis State in Malaysia. Perlis
exported 3.1 metric tons of Harumanis mango to Japan in
2010 and has targeted the export demand to increase to
100 metric tons by 2020. Harumanis mango tree is a
yearly fruit bearing trees and reproductive phase of the
mango trees often starts from January and ends nearly on
June. This type of mango is highly sensitive to the climate
and only grows in Perlis and part of Surabaya in
Indonesia. It requires a significant dry weather period to
Research Notes in Information Science (RNIS)
initial flowering and the productive phase can be
Volume13,May 2013
doi:10.4156/rnis.vol13.10

46
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz

significantly affected by change in weather. Although it that include machine learning techniques in agricultural
does grow on Surabaya, the variety that grows in Perlis is domains for yield prediction [1,5], crop classification,
often highly valued for export and therefore attracts high crop disease detection [19], management and advisory
foreign earnings. The demand outstrips supply most of the expert system [7,8,9,13,14,20] .
time and there is a need to study and understand the yield
cycle in order to accurately predict supply.

Harumanis Mango growth and reproductive phases are Machine learning is about learning the structures from
illustrated in Figure 1 that depicts the growth and data. Machine learning techniques can be used to perform
reproductive phases of Harumanis Mango in Perlis classification and prediction for future observation.
associated with the period of months. The vegetative Classification is a task of assigning objects to one of
growth period is approximately from July to December. several predefined classes to create a classification model,
December and January are considered as Pre-Flowering whereas prediction is where the classification model is
phase where the flower induction process is stimulated. used to predict the new observations. A classification
January to February months is the period when the flowers technique such as k-NN classifier, rule-based classifier
grow and bloom. Fruit bearing occurs during March to and nave Bayes classifiers employs a l earning algorithm
April that leads to harvest from May to June. to find a model that best suits the relationship between the
attributes and the classification categories.

A training set consists of data whose class are known and


is used to build a classification model. The classification
model is later applied to a test data set, to test the
classification accuracy of the model. Evaluation of the
classification model is based on the counts of test records
correctly and incorrectly classified by the model. The
model performance can be displayed using performance
metrics determine the level of accuracy.
Figure 1. Harumanis Mango Growth and Reproductive
Phases The performance of the model can be evaluated using
several methods such as Hold out Method, Random
The pre flowering and flowering phases are identified as Subsampling, Cross Validation and Bootstrap [16] . A
important stages in the plant reproductive physiology. The Cross Validation technique is used to evaluate the
Harumanis mango flowering induction event can be performance of the classification models.
influenced by a few factors such as pruning, defoliation
and nitrogen fertilizer application [3,4,10]. Since In this paper, machine learning techniques are applied to
Harumanis mango tree grow in Malaysia, a tropic country, identify the possible flowering tree branches using biotic
flowering induction is influenced by the climatic factors. Five different classifiers performance used to
factors[2,6,18], and biotic factors[10,12,15]. In tropics predict the possible flowering branches are compared. The
climate countries, it is an important factor that the classifiers used are k-Nearest Neighbour (k-NN), Naives
terminal stems of the trees are allowed to rest after the Bayes, Support Vector Machine (SVM), Classification
previous vegetative growth, to be able to produce Trees (CAT) and Random Forest (RF).
reproductive shoots [11]. This resting time is necessary to
The rest of the paper is outlined as follows; Section 2
provide the stem ample time to mature and grow sufficient
discusses the data sets descriptions and method. Re sults
whorl, length and diameter.
and discussion are presented in Section 3 and the paper
The mango trees yield predictions are essential to enable ends with the conclusion in Section 4.
forward marketing signed between Malaysia and mango
importing countries such as Japan. The forward marketing
contract includes the specific mango quantity in tons to be
delivered at specific times. A Harumanis mango exporter 2. METHODS
needs to know the approximate yield before agreeing to
terms of the contract. The yield can be approximated once In this paper, the data from Harumanis Greenhouse at the
the farmer performs the possible flowering stems Institute of Agrotechnology, University Malaysia Perlis
prediction. are used. The biotic data c onsists of 254 s tems of
generative and vegetative flushes. The attributes and the
Machine learning techniques application in agriculture is a data types are given in Table 1:
relatively new approach for classification and yield
prediction in agriculture. There are a few research studies

47
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz

Table 1. The Attributes and the Data Type of the


Biotic Factors of Harumanis Branches

Attributes Data Type

Lysimeter Nominal

Length of first whorl Continous Figure 2. The typical mango terminal stems
displaying the three whorls where the measurements
Length of second whorl Continous are taken from. The whorls are representing the
termination of each previous flush of vegetative
Length of Third Whorl Continous growth.
The Diameter of the Third Continous
Whorl
Machine learning techniques that were used in this work
Stem State Categorical are k-NN, Nave Bayes, SVM, CAT and RF. k-NN
algorithm is a technique used to classify the new
observations based on the closest neighbors labels. In this
technique the distance (similarity) between the test set and
There are 5 attributes that have been used in this research. the training set is used to determine the nearest-neighbor
These are lysimeter, that describe the type of lysimeter list. The appropriate number of k (the number of
that the trees are planted in, length of the first, second and neighbor instances to be compared) is important to be
the third stem whorl (length in mm of the 3 whorl
branches) and their diameters. determined. A small k value might cause the classifier to
be susceptible to over fitting because of noise in the
Lysimeter feature is assigned a v alue between 1 and 3 training data. On the other hand a large k value might
which represent the root zones of the mango trees. The cause misclassification because neighbors that are located
mango trees are planted in 3 different lysimeter sizes far from the neighborhood will also be included in the
which are micro-lysimeter (1), lysimeter (2) and classification decision.
unrestricted root zones (3). The micro-lysimeter has
dimension of 50 cm in deep and a diameter of 50 c m The k- NN models uses Euclidean distance metrics as
while the other lysimeters size with dimensions of 0.75 m shown in Equation 1 to get the nearest neighbors
deep and 1.5 m in diameter. Mango trees with unrestricted
root zones are planted directly into the ground.
d= ( x y )i i
2

Stem State describes the state of the stem. I t is assign a (Equation 1)


state value of flowering or vegetative. Stem State is
the class that will be used by the machine learning Algorithm for k-NN algorithm
techniques to learn while in the training process and build
an appropriate model. The test data set is used to validate Data : Training Samples D= { x1:N , c1:N}, Test Point
the models developed to predict the Stem State. x*.
Result : Class of new point c*
Figure 2 displays the features from the mango stem that
are measured and used to classify the flowering stems.
1: Let k be the number of nearest neighbors.
The stems whorls length are measured and recorded
accordingly. 2: for i = 1 to N do
3: calculate distance d(xi, x*) between x* and
every sample in D;
4: find xj, for which the distance the smallest, the set of
k closest training samples to x*;

5: y = arg max
v
I (v = y ),
( xi , yi )Dx*
i

6: end

48
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz

The nave bayesian classifier algorithm is one of the


classification algorithms where an instance is described
with n , attributes a i (i = 1 to n) and the classified Root Node
class v from the set of possible classes V is described
as follows:

n
v = arg max P v j ( ) P(ai v j ) Internal Node
v i V i =1
(Equation 2)

Compute a vector of elements Leaf Node Leaf Node Leaf Node

n
pj = P vj ( ) P(ai v j ) RF builds several classification trees to a d ata set and
i =1 combines the predictions from all the trees. A
(Equation 3) classification tree is fitted to each bootstrap samples from
the data. At each node, a small number of randomly
which, after normalization so that the sum of p j is equal selected variables are made available for binary
to 1, represents class probabilities. The class probabilities partitioning. In random forest each variable that is
and conditional probabilities (priors) in the above importance is measured.
formulae are estimates from the training data: class
probability is equal to the relative class frequency, while The classifier models using machine learning techniques
the conditional probability of attribute value given class is have been built and tested using 10 fold cross validation
computed by figuring out the proportion of instances with tests. The k-NN technique has been applied to build a
model to classify the flowering and vegetative branches.
a value of i-th attribute equal to a i among instances that The Euclidean metrics has been used as learning metrics.
from class v j . The classification accuracy with values from 0 to 1, where
0 is the worst classification and 1 is the best classification
Relative frequency is used when computing prior has been recorded. Two learner metrics in k-NN
conditional probabilities. So the total number of training technique performance are compared to find the better
examples is n and nc is the number of training example learner technique that could be used in the classification
that has the specific condition. The relative frequency model. The Learner metrics are Euclidean and Hamming
corresponding to the probability would be metrics. The results are displayed in Figure 4. Since the
Euclidean metrics show better learning ability, this metrics
nc has been used through out the training and testing of the
P= data.
n
(Equation 4) The other techniques, Nave Bayes, SVM, CAT and RF
have also been used to build the classifier models, tested
and compared to report the best technique that could
predict the flowering stems. The results are displayed and
SVM [17] is one of the techniques that is widely used in
discussed in the following section.
classification problems. SVM separates the classes
independently with the hyperplane that maximizes the
distance from a hyperplane separating the classes to the
nearest point in the data set.
3. RESULT AND DISCUSSION
CAT is an algorithm that splits the training instances
Figure 3 displays the result of k-NN technique
accordingly and builds a tree that consists of root node,
classification accuracy which varies the number of
internal nodes and leaf nodes as a model to be used to
neighbors from 1 t o 10, using 10 f old cross validation
classify the test examples. testing. The highest classification accuracy is achieved
using Euclidean metrics for k va lue from 1 to 7 which is

49
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz

0.6794. The accuracy decreases for k from 8 to 10, with SVM and Classification Trees classifier models
values from 0.6711 to 0.6589. On the other hand the outperform the other methods applied in this study in
classification accuracy of Hamming learning metric is predicting the flowering stems and also the vegetative
lower than that of Euclidean metrics. The highest accuracy stem with accuracy levels of more than 70%.
is achieved at k =9 where the accuracy is 0.6534.

4. CONCLUSION
The development of accurate yield prediction methods is
invaluable both to suppliers and the importers. This is
more critical in high value crops that can help in
communities and government to predict and plan
expenditure. The results presented in this paper show that
SVM and Classification trees classifier models outperform
other methods tested in this study in predicting the
flowering stems. The results also demonstrate that the
machine learning technique could be used to perform
classification on flowering and non flowering stems. This
classification algorithm can be used in Decision support
system that could predict tree yield every season using
Figure 3. The Classification Accuracy using k-NN biotic factors.
(k=1to10) and Different Leaning Metrics

Table 2 displays the classification accuracy for the various


classification models in predicting flowering branches. 5. ACKNOWLEDGMENT
Classification Trees classifier model outperforms the Special thanks to the UniMAP Agricultural Station, for
others with 77.95% accuracy rate followed by the SVM providing the samples for data collection. This project is
classifier model. Naives Bayes classifier model also funded by UniMAP Short Grant (9001-00423), University
achieve 72.43% accuracy rate. k-NN and Random Forest Malaysia Perlis. Rohani S. Mohamed Farook
achieve less than 70% accuracy. acknowledges the sponsorship provided by UniMAP.
Table 2. Classififation Accuracy, Sensitivity and
Specificity of the Classifier Models
6. REFERENCES
[1] Basso, B. Spatial validation of crop models for precision
Classifier Classification Sensitivity Specificity
agriculture. Agricultural Systems 68, 2 (2001), 97112.
Model Accuracy (%) (%) (%)
[2] Chacko, E.K. Physiology of vegetative and reproductive in
k-NN 67.94 29.79 82.50 mango (Mangifera indica L.) trees. Proc. 1st Australian
Mango Research Workshop, (1986), 5470.
Naives Bayes 72.43 65.96 76.25
[3] Davenport, T.L., Ying, Z., Kulkarni, V., and White, T.L.
SVM 74.86 85.11 68.75 Evidence for a translocatable florigenic promoter in mango.
Scientia Horticulturae 110, 2 (2006), 150159.
Classification 77.95 76.6 78.75
[4] Davenport, T.L. Reproductive physiology. In: Litz RE (ed),
Trees
The Mango, Botany, Production and Uses, 2008, 69146.
Random 66.17 26.6 89.38
[5] Elwell, D.L., Curry, R.B., and Keener, M.E. Determination
Forest of potential yield-limiting factors of soybeans using
SOYMOD/OARDC. Agricultural Systems 24, 3 (1987),
221242.
The sensitivity rates are also given in Table 2 where SVM [6] Farook, R.S.M., Aziz, A.H.A., Harun, A., et al. Data
outperforms the other techniques in classifying or Mining on Climatic Factors for Harumanis Mango Yield
predicting the flowering branches with 85.11% accuracy Prediction. Intelligent Systems, Modelling and Simulation,
followed by the Classification Trees at 76.6%. Naives International Conference on 0, (2012), 115119.
Bayes model achieves a lower Sensitivity rate at 65.96%.
The k-NN and Random Forest sensitivity is very low at [7] Fukuda, S., Spreer, W., Yasunaga, E., Yuge, K., Sardsud,
29.79% and 26.6%, respectively. V., and Mller, J. Random Forests modelling for the

50
Harumanis Mango Flowering Stem Prediction using Machine Learning Techniques
R.S.M. Farook, H. Ali, A. Harun, Ndzi. D. L., A. Y. M. Shakaff, Mahmad Nor Jaafar, Z. Husin, A.H.A. Aziz

estimation of mango (Mangifera indica L. cv. Chok Anan) cotton crop production as a basis for decision support
fruit yields under different irrigation regimes. Agricultural system in precision agriculture application. Applied Soft
Water Management, (2012). Computing Journal 11, 4 (2011), 36433657.

[8] Hernndez-Snchez, C., Luis, G., Moreno, I., et al. [14] Pomar, J. and Pomar, C. A knowledge-based decision
Differentiation of mangoes (Magnifera indica L.) support system to improve sow farm productivity. Expert
conventional and organically cultivated according to their Systems with Applications 29, 1 (2005), 3340.
mineral content by using support vector machines. Talanta
97, (2012), 32530. [15] Ramrez, F. and Davenport, T.L. Mango (Mangifera indica
L.) flowering physiology. Scientia Horticulturae 126, 2
[9] Hu, J., Li, D., Duan, Q., Han, Y., Chen, G., and Si, X. Fish (2010), 6572.
species classification by color, texture and multi-class
support vector machine using computer vision. Computers [16] Tan, P.-N., Steinbach, M., and Kumar, V. Introduction to
and Electronics in Agriculture 88, (2012), 133140. Data Mining. Addison Wesley, 2006.

[10] Nez-Elisea, R. and Davenport, T.L. Requirements for [17] Vapnik, V.N. Statistical Learning Theory. John Wiley &
mature leaves during floral induction and floral transition Sons, Inc., 1998.
in developing shoots of mango. Acta Hortic. 296, (1992),
3337. [18] Whiley, A.W. Environmental effects on phenology and
physiology of mango-a review. Acta Hortic. 341, (1993),
[11] Nez-Elisea, R. and Davenport, T.L. Flowering of mango 168176.
trees in containers as influenced by seasonal temperature
and water stress. Scientia Horticulturae 58, 1-2 (1994), 57 [19] Yang, C.-C., Prasher, S.O., Landry, J.-A., and
66. Ramaswamy, H.S. Development of a herbicide application
map using artificial neural networks and fuzzy logic.
[12] Nez-Elisea, R. Effect of leaf age, duration of cool Agricultural Systems 76, 2 (2003), 561574.
temperature treatment, and photoperiod on bud dormancy
release and floral initiation in mango. Scientia [20] Zheng, H. and Lu, H. A least-squares support vector
Horticulturae 62, 1-2 (1995), 6373. machine (LS-SVM) based on fractal analysis and CIELab
parameters for the detection of browning degree on mango
[13] Papageorgiou, E.I., Markinos, A.T., and Gemtos, T.A. (Mangifera indica L.). Computers and Electronics in
Fuzzy cognitive map based approach for predicting yield in Agriculture 83, (2012), 4751.

51

You might also like