You are on page 1of 20

Disease Recognition in Sugarcane Crop Using Deep

Learning

Malik Hashmat Shadab1*, Mahavir Dwivedi2, SN Omkar4, Tahir Javed3,


Abdul Bakey3, Mohammad Raqib Pala3, Akshay Chakravarthy5

Indian Institute of Science, Bangalore, India.

hashmat.shadab.malik@gmail.com
mahaviredx@gmail.com
omkar@aero.iisc.ernet.in
tahirjmakhdoomi@gmail.com
mirbakey@gmail.com
raqib0027@gmail.com

Abstract. Crop diseases recognition is one of the considerable concerns faced by


the agricultural industry. However, recent progress in visual computing with im-
proved computational hardware has cleared way for automated disease-recogni-
tion. Results on publicly available datasets using Convolutional Neural Network
(CNN) architectures have demonstrated its viability. To investigate how current
state-of-the-art classification models would perform in uncontrolled conditions,
as would be faced on site, we acquired a dataset of five diseases of sugarcane
plant taken from fields across different regions of Karnataka, India, captured by
camera devices under different resolutions and lighting conditions. Models
trained on our sugarcane dataset achieved a top accuracy of 93.40% (on test-
set) and 76.40% on images collected from different trusted online sources,
demonstrating the robustness of this approach in identifying complex patterns
and variations found in realistic scenarios. Furthermore, to accurately localize the
infected regions, we used two different types of object-detection algorithms-
YOLO and Faster-RCNN. Both networks were evaluated on our dataset, achiev-
ing a top Mean Average Precision score of 58.13% on the test-set. Taking eve-
rything into account, the approach of using CNN’s on a considerably diverse da-
taset would pave the way for automated disease recognition systems.

Keywords: Computer Vision, Deep Learning, Object Detection, Crop diseases,


Automated Plant Pathology
2

1 Introduction

Food is one of the mainstays of human existence. Although there has been a boon in
the yield of food, yet the security of food is challenged by various factors. These include
increased temperature, plant diseases [1] in addition to other factors. Plant diseases af-
fect the agricultural yield of the country as well as the livelihoods of farmers who con-
tribute 80% of the total agriculture productions [2]. Fortunately, plant diseases can be
handled if we have a timely and accurate diagnosis.

The past plant disease diagnosis system was based on visually observing the symp-
toms in the sample and identifying the disease. This task required human expertise,
restricting its domain and making it difficult for farmers to avail. Also, a large variation
in the symptoms sometimes even made it difficult for the experts to identify the disease.
An Artificial Intelligence based system would offer valuable assistance in the diagnosis
of the disease through image-based observation. Advances in computer vision provide
an opportunity to enhance plant disease diagnosis and extend the field of computer vi-
sion for precision agriculture. Especially if the system is affordable, it would be practi-
cally possible for even the farmer to make a timely diagnosis of the disease and act
accordingly, apart from making the work of experienced pathologists precise and accu-
rate.

Deep learning for computer vision, particularly object classification and detection
has made a significant progress in recent years. The Large-Scale Visual Recognition
Challenge (ILSVRC) [4] based on ImageNet dataset [5] has been used as standard for
various computer vision problems. In 2012 AlexNet, a CNN [6] based deep learning
classification model achieved 16.4% top-5 error rate on the ImageNet dataset [7] beat-
ing all other state-of-the-art classification models which were non-CNN based. Subse-
quent progress in deep convolution networks has brought the error rate down to 3.57%
[7-10] illustrating the technical feasibility of the deep-learning based approach for plant
disease detection and the reliability of the results. The training of these networks takes
quite a bit of time but once trained, these networks can make real time classification of
images, suiting consumer applications on smartphones. Smartphones in particular pro-
vide a novel approach in the deployment of these systems partly because of their com-
putational power, advanced cameras, high-resolution displays and partly because of
their affordability. This will make it technically feasible for the farmers to diagnose the
plant disease from a mere picture of a plant leaf. In fact, it is established that that around
6 billion smartphones would be available by 2020 [3].

We focused our work on Sugarcane. It is a kind of crop that has experienced a con-
siderable increase in terms of production especially in India. India has become the larg-
est producer and consumer of sugar in the world. This has become possible because of
45 million sugarcane farmers and a large agricultural force which constitute about 7.5%
of the total rural population [11]. This indicates the vitality of sugarcane both in the
economy of the country and the sustenance of a large number of sugarcane cultivators,
majority of them being small scale. Unfortunately, the onset of diseases still remains a
3

threat in the large-scale production of this crop mostly because of the lack of patholog-
ical facilities. Moreover, majority of the cultivation takes place in rural areas and culti-
vators fail to have an accurate, timely diagnosis of the diseases eventually leading to
damage of the crop. Fortunately, it turns out that this problem can be solved if we have
an effective means of timely diagnosis of the disease.

In this paper, we present an effective and reliable way of diagnosing major sugarcane
diseases which directly affect the crop. We present a classification report on 5 diseases
(or their absence) using 2940 images with a convolutional neural network approach.
We based our work on the physical characteristics and symptoms of each disease. We
measure the performance of the models VGG-19, Resnet-34, Resnet-50, based on their
ability to predict the correct disease (or their absence). Going a step further, we localize
the exact diseased spots in the images of the leaves for 4 classes, thus distinguishing
the infected areas from the rest of the leaf. For this purpose, we used two powerful
detection models Faster R-CNN [12] and YOLOv3 [13]. Our results are a big step to-
wards an automated plant disease diagnosis system.

This paper is organized as follows. In section 1 we briefly review related work in the
field of image processing and specifically in crop disease recognition. The Sugarcane
dataset is introduced in Section 3. Section 4 & 5 present the approach followed for
classification and detection task respectively, as well as the achieved results. Conclu-
sion and future work are given in section 6.

2 Related Work

In recent years CNNs [6] have shown a tremendous advancement and are being applied
in different domains including crop disease detection. Mohanty et al. [21] used two
deep learning architectures GoogleNet [9] and AlexNet [7] on Plant-Village-Dataset
[22] to identify 26 diseases among 14 crops, achieving a peak accuracy of 99.35% at
test-time. Working on the same dataset [23] shows a test accuracy of 90.4% using a
VGG-16[24] model. Another work [25] also uses a deep learning system to identify 13
types of diseases in 5 crops using images from internet achieving an accuracy up to
96.3%.

2.1 PlantVillage Dataset


The Plant-Village-Dataset [22] contains 54,306 images distributed among 38 classes.
Disease names are used as class labels. The dataset in some cases has more than one
image of a leaf which vary in orientation. All the images have been taken in a controlled
environment in the laboratory.
4

3 Sugarcane Dataset

Although all the works mentioned above show prolific results but the problem with
these works is that either the images were downloaded from internet [25] or the images
were taken in controlled environment in laboratory [21, 23], questioning their applica-
bility in real world where we may encounter numerous variations in images. [21] has
also mentioned that the accuracy of their work drops substantially to 31.4% when test-
ing is done on images taken under different conditions from the one under which train-
ing images were taken (laboratory conditions). To encounter all these, we obtain a more
realistic dataset of sugarcane for real world applicability.

The dataset contains 2940 images of sugarcane leaves belonging to 6 different clas-
ses (consisting of 5 diseases and 1 healthy). These include major diseases that affect
the crop in India. All the images were taken in natural environment with numerous
variations. The images were taken at various cultivation fields including University of
Agricultural Sciences, Mandya Bangalore and nearby farms belonging to farmers. All
the images were taken using phone cameras at various angles, orientations, back-
grounds accounting for most of the variations that can appear for images taken in real
world. The dataset was collected with the company of experienced pathologists (section
6). For localizing the infected spots on the leaves (object detection) corresponding to
four diseases, we manually annotated the dataset. Most of the images in the dataset
contain multiple infected spots of varying patterns. All these spots were individually
annotated using different patches accordingly. The distribution of the images into dif-
ferent classes is detailed in Table 1. Fig 1 shows the classes used for classification and
detection and their corresponding distribution.

Table 1. Distribution of Images into different classes

S. No Class Count

1 Cercospora Leaf Spot 346

2 Helminthosporium Leaf Spot 410

3 Rust 382

4 Red Rot 454

5 Yellow Leaf Disease 420

6 Healthy 928
5

1(a) 1(b)

Fig. 1. Distribution of images into various classes used for classification 1(a) and detec-
tion 1(b). The letters represent diseases, ‘H’ denotes Helminthosporium leaf spot, ‘RR’
Red Rot, ‘C’ Cercospora, ‘R’ Rust and ‘Y’ denotes Yellow leaf disease

1 2 3

4 5 6

Fig. 2. Example of leaf images from our dataset, representing every class. 1) Helmin-
thosporium Leaf Spot 2) Red Rot 3) Cercospora Leaf Spot 4) Rust 5) Yellow Leaf Disease
6) Healthy
6

4 Classification
4.1 Approach

We used three popular architecture Resnet-50[10], Resnet-34[10], VGG-19[24] and as-


sessed their applicability for the said classification problem. All these architectures are
CNN [6] based and are discussed below.

VGG-19 [24] follows a simple architecture, a set of stacked convolution layers fol-
lowed by fully connected layers. Concretely it contains 16 convolution layers and three
fully connected layers, finally followed by a SoftMax layer. It only uses 3x3 convolu-
tions with both stride and padding of 1. Set of stacked convolution layers also contain
MaxPool layers which use a 2x2 filter and a stride of 2. It contains around 143.6 million
parameters. It is appealing because of its uniform structure, yet achieving high accu-
racy, though it comes at the cost of slow training. It uses ReLu activation throughout.
This architecture ended as runner-up at the ILSVRC [4] (2014) competition.

ResNets [10] introduce skip connections (or shortcuts) to fit the input from the pre-
vious layer to the next layer without any modification of the input. This architecture
emerged as the winner of ILSVRC [4] (2015) in image classification, detection, and
localization. In addition to using the architecture of a set of stacked convolution layers,
finally followed by a fully connected layer, it uses skip connections or shortcuts. It
doesn’t rely on stacked layers to obtain the underlying mapping rather a residual it lets
these layers fit a residual mapping. If the desired underlying mapping is denoted as
H(x), it lets the stacked non-linear layers fit another mapping F(x) related to H(x) as:

F(x) = H(x) - x (1)

This relies on the fact that it is easier to optimize the residual mapping than to opti-
mize the original, unreferenced mapping. Even in the extreme case, if an identity map-
ping were optimal, it would be easier to make the residual function zero rather than to
an identity function by a stack of nonlinear layers. ResNets also use 3x3 convolutions.
ResNet-34 and ResNet-50 are two variants of this architecture, which are same except
for the number of layers, former having 34 while latter has 50. Both these variants have
considerably less parameters than VGG-19, ResNet-34 having 21.8 million and Res-
Net-50 having 25.6 million parameters.

We evaluate the performance of our models in two ways, by training the entire model
in one case, and only the fully connected part in other case. Transfer learning was used
in both cases starting from pretrained weights on the ImageNet dataset. Here also, we
note that weights obtained from training the fully connected part only were used as the
starting point for training of entire network. To sum up, we have a total of 6 experi-
mental configurations depending on following parameters:
7

Deep Learning Architecture:


VGG-19
ResNet-34
ResNet-50

Training mechanism:
Training only Fully Connected layer
Training all Layers

The LR (learning rate) range test showed 0.01 was a reasonable choice for base
learning rate for the fully connected layer in all cases. Then using differential learning
rate, the initial convolution layers used a base learning rate 0.0001 and the later ones
used 0.001. This also ensured a fair enough comparison between the results as rest of
the hyperparameters were same in all configurations. To get best accuracies possible
on these datasets, following techniques were used while training the networks, solver
being SGDR [26] (Stochastic Gradient Descent with Restarts) in all cases:

Differential Learning Rate [27]. We first found the optimal base learning rate for the
fully connected layers of the network (training only fully connected layers) by plotting
loss vs epochs for a few epochs as shown in Fig. 3 and choosing a value in the optimal
LR (learning rate) range. 1e-2 turned out to be a reasonable value in all cases. The base
learning rates of the network are set in a way such that it increases from initial to later
convolution layers and from later convolution layers to fully connected part, by order
one in both cases. Using different learning rates for different parts of the network allows
us to control the rate at which weights change for each part of our network during train-
ing.

Cyclic Learning Rate [26]. Starting from its base value, learning rate Cosine annealed
its way to zero, completing one cycle in one epoch. Cycling the learning rate allows the
network to get out of spiky minima and enter a more robust one. While annealing the
learning rate allows it to traverse quickly from the initial parameters to a range of good
parameters and perform a more thorough search of the weight-space as it approaches
the minima, accounting for substantial increase in performance of the network.

Test Time Augmentation. At test time we performed a 1.1x zoom of the original image
and took 5 random crops of it. The network was evaluated on these augmented images
and the network generalized exceptionally well as depicted by test accuracies in Table
1.
8

Fig. 3. Learning Rate Range Test for Resnet-34 (shows 1e-2 to be a reasonable value)

Solver: Stochastic Gradient Descent with Restarts (SGDR) [26]


Learning Rate Policy: Cyclic Learning Rate, cosine annealing from base learning
rate to zero (completes one cycle in one epoch)

All the above trainings were done using fastai, which is an open source library for
deep learning.

4.2 Results

PlantVillage Dataset. Since this dataset is publicly available, we also did classification
work on it, training three well known CNN architectures viz VGG-19[24], Resnet-
34[10], Resnet-50[10], achieving exemplary accuracies up to 99.84% at test time. All
the results are shown in Table 2. The results are taken from the best epoch. Here, we
note that weights obtained from training the fully connected part only were used as the
starting point for training of entire network. Loss plots for all networks are given in Fig
4.
9

Table 2. Accuracies for different networks.


Testing Accuracy
Testing Accuracy Testing Accuracy
(Training all layers,
Model (Training FC layers (Training all layers,
testing with augmenta-
only, for 5-10 epochs) 10-15 epochs)
tion)

VGG-19 96.66% 99.55% 99.72%

ResNet-34 97.13% 99.62% 99.68%

ResNet-50 97.41% 99.81% 99.84%

Fig. 4. Loss plots for different architectures on Plant-Village-Dataset


10

Sugarcane Dataset. The overall accuracy we obtained on our dataset ranged from
83.20% (VGG-19: Training only FC layers) to 93.2% (Resnet-50: Training all layers
and then testing with augmentation). All the networks were trained using 80:20 train-
test split.

We used mean F1 score, mean recall [28], mean precision [28], and overall accuracy
as the metrics for evaluation of our models. All these metrics were calculated on the
entire test-set. These metrics are briefly discussed here:

Mean Precision. Precision is the measure of ability of classifier not to label a negative
sample as positive. Mathematically,
True Positives(for that class)
Precision(for particular class) =
True Positives(for that class)+False Positives(for that class)
(2)

Averaging this precision score over all classes (in our case 6) gives us the mean
precision.

Mean Recall. Recall measures the ability of the classifier to find all the positive sam-
ples. Mathematically,
True Positives(for that class)
Recall(for particular class) =
True Positives(for that class)+False Positives(for that class)
(3)

Averaging this recall score over all classes gives the mean Recall.

Mean F1 Score. F1 score considers both precision and recall. Mathematically,

2 ∗ Recall(for that class) ∗ Precision(for that class)


F1 score(for particular class) =
Recall(for that class) + Precision(for that class)
(4)

Averaging these scores over all classes gives the mean F1 score.

Overall Accuracy. It is a simple measure for evaluating the performance of a model. It


is the ratio of the total number of items correctly classified to the total number of items
present in the test set.

Table 3 shows these metrics on the test set across all our experimental configurations
(taken from best epoch). All the networks ran for a total of 10 epochs when training
only the fully connected layers and 15 epochs when training all the layers. To check
over-fitting, we obtained a small dataset of 52 images from trusted online sources be-
longing to these 6 classes and tested our best predicting model Resnet-50 on these im-
ages and we obtained an overall accuracy of 76.4% illustrating the generalization of
our model. The results achieved are significantly better than [21]; they achieved an ac-
curacy of 31.4 % on a similar test setup.
11

We note that these promising results were obtained on a rather small dataset con-
taining only 2940 images, suggesting results will only get better if a reasonably large
dataset is used. Loss plots for the networks are shown in Fig 6.

Table 3. Mean metrics (precision, recall, F1) and overall accuracies across various experimental
configurations.
Overall Testing Ac-
Testing Ac- curacy
Testing
curacy (Training
Mean Mean Accuracy
Mean (Training all layers,
Model Preci- F1 (Training
Recall all layers, testing
sion Score only FC
testing with without
layers)
augmenta- augmenta-
tion) tion)

VGG-19 0.9044 0.9087 0.9026 0.9200 0.9120 0.8320

ResNet-34 0.9095 0.9107 0.9066 0.9240 0.9240 0.8360

ResNet-50 0.9260 0.9200 0.9213 0.9320 0.9280 0.8600

(a) Example image of a leaf from our test da- (b) Visualization of activations in the first
taset suffering from Helminthosporium Leaf convolution layer of Resnet-50 architec-
Spot ture trained using 80:20 split when doing a
forward pass on the image in Figure 5(a)

Fig. 5. Visualization of activations in the initial layer of Resnet-50 architecture depicting that
the model has efficiently learnt to activate against diseased spots on the example leaf.
12

ResNet-50

ResNet-34

VGG-19

Fig. 6. Loss plots for different architectures on our dataset (Decrease in validation loss
depicts the learning of the networks)
13

5 Detection

5.1 Approach
To accurately recognize infected regions in the images, two state-of-the art detections
networks, YOLOv3 and Faster-RCNN will be used for evaluation. The above-men-
tioned models are significantly swift than models which preceded it, such as R-CNN
[15] and Fast R-CNN [16]. [15, 16] used a more time-consuming method to find the
regions on which CNN will be passed separately for classifying the label, known as
selective search [17]. A few thousand regions of interest were generated by [17], which
were then separately passed to the network for classification. This method made [15],
[16] ill-suited for real-time inference. In case of Faster-RCNN, region proposals are
predicted on a convolutional feature map using a region proposal network (RPN) [12]
after passing image through a CNN. After that an ROI layer helps in finding the class
suitable to a proposed region. Region proposals are generated in significantly less time
and it takes around 2.3 seconds to make an inference [18]. Faster- RCNN with a VGG-
16 backbone [12] when evaluated on the PASCAL VOC dataset [19] achieved a mAP
of 0.76 at 5 frames/second. YOLOv3 does all in one pass through the network. Instead
of different modules for predicting regions and localizing the object, YOLOv3[13] does
all this in a single CNN. YOLO [14] performs real time detection at 45 fps, yet achiev-
ing a comparable mAP of 63.4% [20] when trained and tested on the above-mentioned
dataset [19]. The feature extractor for YOLOv3 is a 106-layer CNN block, a variant of
Darknet. For our work we used Faster-RCNN with VGG-16 as its backbone and
YOLOv3.

We evaluated the implementation of these architectures on our dataset by training


the entire model starting from pretrained weights for the convolutional block on
ImageNet dataset [4] in both the models. Faster R-CNN was trained on 600x1000 res-
olution images for 15 epochs and tested on same resolution images. YOLOv3 was
trained on 416x416 resolution images for 6000 iterations and tested on 416x416 and
608x608 resolution images. Faster R-CNN was trained using [29] implementation
while YOLOv3 was trained using [30] implementation.

Class
Classi-
Input Image Feature fier
Extractor Localiza-
tion

Fig. 7. Pictorial representation of detection process.


14

5.2 Results
The most common metrics used in examining the performance of these models is mean
average precision(mAP) [20]. Metrics is briefly discussed below:

mean Average Precision(mAP) [20]. It evaluates across different recall values the av-
erage of maximum precisions. If the Intersection over Union (IoU) of the prediction
which matches the ground truth label is ≥ 0.5, it is considered as true positive. To cal-
culate mAP, we first calculate the maximum precision values APr for all classes indi-
vidually at 11 recall values viz 0, 0.1, 0.2, ..., 0.9 and 1.0. The maximum precision
(APr) at any recall value ȓ is the highest precision value for any recall ≥ ȓ. Then we
calculate AP (average precision), again for all classes individually, which is the average
of these 11 precision values.

(5)

mAP is the average of these APs over all the classes. As mentioned, we used an IoU
threshold of 0.5 for both the models. We also used other metrics for assessing the per-
formance of our models including precision (2), recall (3) and F1 score (4). All the
scores are on the entire test-set.

YOLOv3 trained across a set consisting of images of 416 x 416 resolution for 6000
iterations, produced mAP score of 51.73% on the validation set. When tested across a
set of 608 x 608 images, score was improved by 2.38 percent to 54.11%, showing that
although trained on a lower resolution images, evaluation at a relatively higher resolu-
tion leads to better results in this case. We believe it is due increase in accuracy of
detecting small-scale regions when inference is done on a higher resolution image. Re-
sults on the model are detailed in Table 4 and 5, mean precision and mean recall are
evaluated on a withheld validation set at different confidence thresholds to find a suit-
able trade-off between the two. Faster R-CNN trained on 600 x 1000 Resolution im-
ages for 15 epochs produced a mAP score of 58.30%, performing a bit better than
YOLOv3 in terms of accuracy as expected. However, significant time reduction in
YOLOv3 during inference makes it more suitable for implementation in automated dis-
ease recognition systems.
15

Fig. 8. mAP score plots of the models used for detection

Table 4. Results produced by YOLOv3 on 416x416 resolution images at different thresholds.

Threshold Precision Recall F1-score Avg. IOU

0.00 0.22` 0.60 0.32 14.57%

0.25 0.58 0.46 0.51 39.24%

0.40 0.64 0.43 0.52 43.46%

0.50 0.67 0.43 0.52 45.53%

0.60 0.69 0.42 0.52 46.79%

0.70 0.70 0.40 0.51 47.75%

Table 5. Result on YOLOv3 (608x608)

Threshold Precision Recall F1-score Avg. IOU

0.00 0.15 0.65 0.24 9.96%

0.25 0.45 0.47 0.46 30.23%

0.40 0.49 0.44 0.46 33.25%

0.50 0.54 0.41 0.47 36.39%

0.60 0.56 0.40 0.47 38.49%

0.70 0.61 0.39 0.48 41.55%


16

9(a)

9(b)

Fig. 9. Visualizations of the predictions made by Faster R-CNN 9 (b) when an image from
our test set 9(a) was passed through it

Fig 9(b) shows the results produced by Faster R-CNN when image in 9(a) was passed
through it. The predicted image contains more predictions (bounding boxes) of diseased
spots than in the input annotated image 9(a). As seen, the model is bounding even those
diseased spots that were not annotated in the input image. This shows that the model
has learnt the diseased spots thoroughly. This difference between the number of pre-
dictions of diseased spots made by the model and in the input image, although reduces
the mAP score but simultaneously illustrates the accuracy of the model in predicting
the diseased spots. Moreover, since the images were taken in realistic conditions, they
showed wide variations in the patterns of diseased spots leading to variable sized
17

bounding boxes in the training images as in 9(a), sometimes encompassing the disease
specifically and sometimes a bunch of them altogether. The model learns all these var-
iations and predicts all types of bounding boxes as in 9(b). Since the mAP score [20] is
dependent on the mapping of bounding boxes between the input and the predicted im-
age, this does reduce the mAP score statistically, however the real time results 9(b)
depict the robustness of the model. The training set contains a lot of these types of
images. Keeping all this in view and the small dataset, the scores are quite reasonable.
Given a larger dataset and better annotations of training data with a even distribution
of objects at different scales, the statistics are only expected to improve.

Fig. 10. Visualizations of predictions made by YOLOv3 on held out test set.
18

6 Conclusion

The conventional approaches of finding anomalies in plants by supervision of expert


pathologists is a difficult and time-consuming task. the availability of experts and the
time taken in the process can delay immediate identification of diseases. CNN’s have
been found to be very robust in finding visually observable patterns in images and the
growth of computational hardware has made it viable to utilize them. Using CNN’s for
finding visible anomalies in plants will result in faster identification and quicker inter-
ventions to subdue the effects of diseases on the plants. Our work focused on evaluating
current state-of-art classification architectures on a publicly available dataset of plant
diseases comprising of 54306 images of different plants and exhibiting the contrast in
test accuracy when the trained model is tested on images taken in controlled and un-
controlled conditions respectively. Performance decreases manifold when evaluated on
images from different sources. To circumvent this, we introduced sugarcane dataset,
which has been collected by taking into account the issues which would be faced in
identification of diseases on site. CNN models reached an accuracy of 93.20% on this
dataset. The results of our model on images collected from different sources showed a
significant improvement compared to performances reported in literature (e.g. Mohanty
et al. 2016 [13] 31% accuracy for a problem with 38 plant disease classes), we achieved
an accuracy of 76.4% with six plant disease classes. These results show huge potential
of these deep learning models and allude to the fact that for robust plant disease identi-
fication system, a more diverse set of training data from different areas and under dif-
ferent conditions is needed.

Furthermore, it is apparent localizing the infected region will be the next step in
progression from coarse to refine inference. Both models, Faster-RCNN and YOLOv3
were trained on a subset of sugarcane dataset. In context of the diversity and uncon-
trolled conditions of the dataset, both frameworks showed promising results in success-
fully detecting five different diseases in sugarcane. Larger dataset for both tasks, we
believe would improve the results further.

Acknowledgements. We thank College of Agriculture, Mandya, Bangalore for helping


us in the collection of Sugarcane dataset and providing expertise for identifying the
different types of diseases that the sugarcane plants were suffering from while collect-
ing the images of their leaves.

7 References
1. Strange RN, Scott PR (2005) Plant disease threat to global food security. Phytopathology 43
2. UNEP (2013) Smallholders, food security and the environment
3. CNBC News Channel https://www.cnbc.com/2017/01/17/6-billion-smartphones-will-be-in-
circulation-in-2020-ihs-report.html
19

4. Russakovsky O et al. (2015) ImageNet Large Scale Visual Recognition Challenge. Interna-
tional Journal of computer Vision (IJCV) 115(3): 211:252
5. Deng J et al. (2009). ImageNet: A large-scale hierarchical image database in Computer Vi-
sion and Pattern Recognition, 2009, IEEE Conference on. (IEEE), pp. 248-255
6. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-Based Learning Applied to doc-
ument recognition
7. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolu-
tional Neural Networks in Advances in Neural information processing systems. pp. 1097-
1105
8. Zeiler MD, Fergus R (2014) Visualising and understanding convolutional networks in Com-
puter Vision-ECCV 2014. (Springer), pp. 818-833
9. Szegedy C et al. (2015) Going deeper with convolutions in Proceedings of the IEEE confer-
ence on Computer Vision and Pattern Recognition. pp. 1-9
10. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. ArXiv
preprint: 1512.03385
11. Nandhini T.S D, Padmavathy V (2017) A Study on Sugarcane Production in India Interna-
tional Journal of Advanced Research in Botany, 3(2): 13-17 doi:
http://dx.doi.org/10.20431/2455-4316.0302003
12. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: Towards Real-Time Object Detection
with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149
13. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint
ArXiv:1804.02767.
14. Redmon J, Divvala S, Girshick R,Farhadi A (2016) “You only look once: Unified, real-time
object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 779– 788.
15. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object
detection and semantic segmentation. arXiv preprint, arXiv:1311.2524
16. Girshick R (2015) Fast R-CNN. arXiv preprint, arXiv:1504.08083
17. Uijlings J, Sande K, Gevers T, Smeulders A (2013) Selective Search for Object Recognition
18. https://webcourse.cs.technion.ac.il/236815/Spring2016/ho/WCFiles/RCNN_X3_6pp.pdf
19. Everingham M, Eslami S, Gool L, Williams C, Winn J, Zisserman A (2014) The PASCAL
Visual Object Classes (VOC) Challenge: A Retrospective Int J Comput Vis (2015) 111:98–
136 DOI 10.1007/s11263-014-0733-5
20. Beitzel SM, Jensen EC, Frieder O (2009) MAP In: LIU L, ÖZSU M.T (eds) Encyclopedia
of Database Systems. Springer, Boston, MA
21. Mohanty S.P, Hughes D.P, Salathé M (2016) Using deep learning for image-based plant
disease detection. Front. Plant Sci. 7 http://dx.doi.org/10.3389/fpls.2016. 01419. Article:
1419.
22. Hughes DP, Salathé M (2015) An open access repository of images on plant health to enable
the development of mobile disease diagnostics. CoRR abs/1511.08060.
23. Wang G, Sun Y, Wang J (2017) Automatic Image-Based Plant Disease Severity Estimation
Using Deep Learning. Comput. Intell. Neurosci. 2017:2917536. doi:
10.1155/2017/2917536.
24. Simonyan K, Zisserman A (2014) Very Deep Convolutional Networks for Large-Scale Im-
age Recognition. ArXiv:1409.1556v6
25. Sladojevic S, Arsenovic M, Anderla A, Culibrk D, Stefanovic D (2016) Deep Neural Net-
works Based Recognition of Plant Diseases by Leaf Image Classification. Comput. Intell.
Neurosci. 2016:3289801. doi: 10.1155/2016/3289801
20

26. Smith L N (2015) Cyclic Learning Rates for Training Neural Networks.
ArXiv:1506.01186v6
27. Fastai deep learning course lesson 1. https://course.fast.ai/videos/?lesson=1
28. https://www.biostat.wisc.edu/~page/rocpr.pdf
29. https://github.com/chenyuntc/simple-faster-rcnn-pytorch
30. https://github.com/AlexeyAB/darknet

You might also like