Professional Documents
Culture Documents
series as per USDA Soil Taxonomy as in [8][9]. The solid Genetic Algorithms are optimization and
phase of soil can be divided into mineral matter and Machine Learning algorithms that are based loosely on the
organic matter. The mineral particles can be further process of biological evolution. Interest in genetic
subdivided into classes based on size. The classification of algorithms has increased recently in conjunction with an
soil particles according to size are Sand, Silt, Clay. The increase in interest in other algorithms based on natural
proposition of Sand, Silt, Clay present in soil determines processes, including simulated annealing and Neural
its texture. Networks. Genetic Algorithms have been widely used in a
wide variety of applications including combinational
A. Soil Data optimization and knowledge discovery. A Genetic
Algorithm proceeds by choosing chromosomes to serve as
The soil data used in this paper consists of 111 instances parents and then replacing members of the current
with 8 attributes like (i.e., Depth, Sand, Silt, Clay, population with new chromosomes that are (possibly
Sandbysilt, Sandbyclay, Sandbysiltclay, TextureClass). The modified) copies of the parents. The process of
texture of the Soil data is varied from sand to silty clay reproduction and population replacement goes on until a
loam where as in sub-surface horizons it varied from sand stopping criterion (the achievement of a performance
to clay as in [2]. Table 1 shows the different soil survey target or the usage of an allotted amount of CPU time for
symbols. instance) is met.
TABLE 1
SOIL SURVEY SYMBOLS
The classification of Soil data [10] using GATree
S Sand
tool, which is a decision tree builder which is based on
Sicl Silty Clay Loam Genetic Algorithms (GAs). GATree offers some unique
Sic Silty Clay features that are not found in any other tree inducers while
C Clay at the same time it can produce better results for many
Sl Sandy loam difficult problems.
Cl Clay loam
Sil Silty Loam B. Fuzzy Classification Rules
L Loam
Ls Loamy sand Fuzzy Logic was initiated to represent or
Scl Sand Clay Loam manipulate data and information processing, non statistical
Sc Sand Clay uncertainties. It is mathematically used to represent
uncertainty and vagueness and to provide formalized tools
for dealing with the impression intrinsic to many
III. CLASSIFICATION ALGORITHMS problems. Fuzzy Logic provides an inference morphology
that enables approximate human reasoning capabilities
Systems that construct classifiers are one of the that are applied to knowledge based systems. The theory of
commonly used tools in data mining. Such systems take a Fuzzy Logic provides a mathematical strength to capture
collection of cases as input, each belong to a small number the uncertainties associated with human cognitive
of classes described by a fixed set of attributes, and outputs processes, such as thinking and reasoning. The
a classifier that can accurately predict the class to which a development of fuzzy logic was motivated in large measure
new case belongs. by the need for a conceptual frame-work which can address
Datasets can have nominal, numeric or mixed the issue of uncertainty and lexical imprecision.
attributes and classes. Not all classification algorithms
perform well for different types of attributes, classes and Fuzzy classification rules are widely considered as
for datasets of different sizes. In order to design a generic a well suited representation of classification knowledge
classification tool, one should consider the behavior of with uncertainty and they allow readable and interpretable
various existing classification algorithms on different rule bases. The most important task in the design of fuzzy
datasets. In this paper, the data mining algorithms like classification systems is to find a set of fuzzy rules from
GATree, Fuzzy classification rules and Fuzzy C-Means training data with uncertainty to deal with a specific
algorithm are analyzed. These are applied to Soil data and classification problem.
then compared for efficient algorithm to classify soil
texture. We proposed an algorithm that was implemented
A. GATree in programming language ‘C’. It generated Fuzzy rules
from training data to deal the soil data classification by
first defining the membership functions for the input 4 0.67 2/3
attributes of the soil data, and then generating the initial 5 0.33 1/3
fuzzy rules for the training data [11]. 6 1 3/3
7 0.33 1/3
C. Fuzzy C – Means Algorithm 8 0.33 1/3
9 0.33 1/3
In unsupervised classification, class-labels are not 10 0.67 2/3
known. In our study, Fuzzy C-means algorithm is used to
deal Unsupervised data with uncertainty. The goal of
Fuzzy C-means algorithm is to group the objects into In the above table for the training size of 27 and
clusters based only on their observable features such that test size of 3 for different tests the accuracy and the
each cluster contains objects that share some important correctly classified instances is shown. The average
properties. accuracy for the instances is 0.46.
Most analytical fuzzy clustering algorithms are B. Fuzzy Classification Rules
based on optimization of the basic c-means objective
function, or some modification of it. Fuzzy C-Mean (FCM) The proposed Fuzzy Classification algorithm
clustering is grouped into n clusters with every data point presented in the paper [11] generates rules for the defined
in the dataset belonging to every cluster that will have a membership functions of the input attributes and we have
high degree of belonging or membership to that cluster and converted each training datum into a fuzzy rule. This
another data point that lies far away from the center of a algorithm is implemented in C language. Table 3
cluster will have a low degree of belonging or membership represents average accuracy rate and average number of
to that cluster. rules generated for the existing method [13] and the
proposed method [11].
We have applied fuzzy C-means algorithm for
Soil data which consists of 10 texture classes. The fuzzy TABLE 3
classifiers classify each texture class by clustering them. THE AVERAGE ACCURACY RATE AND AVERAGE NUMBER OF
This is implemented in MATLAB [12]. RULES
Method I II
IV. COMPARATIVE STUDY
Average 96.6% 99%
This section deals a comparative study of different accuracy rate
algorithms which are mentioned in section III. The Average number of 9.01 9.9
objective of this study is to know the efficient technique to rules generated
classify the soil texture.
The program that was developed to generate fuzzy
A. GATree classification rules was tested for complexity using the tool
SourceMonitor. The metric summary sheet that is
With the Genetic Algorithms, the binary Decision generated from the SourceMonitor tool is displayed below
Tree explains the prediction of the category of Soil data for your reference.
that is built first. To generate the decision tree for the
dataset it takes more time and then the rules are classified C. Fuzzy C-Means Algorithm
with that decision tree. The accuracy of Genetic Algorithm
is shown in the table 2. Genetic Algorithms are not Fuzzy C-means clustering is a partitioning
suitable for managing imprecise or uncertainty in soil data. method, carried out through an iterative optimization of
Genetic based Decision rules are effective for soil data. the objective function. In this method, each feature vector
representing the data has a degree of membership into all
TABLE 2
THE ACCURACY OF GENETIC ALGORITHM the clusters and the algorithm works to minimize an
Test Accuracy Correct Classified objective function. The Fuzzy C-Mean (FCM) clustering is
1 0 0/3 grouped into n clusters with soil data belonging to every
2 0.33 1/3 cluster that will have a high degree of belonging or
membership to that cluster and another data point that lies
3 0.67 2/3
far away from the center of a cluster that will have a low
degree of belonging or membership to that cluster. The Classification rules were used for supervised learning.
average accuracy to classify the clusters is 90.90 [12]. The However, classification based on Fuzzy rules gives much
average number of generated fuzzy rules with this method performance than GATree. For Unsupervised learning
is zero due to the fact that it is clustering technique. Fuzzy C-Means algorithm was used for classifying the soil
data. This helps one to classify soil texture based on soil
D. The accuracy of three Algorithms properties effectively, which influences fertility, drainage,
water holding capacity, aeration, tillage, and bearing
Table 4 represents the Accuracy and interesting strength of soils.
rules generated by GATree for general data and Fuzzy
Classification rules for uncertain data. It concludes that for REFERENCES
supervised soil data, Fuzzy Classification rules generated
more interesting rules than GATree. However, GATree is [1] Cunningham, S. J., and Holmes, G. (1999).
used for classification of classic soil data and Fuzzy Developing innovative applications in
classification rule method is used for classification of agriculture using data mining. In the
Fuzzy soil data. Proceedings of the Southeast Asia Regional
Computer Confederation Conference,1999.
TABLE 4 [2] A Thesis by D.Basavaraju, Characterisation and
THE ACCURACY AND RULES OF
GATREE AND FUZZY CLASSIFICATION RULES classification of soils in Chandragiri mandal of
Chittoor district,Andhra Pradesh.
Method Average Average [3] Jiawei Han and Micheline Kamber, Data Mining
accuracy number of concepts and Techniques, Elsevier..
rate rules [4] Minaei-Bidgoli, B., Punch, W. Using Genetic
generated Algorithms for Data Mining Optimization in an
Genetic Algorithm 46.67% 17 Educational Web-based System. Genetic and
Evolutionary Computation, Part II. 2003.
Fuzzy Classification 99% 9.9 pp.2252–2263.
Rules [5] Quinlan, J.R. C4.5: Programs for Machine
Finally, the accuracy rate of Supervised and Learning. Morgan Kaufman. 1993.
Unsupervised algorithms are given in table 7. Eventhough [6] Isbell, R. F. (1996). The Australian Soil
accuracy rate of Fuzzy Classification rules is higher than Classification.Australian soil and land survey
Fuzzy C-means algorithm; Fuzzy C-means algorithm is handbook. (Vol. 4).Collingwood, Victoria,
more suitable algorithm for Unsupervised Fuzzy soil data. Australia: CSIRO Publishing.
[7] Ibrahim, R. S. (1999). Data Mining of Machine
TABLE 5 Learning Performance Data. Unpublished
THE AVERAGE ACCURACY RATES OF Master of Applied Science (Information
SUPERVISED AND UNSUPERVISED ALGORITHMS Technology), Publisher; RMITUniversity Press.
Method Average [8] Soil Survey Staff 1998, Keys to soil taxonomy.
accuracy Eight Edition, Natural Resource Conservation
rate Services, USDA, Blacksburg, Virginia.
Genetic Algorithm 46.67% [9] Soil Survey Staff 1951, Soil Survey Manual. US
Fuzzy Classification 99% Department of Agricultural Hand book No.
Rules [10] 18.
Fuzzy Clustering 90.90% P. Bhargavi, and Dr. S. Jyothi , Soil
Classification Using GATree. International
journal of computer science & information
Technology (IJCSIT) Vol.2, No.5, October 2010.
V. CONCLUSION [11] P. Bhargavi, and Dr. S. Jyothi , Soil
Classification by Generating Fuzzy rules.
In this paper, we have compared the effectiveness International Journal on Computer Science and
of the classification algorithms Genetic algorithm, Fuzzy Engineering, Vol. 02, No. 08, 2010, 2571-2576.
classification and Fuzzy clustering. The achieved [12] P. Bhargavi, and Dr. S. Jyothi , Fuzzy C-Means
performances are compared and analyzed on the collected Classifier for Soil Data. International Journal of
Supervised and Unsupervised Soil data. GATree and Fuzzy Computer Applications (0975 – 8887), Volume