You are on page 1of 6

Mining Wiki Usage Data for Predicting Final Grades of Students

Gkhan Akapnar, Erdal Cogun, Arif Altun


Hacettepe University
gokhana@hacettepe.edu.tr, erdal.cosgun@hacettepe.edu.tr, altunar@hacettepe.edu.tr

Abstract
This study aims to predict students final grades (A, B, C, D and F) based on their wiki
usage data. Usage data are stored in wiki database in a limited way when default settings are
used. Therefore an extension is developed to extend its capability to log students login and
navigation data. A tool is developed for extracting information from this data and preprocessing of it. Dataset includes server side wiki usage log of 81 students throughout 3
months. Classification performance of Random Forest, Support Vector Machines, Naive
Bayes and Boosted Classification Tree algorithms are compared for classifying students. Tenfold cross validation is used to evaluate the performance of the models. According to our
findings, SVM outperforms other methods with the best classification performance.
Keywords: wiki, classification, educational data mining, predicting final grade.
Main Conference Topic: New Trends and Experiences, Educational Data Mining
Introduction
Using wiki in online learning environments is increased in recent years especially with
the increasing demand on collaborative learning. Wikis can be used in the following fields to
support learning: in-class collaboration, group projects outside of class, collaborative
environment for learning from peers, peer and teacher feedback and review, and assessment
and management of group performance [1]. Although wiki has a great potential for online
learning environments, assessment of the individual contribution is difficult and time
consuming if traditional methods are used as many students can be contributed to the content
creation. On the other hand there are lots of students - system interaction data are stored in
wiki database like other online learning environments (e.g. forum, lms, vle). By analyzing
these data with the help of statistical and data mining (DM) techniques, lots of useful
Information can be extracted for tutoring, assessment or understanding of learning and learner
behavior [2, 3].
Educational data mining is one of the remarkable research areas which has emerged in
recent years and defined as the application of data mining techniques to dataset that come
from educational settings to address important educational questions [4]. According to
Romero and Ventura [2] these questions includes Analysis and Visualization of Data,
Providing Feedback for Supporting Instructors, Recommendations for Students, Predicting
Students Performance, Student Modeling, Grouping Students, etc. To answer these
questions, educational data mining research uses different DM methods such as Prediction,
Clustering, Relationship Mining, Discovery with Models and Distillation of Data for Human
Judgment [5].
Among others one of the key application of educational data mining is predicting
students performance. Prediction of a students performance is one of the oldest and most
popular applications of DM in education, and different techniques and models have been
applied so far [2].
In a recent study Lopez et al. [6] demonstrates the potential of the classification via
clustering approach to predict students final marks (passed or failed) on the basis of their
participation in forums. Their results showed that student participation in the course forum
1

was a good predictor of the final marks for the course. Fausett and Elwasif [7] found that
neural networks can be trained to predict students' grades in Calculus I based on their
placement test responses. They used student's test response pattern as input and the grade in
Calculus I as the target responses. Martinez [8] suggested that student pre-college
assessment data can be used to predict academic success (a grade of A, B, or C) in community
college courses with discriminant function analysis. Minaei-Bidgoli and Punch [9] presented
an approach for classifying students by using genetic algorithms to predict their final grade
based on logged data in online learning environment. Superby et al. [10] used classification
for predicting factors influencing the academic success of the first-year university students by
means of discriminant analysis, neural networks, random forests and decision tree. Kotsiantis
et al. [11] compared six different machine learning algorithms for predicting students marks
(pass or fail) in Hellenic Open University data. They also compared six regression algorithms
to predict students marks on similar data [12]. Delgado et al. [13] implemented a neural
network to Moodle access logs and trained trying to predict the surpass of a course from the
students. The model proposed by these authors showed that it is possible to predict those
students with problems to pass a course. Two recent studies compared different data mining
methods and techniques for classifying students based on students Moodle interaction data
for predicting the final marks obtained in the course [14, 15].
In this study we sought to examine the extent to which we can predict students course
grades (A, B, C, D and F) on the basis of their wiki usage. MediaWiki was used as the wiki
engine. MediaWiki is a free, open source and easy to use wiki engine for creating wiki based
web sites. We developed an extension to log students login and navigation data which are not
tracked in default configuration.
Background
We applied four of the most commonly used classification algorithm for predicting
students final grades and compared their prediction performance. The following paragraphs
describe these methods briefly.
Random Forest: A random forest is a decision tree ensemble classifier, with each tree
grown using some type of randomization. Random forests have a capacity for processing huge
amounts of data with high training speeds, based on a Classification and Regression Tree
(CART) [16]. CART is a simple statistical tool applying recursive binary partitioning of the
feature space. CART is well known for its efficiency in coping with large data sets. However,
as the data become noisier, and less information is contained in each variable, the predictive
ability of CART diminishes. RF overcomes this problem by introducing random elements into
the model by which subsets of variables are chosen at random and bootstrap samples are
selected with replacement for tree growing [17].
For each classification tree, a bootstrap sample is drawn from the original samples
[18]. At each non-leaf node of a classification tree, the best split feature is selected from a
small random subset of the original features. When the forest receives an input vector, each
classification tree casts a unique vote, the final prediction is determined by the majority votes
of all the trees in the random forest. Since the bootstrap sample is drawn with replacement,
the samples which are not in the bootstrap samples are called out-of-bag (OOB) data [18, 19].
Boosted Classification Tree: The algorithm for Boosting Trees evolved from the
application of boosting methods to regression trees. The general idea is to compute a
sequence of simple CARTs, where each successive tree is built for the prediction residuals of
the preceding tree. This method will build binary trees, i.e., partition the data into two samples
at each split node. We suppose that user were to limit the complexities of the trees to 3 nodes
2

only: a root node and two child nodes, i.e., a single split. Thus, at each step of the boosting
(boosting trees algorithm), a simple (best) partitioning of the data is determined, and the
deviations of the observed values from the respective means (residuals for each partition) are
computed. The next 3-node tree will then be fitted to those residuals, to find another partition
that will further reduce the residual (error) variance for the data, given the preceding sequence
of trees.
It can be shown that such "additive weighted expansions" of trees can eventually
produce an excellent fit of the predicted values to the observed values, even if the specific
nature of the relationships between the predictor variables and the dependent variable of
interest is very complex (nonlinear in nature). Hence, the method of gradient boosting - fitting
a weighted additive expansion of simple trees - represents a very general and powerful
machine learning algorithm [20].
Support Vector Machines: SVMs are a relatively new computational learning methods
based on the statistical learning theory presented by Vapnik [21]. In SVMs, original input
space mapped into a high-dimensional dot product space called a feature space, and in the
feature space the optimal hyper plane is determined to maximize the generalization ability of
the classifier. The maximal hyper plane is found by exploiting the optimization theory, and
respecting insights provided by the statistical learning theory [22].
Nave Bayes: Bayesian classiers are statistical classiers. They can predict class
membership probabilities, such as the probability that a given sample belongs to a particular
class. Bayesian classier is based on Bayes theorem. Naive Bayesian classiers assume that
the effect of an attribute value on a given class is independent of the values of the other
attributes. This assumption is called class conditional independence. It is made to simplify the
computation involved and, in this sense, is considered nave [23].
Let X = (x1, x2, . . . , xn) be a sample, whose components represent values made on a
set of n attributes. In Bayesian terms, X is considered evidence. Let H be some hypothesis,
such as that the data X belongs to a specic class C. For classication problems, our goal is to
determine P(H|X), the probability that the hypothesis H holds given the evidence, (i.e. the
observed data sample X). In other words, we are looking for the probability that sample X
belongs to class C, given that we know the attribute description of X. According to Bayes
theorem, the probability that we want to compute P(H|X) can be expressed in terms of
probabilities P(H), P(X|H), and P(X) as P(H|X) = [P(X|H)* P(H)] / P(X) [23].
Description of the Data Used
The dataset used in this study was gathered from a wiki used by university students
during a third-year course. Students used wiki to write reflection about concepts that they
learned in Computer Network and Communication course. Variables selected for this
experiment were extracted from two different tables. One was a revision table of wiki which
stored all changes conducted by students. The other was a table which stored students login
and navigation data via extension.
Revision table included more than 1900 records and we used a WikLog tool developed
by Akapnar and Akar [24] to extract information automatically from this table. The usage
data included server side wiki usage log of 81 students with a total of 1800 sessions and
40.000 page requests throughout 3 months. The tool was developed for extracting information
from this data and pre-processing of it. Variables extracted from these two tables are shown in
Table 1. Table 2 shows the summary of statistics for the extracted variables.

Table 1. Variables of a student in a wiki


Name
n_Session

Domain
Usage log

Description
Total session count

a_Time
n_MainPageReturn

Usage log
Usage log

Average time in one session


Main page return rate

n_UniquePage
n_Revisits

Usage log
Usage log

Unique page visits


Total number of revisited web pages

n_Edit

MediaWiki db

Total number of edits

n_Word
f_Grade

MediaWiki db
Class

Total word count


Final grade of the student

Table 2. Descriptive statistics for variables

n_Session
a_Time
n_MainPageReturn
n_UniquePage
n_Revisits
n_Edit
n_Word

mean
22,69
17,81
25,19
143,25
56,15
21,90
161,98

sd
20,77
7,57
13,99
77,83
18,91
29,24
251,66

median
18,00
17,38
22,00
146,00
60,00
9,00
60,00

min
1,00
1,47
6,00
2,00
0,00
0,00
0,00

max
143,00
46,49
80,00
265,00
87,00
130,00
1240,00

Results
Naive Bayes, Support Vector Machines, Boosted Classification Tree and Random
Forest were implemented by R software. We used the gbm package for BCT, the
randomForest package for RF, and the e1071 package for SVM and Naive Bayes. The models
were generalized with 10-fold Cross Validation (CV).
In this study True Classification Rate of four different data mining techniques for
classifying students are compared. Table 3 shows classification accuracy of these techniques.
According to these results the best method with our data is SVMs.
Table 3. Classification accuracy of classification algorithm
Algorithm
Random Forest1
Support Vector Machine2
Nave Bayes3
Boosted Classification Tree4

Classification Accuracy (%)


63,3
67,1
59,6
61,4

RF: 1000 tree, 5 mTry.


SVM: Radial Based Kernel.
3
Naive Bayes: Threshold: 0.100, Sub-Sample Rate: 0,30.
4
Boosted Classification Tree: 1000 tree, Number of Additive Terms: 200, Learning Rate: 0.1000.
2

Conclusions
Although mining educational data to predict students' performance is not a new
phenomenon, there is no published paper on the use of data mining techniques to predict
student performance based on their wiki usage data until now. This paper reports the
comparison of Random Forest, Support Vector Machines, Naive Bayes, and Boosted
Classification Tree for classifying students for predicting final grades obtained in an
undergraduate course on the basis of their wiki usage data. In recent years, these methods
became popular and robust for the prediction problems. We compared different classification
algorithm because there is not one single algorithm that obtains the best classification
accuracy in all cases and all datasets [15, 25]. According to our findings, SVM outperforms
other methods. Possible reason of this result could be that our classification problem is nonlinear. On the other hand, tree based methods have enough performance for prediction as
well. These findings showed that data mining methods can help researchers to assess students
individual contributions to wiki if the necessary information is stored in a database or in log
files. Presented study also showed that students navigation logs and wiki usage data are good
predictors of their course performance. For future research, instructors can use the extracted
knowledge for decision making and for classifying new students [15]. Feedback is an
important variable in changing behavior, and studies suggests that many students will respond
appropriately in the face of feedback that they understand [26]. These extracted knowledge
can also be used as a feedback to help students who are potentially at risk and intervene in
their problems early enough to allow them to change their behavior.

1.
2.

3.
4.
5.

6.

7.

8.
9.

References
Ben-Zvi, D., Using Wiki to Promote Collaborative Learning in Statistics Education.
Technology Innovations in Statistics Education, 2007. 1(1).
Romero, C. and S. Ventura, Educational Data Mining: A Review of the State of the
Art. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE
Transactions on, 2010. 40(6): p. 601-618.
Rudas, I.J. and P. Tth. Web Mining Usage in Course Development. in The SEFI
Annual Conference 2011. 2011. Lisbon, Portugal.
Romero, C. and S. Ventura, Data mining in education. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 2013. 3(1): p. 12-27.
Baker, R.S.J.d., Data Mining for Education, in In International Encyclopedia of
Education, 3rd Ed., B. McGaw, P. Peterson, and E. Baker, Editors. 2011, Oxford, UK:
Elsevier.
Lopez, M.I., et al. Classification via clustering for predicting final marks based on
student participation in forums. in 5th International Conference on Educational Data
Mining, EDM 2012. 2012. Chania, Greece.
Fausett, L.V. and W. Elwasif. Predicting performance from test scores using
backpropagation and counterpropagation. in Neural Networks, 1994. IEEE World
Congress on Computational Intelligence., 1994 IEEE International Conference on.
1994.
Martinez, D. Predicting Student Outcomes Using Discriminant Function Analysis.
2001.
Minaei-Bidgoli, B. and W. Punch, Using Genetic Algorithms for Data Mining
Optimization in an Educational Web-Based System Genetic and Evolutionary
Computation GECCO 2003, E. Cant-Paz, et al., Editors. 2003, Springer Berlin /
Heidelberg. p. 206-206.

10.

11.

12.

13.

14.
15.
16.

17.

18.
19.

20.
21.
22.

23.

24.
25.
26.

Superby, J.F., J.P. Vandamme, and N. Meskens. Determination of Factors Influencing


the Achievement of the First-year University Students using Data Mining Methods. in
Workshop on Educational Data Mining. 2006.
Kotsiantis, S., C. Pierrakeas, and P. Pintelas, Predicting Students' Performance in
Distance Learning Using Machine Learning Techniques. Applied Artificial
Intelligence, 2004. 18(5): p. 411-426.
Kotsiantis, S.B. and P.E. Pintelas. Predicting students marks in Hellenic Open
University. in Advanced Learning Technologies, 2005. ICALT 2005. Fifth IEEE
International Conference on. 2005.
Delgado, M., et al. Predicting Students Marks from Moodle Logs using Neural
Network Models. in Current Developments in Technology-Assisted Education. 2006.
Badajoz.
Romero, C., et al. Data mining algorithms to classify students. in Proc. Int. Conf.
Educ. Data Mining. 2008. Montreal, Canada.
Romero, C., et al., Web usage mining for predicting final marks of students that use
Moodle courses. Computer Applications in Engineering Education, 2010: p. n/a-n/a.
Ko, B., S. Kim, and J.-Y. Nam, X-ray Image Classification Using Random Forests
with Local Wavelet-Based CS-Local Binary Patterns. Journal of Digital Imaging,
2011. 24(6): p. 1141-1151.
Chen, C.C.M., et al., Methods for Identifying SNP Interactions: A Review on
Variations of Logic Regression, Random Forest and Bayesian Logistic Regression.
Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 2011. 8(6):
p. 1580-1591.
Breiman, L., Random Forests. Machine Learning, 2001. 45(1): p. 5-32.
Lin, X., et al., A method for handling metabonomics data from liquid
chromatography/mass spectrometry: combinational use of support vector machine
recursive feature elimination, genetic algorithm and random forest for feature
selection. Metabolomics, 2011. 7(4): p. 549-558.
StatSoft, I., Electronic Statistics Textbook. 2011, StatSoft: Tulsa.
Vapnik, V., Statistical learning theory. 1998: Wiley.
Widodo, A., B.-S. Yang, and T. Han, Combination of independent component analysis
and support vector machines for intelligent faults diagnosis of induction motors.
Expert Systems with Applications, 2007. 32(2): p. 299-312.
Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, Second
Edition (The Morgan Kaufmann Series in Data Management Systems). 2006: Morgan
Kaufmann.
Akapnar, G. and P. Akar. Measuring Author Contributions to the Mediawiki. in
IADIS International Conference WWW/Internet 2009. 2009. Rome, Italy.
Osmanbegovi, E. and M. Sulji, Data Mining Approach for Predicting Student
Performance. Economic Review, 2012. 10(1).
Bienkowski, M., M. Feng, and B. Means, Enhancing Teaching and Learning Through
Educational Data Mining and Learning Analytics: An Issue Brief. 2012: Washington,
D.C.

You might also like