Professional Documents
Culture Documents
Feature Description
mean Mean
std Standard deviation
var Variance
min Minimum value
max Maximum value
acf_mean Auto correlation mean
acf_std Auto correlation standard deviation
Figure 1: Plot of Resultant vector data for different users acv_mean Auto covariance mean
acv_std Auto covariance standard deviation
It can be seen that there exists a certain degree of distinction
skew Skewness
between the walking patterns of different users as testified by
kurtosis Kurtosis value
the plot of the data. Our goal now remains to train a machine
error Deviation from mean
learning agent to distinguish between different users by
Table 1: Features extracted from raw accelerometer data
looking at this raw data alone.
The preprocessing of the raw data and labelling of the feature
IV. METHODS vectors was done using a script written in python which
The attributes present in our dataset are all continuous and as performed a windowing operation on the raw data. The above
seen from the plot they do possess a certain periodic pattern. mentioned 12 features were extracted from each of the 4 axis
However for any machine learning classifier to classify this hence giving us total 48 different attributes from the raw
raw data into different users we have to train it using a training sensor data.
set of feature vectors. Hence the problem of user classification
is formulated as follows: The size of the window was selected to contain 100 data
points with 50% overlap to spread out the gait features
1- Record the raw acceleration data from accelerometers for between consecutive windows. Figure 3 shows the windowing
different users. operation performed on one of the axes.
V. EXPERIMENTS
The resulting labelled data was split into 60% for training and
40% for testing purposes. This section describes the
performance of six different classifiers on the testing data and
the confusion matrix for each showing the accuracy of
classification:
to overfitting. To develop a general model of the data a depth
1. Dummy Classifier: of six was found optimal.
For the baseline a Dummy classifier was chosen from the
collection of classifiers in Python’s scikit-learn library which The Figure 6 shows the confusion matrix for the classification
uses a stratified strategy to generate predictions by respecting by the Decision Tree classifier resulting into an accuracy of
the training set’s class distribution. The accuracy of 61.7%.
classification was very poor being only 4.9%. Figure 4 shows
the confusion matrix for baseline classifier.
Figure 4: Confusion matrix for Baseline classifier (Prediction Figure 6: Confusion matrix for Decision Tree classifier (Prediction
accuracy: 4.9%) accuracy: 61.7%)
6. SVM (Kernel = ‘Linear’): Figure 10: Summary of Cross validation scores for different
To further increase the accuracy of prediction the last classifiers
classifier used was again a Support Vector Machine but this
time with a Kernel of type Linear. With the kernel changed the The gray bars show the mean of 10-fold cross-validation
accuracy of prediction further improved to a value of 94.1% scores and the blue whiskers show the standard deviation of
which is the best so far for each of the classifier tested. The the scores.
confusion matrix also improved as the diagonal values
increased showing that the classifier did not confuse users In order to further evaluate the performance of this classifier
with each other and classified them correctly for most of the on the training data we calculate some other metrics such as
instances. Precision, Recall and F-score which are defined by the
following equations:
𝑇𝑝 𝑇𝑝 𝑃𝑋𝑅
𝑃= 𝑇𝑝 +𝐹𝑝
𝑅= 𝑇𝑝 +𝐹𝑛
𝐹=2 𝑃+𝑅
Where:
𝑇𝑝 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑝 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑛 = 𝐹𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
The complete code and dataset for the project can be found at:
https://github.com/theumairahmed/User-Identification-and-
Classification-From-Walking-Activity
REFERENCES
[1] Lily Lee and Eric Grimson. Gait analysis for recognition
and classification. In IEEE Conference on Face and Gesture
Recognition, pages 155–161, 2002.