Professional Documents
Culture Documents
Spectrum Disorder
Bram van den Bekerom
University of Twente
P.O. Box 217, 7500AE Enschede
The Netherlands
b.vandenbekerom@student.utwente.nl
ABSTRACT strategies [19]. That study also showed that the average
Autism Spectrum Disorders are not hard to detect, but it age at first diagnosis is between 3 to 6 years. This gap be-
requires that physicians have proper training and experi- tween possible age at first diagnosis(18 months) and the
ence. At the moment however Autism Spectrum Disorder average age at first diagnosis is a gap that should be closed
is detected much later than is actually possible. Early de- as much as possible. As stated before, physicians can use
tection of Autism Spectrum Disorder improves the overall certain reliable formal screening tools to increase the ac-
mental health of the child. In this research we use ma- curacy of the estimate regarding the developmental status
chine learning to determine a set of conditions that to- of children. However, only a minority of physicians use
gether prove to be predictive of Autism Spectrum Disor- those tools that are available[22]. One way to improve the
der. This will be a of great use to physicians, helping accuracy would be routine developmental screenings for
them detect Autism Spectrum Disorder at a much ear- all children[11]. These kind of routine screenings might
lier stage. This will be done through literature review, prove to be costly.
data exploration and evaluation. Predicting if a child has
Autism Spectrum Disorder proved possible by using de- An alternative approach is developing a decision support
velopmental delay, learning disability and speech or other system, which can be used by physicians to help them
language problems as attributes and also include physical estimate the developmental status of children more accu-
activity, premature birth and birth weight to improve the rately. Recent research has predicted cases of child abuse
accuracy. Using the 1-away method it was also possible using structured data with reasonable performance[5][7].
to predict the severity of Autism Spectrum Disorder quite The question that rises is if it would be possible to use
reasonably. The 1-away method improved the accuracy structured data to predict ASD in an earlier stage. Us-
from 54.1% to 90.2%, which is a significant increase. This ing structured data we propose a decision support based
and the fact that the severity was based on input from approach to help physicians improve the accuracy of their
just the caretakers of the children, prompts the need for estimates regarding the developmental status of children
further research in this matter. and specifically in the case of ASD.
1
Health contain co-occurring conditions with Autism selection algorithm will be used on the data set, to deter-
Spectrum Disorder? mine if there are attributes that seem strongly correlated
to ASD. The method that will be used to determine the
3. Are the co-occurring conditions predictive of Autism relevant attributes is correlation based feature selection,
Spectrum Disorder, using machine learning? which has been proven to be very good for this task[8][14].
3. LITERATURE REVIEW
This literature review will look into co-occurring condi-
tions with ASD and machine learning methods that can
be used to analyze structured data sets. The review will
be done based on the earlier introduced research questions
and will follow the review method as proposed by Wolf-
swinkel et al[26].
2
physical activity of adolescents (13-18 years old) with ASD[15]. all of these attributes were used for the evaluation step,
Another study into the 2011-2012 NSCH showed that chil- because it was determined that some of the attributes were
dren with very low birth weight (<1500 g) had 3.2 times too closely related to the actual class of the child (either
higher odds of autism/ASD than normal birth weight chil- autistic or not). By excluding those leading attributes
dren [24]. They also determined that children born pre- and actually including some attributes that we determined
maturely had 2.3 times higher odds of autism/ASD than earlier by performing our literature review the following
term children. Yates and Couteur propose that difficulties attributes remained:
and delay in social interaction are often the earliest fea-
tures in ASD, but that they are also easily missed, because • Learning disability
they are often subtle[28].
• Developmental delay
The findings in this literature review into co-occurring con- • Speech or other language problems
ditions with ASD, provides us with the following list of
• Birth weight
co-occurring conditions:
• Prematurely born
• Developmental delay
• Physical activity
• Obesity
• Attendance to religious events
• Less physical activity
• Body Mass Index
• Very low birth weight
We had to exclude social interaction delay from the condi-
• Premature birth tions found through literature review, because this was not
represented in the data set. The attributes are visualized
• Social interaction delay in Figure 2. The blue color means they are autistic and
the red color means they are not. A lower score means
There were many more possible co-occurring conditions they do not have, for example, a learning disability. A
with ASD, but they didn’t make the terms of inclusion. higher score means they do.
By applying the five-stage grounded-theory method for
reviewing literature by Wolfswinkel et al, the conditions 4.2 Data sets
mentioned above remained and will be used during the After exploring the data it became clear that there was
machine learning process if possible. a possibility to create a data set that had 2 classes (no
ASD or ASD) and a data set that had 4 classes (no ASD,
3.2 Machine Learning methods mild ASD, moderate ASD and severe ASD). Both sets will
There are many classification algorithms that can be used be used for our machine learning and the results will be
for the type of classification task we will perform in this compared to each other.
research. Three of the most popular algorithms [1] will be
used for this classification task. Although Aggarwal and
Zhai propose that these are the most popular algorithms
5. RESULTS
for text classification, they also state they are just as capa- Below, the results from machine learning are described.
ble to classify the structured data we have in this research. First we divided our data set in only 2 classes, namely no
The classification algorithms to be used are: Naive Bayes ASD or ASD. In a further iteration we divided the ASD
[13], Random Forest[3] and Support Vector Machine[4]. class in 3 separate classes, which caused our data set to
Besides these three popular algorithms, we will also use have 4 classes: no ASD, mild ASD, moderate ASD and
an older method J48 [20], which is a java implementation severe ASD.
of a well known algorithm called C4.5[27], to be able to 5.1 Machine learning results using 2 classes
compare the Random Forest method with a different tree
We applied our 4 previously determined classification al-
algorithm. Using these diverse algorithms will ensure our
gorithms to the data set when it was divided in only 2
results are more reliable and help us determine whether a
classes, either a child had ASD or a child did not have
certain algorithm is not usable for this classification task.
ASD. This was done by using the attributes determined
in the data exploration step. The results of these clas-
4. DATA EXPLORATION sification algorithms (Naive Bayes(NB), Support Vector
The NSCH data set contains 95577 records of children, Machines(SVM), J48 and Random Forest(RF) are sum-
with 367 variables. Because only a small percentage of the marized in Table 2.
dataset was positivily recognized as having ASD (around
2000 children), we used the random under-sampling ap-
proach[6]. Using this approach we created a data set which Table 2. Performance scores for classifier using 2
contains roughly 50 percent children with ASD and 50 per- classes
cent children without ASD. By pre-processing this data Algorithm Precision Accuracy Recall
set, we managed to reduce the amount of variables from
NB 0.866 0.865 0.865
367 to 256, which still is a large amount to use as input
SVM 0.835 0.833 0.833
for the classification algorithms.
J48 0.871 0.871 0.871
RF 0.854 0.851 0.851
4.1 Attribute selection
The next step consisted of applying the correlation based These results show that the chosen attributes prove to be
feature selection method mentioned earlier in section 2.2. very predictive of Autism Spectrum Disorder, with an av-
This method resulted in only 16 remaining attributes. Not erage correctly predicted classes of roughly 86%. After fine
3
Figure 2. Overview of attributes
4
ing several iterations in our machine learning process, it
Table 3. Performance scores for classifier using 4 became clear that only a few attributes were necessary
classes for our classification algorithms, which rises the question
Algorithm Precision Accuracy Recall whether these attributes are perhaps too leading towards
ASD. Our machine learning algorithms did however show
NB 0.479 0.512 0.512
significant promise and warrants further research.
SVM 0.475 0.493 0.493
J48 0.524 0.541 0.541 6.1 Predicting ASD or no ASD
RF 0.489 0.507 0.507 When the population was split in 2 almost equal size
groups, where one half had some form of ASD and the
other half had no ASD, we could apply our algorithms to
tuning the algorithms it became clear certain attributes good success. It showed that strongly co-occurring condi-
where linked much stronger to ASD than others. When tions are developmental delay, learning disability, speech
looking further at the breakdown of these attributes in or other language problems and it also showed that some
regards to the 2 classes it can be concluded that Devel- co-occurring conditions were also influential but to a lesser
opmental delay, Learning disability and Speech and other degree, such as physical activity, premature birth and birth
language problems are strongly linked to ASD. Physical weight. The other conditions, Body Mass Index and at-
activity is also linked to ASD, but much less so than the tendance to religious events didn’t help predicting ASD.
aforementioned attributes. Premature birth, birth weight Our machine learning process generated a simple tree,
and attendance to religious events are even less valuable to which shows that developmental delay, learning disability
our classification task and Body Mass Index is not used at and speech or other language problems are really strongly
all. These conclusions can be derived from examining the linked to ASD and if a child has a certain combination of
decision tree generated from the fine tuned J48 algorithm, those conditions, he or she should definitely do a formal
which can be seen in Figure 3. screening for ASD.
5.2 Machine learning results using 4 classes 6.2 Predicting severity of ASD
After seeing the positive results when using 2 classes, we After proving that it is possible to predict whether some-
were keen to see if we could use this data set in other ways. one is likely to have ASD or not, we proceeded to divide
The data set contained information about the severity of the data into 4 groups, namely No ASD, Mild ASD, Mod-
the child’s ASD, which allowed us to divide the data set in erate ASD and Severe ASD. With this we tried to actually
4 classes. A child had either no ASD, mild ASD, moderate predict the severity of ASD. This scale from No ASD to
ASD, severe ASD. The severity that is reflected in the data Severe ASD was derived from our data set, but supporting
set is based on a question asked to the caretaker of the documents of the data set did not describe what this scale
subject child. The same attributes as those used for the of severity actually meant. Those classes are based on the
machine learning process when using 2 classes, were used judgment of the primary caretakers of the subject child,
to predict the severity of ASD (4 classes). After applying which raises questions to the validity. Only a small group
the same algorithms (NB, SVM, J48 and RF) it looked like of the children had severe ASD, which makes it harder to
these attributes were not sufficient enough to predict the properly predict the severity. After applying the 1-away
severity of ASD. This is clearly shown in Table 3, which method though, we got some promising results for the 4
shows that it only correctly predicted the class in about classes. This led us to the conclusion that our proposed
50 percent of the cases. co-occurring conditions are, besides useful to predict if a
We tried to increase this number by including other at- child has Autism Spectrum Disorder, also predictive of the
tributes, but this didn’t prove fruitful. The accuracy went severity of ASD. The problem however is that the 1-away
up to about 70 percent, but it was evident that this hap- method on only 4 classes is really influential in the results.
pened because leading attributes were included. Therefore we conclude that our proposed co-occurring con-
We went back to the original attributes that proved pre- ditions are not sufficient enough to determine the severity
dictive for 2 classes and applied the 1-away method [18]. of ASD, but are well suited to determine if a child would
By applying this method to our most successful algorithm need further formal screenings for ASD. With these screen-
J48 we increased the accuracy from 54.1% to 90.2%. The ings, physicians could then determine the severity of ASD
results are shown in Table 4. and help in the child’s development.
5
Figure 3. J48 decision tree
both parent and physician ratings of behavioural indica- could for example be tested on data sets that include tex-
tors would almost certainly improve the diagnostic classi- tual data. Running the same algorithms on data sets gen-
fication. erated from the input of physicians would be a nice way
to compare both types of data.
7. FUTURE WORK Another important step would be testing these attributes
This research focused on determining a way to predict to more classes than just ASD. It might very well be
whether a child had ASD or not. We managed to do that developmental delay, learning disability and speech
this, but new challenges and questions arose during this or other language problems are predictive of other dis-
research. When splitting up the ASD group into 3 gra- orders as well or more predictive of those disorders and
dations of ASD (mild, moderate, severe) it showed that therefore less predictive of ASD. We believe it’s important
it is a lot harder to predict the severity of ASD. Future that these conditions are tested on this data set as well,
work can therefore focus on including textual data from but this was outside the scope of this research.
physicians to help determine the possible severity of ASD.
Predicting the severity of ASD was possible with this kind 8. ACKNOWLEDGEMENTS
of data set after applying the 1-away method, but the big I would like to thank Chintan Amrit for his guidance and
increase in accuracy does warrant further research. This support during this research. I’d also like to thank Miha
research could verify whether the 1-away method is actu- Lavric for his ideas and input. Hopefully my research can
ally useful in this case, or should not be used. It could assist them on their future research into the subject.
also look into other attributes that are linked better to
the severity of ASD.
9. REFERENCES
We also recommend testing the proposed attributes on [1] C. Aggarwal and C. Zhai. A survey of text
data sets which include data of the same child over time classification algorithms. In Mining Text Data,
to test the true predictive value of these attributes. This volume 9781461432, chapter 6, pages 163–222. 2012.
would require data sets from large health organizations or [2] C. Amrit, T. Paauw, R. Aly, and M. Lavric. Using
gathered from a large research. text mining and machine learning for detection of
Furthermore future work should focus on verifying our child abuse. CoRR, abs/1611.0:31, nov 2016.
findings by using the proposed decision tree in practice [3] D. Cutler, T. Edwards Jr., K. Beard, A. Cutler,
or test it on other data sets. The proposed decision tree K. Hess, J. Gibson, and J. Lawler. Random forests
6
for classification in ecology. Ecology, 88(11), 2007. [19] J. A. Pinto-Martin, M. Dunkle, M. Earls,
[4] H. Drucker, D. Wu, and V. Vapnik. Support vector D. Fliedner, and C. Landes. Developmental stages of
machines for spam categorization. IEEE developmental screening: Steps to implementation of
Transactions on Neural Networks, 10(5), 1999. a successful program. American Journal of Public
[5] P. Gillingham. Predictive Risk Modelling to Prevent Health, 95(11):1928–1932, 2005.
Child Maltreatment and Other Adverse Outcomes [20] J. R. Quinlan. Induction of Decision Trees. Machine
for Service Users: Inside the âĂŸBlack Box’ of Learning, 1(1):81–106, 1986.
Machine Learning. British Journal of Social Work, [21] J. D. Rodrı́guez, A. Pérez, and J. A. Lozano.
46(4):1044–1058, 2016. Sensitivity analysis of k-fold cross validation in
[6] H. He and E. Garcia. Learning from imbalanced prediction error estimation. IEEE transactions on
data. IEEE Transactions on Knowledge and Data pattern analysis and machine intelligence,
Engineering, 21(9), 2009. 32(3):569–575, 2010.
[7] H. Horikawa, S. P. Suguimoto, P. M. Musumari, [22] N. Sand. Pediatricians’ Reported Practices
T. Techasrivichien, M. Ono-Kihara, and M. Kihara. Regarding Developmental Screening: Do Guidelines
Development of a prediction model for child Work? Do They Help? Pediatrics, 116(1):174–179,
maltreatment recurrence in Japan: A historical 2005.
cohort study using data from a Child Guidance [23] P. Setoh, P. B. Marschik, C. Einspieler, and
Center. Child Abuse and Neglect, 59:55–65, 2016. G. Esposito. Autism spectrum disorder and early
[8] G. Ilczuk, R. Mlynarski, W. Kargul, and motor abnormalities: Connected or coincidental
A. Wakulicz-Deja. New feature selection methods for companions? Research in Developmental
qualification of the patients for cardiac pacemaker Disabilities, 60:13–15, 2017.
implantation. In Computers in Cardiology, [24] G. K. Singh, M. K. Kenney, R. M. Ghandour, M. D.
volume 34, 2007. Kogan, and M. C. Lu. Mental Health Outcomes in
[9] C. Johnson, S. Myers, P. Lipkin, J. Cartwright, US Children and Adolescents Born Prematurely or
L. Desch, J. Duby, E. Elias, E. Levey, G. Liptak, with Low Birthweight. Depression Research and
N. Murphy, A. Tilton, D. Lollar, M. Macias, Treatment, 2013:7–18, 2013.
M. McPherson, D. Olson, B. Strickland, S. Skipper, [25] S. Ward, K. Sullivan, and L. Gilmore. Combining
J. Ackermann, M. Del Monte, T. Challman, parent and clinician ratings of behavioural
S. Hyman, S. Levy, S. Spooner, and indicators of autism spectrum disorder improves
M. Yeargin-Allsopp. Identification and evaluation of diagnostic classification. Early Child Development
children with autism spectrum disorders. Pediatrics, and Care, 2016.
120(5):1183–1215, 2007. [26] J. Wolfswinkel, E. Furtmueller, and C. P. Wilderom.
[10] P. K, S. Srinath, S. Seshadri, S. Girimaji, and Using grounded theory as a method for rigorously
J. Kommu. Lost time: Need for more awareness in reviewing literature. European Journal of
early intervention of autism spectrum disorder. information systems, 22(August 2009):1–11, 2013.
Asian Journal of Psychiatry, 25:13–15, 2017. [27] X. Wu, V. Kumar, Q. J. Ross, J. Ghosh, Q. Yang,
[11] A. Keil, C. Breunig, S. Fleischfresser, and H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S.
E. Oftedahl. Promoting routine use of Yu, Z. H. Zhou, M. Steinbach, D. J. Hand, and
developmental and autism-specific screening tools by D. Steinberg. Top 10 algorithms in data mining,
pediatric primary care clinicians. Wisconsin Medical volume 14. 2008.
Journal, 113(6), 2014. [28] K. Yates and A. Le Couteur. Diagnosing
[12] R. Kohavi. A Study of Cross-Validation and autism/autism spectrum disorders. Paediatrics and
Bootstrap for Accuracy Estimation and Model Child Health, 26(12):513–518, 2016.
Selection. International Joint Conference on
Artificial Intelligence, 14(12):1137–1143, 1995.
[13] I. Kononenko. Inductive and Bayesian Learning in
Medical Diagnosis. 1993.
[14] I. Koprinska, M. Rana, and V. G. Agelidis.
Correlation and instance based feature selection for
electricity load forecasting. Knowledge-Based
Systems, 82:29–40, 2015.
[15] W. Mangerud, O. Bjerkeset, and S. Lydersen.
Physical activity in adolescents with psychiatric
disorders and in the general population. Child and
Adolescent Psychiatry and Mental Health, 8(1), 2014.
[16] S. M. McCoy, J. M. Jakicic, and B. B. Gibbs.
Comparison of Obesity, Physical Activity, and
Sedentary Behaviors Between Adolescents With
Autism Spectrum Disorders and Without. Journal
of Autism and Developmental Disorders,
46(7):2317–2326, 2016.
[17] C. National Center for Health. National Survey of
Children’s Health, 2013.
[18] J. Piggott and C. Amrit. How Healthy is my Project
? Open Source Project Attributes as Indicators of
Success. Open Source Software: Quality Verification,
pages 1–16, 2013.