You are on page 1of 178

Rochester Institute of Technology

RIT Scholar Works


Theses Thesis/Dissertation Collections

3-27-2017

Machine Learning Based Autism Detection Using


Brain Imaging
Gajendra Jung Katuwal
gjk3423@rit.edu

Follow this and additional works at: http://scholarworks.rit.edu/theses

Recommended Citation
Katuwal, Gajendra Jung, "Machine Learning Based Autism Detection Using Brain Imaging" (2017). Thesis. Rochester Institute of
Technology. Accessed from

This Dissertation is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for
inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.
Machine Learning
Based Autism Detection
Using Brain Imaging

A dissertation

by

Gajendra Jung Katuwal


Submitted in partial fulfilment

of

Doctor of Philosophy in Imaging Science

Chester F. Carlson Center for Imaging Science

Rochester Institute of Technology

Rochester, New York, USA

March 27, 2017


Chester F. Carlson Center for Imaging Science
Rochester Institute of Technology
Rochester, New York, USA
CERTIFICATE OF APPROVAL
PhD Degree Dissertation
The PhD Degree Dissertation of Gajendra Jung Katuwal has been examined and approved
by the following dissertation committee as satisfactory for a PhD degree in Imaging Science.

Dr. Andrew Michael, Dissertation Advisor Date

Dr. Stefi Baum, Dissertation Advisor Date

Dr. Manuela Campanelli, Committee Chair Date

Dr. Nathan Cahill, Committee Member Date

Dr. John Kerekes, Committee Member Date


Dedicated to my parents.
ACKNOWLEDGMENTS

First of all, I would like to express my sincere gratitude to my advisors, Dr. Andrew Michael and

Dr. Stefi Baum for their guidance and training during this fulfilling journey of four years and some

months. Andrew believed in me despite of my zero experience in brain imaging and guided me to

this point. Stefi, despite being physically on the other side of the 49th parallel, always inspired me

to ask the right questions, which was pivotal for the evolution of my thought process and was

immensely helpful during the course of my research.

I would also like to thank the rest of the dissertation committee: Dr. Nathan Cahill, Dr. John

Kerekes, and Dr. Manuela Campanelli, for many insightful comments and questions. My sincere

gratitude also goes to my cool lab mates Chao, Chase, and Viraj. Without them, this journey would

have been difficult. My sincere gratitude goes to all the individuals behind the data, ideas,

publications, software, etc. used in my research and all helpful folks from stackexchange. Without

their contributions, I would not have made this far.

I would like to thank Ronnie and Rob for their help in grammatical debugging. Finally, I am

grateful to several baristas from the beautiful town of Lewisburg to the hip places in Cambridge

and Somerville for helping me to overclock my brain—a necessity for finishing this dissertation.
ABSTRACT

Autism Spectrum Disorder (ASD) is a group of heterogeneous developmental disabilities

that manifest in early childhood. Currently, ASD is primarily diagnosed by assessing the behavioral

and intellectual abilities of a child. This behavioral diagnosis can be subjective, time consuming,

inconclusive, does not provide insight on the underlying etiology, and is not suitable for early

detection. Diagnosis based on brain magnetic resonance imaging (MRI)—a widely used non-

invasive tool—can be objective, can help understand the brain alterations in ASD, and can be

suitable for early diagnosis. However, the brain morphological findings in ASD from MRI studies

have been inconsistent. Moreover, there has been limited success in machine learning based ASD

detection using MRI derived brain features. In this thesis, we begin by demonstrating that the low

success in ASD detection and the inconsistent findings are likely attributable to the heterogeneity

of brain alterations in ASD. We then show that ASD detection can be significantly improved by

mitigating the heterogeneity with the help of behavioral and demographics information. Here we

demonstrate that finding brain markers in well-defined sub-groups of ASD is easier and more

insightful than identifying markers across the whole spectrum. Finally, our study focused on brain

MRI of a pediatric cohort (3 to 4 years) and achieved a high classification success (AUC of 95%).

Results of this study indicate three main alterations in early ASD brains: 1) abnormally large

ventricles, 2) highly folded cortices, and 3) low image intensity in white matter regions suggesting

myelination deficits indicative of decreased structural connectivity. Results of this thesis

demonstrate that the meaningful brain markers of ASD can be extracted by applying machine

learning techniques on brain MRI data. This data-driven technique can be a powerful tool for

early detection and understanding brain anatomical underpinnings of ASD.

i
ABBREVIATIONS

ABIDE: Autism Brain Imaging Data Exchange

ADHD: Attention Deficit Hyperactivity Disorder

AS: Autism Severity

ASD: Autism Spectrum Disorder

AUC: Area Under the Receiver Operating Curve

CARS: Childhood Autism Rating Scale

CDC: Center for Disease Control and Prevention

CTR: Controls

CSF: Cerebrospinal Fluid

Curvind: Curvature Index

DB: Demographics and Behavioral Measures

DC: Diencephalon

FAST: FMRIB's Automated Segmentation Tool

Foldind: Folding Index

FS: FreeSurfer

FSL: FMRIB Software Library

Gauscurv: Gaussian Curvature

GM: Gray Matter

Meancurv: Mean Curvature

MRI: Magnetic Resonance Imaging

SPM: Statistical Parametric Mapping

ii
Std.: Standard Deviation

TDC: Typically Developing Controls

TIV: Total Intracranial Volume

VIQ: Verbal Intelligent Quotient

WM: White Matter

iii
CONTENTS

ABSTRACT ........................................................................................................................................ i

ABBREVIATIONS ........................................................................................................................... ii

FIGURES.. ......................................................................................................................................... x

TABLES… ........................................................................................................................................ xii

CHAPTER 1: INTRODUCTION ................................................................................................... 1

1.1 Autism Spectrum Disorder ..................................................................................................................1

1.2 Importance of early detection of ASD .................................................................................................2

1.3 Problem 1: Behavior based diagnosis of ASD is late and does not help to understand

underpinnings of ASD ..................................................................................................................................3

1.4 Problem 2: Inconsistent brain anatomical findings in ASD ................................................................4

1.5 Problem 3: Low Classification success in large multi-site data ............................................................4

1.6 Thesis Contributions ............................................................................................................................4

1.6.1 Demonstrating MRI as a potential tool for detection ................................................................5

1.6.2 Relating inconsistent findings and low classification success to ASD heterogeneity and

methodological differences ....................................................................................................................5

1.6.3 Providing the evidence of brain image processing tool dependent findings and suggesting

multi-variate techniques as a partial solution ........................................................................................6

1.6.4 Divide and Conquer: we propose identifying brain biomarkers in sub-groups of ASD is easier

and more meaningful than across the whole spectrum .........................................................................7

1.6.5 Brain biomarkers discovery for early detection of ASD .............................................................7

1.7 Thesis Organization .............................................................................................................................8

1.7.1 Chapter 1: Introduction .............................................................................................................8

iv
1.7.2 Chapter 2: Background ..............................................................................................................9

1.7.3 Chapter 3: Method Dependent Findings ...................................................................................9

1.7.4 Chapter 4: Brain Morphology Based ASD Detection and Tackling ASD Heterogeneity ......10

1.7.5 Chapter 5: Early Detection of ASD .........................................................................................10

1.7.6 Chapter 6: Conclusion .............................................................................................................10

1.7.7 Appendices ...............................................................................................................................10

2 CHAPTER 2: BACKGROUND ................................................................................................ 12

2.1 Brain Imaging .....................................................................................................................................12

2.2 MRI………………………………………………………………………………………………………………………………….14

2.2.1 MRI acquisition .......................................................................................................................14

2.3 Brain Feature extraction from MRI ...................................................................................................16

2.3.1 SPM……….. ............................................................................................................................16

2.3.2 FSL…….. .................................................................................................................................18

2.3.3 FS………… .............................................................................................................................19

2.4 Analysis of neuroimaging data ............................................................................................................23

2.4.1 Voxel-based ..............................................................................................................................23

2.4.2 Vertex-based or Surface-based.................................................................................................24

2.4.3 Feature-based ...........................................................................................................................24

2.5 Machine Learning ..............................................................................................................................25

2.5.1 Random Forest (RF) .................................................................................................................27

2.5.2 Gradient Boosting Machine (GBM) .........................................................................................29

3 CHAPTER 3: DEPENDENCY OF BRAIN FINDINGS ON IMAGE PROCESSING

TOOLS AND POTENTIAL SOLUTIONS ................................................................................. 31

3.1 Introduction ........................................................................................................................................31

3.2 Methods…… ........................................................................................................................................35

v
3.2.1 Data……. .................................................................................................................................35

3.2.2 Tissue Segmentation and Brain Volumes Estimation ..............................................................36

3.2.3 Statistical Analysis ....................................................................................................................38

3.3 Results……...........................................................................................................................................40

3.3.1 Estimated brain volumes and inter-method differences ...........................................................40

3.3.2 ASD vs. TDC inter-group differences is dependent on the method used ................................42

3.3.3 Differential bias of methods to the diagnostic group ................................................................45

3.4 Discussion....................................................................................................................................... 46

3.4.1 ASD vs. TDC inter-group differences dependent on the method used ...................................47

3.4.2 Comparison of ASD vs. TDC brain volume differences with previous studies .......................48

3.4.3 Methods have different biases for diagnostic group, and many of them are larger than inter-

group differences .................................................................................................................................50

3.4.4 Locations of the inter-method segmentation discrepancies .....................................................50

3.4.5 Sources of the inter-method discrepancies in tissue segmentation ...........................................55

3.4.6 Implications to future neuroimaging studies ............................................................................56

3.5 Conclusion ..........................................................................................................................................60

4 CHAPTER 4: UTILIZING NON-BRAIN INFORMATION TO IMPROVE ASD

DETECTION BASED ON BRAIN MORPHOLOGY ............................................................... 62

4.1 Introduction ........................................................................................................................................62

4.2 Methods……….. ...................................................................................................................................65

4.2.1 MRI Data .................................................................................................................................65

4.2.2 Reasons for using male subjects only .......................................................................................66

4.2.3 MRI Data processing ...............................................................................................................67

4.2.4 Classification Algorithms ..........................................................................................................67

4.2.5 Metrics Used ............................................................................................................................68

vi
4.2.6 Adding AS, VIQ, and age information to the brain morphometric features ..........................69

4.3 Results……...........................................................................................................................................70

4.3.1 Classification using only brain morphometric properties ........................................................70

4.3.2 Age and VIQ used as training features in conjunction with brain morphometric features .....70

4.3.3 Sub-grouping subjects by AS, VIQ, and age ...........................................................................71

4.3.4 Multivariate analysis: Important features for classification and their variability across sub-

groups…………………………………………………………………………………………….….75

4.4 Discussion …………………………………………………………………………………………………………………………81

4.4.1 Classification becomes difficult with increase in AS.................................................................82

4.4.2 Folding index and curvature features may be important markers for early detection of ASD 84

4.4.3 ASD heterogeneity in brain morphometry ..............................................................................86

4.4.4 Low classification success in large multi-site data due to ASD heterogeneity ..........................87

4.4.5 Future direction for neuroimaging studies ..................................................................................88

4.5 Conclusion ..........................................................................................................................................93

5 CHAPTER 5: EARLY DETECTION OF AUTISM USING BRAIN MORPHOLOGY .... 94

5.1 Introduction ........................................................................................................................................95

5.2 Material and Methods ........................................................................................................................96

5.2.1 Subjects…… .............................................................................................................................96

5.2.2 Brain Features Extraction ........................................................................................................98

5.2.3 Univariate Analysis...................................................................................................................99

5.2.4 Multivariate Analysis: ASD vs. CTR Classification .................................................................99

5.3 Results…….........................................................................................................................................100

5.3.1 ASD vs. CTR Brain Differences ............................................................................................100

5.3.2 ASD vs. CTR Classification ...................................................................................................104

5.3.3 Important Features for ASD vs. TDC Classification .............................................................105

vii
5.4 Discussion ……………………………………………………………………………………………………………………….110

5.4.1 Amygdala in ASD ..................................................................................................................111

5.4.2 Early brain overgrowth in ASD .............................................................................................111

5.4.3 Overgrowth is related to increase in cortical surface area but not thickness .........................111

5.4.4 Higher cortical folding in ASD brains and its relation to early brain overgrowth and cortical

expansion……...................................................................................................................................112

5.4.5 Larger Ventricles and extra-axial volume in ASD may cause additional cortical folding .....114

5.4.6 WM in frontal and temporal regions less myelinated in ASD ...............................................115

5.4.7 Brain overgrowth in ASD is followed by arrested growth and even degeneration ................116

5.5 Conclusion ........................................................................................................................................118

CHAPTER 6: CONCLUSION AND FUTURE WORK ........................................................... 120

5.6 Contributions ....................................................................................................................................121

5.6.1 We demonstrated MRI as a potential tool for ASD detection ...............................................121

5.6.2 We showed that the methodological differences are a cause of inconsistent brain imaging

findings……………. .........................................................................................................................121

5.6.3 We showed that identifying brain biomarkers in sub-groups of ASD is easier and more

meaningful than across the whole spectrum .....................................................................................122

5.6.4 We identified potential brain markers for early detection of ASD.........................................123

5.7 Future Work ....................................................................................................................................123

5.7.1 Critical need for large longitudinal studies .............................................................................123

5.7.2 Data-driven brain features .....................................................................................................124

BIBLIOGRAPHY ......................................................................................................................... 126

Appendix A..................................................................................................................................... 151

Appendix B ..................................................................................................................................... 154

viii
Appendix C ..................................................................................................................................... 161

ix
FIGURES

Figure 1.1 Thesis Organization ...................................................................................................... 9

Figure 2.1: Sagittal view of a T1 weighted brain MRI. ................................................................ 13

Figure 2.2: MRI acquisition process. ............................................................................................ 15

Figure 2.3: SPM Tissue Segmentation. ........................................................................................ 18

Figure 2.4: Freesurfer (FS) recon-all processing workflow .............................................................. 20

Figure 2.5: Sub-cortical segmentation & cortical parcellation in Freesurfer (FS). ........................ 22

Figure 2.6: Random Forest is an ensemble of decision trees. ....................................................... 28

Figure 3.1: Distribution of estimated brain volumes and inter-method differences. .................... 42

Figure 3.2: ASD – TDC brain volume differences are method dependent .................................. 44

Figure 3.3: Inter-method segmentation comparison .................................................................... 52

Figure 3.4: Multi-variate pattern is more robust across methods ................................................. 59

Figure 4.1: Improvement in classification by sub-grouping .......................................................... 72

Figure 4.2: Important features for classification are different across sub-groups.......................... 77

Figure 4.3: Variability of the important features. ......................................................................... 78

Figure 4.4: Folding index and curvature features are important for classification in young subjects

............................................................................................................................................... 85

Figure 4.5: ASD vs. TDC group difference in ventricular volumes decreases with verbal IQ (VIQ)

............................................................................................................................................... 91

Figure 5.1: Subject Selection Flowchart ....................................................................................... 97

Figure 5.2: Classification Flowchart............................................................................................ 100

Figure 5.3: Statistically Significant Brain Alterations. ................................................................ 101

x
Figure 5.4: Global Volume Features ........................................................................................... 104

Figure 5.5: ASD vs. CTR Classification Scores. ......................................................................... 105

Figure 5.6: Important brain feature for ASD vs. CTR classification.......................................... 107

Figure 5.7: Important morphology features for ASD vs. CTR classification. ............................ 108

Figure 5.8: Important intensity features for ASD vs. CTR classification. .................................. 109

Figure 5.9: Cortical expansion theory on cortical gyrification ................................................... 113

Figure 5.10: Higher cortical folding in ASD brain. .................................................................... 115

Figure 5.11: Inefficient communication in ASD brain. .............................................................. 116

Figure 5.12: Arrested brain growth in ASD ............................................................................... 118

Figure 8.1: Gradient Boosting Machine: Improvement in classification by sub-grouping based on

autism severity (AS), age and Verbal IQ (VIQ). ................................................................. 154

Figure 8.2. Gradient Boosting Machine (GBM): Important features for classification are variable

across sub-groups. ............................................................................................................... 155

Figure 8.3: Inter and intra sub-groups classification AUC scores .............................................. 156

Figure 8.4: Random Forest: Important features for classification in sub-groups (10% threshold).

............................................................................................................................................. 157

Figure 8.5: Random Forest: Important features for classification by in sub-groups (25% threshold).

............................................................................................................................................. 158

Figure 8.6: Classification performance degrades with Autism Severity (AS). ............................. 159

Figure 9.1: Histogram of the effect sizes of the statistically significant (at 0.05) differences in brain

features ................................................................................................................................ 161

xi
TABLES

Table 3.1: Subject demographics and behavioral (DB) measures ................................................ 35

Table 3.2: Estimated brain volumes and inter-method differences .............................................. 41

Table 3.3: ASD vs. TDC brain volume differences ...................................................................... 43

Table 3.4: Differential bias for diagnostic group (ASD) ............................................................... 46

Table 4.1: Previous ASD vs. TDC Classification studies ............................................................. 64

Table 4.2: Subject demographics and behavioral (DB) measures ................................................ 66

Table 4.3: Sub-groups definition .................................................................................................. 70

Table 4.4: Classification AUC in sub-groups created by AS, VIQ and age................................. 73

Table 7.1: Estimated brain volumes and inter-method differences in NYU .............................. 151

Table 7.2: ASD vs. TDC brain volume differences in NYU ...................................................... 152

Table 7.3: Differential bias for diagnostic group (ASD) in NYU ............................................... 153

Table 8.1: Correlation between the feature importance scores from RF and GBM .................. 160

xii
1 CHAPTER 1: INTRODUCTION

1.1 Autism Spectrum Disorder

Autism Spectrum Disorder (ASD) is a group of lifelong developmental disabilities that

manifest in early childhood. According to DSM 5 (American Psychiatric Association, 2013), ASD

is characterized by impairment in social-communication and behavior domains, including

repetitive behaviors and restrictive interests. According to a recent Center for Disease Control and

Prevention (CDC) survey (CDC, 2014), 1 in 68 children (1 in 42 boys and 1 in 189 girls) have ASD.

A more recent CDC survey of parents indicated that this number can be as high as 1 in 45

(Zablotsky et al., 2015). In addition to the substantial difficulties faced by ASD individuals and their

parents, the economic burden of ASD is high. A recent study by Leigh and Du (2015) has estimated

ASD’s economic cost for 2015 to be $268 billion in the United States alone. The study projects

annual costs rising to $461 billion in 2025 if ASD’s prevalence remains constant and more than $1

trillion by 2025 if ASD’s prevalence continues the steep rise seen over the last decade.

ASD is highly heterogeneous in its etiology, comorbidity, pathogenesis, genetics, and

severity (Betancur, 2011; Happé et al., 2006; Jeste & Geschwind, 2014; Ronald et al., 2006). It has

a strong genetic basis and is highly heritable (Ronald et al., 2006). It has been reported that 10%–

20% of individuals with ASD have an identified genetic etiology. The genetic architecture of ASD

is highly heterogeneous (Geschwind & Levitt, 2007; Jeste & Geschwind, 2014); ASD has been

linked to more than 100 different genes affecting different aspects of neurodevelopment and

function (Betancur, 2011). In addition, ASD shows high comorbidity with other psychiatric

1
disorders (Dougherty et al., 2016b; Matson et al., 2013). For example, it has been estimated that

14 to 78% of children with ASD also meet the criteria for Attention Deficit and Hyperactivity

Disorder (ADHD) (Gargaro et al., 2011), up to 42% meet the criteria for anxiety disorders (Matson

et al., 2013), and 25 to 70% have some level of intellectual disability (Fombonne, 2009). Similarly,

brain alterations in ASD have been found to be highly heterogeneous which is discussed in Section

1.6.2

1.2 Importance of early detection of ASD

Early detection of ASD is important because it allows for the application of early

intervention methods. It has been shown that early intervention is effective in decreasing

impairments (Dawson et al., 2010) and may result in more positive long-term outcomes for the

child (Pickles et al., 2016; Rogers & Vismara, 2010). The annual economic cost of ASD has been

significant—and the largest factors contributing to the cost are lost productivity and adult care

(Ganz, 2007). With successful application of early intervention methods, in addition to improving

the quality of life of ASD individuals, the economic cost related to ASD can also be significantly

reduced. And, for successful application of early intervention methods, early detection of ASD is

required.

Early detection of ASD may help to disentangle the effects of genetic and environmental

risk factors of ASD and may improve our knowledge of its underlying etiology. A large study by

Sandin et al. (2014) including 2 million children (~14500 ASD) have reported that the risk of ASD

is influenced equally by genetic and environmental factors. Environmental factors include a wide

range of influences such as parental age (Sandin et al., 2015), obstetric conditions (Kolevzon et al.,

2007), medication used (Boukhris et al., 2015), maternal nutrition (Lyall et al., 2013; Schmidt et

al., 2014), exposure to chemicals during prenatal stage, prenatal stress (Kinney et al., 2009), etc.

2
The environmental factors related to the risk of ASD can interact with each other making the

etiology of ASD more complex. If detection can be achieved at the neonatal stage, it can help in

separating out the effects of post-natal environmental risk factors of ASD—thus, improving our

knowledge of ASD etiology.

In addition, it may be the case that ASD is closer to normal development in its presentation

at a younger age, before environmental factors have influenced its postnatal development. That

is, when a child is tested for ASD as early as possible, fewer environmental factors will have

contributed to the development of ASD after birth—which means that the underlying etiology and

perhaps its manifestation may be more homogenous—which in turn, would make it easier to detect

and understand the underlying etiology of ASD in children through the application of biomarkers

at very young age.

Below we briefly discuss the major challenges faced by current techniques for ASD

detection and studies investigating brain morphology of ASD.

1.3 Problem 1: Behavior based diagnosis of ASD is late and does not help

to understand underpinnings of ASD

Currently ASD diagnosis is based on clinical assessments based on observations of the

individual's behavior and intellectual abilities. This diagnosis procedure can be subjective, time

consuming, and inconclusive due to factors such as comorbidity (Close et al., 2012). An objective

diagnostic tool is highly valuable and its importance further increases from the fact that currently

existing behavioral diagnosis has been found to be highly subjective especially at a young age where

there is so little to observe. In addition, since the present diagnosis procedure is based only on

behavioral symptoms, it does not provide insight on the brain anatomical underpinnings and

underlying etiology of ASD. Furthermore, it is difficult to utilize it for early diagnosis and

3
intervention. According to a recent report by CDC (CDC, 2014), the median age of ASD diagnosis

is 53 months. Similarly, it has been reported that more than half of school-aged kids were age 5 or

older when they were first diagnosed with ASD and less than 20% were diagnosed by age 2 years

(Pringle et al., 2012).

1.4 Problem 2: Inconsistent brain anatomical findings in ASD

A number of studies utilizing structural Magnetic Resonance Imaging (MRI) data to

investigate the brain morphology of ASD have reported alterations in brain regions involved in

language and social behavior, particularly fronto-temporal regions (Bigler et al., 2007; Ha et al.,

2015), and the amygdala-hippocampus complex (Groen et al., 2010; Nordahl et al., 2012). Early

brain overgrowth (Campbell et al., 2014), alterations in corpus callosum (Wolff et al., 2015),

cerebellum (D’Mello et al., 2015), and fusiform (Dougherty et al., 2016a; van Kooten et al., 2008)

have also been reported by studies. However, these findings have been somewhat inconsistent

(Amaral et al., 2008; Chen et al., 2011; Katuwal et al., 2015a).

1.5 Problem 3: Low Classification success in large multi-site data

In recent years, several studies have applied machine learning on MRI derived brain

features, targeted at ASD detection. These studies have been successful with high classification

accuracies (80%) for well-matched small datasets (n < 200). However, the studies using the large

multi-site ABIDE dataset (n > 700) (Haar et al., 2014; Katuwal et al., 2015b; Sabuncu &

Konukoglu, 2014) have reported low classification accuracies (~60%).

1.6 Thesis Contributions

4
1.6.1 Demonstrating MRI as a potential tool for detection

As a solution to the Problem 1 i.e. “Behavior based diagnosis of ASD is late and does not help to

understand the underpinnings of ASD”, we demonstrate that ASD can be successfully detected using

machine learning on MRI data. Using brain images of both children and adult subjects, data-

driven potential brain biomarkers that are consistent with the current biological understanding of

ASD were found. In addition, we could successfully classify ASD subjects from non-ASD subjects

in many cases. From the results of this thesis, we can safely argue that the application of machine

learning on MRI data can be a useful technique for early detection of ASD and for understanding

its brain anatomical underpinnings. This analysis and results are presented in Chapter 4 and

Chapter 5.

1.6.2 Relating inconsistent findings and low classification success to ASD

heterogeneity and methodological differences

In this thesis, we have attempted to identify the causes of Problem 2: “Inconsistent brain

anatomical findings in ASD” and Problem 3: “Low Classification success in large multi-site data”. We

attribute methodological differences and heterogeneity of ASD, or a combination of both, as the

likely underlying causes of these discrepancies.

The methodological differences can arise from the differences in image preprocessing tools,

analytical models, covariates, etc. In this thesis, we focus on methodological differences due to the

choice of image preprocessing tool. This analysis and results are presented in Chapter 3.

The heterogeneity of ASD brain morphology can be mainly attributed to the significant

dependence of brain alterations on demographics and behavioral (DB) measures such as age, sex,

IQ, handedness, etc. Previous studies have reported variability in brain alterations with factors

such as age (Lin et al., 2015), gender (Lai et al., 2013), handedness (Floris et al., 2013) etc. In this

5
thesis, we focus on the variability ASD brain alterations with respect to three DB measures: age,

verbal IQ (VIQ), and autism severity (AS). This analysis and results are presented in Chapter 4.

1.6.3 Providing the evidence of brain image processing tool dependent findings and

suggesting multi-variate techniques as a partial solution

We investigate differences in image preprocessing tools as a possible cause behind the

inconsistent anatomical findings in ASD and in neuroimaging in general. In particular, we focus

on the estimation of the following brain volumes: gray matter (GM), white matter (WM),

cerebrospinal fluid (CSF), and total intracranial volume (TIV) from T1-weighted MRIs using three

popular preprocessing methods: SPM, FSL, and FreeSurfer (FS). We found that the estimated

brain volumes did not agree across methods and the methods showed differential biases for ASD,

and many of the biases were larger than ASD versus TDC differences. In summary, we

demonstrated that the differences in the choice of image preprocessing tool is a reason behind the

inconsistent neuroimaging findings in ASD.

Improving the current brain image preprocessing tools to make them more accurate and

standardizing them is the best possible solution to remove the effects of methodological differences

in the end findings. However, when the existing popular tools have to be used, we suggest

investigating multi-variate relationships (in addition to univariate relationships) in brain alterations

because multi-variate relationships are more robust across methods and scanners. In this thesis, we

provide few examples to corroborate this view. This analysis and results are presented in Chapter

3.

6
1.6.4 Divide and Conquer: we propose identifying brain biomarkers in sub-groups of

ASD is easier and more meaningful than across the whole spectrum

We first propose and then demonstrate that the heterogeneity in ASD brain morphometry

can be mitigated by augmenting the information from demographics and behavioral measures

(DB) of the subjects. We demonstrate that identifying ASD brain alterations in relatively

homogenous sub-groups is easier and more insightful than across the whole heterogeneous ASD

spectrum. We use automatically extracted multiple brain morphological features and multiple

classification techniques for this investigation. We explore if DB measures of the subjects can be

utilized to mitigate the ASD heterogeneity and hence facilitating a better understanding of the

brain alterations associated with ASD and aiding in its detection. First, we investigate the

incremental predictive power that can be gained by adding DB measures such as age, VIQ, and

AS to the brain morphological features derived from MRI. Second, we probe if sub-grouping the

subjects by the above-mentioned DB measures helps to improve ASD vs. TDC classification and

provides better insight into the brain morphometric alterations in ASD. This analysis and results

are presented in Chapter 4.

1.6.5 Brain biomarkers discovery for early detection of ASD

We apply machine learning on automatically extracted multiple brain morphometric and

intensity features to classify young ASD subjects (3 to 4 years) from the controls of the same age

group. We achieved very high success rate (> 90% AUC) for ASD vs. control classification. We

also identified three potential brain markers for early detection of ASD: larger ventricles, larger

TIV, higher amount of cortical folding, and myelination deficits particularly in frontal and

temporal regions. We also noticed that larger TIV of ASD brain is related to larger cortical surface

area but relatively independent to cortical thickness. Further, we could show that the higher

7
amount of cortical folding in ASD brains may be an aftereffect of the early brain overgrowth in

ASD. We also hypothesize that the higher amount of cortical folding in ASD is due to the greater

compressive stress in the cortex induced by both hyper-expansion of the cortex and abnormally

large ventricles. To our knowledge this is the first study to leverage clinical imaging archives to

investigate early brain markers in ASD. The high degree of success in classification and the

biologically relevant potential brain markers indicate that application of advanced analytic

methods on brain features holds promise for aiding early identification of ASD.

1.7 Thesis Organization

This thesis is divided into six chapters and three appendices; see Figure 1.1.

1.7.1 Chapter 1: Introduction

This Chapter contains a brief introduction to ASD and studies investigating brain

alterations in ASD using MRI. It starts with the shortcomings of behavioral based ASD diagnosis

and then provides motivation for brain morphology based ASD detection using machine learning.

This chapter makes an effort to educate the reader about the importance of early detection of ASD

and then presents MRI based ASD detection as a potential technique for it. Finally, this chapter

briefly discusses the challenges to characterize the ASD brain morphology using MRI and then

provides the possible solutions as the aims and contributions of this thesis.

8
Figure 1.1 Thesis Organization

1.7.2 Chapter 2: Background

This Chapter contains the basic elements required to get acquainted with brain imaging in

general and brain MRI data processing and analysis in particular. It includes a brief introduction

to MRI, extraction of brain features from MRI, and univariate and multi-variate analysis

techniques used to identify meaningful patterns from these brain features.

1.7.3 Chapter 3: Method Dependent Findings

This Chapter is based on the Frontiers in Neuroscience research article “Inter-Method

Discrepancies in Brain Volume Estimation May Drive Inconsistent Findings in Autism” (Katuwal

et al., 2016b). Using the case study of brain volumes estimation from three popular brain image

processing tools, this Chapter provides the evidence that the differences in the choice of processing

tool is a cause behind the inconsistent brain anatomical findings in ASD.

9
1.7.4 Chapter 4: Brain Morphology Based ASD Detection and Tackling ASD

Heterogeneity

This Chapter is based on the PLOS ONE journal research article “Divide and Conquer:

Sub-Grouping of ASD Improves ASD Detection Based on Brain Morphometry” (Katuwal et al.,

2016c). Using the brain images of adult subjects (6 to 40 years) from the ABIDE dataset, this

Chapter shows how information from DB measures such as age, VIQ, and AS can help to mitigate

the heterogeneity in ASD brain morphology. This Chapter demonstrates that searching for the

brain biomarkers in meaningful sub-groups of ASD is easier and more meaningful than searching

them across the whole ASD spectrum.

1.7.5 Chapter 5: Early Detection of ASD

Utilizing the brain images of subjects 3 to 4 years of age from a clinical imaging archive at

Geisinger Health System, this Chapter demonstrates that applying machine learning on MRI-

derived brain morphology features, early brain markers of ASD can be identified in a data-driven

fashion and be used for early detection of ASD.

1.7.6 Chapter 6: Conclusion

This Chapter summarizes the methodology and the results of this thesis and provides a

discussion on the implications of the findings of this thesis. Finally, it concludes by discussing the

future work.

1.7.7 Appendices

Appendix A, B, and C contain the supplementary materials form Chapter 3, 4, and 5

respectively.

10
11
2 CHAPTER 2: BACKGROUND

2.1 Brain Imaging

Brain imaging includes the use of diverse techniques to image the structure, function, and

biochemical processes of the brain. Imaging technologies of various modalities now provide the

visualization of the structure and function of the brain with high resolution. Brain imaging is

becoming an important technique in both research and clinical care. It is being successfully used

to facilitate the understanding of the structure and the functions of the brain and also has been a

vital diagnostic tool for many neurological disorders (O’Brien, 2007). Brain imaging can be broadly

categorized into three categories: structural, functional, and molecular imaging (Asbury, 2011).

Structural imaging techniques can capture the anatomical structure of the brain and

include X-ray, Computer Assisted Tomography (CT), MRI, Diffusion Tensor Imaging (DTI) etc.

MRI is based on the principle of nuclear magnetic resonance and uses radiofrequency waves to

probe tissue structure. DTI is a special form of MRI technique that measures the diffusion of water

as a function of spatial location and is used to characterize the microstructural changes in the tissues

(Le Bihan et al., 2001). It is widely used to estimate the WM connectivity patterns in neurological

disorders such as ASD and ADHD (Assaf & Pasternak, 2008).

Functional imaging techniques capture the function or physiology of the brain or measure

the brain metabolism (Raichle, 1998). Most of the techniques such as functional Magnetic

Resonance Imaging (fMRI) and Positron Emission Tomography (PET), measure the amount of

cerebral blood flow as a proxy to the brain metabolism rate assuming that there is more blood flow

in functionally active brain regions.

12
Molecular and cellular imaging techniques probe the biochemical activities of cells and

their molecules (Massoud & Gambhir, 2003). They generally use different kinds of light

microscopes on light-emitting probes. The light-emitting probes are the molecules emitting radio

frequencies of various wavelengths to contrast the target cells.

In addition to the unimodal studies based on brain images of a single modality, there are

several other studies utilizing the joint information from the brain images of multiple modalities

(Michael, 2009; Michael et al., 2010, 2011; Pichler et al., 2010; Townsend & Cherry, 2001).

This thesis makes an effort to identify brain markers for ASD detection using the brain

morphology captured by MRI. So, we will limit our scope to MRI from here on. To be more

particular, MRI refers to structural MRI in this thesis. All the brain MRI images used in this thesis

were T1-weighted. In T1-weighted images, tissues with high fat content (e.g. WM) appear bright

whereas fluids (e.g. CSF) and air appear dark (Hornak, 1996). Figure 2.1 shows the sagittal view

of a typical T1-weighted brain MRI.

Figure 2.1: Sagittal view of a T1 weighted brain MRI.

13
2.2 MRI

MRI is a non-invasive tool for examining the anatomy of the human body. It is based on

the principle of magnetic resonance imaging and it utilizes the magnetic properties of the proton

of the hydrogen atom which is abundant in our body.

2.2.1 MRI acquisition

Here we briefly explain the MRI acquisition process. For detailed information, please refer

to https://radiopaedia.org/articles/mri-introduction, Pooley (2005), Hornak (1996) and Section

2.1 of Michael (2009). A typical MRI machine consists of three major components: a very strong

main magnet, gradient magnets, and a radio frequency emitter. A typical MRI process includes

four major components: slice selection, phase encoding, frequency encoding, and signal

reconstruction (see Figure 2.2).

Slice selection: The protons spin around their axes effectively acting as small magnetic

dipoles. Initially, the magnetic dipoles (protons) are randomly aligned. When a strong external

magnetic field of the MRI machine is applied, in addition to the spinning motion, the dipoles start

to precess along the direction of the external field at the Larmor frequency, which is proportional

to the external magnetic field. After that, a combination of gradient magnets is turned on, by effect

of which, the net magnetic field varies linearly along the axis perpendicular to the 2D slice of

interest—i.e. the net magnetic field is uniform in each slice but is different across the slices—which

means the protons at each slice are precessing with unique frequency and hence can only absorb

electromagnetic waves of particular frequencies to resonate, echo or excite. In other words, varying

net magnetic field allows us to selectively excite the hydrogen atoms only in the slice of interest.

14
Figure 2.2: MRI acquisition process.

Phase encoding. A suitable radio frequency pulse is emitted for a brief time so that the

hydrogen atoms only in the slice of interest are excited. Then the radio frequency is turned off and

the excited hydrogen atoms are allowed to return to the low energy state through a gradual decay.

During relaxation, electromagnetic waves are emitted at the precessing frequency of the Hydrogen

atoms. Then another set of gradient magnets is turned on for a short interval, by effect of which,

the net magnetic field along one axis of the slice of interest varies. Since the precession frequency

is dependent on the external magnetic field, protons at different locations precess with different

frequencies creating location dependent phase lag relative to their initial state. This location

dependent phase lag effectively encodes the spatial information in the slice of interest. In practice,

several phase encoding steps are repeated so that the captured signal has enough information to

reconstruct the anatomical image.

Frequency encoding. After phase encoding, another gradient is applied along the axis

perpendicular to the phase encoding axis so that the hydrogen atoms at different columns precess

with different frequencies proportional to the net magnetic field.

15
Signal reconstruction. The excited hydrogen atoms return to the low energy state

releasing electromagnetic waves. These waves have frequency and phase dependent on their spatial

location and are picked up by receiver coils and the collected 2D signal is called the k-space. A

Fourier Transform is applied on the k-space image to reconstruct the 2D anatomical image of the

slice. Multiple 2D slices are collected and combined to form the 3D MRI volume.

2.3 Brain Feature extraction from MRI

MRI data have to be processed to extract brain features before performing further analyses

on them. There are a number of automatic software tools available from different labs and research

universities for MRI data processing. Among them, the most popular are SPM (Ashburner &

Friston, 2005), FS (Dale et al., 1999a; Fischl et al., 1999b), FSL (Jenkinson & Smith, 2001), AFNI

(Cox, 1996), and LONI (Dinov et al., 2010). MRI data processing can be roughly divided into two

steps. The first step includes preprocessing tasks such as motion correction, non-uniform intensity

normalization, skull stripping, etc. These tasks ensure that the final image is suitable for further

processing. In other words, these preprocessing tasks are image processing operations performed

on raw brain images so that they have desirable properties that are consistent with the assumptions

made by the subsequent registration and segmentation steps. The second step includes tasks such

as registration of a brain image to a standard brain template and segmentation of cortical and sub-

cortical brain regions.

Below we briefly discuss the brain feature extraction steps of three popular neuroimaging

preprocessing tools that were used in this thesis.

2.3.1 SPM

SPM (http://www.fil.ion.ucl.ac.uk/spm/) is a brain imaging software tool mainly designed

for the analysis of the brain image sequences. It was developed and is maintained by the members

16
& collaborators of the Wellcome Trust Centre for Neuroimaging at University College London.

In Chapter 3 of this thesis, SPM was utilized for the segmentation of following brain tissues: GM,

WM, and CSF.

Tissue segmentation in SPM is performed using the New Segment tool (Ashburner et al.,

2013). New Segment utilizes a Gaussian Mixture Model (GMM) with components for GM, WM,

CSF, bone, soft tissue, and air/background where each component is modeled as one or more

Gaussians. The distribution or histogram of image intensities in a T1 weighted brain image (Figure

2.1A) are modeled by a GMM (Figure 2.3c) where the different peaks correspond to the different

image intensities of the tissue classes. The mixture model is updated by combining the spatial

information from a standard tissue probability map (TPM) and the intensity information of the

input MRI (Ashburner & Friston, 2005). The standard TPM (Figure 2.3d) of a tissue contains

information about the spatial location of the tissue in a probabilistic sense. The value at a certain

voxel of a TPM represents the probability of a certain tissue (GM, WM or CSF) belonging to that

voxel. In this example, a voxel value of the TPM in Figure 2.3d represents the probability that the

voxel contains GM. New Segment uses ICBM-452 T1 brain atlases (Mazziotta et al., 2001a) as

standard TPMs.

At first the T1 image is registered in the same space as the standard TPM. After the

registration, the registered T1 image is segmented roughly based on intensity thresholds to get an

initial segmentation map (Figure 2.3b) to create an initial TPM. Then, the segmented tissue map

is refined using Bayes’ theorem with the standard TPM as a priori. The maximum likelihood

estimate of the parameters of the mixture model are estimated using the Expectation Maximization

(EM) algorithm (Do & Batzoglou, 2008). At each iteration of EM, model parameters and the TPM

are updated. The iteration continues until the final parameters of the mixture model are estimated.

A voxel value in the final TPM (Figure 2.3e) represents the probability that the voxel contains the

17
particular tissue. Further details on SPM tissue segmentation can be found at (Ashburner et al.,

2013).

Figure 2.3: SPM Tissue Segmentation.


The distribution (histogram) of image intensities of the T1 image (upper left) are modeled by a Gaussian
Mixture Model (bottom left) where the different peaks correspond to the different image intensities of the
tissue classes. First, the T1 image is registered in same space as standard Tissue Probability Map (TPM)
which contains the anatomical information of a particular tissue in probabilistic sense. Second, the T1 image
is segmented roughly based on intensity thresholds. Third, the segmented tissue map is refined using Bayes
theorem with the standard TPM as a priori. This process is repeated until there is no significant change in
the segmented tissue map. A voxel in the final tissue map represents the probability of that voxel containing
a particular tissue type. Note: A part of this figure was borrowed from Mietchen and Gaser (2009).

2.3.2 FSL

FSL (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) is a brain imaging software tool designed for the

analysis of both structural and functional brain images. It was developed and is maintained by the

Analysis Group at the Oxford Centre for Functional MRI of the Brain, at the Oxford University.

In Chapter 3 of this thesis, FSL was utilized to for the segmentation of following brain tissues: GM,

WM, and CSF.

Tissue classification in FSL is performed by FMRIB's Automated Segmentation Tool

(FAST) (Zhang et al. 2001). FAST utilizes a hidden Markov random field (HMRF) (Rabiner &

18
Rabiner, 1989) which is a generalized version of the finite mixture model. In the HMRF model

utilized by FAST, each tissue class in the mixture model is represented by a Gaussian. In addition

to a basic mixture model, HMRF model incorporates a Markov random field (MRF) to utilize the

neighborhood information of the voxels. The hidden variables specifying the identity of the

mixture component (parametric distribution or Gaussian in this case) of each observation (voxel

intensity) are related by a Markov process in the HMRF model unlike a finite mixture model where

they are independent of each other. The HMRF model in FAST does not use a standard TPM as

a priori; instead it uses K-means segmentation (Kanungo et al., 2002) to estimate the initial

parameters of the tissue classes. It uses the EM algorithm to find the maximum likelihood estimate

of the model parameters. Before tissue segmentation using FAST, the brain region is extracted

using the brain extraction tool (Smith, 2002).

2.3.3 FS

FS (https://surfer.nmr.mgh.harvard.edu/) is a brain imaging software tool designed for the

analysis of both structural and functional brain images, and was originally developed for the

construction of cortical surface models. It was developed and is maintained by the Laboratory for

Computational Neuroimaging at the Athinoula A. Martinos Center for Biomedical Imaging. In

Chapter 4 and 5 of this thesis, FS was utilized to compute various morphometric properties of a

number of cortical and sub-cortical structures of the brain. In this thesis, the recon-all workflow of

FS v. 5.3.0 (Dale et al., 1999a; Fischl et al., 1999a, 2002; Ségonne et al., 2004) was used to extract

brain morphometric features.

Recon-all is a fully automated workflow (see Figure 2.4) that performs all the FS cortical

reconstruction and sub-cortical segmentation steps in a unified pipeline. It includes several

processing stages such as motion correction, non-uniform intensity normalization, Talairach

19
(Talairach & Tournoux, 1988) transform computation, intensity normalization, skull stripping,

sub-cortical segmentation, and cortical parcellation steps. Detailed steps of recon-all workflow can

be found at https://surfer.nmr.mgh.harvard.edu/fswiki/recon-all. Cortical parcellation and sub-

cortical segmentation steps are briefly explained below.

Figure 2.4: Freesurfer (FS) recon-all processing workflow


Recon-all is a fully automated workflow for cortical reconstruction and sub-cortical segmentation steps.
Detailed steps of recon-all workflow can be found at https://surfer.nmr.mgh.harvard.edu/fswiki/recon-all

2.3.3.1 Sub-cortical Segmentation

FS utilizes the Bayesian approach for the segmentation of sub-cortical structures (Fischl et

al., 2002). At first, a brain image is affine registered to a standard probabilistic atlas called Aseg atlas

(Fischl et al., 2002) which contains the information regarding statistical properties of 37 anatomical

structures. At each voxel of the standard atlas, one of the 37 anatomical labels including left and

right caudate, putamen, pallidum, thalamus, lateral ventricles, hippocampus, and amygdala (see

Figure 2.5a), can belong to a voxel. The intensity distribution of each label is modeled as a

Gaussian. Local spatial relationships between the labeled brain structures are encoded by an

anisotropic non-stationary MRF. After registration to the atlas, the maximum posteriori estimate

of the segmentation or the probability of an anatomical label belonging to each voxel is calculated.

In this Bayesian approach, two forms of prior information are utilized. The first is the global spatial

20
information or the probability that an anatomical label can be present at a particular voxel. This

information is derived from the atlas and the affine transformation of the input image to the atlas.

The second is the local spatial information or the local spatial relationship between anatomical

labels such as “posterior amygdala is frequently superior to anterior hippocampus, but never

inferior to it” (Fischl et al., 2002). This spatial relationship is encoded by an anisotropic non-

stationary MRF.

2.3.3.2 Cortical parcellation

FS subdivides the cerebral cortex on a brain MRI into a number of gyral based regions of

interest or parcellations (Fischl et al., 2004; Desikan et al., 2006). It utilizes the prior information

from a standard atlas such as the Desikan-Killiany atlas (Desikan et al., 2006) which was created

using a dataset of 40 MRI scans, where in each scan, 35 cortical regions were manually identified

in each of the individual hemispheres (see Figure 2.5b). The local surface geometry of a location in

the cortex and the prior information derived from the atlas are combined in a Bayesian framework

to calculate the maximum posteriori estimate of a brain parcellation belonging to that location.

Similar to sub-cortical segmentation, the prior information has two sources. The first source is the

global spatial information; it is the probability that a given parcellation occurs at a particular

location in the atlas, independent of the local surface geometry. This information is provided by

the atlas and the registration of the input image to the atlas. The second source is the local spatial

information or the local spatial relationship between parcellation labels such as “precentral gyrus

can be anterior, superior or inferior to central sulcus but never posterior to it” (Fischl et al., 2004).

This spatial relationship is encoded by an anisotropic non-stationary MRF.

21
Figure 2.5: Sub-cortical segmentation & cortical parcellation in Freesurfer (FS).
Figure a) was borrowed from http://slideplayer.com/slide/5222876 and figure b) was borrowed from
Klein and Tourville (2012).

2.3.3.3 FS Features

After the segmentation of cortical and sub-cortical brain regions, different morphometric

and intensity properties or features of these regions can be calculated. Volume, intensity mean,

intensity standard deviation (std.) of 40 sub-cortical brain structures are automatically estimated by

the recon-all workflow. Similarly, recon-all automatically estimates the volume, surface area,

Gaussian curvature, mean curvature, curvature index, folding index, thickness mean, thickness

standard deviation, intensity mean, and intensity std. for the 34 cortical brain regions of each

hemisphere.

22
!"#$ %!"&'
Mean curvature is the arithmetic mean of principal curvatures where K *+, and
(

K *-. are maximum and minimum principal curvatures respectively. Mean curvature measures the

extrinsic curvature. Extrinsic curvature refers to the amount of folding created with no distortion.

Gaussian curvature is the square of the geometric mean of principal curvatures K *-. K *+, .

Gaussian curvature measures the intrinsic curvature and its sign can be used to characterize the

surface. Intrinsic curvature refers to the folding created with distortion or shearing i.e. it is

excess/deficit in the surface area compared to a plane at the point (Schaer et al., 2008).
/
Folding index is defined as K *+, K *+, − K *-. 𝑑𝐴 (VanEssen & Drury, 1997).
01

Folding index is an overall measure of the folding of a surface. For any sulci having the shape of a

half-cylinder, its folding index is proportional to its length.

2.4 Analysis of neuroimaging data

Neuroimaging data analysis can be mainly categorized in three groups: voxel-based, vertex-

based, and feature-based.

2.4.1 Voxel-based

In voxel-based studies, the voxel intensities are compared between the images. The analysis

is usually preceded by the registration of the brain images to a standard space to ensure voxel-to-

voxel correspondence across images by adjusting the 3D coordinates of a voxel to best match the

MRI intensities across subjects at that particular voxel (Greve, 2011). Image registration is

generally followed by image smoothing and the amount of smoothing or the size of the low pass

filter used for smoothing is proportional to the size of the region of interest. The intensity of a voxel

in a final smoothed image may represent intensity, volume, or concentration of a tissue class at that

location. An independent univariate parametric statistical model is fitted for each voxel.

23
Representative voxel-based neuroimaging studies include: Nishida et al. (2011) reporting larger

anterior insular volume in patients with obsessive–compulsive disorder, May (2011) demonstrating

experience-dependent structural plasticity in the adult human brain, Matsuda et al. (2012)

detecting Alzheimer’s, Walther et al. (2009) relating brain structural differences to body mass index

in older females, Demirakca et al. (2011) reporting diminished GM in the hippocampus of cannabis

users, Kühn, Schubert, and Gallinat (2010) reporting reduced thickness of medial orbitofrontal

cortex in smokers, etc.

2.4.2 Vertex-based or Surface-based

In vertex-based or surface-based studies, the surface area of a brain structure is explicitly

represented by a mesh of triangles. The point where the vertices of the neighboring triangles meet

is called a vertex. A vertex can be localized by two spherical coordinates (longitude and latitude)

on a surface. After a brain surface such as the pial surface is represented by a mesh of triangles,

several geometric features such as area, curvature, thickness, etc. can be computed. For group

analysis, the surface based registration is performed where the 2D spherical coordinate of a vertex

is adjusted to match the curvature across subjects so that the folding patterns are aligned as

described by Greve (2011). Representative vertex-based neuroimaging studies include: Tondelli et

al. (2012) reporting atrophy in the right medial temporal lobe and the right hippocampus in

Alzheimer’s patients 10 years before clinical diagnosis, Im et al. (2006) reporting cortical thickening

in women in localized anatomical regions, Zarei et al. (2010) reporting significant bilateral regional

atrophy in the dorsal-medial part of the thalamus in Alzheimer’s patients.

2.4.3 Feature-based

In feature-based studies, the different features or properties of the brain regions are

analyzed. The features can be morphometric such as volume and area of a brain region or intensity

24
based such as the mean intensity of a brain region. The feature-based analysis is usually preceded

by the segmentation of various cortical and sub-cortical brain regions and the calculation of

different features of these regions. The feature extraction may involve many vertex and voxel based

techniques. In this thesis, all analyses are feature-based and the scope is limited to feature-based

analysis from here on.

Usually in feature-based studies, the group differences in a feature or a group of features

are investigated. When a study involves only one feature or treats each feature in a group to be

independent of each other, it is called a univariate study. A typical univariate study utilizes a

statistical model such as General Linear Model (Nelder et al., 2006) to investigate the group

differences in brain features. In contrast, when a study treats the individual features to be

dependent of each other and tries to identify a multi-variate pattern between two groups, it is called

a multi-variate study. A typical multi-variate study utilizes machine learning techniques to identify

multi-variate differences in a data-driven fashion.

2.5 Machine Learning

Machine learning is the process of learning the underlying structure of a dataset and using

that knowledge to make predictions on unseen data (Alpaydin, 2010). An algorithm or a collection

of algorithms are applied on samples of a population to learn underlying structure or study the

probability distribution of the population. Machine learning techniques capture the multi-variate

relationships in data and hence are well-suited to detect subtle and distributed differences in the

data. So, compared to univariate techniques, machine learning techniques can perform better in

capturing the brain morphology of heterogeneous conditions like ASD. Thus, they hold promise

for improving our knowledge of ASD brain morphology and identifying brain biomarkers helpful

for ASD diagnosis.

25
Machine learning techniques cover three major areas: supervised learning, unsupervised

learning, and reinforcement learning.

In supervised learning a training dataset with class labels are given and the algorithm learns

the input to output mapping function or a pattern in the training data. The learned mapping

function is then used to predict the labels of an unseen test data. The inherent assumption here is

that the training and testing data are similar. In other words, the same probability function

generates both training and test data. Supervised learning can be a classification problem when

the target label is categorical, or a regression problem when the target label is continuous. Some

applications of supervised learning are stock prediction (Rather et al., 2017), mortality prediction

(Katuwal & Chen, 2016), spam detection (Chakraborty et al., 2016), cancer detection (Bazazeh &

Shubair, 2016), ASD detection (Kim et al., 2016) etc. Some examples of supervised learning

algorithms are regression models, naïve Bayes, decision trees, Random Forest (RF), Gradient

Boosting Machine (GBM), neural networks, etc.

In unsupervised learning, class label information is absent and the algorithm has to learn

the underlying structure of the data by itself. Unsupervised learning can be a goal in itself, such as

finding sub-groups or clusters in the data or can be an intermediate step such as dimensionality

reduction for subsequent machine learning steps. Some applications of unsupervised learning are

human action categorization (Niebles et al., 2008), extracting hierarchical features for object

recognition (Ranzato et al., 2007), discovery of human neural-behavioral maps (Vogelstein et al.,

2014), identifying sub-groups of ASD (Gupta, 2015), etc. Some examples of unsupervised learning

algorithms are K-means, mixture models, principal component analysis, autoencoder, and

generative adversarial networks.

In reinforcement learning, an agent or a group of agents is trained in an environment to

maximize a cumulative reward. The environment is generally represented by a Markov decision

26
process (Puterman & L., 1994) with its inputs as the actions of the agent and outputs as the

observations and rewards sent to the agent. Reinforcement learning is particularly well-suited to

problems where trade-off between long-term and short-term reward is important. Some

applications of reinforcement learning are robot control (Kober et al., 2013), playing Go (Silver et

al., 2016) , and playing poker (Brown & Sandholm, 2017).

In this thesis, we applied supervised learning techniques namely RF (Breiman, 2001) and

GBM (Friedman, 2000). These two algorithms are briefly explained below.

2.5.1 Random Forest (RF)

RF (Breiman, 2001) is an ensemble of decision trees and its output class is the mode value

of the output classes of the individual decision trees (see Figure 2.6). It is an ensemble technique

that relies on the reduction of the variance of the general error term. For the squared error loss,

the expected generalization error of a model φ at a given point 𝐱 can be decomposed into three

components (Gelman & Hill, 2006) as in Equation 2.1.

E Error φ(x) = noise 𝐱 + bias ( 𝐱 + var(𝐱) (2.1)

In Equation 2.1, the first term noise 𝐱 is the irreducible error or Bayes error. It is the

theoretical lower bound on the generalization error and is independent of both learning algorithm

and data. The second term bias ( 𝐱 is the difference between the average prediction of the model

and the prediction of the Bayes model. The third term var(𝐱) is the variability of the predictions

at point 𝐱 over the models learned from all possible subsets of population.

The main idea of RF is to decrease the variance term by keeping the bias constant, thereby

decreasing the overall error of the ensemble. It achieves this variance reduction by averaging the

high variance classifiers or decision tree classifiers. The more diverse or uncorrelated the decision

trees, the more error reduction is achieved by averaging. To make the decision trees different from

27
each other, RF introduces randomness while constructing the trees, hence the name ‘random

forest’. The randomization is introduced at first during data sampling and then while constructing

the decision trees. Each tree learns from a bootstrap replica of the data obtained by random

sampling with replacement in the original data. This introduces a degree of randomness in the

decision trees because they are trained with different bootstrap replicas. While growing decision

trees, the quality of a node split is based only on a random subsample of the variables instead of all

of them. Most of the randomization comes from this step.

Figure 2.6: Random Forest is an ensemble of decision trees.


a) Generalized error broken down into three terms: noise, bias, and variance. Random Forest (RF) relies
on the reduction of variance without increasing bias, thus reducing the overall error. The directions of the
green arrows correspond to the direction of change of the error terms due to ensembling. b) General block
diagram of RF as an ensemble of diverse decision trees.

A decision tree is a simple classifier which iteratively partitions the data space into more

homogenous partitions using simple decision rules. The quality of the data partitions or the child

nodes after a node splitting in a decision tree is measured by their homogeneity. The homogeneity

of the data samples in a partition is generally quantified using the metrics such as Gini impurity

28
(Breiman, 1996), entropy (Gray & M., 1990), etc. and the quality of the node split is quantified by

change in these metrics.

In a RF, the importance of a variable is the sum of the weighted impurity decreases in all

node splits by the variable, averaged over all trees in the forest (Breiman, 2001; Louppe, 2015). It

is calculated as in Equation 2.2.


T
1
Imp XK = 1(jO = j) p(t)Δi(sO , t) (2. 2)
M
*U/ S"

Where,

Imp XK = the importance of variable XK

jO = the identifier of the variable used for splitting node t

sO = a split at node t due to variable XK

XY
p t = the proportion of samples at node t where NO is the number of samples at node t
X

and N is the total number of samples

Δi sO , t = decrease in node impurity by split sO

φ* = mth decision tree

M = the number of decision trees in the forest

In this thesis, RF was used as the first choice classifier since it is inherently suitable for

parallel processing, has very few hyper parameters to tune, does not require data scaling, is

theoretically resistant to overfitting, provides variable importance, and has been found to be very

good for a variety of datasets (Breiman, 2001; Caruana & Niculescu-Mizil, 2006).

2.5.2 Gradient Boosting Machine (GBM)

Boosting is an ensemble technique that relies on bias reduction to reduce the generalized

error of an ensemble (Bauer & Kohavi, 1999). A general boosting technique iteratively combines
29
several weak or base learners with high bias and low variance such as decision tree stumps into one

strong learner. The base learners are combined so that the ensemble bias decreases while variance

remains the same, thereby reducing the net ensemble error. At each iteration or boosting step,

GBM constructs a new base learner to be the most parallel to the negative gradient of a loss

function along the observed data so that the new base learner focuses on the weakness of the model

(Natekin & Knoll, 2013). In other words, it performs functional approximation of a model by

consecutively improving along the negative direction of a loss function. In this thesis, a binomial

loss function was used and decision trees were used as base learners.

30
3 CHAPTER 3: DEPENDENCY OF BRAIN

FINDINGS ON IMAGE PROCESSING TOOLS

AND POTENTIAL SOLUTIONS

Some material from (Katuwal et al., 2016b) has been reused in this chapter under Creative Commons Attribution

(CC BY) license.

Previous studies applying automatic brain image processing methods on MRI report

inconsistent neuroanatomical alterations in ASD. In this study, we investigate methodological

differences as a possible cause behind these inconsistent findings. In particular, we focus on the

estimation of the following brain volumes: GM, WM, CSF, and TIV. T1-weighted MRIs of 417

ASD subjects and 459 TDC from the ABIDE dataset were estimated using three popular

preprocessing methods: SPM, FSL, and FS. The estimated brain volumes were correlated but had

significant inter-method biases. ASD vs. TDC differences in all brain volume estimates were highly

dependent on the method used. When methods were compared with each other, they showed

differential biases for ASD, and several biases were larger than ASD vs. TDC differences of the

respective methods. After manual inspection, we found inter-method segmentation mismatches in

the cerebellum, sub-cortical structures, and inter-sulcal CSF. In summary, method dependent ASD

vs. TDC differences indicate that the inter-method discrepancy can contribute to inconsistent

neuroimaging findings in ASD. We suggest cross-validation across methods and emphasize the

need to develop better methods to increase the robustness of neuroimaging findings.

3.1 Introduction

31
MRI is a powerful tool used to investigate the human brain in vivo and to find associations

between brain morphometry and brain disorders. Although a large number of MRI studies have

been conducted, consistent MRI markers for brain disorders are yet to be found (Chen et al. 2011).

This can mainly be attributed to low statistical power of neuroimaging and neuroscience studies

(Button et al., 2013). Inconsistent results may be due to differences in the demographics of data

(Stanfield et al., 2008), image acquisition settings (Auzias et al., 2014a; Styner et al., 2002),

assumptions made on data and algorithms used (Eggert et al., 2012; Fellhauer et al., 2015;

Nordenskjöld et al., 2013), and even machines used to process the data (Gronenschild et al., 2012).

With the increasing use of automated preprocessing methods in neuroimaging studies, the

effect of inter-method variations in neuroimaging findings merit an investigation. Although

automatic methods are more objective than manual methods, they possess method-specific bias

and variance (Eggert et al., 2012; Nordenskjöld et al., 2013). The bias and variance across methods

arise mainly due to method specific assumptions made on data, varying definition of brain

structures, different image processing algorithms, varying sensitivity to imaging artifacts such as

motion, and use of inconsistent a priori information such as brain templates. A number of previous

studies have reported significant inter-method inconsistencies (Eggert et al., 2012; Fellhauer et al.,

2015; Hansen et al., 2015; Ren et al., 2005; Tsang et al., 2008). Eggert et al. (2012) reported

pronounced differences (11%) in mean segmented GM volumes from four standard segmentation

algorithms: SPM8 New Segment, SPM8 VBM, FSL v. 4.1.6 and FS v. 4.5. According to Hansen et

al. (2015), compared to manual segmentation, FS 4.5 underestimated TIV by 7 %. Similarly,

differences in segmentation accuracies between FSL and SPM5 were reported by Tsang et al.

(2008).

One important concern is that inconsistent results driven by inter-method variations can

change the end results of a study and hence change the subsequent biological interpretation.

32
Several previous studies have shown that the magnitude and even the direction of the effect size

can be dependent on the method used. Boekel et al. (2015) performed a replication study on 17

brain structure-behavior correlations from five neuroimaging studies and were not able to replicate

any of the correlations. A response paper by Muhlert and Ridgway (2015) pointed out that one of

the reason the correlations could not be replicated is the methodological differences between SPM

and FSL. Nordenskjöld et al. (2013), compared SPM8 and FS v. 5.1.0 TIV estimates to reference

TIV obtained from manual segmentation of proton density weighted images. They report that

both SPM8 and FS overestimated TIV. In addition, SPM showed systematic bias associated with

gender (systematic overestimation of TIV in females) and aging atrophy while FS showed bias for

reference TIV (systematic overestimation of TIV for larger skull size). Notably, hippocampal

volume showed different associations with education depending on which TIV measure (SPM or

FS) was used for hippocampal volume normalization. When normalized with SPM TIV, there was

no association between hippocampal volume and education, whereas when normalized with FS

TIV, the association was significant. Similarly, (Callaert et al., 2014) measured the effect of age on

GM volume using four different methods: SPM8 Unified Segmentation, SPM8 New Segment, FSL

v. 4.1.5, and a method combining intensity based segmentation and atlas-to-image non-rigid

registration. They found that the age specific effect changed with the different methods. Age related

differences according to the Unified Segmentation and New Segment were significantly larger and

smaller respectively than other methods. Similarly, according to Rajagopalan et al. (2014), in ALS

patients with frontotemporal dementia, FSL v. 4.1.5 showed that the GM volume in motor region

is significantly reduced, whereas SPM8 did not show any significant changes in GM. The above

results suggest that inter-method discrepancies are a source of inconsistent findings in

neuroimaging and that the choice of preprocessing method can affect end results. Thus, the effect

of inter-method variations in neuroimaging results deserves a detailed investigation.

33
In this study, we investigate inter-method discrepancies as a source of inconsistent

neuroimaging findings in ASD. Brain anatomical findings in ASD compared to that of TDC have

been highly inconsistent across studies (Amaral et al., 2008; Chen et al., 2011; Jumah et al., 2016;

Katuwal et al., 2016c). Recently a number of studies (Haar et al., 2014; Kucharsky Hiess et al.,

2015; Valk et al., 2015; Riddle et al., 2016) have used the large multi-site (~1000 subject, age 6-65

years) Brain Imaging Data Exchange (ABIDE) (Di Martino et al., 2014) to investigate brain

anatomical differences in ASD. Haar et al. (2014) using ABIDE did not replicate many of the

previously reported anatomical alterations in ASD except significantly larger ventricular volumes,

smaller corpus callosum volume (central segment only), and several cortical areas with increased

thickness in the ASD group. One of the most replicated findings in ASD is that toddlers with ASD

(age 2–4 years) on average have a larger head size than TDC (Carper et al., 2002; Courchesne et

al., 2011b, 2011a; Campbell et al., 2014). However, several recent large studies (Raznahan et al.,

2013; Zwaigenbaum et al., 2014) have shown that there is no overall difference in head

circumference between ASD and TDC over the first 3 years and the results of previous studies

reporting large head sizes in ASD may be due to the bias in population norms. Campbell et al.

(2014) have reported that abnormally rapid rate of brain growth during the first years of life seem

to occur in a very small subgroup of ASD children.

Here we investigate if inter-method differences are a source of inconsistent

neuroanatomical findings in ASD. We address this question by using global brain volume

measures. We estimate GM, WM, CSF volumes, and TIV of ASD and TDC subjects using the

large ABIDE dataset applying three widely used preprocessing methods: SPM, FSL, and FS. We

answer the following three questions in this study:

1. How large are inter-method differences of brain volume estimates and how do they

influence ASD vs. TDC group differences?

34
2. Do inter-method differences show differential bias towards diagnosis group (in this case

ASD) and how does it compare with ASD vs. TDC differences?

3. What are potential reasons behind inter-method differences in brain volume

estimation?

3.2 Methods

3.2.1 Data

A total of 1,112 MRI scans were downloaded from the ABIDE consortium representing

imaging data from 17 different sites. Each MRI was visually inspected to detect significant motion

and other artifacts. In total, 172 images with poor image quality and motion artifacts were

discarded. An additional 64 subjects were discarded due to failure during segmentation by FS or

FSL; none of the images failed during SPM segmentation. T1 weighted brain MRIs from a total

of 876 subjects from 15 sites were retained for volumetric analyses. Of the 876 subjects, 417 (367

males, 50 females) were ASD and 459 (382 males, 77 females) were TDCs. Subject demographics

and behavioral measures are presented in Table 3.1.

Table 3.1: Subject demographics and behavioral (DB) measures


ASD TDC ASD vs. TDC
t-test P-value
N 417 459
M=367, F=50 M=382, F=77
Age(years) 17.8 ± 8.9 17.7 ± 8.0 0.88
(7 to 64) (6.47 to 56.2)
VIQ 104.6 ± 17.8 112.4 ± 12.9 3.1E-10*
PIQ 105.0 ± 16.7 108.1 ± 12.9 5.2E-3*
FIQ 105.4 ± 16.5 111.5 ± 12.1 4.3E-9*
ADOS 11.9 ± 3.7 NA NA
* statistical significance at 0.05
M: Male; F: Female; VIQ: Verbal IQ; PIQ: Performance IQ; FIQ: Full IQ, ADOS: Diagnostic
Observation Schedule

35
3.2.2 Tissue Segmentation and Brain Volumes Estimation

All MRIs were processed with SPM8 (Ashburner & Friston, 2005), FSL 5.0.4 (Jenkinson &

Smith, 2001), and FS 5.3.0 (Dale et al., 1999b; Fischl et al., 1999b). In order to minimize manual

intervention and to make the study more objective and replicable, default parameters set by the

respective toolboxes were used. Final results of the automatic segmentations were manually

inspected and subjects with segmentation failures (64 in total) were discarded from the study.

SPM: Using New Segment tool (Ashburner et al., 2013), each T1 brain image was segmented

into probabilistic maps of GM, WM, and CSF tissues. Spm_get_volumes script was used to calculate

the tissue volumes using c1, c2, and c3 probabilistic maps corresponding to native space tissue

maps of GM, WM, and CSF respectively. Native space volumes were selected to minimize volume

changes due to spatial transformations. TIV was calculated as the sum of the GM, WM, and CSF

volumes in the native space of the MRI. This method of TIV calculation was performed according

to SPM’s recommendation (https://en.wikibooks.org/wiki/SPM/VBM) and has been utilized in

several previous studies (Nordenskjöld et al., 2013; Ridgway et al., 2011).

FSL: Using FAST tool (Zhang et al. 2001), each T1 brain image was segmented into

probabilistic maps of GM, WM, and CSF tissues. Brain regions were extracted using the brain

extraction tool (BET) (Smith, 2002) before tissue segmentation using FAST. Fractional intensity

threshold (f option in BET) was kept at the default value of f = 0.5. Brain slices (from the slicedir

directory) produced by FSL script fslvbm_1_bet –b were used to visually verify that brain regions

were accurately extracted. Images for which brain extraction was not successful with the default

value of f = 0.5 were excluded from the study. Fslstats script was used to calculate the tissue volumes

using partial volume maps (pve_0, pve_1, pve_2 images) in the native space produced by FAST as

36
recommended by FSL

(http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST#Tissue_Volume_Quantification). TIV was

calculated using the SIENAX function of FSL as recommended by FSL (Smith et al., 2002) and

the ENIGMA protocol (http://enigma.ini.usc.edu). SIENAX first strips out non-brain tissues using

BET to extract the regions corresponding to the brain and the skull. After brain extraction, the

skull image is affine registered to the MNI52 template (Mazziotta et al., 2001b) with a scaling factor

between the subject's image and the standard space as the output. The scaling factor is computed

as the determinant of the affine transformation matrix that registers the subject’s image to the

MNI152 template. Finally, the TIV of the subject’s image was calculated by dividing the TIV of

MNI152 template brain (1.847712 L) by the scaling factor (Mazziotta et al., 1995, 2001a, 2001b).

FS: Tissue segmentation in FS was performed by the recon-all preprocessing workflow

(https://surfer.nmr.mgh.harvard.edu/fswiki/recon-all). Recon-all is a fully automated workflow

that performs all the FS cortical reconstruction processes. As recommended by FS, GM and WM

volumes were extracted from aseg.stats file which is an output of the recon-all workflow (see

https://surfer.nmr.mgh.harvard.edu/fswiki/MorphometryStats). FS does not output total CSF

volume. In FS TIV was calculated by a technique similar to FSL (see

http://surfer.nmr.mgh.harvard.edu/fswiki/eTIV). However, FSL uses both brain and skull

whereas FS uses only the brain to guide the registration of a subject’s image to a template.

Estimated TIV (eTIV) is calculated by dividing the atlas mask volume from MNI305 template by

the determinant of the affine transformation matrix (T) that maps the native space image into MNI

space (Buckner et al., 2004). Talairach registration, the third step of recon-all, computes the affine

transform T that transforms the original image to the MNI305 template (Evans et al., 1992).

37
3.2.3 Statistical Analysis

Statistical software package R v3.2.0. (http://www.R-project.org/) was used for all

statistical analyses. Brain volume estimates from the three preprocessing methods are compared

with each other to test inter-method differences and their biases for diagnostic group (ASD). All p-

values reported in this study were corrected for multiple comparisons using false discovery rate

(Benjamini & Hochberg, 1995) unless otherwise explicitly mentioned.

3.2.3.1 Inter-method differences in brain volumes estimation

Inter-method differences across all 876 subjects are summarized by the following statistics:

mean volume difference, percentage difference, correlation coefficient, Cohen’s d, and paired t-

test p-value. Separate comparisons were performed for different volume types. Cohen’s d (Cohen,

1988) was used as a measure of effect size. Cohen’s d is the standardized difference between two

means and is defined as (mean1−mean2)/SDpooled where SDpooled is the weighted average of the standard

deviations of two groups. Paired t-tests were performed to test the statistical significance of the

inter-method differences in brain volume estimates.

3.2.3.2 ASD vs. TDC inter-group differences in brain volumes

ASD vs. TDC brain volume differences were tested using independent two sample t-tests

for each preprocessing method. In addition, ASD vs. TDC brain volume differences were tested

after adjusting for the effects of age, sex, and site by fitting a linear mixed-model using lmer4

package in R (Bates et al., 2014). As fixed effects, we entered diagnostic group (ASD/TDC), sex,

age, and age2. As random effects, we included random intercepts and slopes for the effect of

diagnostic group at each level of site. Fixed effect of diagnostic group is reported as the ASD vs.

TDC group difference. The p-values were calculated using the Satterthwaite’s approximated

degrees of freedom (Satterthwaite, 1946) implemented in lmertest (Kuznetsova et al., 2013). In

38
another separate experiment, in addition to age, sex, and site, the effect of full scale IQ (FIQ) was

adjusted, where subjects with missing FIQ (N = 63) were excluded. For each method, separate

models were built for different brain volume types.

3.2.3.3 Method bias for diagnostic group

The inter-method difference in brain volumes (∆y = y( − y/ ) were modeled by a linear

mixed-model. Here, y/ and y( are brain volume estimates from two different methods where y( is

considered as a reference value. The fixed and random effects of the model were same as in the

model described in previous section. The fixed effect of diagnostic group on ∆y (β_`a ) can be

interpreted as the amount of brain volume by which method y/ systematically over/under

estimates ASD subjects compared to TDC. Here our null hypothesis is that different methods do

not have systematic differential bias to the diagnostic group. Our null hypothesis will be rejected

when the fixed effect of diagnostic group β_`a is statistically significant. If β_`a is statistically

significant, we will conclude that with reference to y/ , method y( has systematic biases for ASD or

bcde
TDC subjects. Percentage bias of y( for ASD (with reference to y/ ) was calculated as ×100,
fg

where y/ is the mean volume across all subjects according to y/ . For each pair of methods, separate

models were fitted for the different brain volumes. Multiple comparisons correction across different

tissues was performed separately for each pair.

3.2.3.4 Experiments repeated with NYU Data

Data used for the above experiments were acquired from multiple scanning sites and the

effects of scanning site were adjusted by fitting a linear mixed-model with site as a random effect.

This model does not capture the non-linear site effects. In order to completely eliminate the effects

of site and to focus only on the differences due to methods, we repeated the above experiments

39
using only the NYU site. NYU was selected because it had the largest number of subjects (71 ASD

and 58 TDC) and all its subjects had FIQ information. NYU results are presented in Appendix A.

3.3 Results

3.3.1 Estimated brain volumes and inter-method differences

Brain volumes estimated by SPM, FSL, and FS are presented in Table 3.2. Statistics of the

inter-method differences (mean difference, correlation coefficient, Cohen’s d, and paired t-test p-

value) for all 876 subjects are presented in the shaded cells. The distributions of the estimated brain

volumes are visualized as boxplots in Figure 3.1A. To visualize the inter-method distribution, the

estimates from SPM, FSL, and FS are plotted against the estimates from SPM in Figure 3.2B.

The brain volume estimates from different methods moderately agreed except for CSF (see

Figure 3.1B). SPM vs. FSL correlation coefficients were 0.85, 0.64, 0.82, and 0.50 for TIV, GM,

WM, and CSF volumes respectively (see Table 3.2). Similarly, SPM vs. FS correlation coefficients

were 0.83, 0.77, and 0.93 for TIV, GM, and WM volumes respectively. FSL vs. FS correlation

coefficients were 0.83, 0.75, and 0.81 for TIV, GM, and WM volumes respectively. In summary,

the inter-method correlation coefficients were high for TIV and WM, followed by GM and were

the lowest for CSF.

Except WM, the average estimates of all the brain volumes by SPM were higher than that

of FSL and FS; see purple boxplot in Figure 3.1A. FSL estimates of TIV were the lowest while FSL

estimates of WM were the highest. Similarly, FS estimates of WM were the lowest. In summary,

the following inter-method differences were observed: TIVSPM (1.57 L) > TIVFS (1.56 L) > TIVFSL

(1.38 L), GMSPM (0.75 L) > GMFS (0.71 L) > GMFSL (0.64 L), WMFSL (0.53 L) > WMSPM (0.52 L)

> WMFS (0.49 L) and CSFSPM (0.30 L) > CSFFSL (0.22 L). Except TIVSPM vs. TIVFS, all inter-

40
method differences were statistically significant (p < 0.001; Cohen’s d > 0.5 in 7 and d > 1 in 4 out

of 10 comparisons).

Table 3.2: Estimated brain volumes and inter-method differences


TIV GM WM CSF
SPM (L) 1.566 ± 0.15 0.748 ± 0.07 0.519 ± 0.06 0.299 ± 0.04
SPM – FSL mean diff. (ml) 183.4 106.0 -14.4 77.9
SPM
Correlation Coefficient 0.851 0.639 0.821 0.494
vs.
Cohen’s d 1.20 1.41 -0.21 1.62
FSL
Paired t-test p-value <1E-100* <1E-100* 7E-18* <1E-100*
FSL (L) 1.383 ± 0.15 0.642 ± 0.08 0.533 ± 0.08 0.221 ± 0.05
FSL – FS mean diff. (ml) -177.4 -70.0 40.3 204.9
FSL
Correlation Coefficient 0.829 0.745 0.808 NA
vs.
Cohen’s d -1.05 -0.82 0.53 NA
FS
Paired t-test p-value <1E-100* <1E-100* <1E-100* NA
FS (L) 1.560 ± 0.18 0.708 ± 0.08 0.493 ± 0.07 NA
SPM – FS mean diff. (ml) 6.1 40.1 25.9
SPM
Correlation Coefficient 0.830 0.767 0.934 NA
vs.
Cohen’s d 0.04 0.54 0.42 NA
FS
Paired t-test p-value 0.07 <1E-90* <1E-100* NA
Mean and standard deviation of the brain volumes estimated by SPM, FSL, and FS are presented. Cells
corresponding to CSFFS are filled as ‘NA’ since FS does not output total CSF volume. Inter-method
differences and corresponding statistics are presented in shaded cells. Correlation coefficient is used to
measure the association between the brain volumes estimated by two different methods. Cohen’s d is used
to measure the effect size of the inter-method difference and paired t-test was used to check the statistical
significance. Statistically significant differences are denoted by * for p < 0.05. Correlation coefficients show
that methods agree on volume estimates but the Cohen’s d and paired t-test p-values indicate significant
inter-method biases.

The inter-method correlation coefficients within NYU subjects were slightly higher than

that from using all subjects (see Table 7.1 in Appendix A). Inter-method differences were mostly

similar to that of the experiment using all subjects.

41
Figure 3.1: Distribution of estimated brain volumes and inter-method differences.
A) The distribution of brain volumes estimated by SPM (purple), FSL (blue) and FS (orange) are
presented by boxplots and indicate significant inter-method differences. CSFFS is not presented since
FS does not output total CSF volume. B) Volumes estimated by FSL and FS are plotted against the
volumes estimated by SPM. The brain volume estimates from different methods moderately agreed
except for CSF.

3.3.2 ASD vs. TDC inter-group differences is dependent on the method used

The mean ASD vs. TDC (ASD – TDC) group difference for TIV, GM, WM, and CSF

according to SPM, FSL, and FS are presented in Table 3.3. The ASD vs. TDC distribution of the

brain volume estimates are presented as boxplots in Figure 3.2A. The mean inter-group differences

in percentage are presented as bar plots in Figure 3.2B.

42
TIV GM WM CSF
diff diff p-val diff diff p-val diff diff p-val diff diff p-val
(ml) % (ml) % (ml) % (ml) %
Raw Volumes
SPM 24.0 1.53 0.019* 11.1 1.49 0.016* 3.7 0.71 0.33 9.2 3.08 0.001*
FSL 11.8 0.86 0.26 -1.6 -0.24 0.77 -1.4 -0.26 0.80 3.8 1.70 0.30
FS 5.6 0.36 0.65 0.3 0.04 0.96 -4.5 -0.92 0.33 NA NA NA
Adjusted for age, sex, and site
SPM 20.8 1.34 0.04* 13.7 1.84 0.02* 3.7 0.71 0.34 5.9 1.99 0.011*
FSL 12.7 0.92 0.32 1.37 0.21 0.85 -0.1 -0.02 0.99 -1.6 -0.73 0.47
FS 8.3 0.53 0.60 3.11 0.44 0.64 -2.9 -0.59 0.57 NA NA NA
Adjusted for age, sex, site, and FIQ§
SPM 28.1 1.81 0.001* 17.5 2.35 0.006* 6.2 1.22 0.13 6.5 2.22 0.006*
FSL 19.1 1.44 0.16 3.5 0.55 0.67 2.2 0.42 0.70 -1.4 0.62 0.55
FS 16.1 1.03 0.33 7.9 1.11 0.26 0.3 0.07 0.95 NA NA NA
Table 3.3: ASD vs. TDC brain volume differences
§63 subjects with missing FIQ were excluded from the particular analysis.
Mean (ASD – TDC) difference (diff) in brain volume estimates according to SPM, FSL, and FS. Percentage
ASD vs. TDC group difference was calculated as 100*(𝐴𝑆𝐷 − 𝑇𝐷𝐶)/𝑇𝐷𝐶, where 𝑇𝐷𝐶 is the group mean
of the TDC subjects. Cells corresponding to CSFFS are filled as ‘NA’ since FS does not output total CSF
volume. Statistically significant differences are denoted by * for p < 0.05. ASD vs. TDC differences are
dependent upon the method used and only in SPM TIV, GM and CSF volumes in ASD were significantly
larger than TDC.

43
Figure 3.2: ASD – TDC brain volume differences are method dependent
A) The distribution of raw brain volumes estimated by SPM, FSL, and FS for ASD and TDC. B) ASD vs.
TDC difference in the estimated brain volumes as a percentage of mean TDC is presented as a bar plot for
each method. ASD vs. TDC brain volume difference varied with methods which suggests that subsequent
interpretations are highly dependent on the method of choice.

Results show that ASD vs. TDC differences are highly dependent upon the method used.

According to SPM, ASD had 1.53% (p = 0.019) more TIV than TDC; see Table 3.3. and purple

bar in Figure 3.2B. Whereas according to FS and FSL, ASD had only 0.36% (p = 0.65) and 0.86%

(p = 0.26) more TIV than TDC respectively (see yellow bar in Figure 3.2B.). Similarly, according

to SPM, ASD had 1.49% (p = 0.016) more GM than TDC. In contrast, FSL estimates show that

ASD has 0.24% (p = 0.77) less GM than TDC. Whereas, according to FS, there was small

difference in GM of ASD and TDC. Similar method dependent ASD vs. TDC differences in WM

44
and CSF volume estimates were noted. Method dependence of ASD vs. TDC differences persisted

even after the effects of age, sex, and site were removed (see Table 3.3). Similar results were

observed even after removing the effects of FIQ where 63 subjects with missing FIQ were excluded

from the analysis. We verified our results by removing the effects of site by using subjects from only

one scanning site (NYU). Here again, ASD vs. TDC volume differences were dependent on the

method used (see Table 7.2 in Appendix A).

3.3.3 Differential bias of methods to the diagnostic group

Biases were pairwise calculated between methods using estimates from one method as the

reference and are presented in Table 4. Compared to FSL, SPM showed statistically significant

bias for ASD subjects in TIV (0.8%, p = 0.039), GM (1.9%, p = 0.007), WM (1%, p = 0.038), and

CSF (3%, p = 0.006) volumes. Similarly, compared to FS, SPM showed statistically significant bias

for ASD subjects in GM (1.4%, p = 0.004) and WM (1.5%, p = 0.004) volume estimates. We also

noted that several method biases were larger than ASD vs. TDC differences according to the same

methods. For example, with reference to FSL, SPM bias for ASD subjects in WM estimation was

7 ml, whereas, the inter-group difference in WM volumes according to SPM was 3.7ml (see Table

3). In summary, with reference to FSL and FS, SPM showed positive bias for ASD subjects in

multiple brain volumes. In other words, with reference to SPM, FSL and FS showed negative bias

in brain volume estimation of ASD subjects. Many of the biases were larger than ASD vs. TDC

inter-group differences according to the respective methods. When we repeated these experiments

using subjects from only one scanning site (NYU) we got similar results; see Table 7.3 in Appendix

A for details.

45
Table 3.4: Differential bias for diagnostic group (ASD)
TIV GM WM CSF
bias bias beta bias bias beta bias bias beta bias bias beta
(ml) % p-val (ml) % p-val (ml) % p-val (ml) % p-val
SPM
vs. 11 0.8 0.039* 12 1.9 0.003* 5 1.0 0.038* 7 3.0 0.006*
FSL#
FSL
vs. 5 0.3 0.75 -1 -0.2 0.75 2 0.37 0.75 NA NA NA
FS#
SPM
vs. 13 0.8 0.180 10 1.4 0.004* 8 1.5 0.004* NA NA NA
FS#
# reference method

bias (ml): brain volume (in ml) by which a method systematically overestimates in ASD subjects than in
TDCs.
% bias: the percentage of brain volume by which a method systematically overestimates in ASD subjects
than in TDCs.

3.4 Discussion

In this work, we investigated discrepancies in brain volumes (TIV, GM, WM, and CSF)

estimated by three different preprocessing methods (SPM, FSL, and FS). Brain volume estimates

between the methods had significant correlation, but the absolute values of brain volume estimates

were significantly different between methods. In other words, there was significant method specific

biases while estimating brain volumes. These biases had an influence on ASD vs. TDC brain

volume differences and our results indicate that ASD vs. TDC brain volume differences were

dependent on the method used to estimate the brain volumes. When the methods were compared

pair-wise, significant differential biases for the diagnostic group (ASD) were revealed, and most

biases were larger than ASD vs. TDC differences according to the respective methods. Below we

compare our results with previous findings and discuss potential reasons behind brain tissue volume

discrepancies by investigating segmentation disagreements at anatomical locations. We further

46
provide discussions on inter-method discrepancies at a conceptual level. Finally, we discuss

methods for minimizing inter-method discrepancies.

3.4.1 ASD vs. TDC inter-group differences dependent on the method used

In this study, we found that ASD vs. TDC inter-group differences in brain volumes are

dependent upon the method used. This dependency was evident in all brain volume types. For

example, SPM showed 1.53% more TIV in ASD compared to TDC with statistical significance.

Whereas FS showed that ASD had only 0.36% more TIV compared to TDC and the difference

was not statistically significant. In other words, a research study using SPM to estimate TIV will

report statistically significant TIVASD > TIVTDC but a different study using FS on the same data

will not report statistically significant difference. Similarly, ASD had 1.49% more GM than TDC

according to SPM, and the difference was statistically significant. Whereas according to FSL, ASD

had 0.24 % less GM than TDC. In other words, a study using SPM would report larger GM

volumes in ASD whereas a study using FSL would report smaller GM volumes in ASD. These

results show that the magnitude and even the direction of the effect under investigation is

dependent on the method used, and previous studies have reported similar findings. For example,

Callaert et al. (2014) reported that the age effect on GM volume was significantly dependent upon

the method used to estimate the GM volume. Similarly, Nordenskjöld et al. (2013) reported that

hippocampal volume showed different associations with education depending on which TIV

measure (SPM or FS) was used for hippocampal volume normalization. Rajagopalan et al. (2014)

found significant GM reduction in the motor region of the brain of ALS patients using SPM but

could not replicate the finding using FSL. These results demonstrate that the choice of a

preprocessing method used to estimate brain volumes can have a significant effect on the end

results of a study.

47
3.4.2 Comparison of ASD vs. TDC brain volume differences with previous studies

Due to various heterogeneous neuroimaging findings in ASD our comparisons here are

limited only to meta analytic studies (Via et al., 2011; Radua et al., 2011; Stanfield et al., 2008;

Redcay & Courchesne, 2005) and studies that used the ABIDE dataset (Haar et al., 2014;

Kucharsky Hiess et al., 2015; Riddle et al., 2016). It should be noted that the exclusion criteria and

the number of subjects included in the studies that used ABIDE data were not consistent across

studies. In our study, TIV in ASD was larger than TDC according to all methods, however, the

difference was statistically significant only according to SPM. Riddle et al. (2016) using SPM8

VBM also report greater TIV in ASD (1.58%) with statistical significance. Kucharsky Hiess et al.

(2015) used a different toolbox (ART Brainwash, www.nitrc.org/projects/art) and reported higher

TIV in ASD (1.73%) with statistical significance. Haar et al. (2014) using FS found TIV to be

higher only in 2 of the 18 sites they used and when these 2 sites were removed (26 subjects) overall

ASD vs. TDC difference was not statistically significant and this result is similar to our FS results.

Further, Redcay and Courchesne, (2005) and Stanfield et al. (2008) have also reported similar

results. Interestingly, 1.53% more TIV in ASD reported by SPM in our study is very close to an

estimate (1.534%) predicted by a model proposed by Redcay and Courchesne ( 2005). This model

prediction was for ASD and TDC subjects at age 17.75 years which is the mean age of the subjects

used in our study. Kucharsky Hiess et al. (2015) have also reported similar prediction.

In our study, GM volume in ASD was greater than TDC (1.5%) with statistical significance

according to SPM. An ABIDE study by Riddle et al. (2016) using SPM8 VBM also report a similar

result (1.58% more in ASD). However, in our study GM volume in ASD was slightly smaller

according to FSL (0.2%) without statistical significance and there was little difference in GM

volume according to FS. Haar et al. (2014) report slightly larger GM volume in ASD but without

48
statistical significance, using both FS (d = 0.01 in cortical GM; d = 0.13 in cerebellar GM) and

FSL (d = 0.18). A large meta-analytic study by Via et al. (2011) report no GM volume differences

(d = 0.006) between ASD and TDC. In summary, studies using SPM tend to report slightly larger

GM volume in ASD but results of studies using FSL and FS are inconsistent.

In our study, WM volume in ASD was slightly greater than in TDC (0.7%) without

statistical significance according to SPM. Riddle et al. (2016) also using SPM report similar results

(0.67% more in ASD). Similarly, Haar et al. (2014) report slightly larger WM volume in ASD but

without statistical significance, using both FSL (d = 0.03) and FS (d = 0.13 in cortical WM; d =

0.04 in cerebellar WM). However, in our study, WM volume in ASD was slightly smaller according

to both FSL (0.3%) and FS (0.9%) without statistical significance. A large meta-analytic study by

Radua et al. (2011) report slightly smaller WM volume (d = 0.006) in ASD. In summary, studies

using SPM tend to report slightly larger WM volume in ASD but results are inconsistent in studies

that used FSL or FS.

In our study, CSF volume was larger in ASD according to SPM (3.1%) with statistical

significance. Similarly Lin et al. (2015) using SPM8 New Segment also found greater CSF volume in

ASD (4.75%) with statistical significance. Riddle et al. (2016) using SPM8 VBM also report 1.54%

more CSF in ASD but with no statistical significance. Haar et al. (2014) using FSL also report

slightly larger CSF volume in ASD (d = 0.15) without statistical significance and this is similar to

our FSL finding: 1.7% greater in ASD without statistical significance. In summary, our result

agrees with previous finding of greater CSF volume in ASD and that the magnitude of the

difference is method dependent.

49
3.4.3 Methods have different biases for diagnostic group, and many of them are larger

than inter-group differences

We found that methods have systematic differential biases for diagnostic group (ASD) and

several biases were larger than ASD vs. TDC differences according to the respective methods. In

other words, inherent systematic bias of a method to a variable of interest (ASD) is larger than the

actual effect of the variable (brain volume difference due to ASD). The differential biases shown

by the methods for explains the method dependent ASD vs. TDC group difference in brain

volumes presented in section 3.2.2. With reference to FSL and FS, SPM showed positive bias for

ASD subjects in multiple brain volumes. From different perspective or considering SPM as the

reference method, it can be said that FSL and FS showed negative bias for ASD subjects. In other

words, with reference to SPM, FSL and FS systematically underestimated brain volumes in ASD

subjects compared to TDC. This might be one reason why SPM shows greater brain volumes in

ASD compared to TDC while FSL and FS do not. To conclude which method captures the true

ASD vs. TDC difference further investigation using ground truth data is necessary.

Similar results have been reported in previous studies (Nordenskjöld et al., 2013), where it

was reported that SPM showed bias associated with gender and atrophy while FS showed bias

dependent on skull size. In summary, the above results indicate that systematic differential biases

of a preprocessing method can be assigned as brain volume difference due to thus leading to

incorrect findings.

3.4.4 Locations of the inter-method segmentation discrepancies

To identify the locations of inter-method segmentation discrepancies, twenty subjects were

randomly chosen and their tissue probability maps (TPM) were individually inspected using

MRIcron (http://www.mccauslandcenter.sc.edu/mricro/mricron/index.html). For each tissue

50
type, a subject with the most common segmentation discrepancy was chosen and these

discrepancies are presented in Figure 3.3 where TPMs of different methods are overlaid using

MRIcron.

3.4.4.1 Overestimation of GM by SPM compared to FSL and FS

In Figure 3.3A(i), red represents the regions where voxel probabilities in SPM TPM for

GM are higher than that of FSL TPM; green, vice-versa and yellowish-green represents regions

where voxel probabilities from both methods are similar. Figure 3.3A(ii) compares the histograms

of the voxel probabilities in GM TPMs produced by SPM and FSL.

Our results showed that SPM overestimated GM volume compared to FSL in the following

four main brain regions.

1) Cerebellum: FSL under segments GM (less GM volume or less GM voxel probability)

in the cerebellum – see red regions in box 1 of Figure 3.3A(i).

2) Subcortical Structures: The proportion of GM (compared to WM) according to

SPM is greater in subcortical structures – see red regions in box 2 in Figure 3.3A(i).

Probability values in subcortical voxels are generally greater than 0.8 in GM TPM for

SPM. This accounts for the higher red curve for voxel probabilities greater than 0.8 in

the histogram of voxel probability presented in Figure 3A(ii). In FSL, however,

subcortical voxel probability values are in the 0.3–0.6 range, which accounts for the

higher green curve in the 0.3–0.6 range.

51
Figure 3.3: Inter-method segmentation comparison
Tissue Probability Maps (TPMs) from different methods are overlaid on one another. Red/green represents
the voxels where only one TPM has non-zero probability value. Yellowish green or orange represents
overlapping regions. (A) SPM vs. FSL GM segmentation, (B) SPM vs. FSL CSF segmentation, (C) SPM
vs. FS WM segmentation and (D) SPM vs. FSL full brain map (GM+WM+CSF). A(ii) & B(ii) are histograms
of voxel probability values in GM & CSF TPMs respectively. Although TPMs of different methods predominantly
overlap, there are mismatching regions/voxel values of segmentation that contribute to inter-method differences in brain volumes
estimates.

3) Boundaries of GM and other Structures: SPM assigns regions close to the GM

boundaries of different brain structures as GM indicated by red lines in box 3 and box

5 of Figure 3.3A(i).

52
4) Inter-sulcal CSF: SPM segments some inter-sulcal CSF as GM – indicated by red

regions in box 4. The overestimation of GM in these brain regions by SPM explains the

higher GM estimates by SPM.

3.4.4.2 Low correlation in CSF volume estimates by SPM and FSL

SPM and FSL produced similar ventricular CSF segmentations; overlap or agreement is

presented in orange, box 2 in Figure 3.3B(i). However, the probability values in ventricular voxels

are in the 0.95–0.99 range in CSF TPM of SPM, while probability values are exactly 1 in CSF

TPM of FSL. This accounts for the shift of the FSL (green) curve to the right in the 0.9–1 range in

Figure 3.3B(ii). Discrepancies in non-ventricular CSF estimates were mainly from following three

brain regions. 1) Brain Boundary: The estimation of CSF surrounding the brain, indicated by the

red regions in box 3 and in other brain slices of Figure 3.3B(i) was higher for SPM. This accounts

for the higher SPM (red) curve in the 0.7–0.99 range in the histogram of CSF. This may be due to

the fact that the a priori CSF TPM used by SPM (in New Segment) during segmentation has a thick

layer of CSF at the boundary of the brain. 2) Inter-sulcal CSF: FSL segments greater CSF

compared to SPM in inter-sulcal regions (see green regions in Box1 of Figure 3.3B(i)), where CSF

TPMs of SPM and FSL have probability values in the ranges of 0–0.3 and 0.3–0.7, respectively.

This introduces higher SPM (red) curve in the 0–0.3 range in Figure 3.3B(ii).

Our results indicate that the segmentation discrepancy in non-ventricular CSF

segmentation is the primary cause of the low inter-method correlation in CSF volumes.

Misclassification of bone/air as CSF or vice-versa can be another major source of discrepancy. T1-

weighted images provide a reasonable amount of contrast between GM (dark gray), WM (lighter

gray) and CSF (black). However, dense bone and air also appear dark like CSF. This makes the

segmentation of CSF challenging, especially at the sulcal regions since it is difficult to distinguish

53
between the inner skull and sulcal CSF. Accuracy in CSF segmentation can be improved by

augmenting information from the T2-weighted image as it provides additional contrast between

CSF (bright) and brain tissue (dark).

3.4.4.3 Underestimation of WM by FS compared to SPM

The WM volumes estimated by FS were the lowest in general. The final results of surface

reconstruction and parcellation produced by recon-all were used to report WM segmentation by FS.

Our study indicates that FS produces a considerable number of areas where WM is misclassified

as non-WM (red dots in box 1 of Figure 3.3C). The misclassified areas were primarily due to WM

hypo-intensities misclassified as GM or partial voluming in which WM+GM voxels look like non-

WM and are segmented as non-WM. WM hypo-intensities have values much lower than the

average WM intensity. Although recon-all automatically adds the volume of WM hypo-intensities

to the total WM volume, our results indicate that it cannot still identify all the hypo intensities and

the WM segmentations of FS require significant manual editing.

3.4.4.4 Brain Mask (SPM vs. FSL)

Brain masks in SPM and FSL presented in Figure 3.3D(i) were created by the summation

of GM, WM, and CSF TPMs. The brain mask of SPM (red) is larger than that of FSL (green).

This is mainly due to the overestimation of CSF surrounding the brain by SPM compared to FSL.

In the FSL brain mask, voxel probabilities have only two values, 0 or 1, while the voxel probabilities

in SPM brain masks are continuous in the 0–1 range (see Figure 3.3D(ii)). In FAST of FSL, the

HMRF model used for classification has only three components (GM, WM and CSF); hence, the

summation of these TPMs add up to one in the brain regions. In SPM, however, the Gaussian

mixture model uses a mixture of six Gaussian components for GM, WM, CSF, bone, soft tissue

and air/background; but the brain mask was created by summing only three mixture components:

54
GM, WM and CSF. Therefore, the voxel probabilities in the brain mask of SPM are in the 0–1

range.

3.4.5 Sources of the inter-method discrepancies in tissue segmentation

Inter-method segmentation discrepancies in different locations of the brain were shown in

the previous section. This section discusses the possible reasons behind the inter-method

segmentation discrepancies at a conceptual level. Inter-method differences in tissue segmentation

can be mainly attributed to differences in two factors: method dependent differences in the brain

template and method dependent differences in the spatial normalization process.

3.4.5.1 Brain templates

A brain template or atlas is an anatomical representation of a brain. It is a pre-segmented

standard brain image generated from a single subject or a cohort of subjects. A brain template is

generally used as an a priori to guide tissue classification. The different preprocessing methods used

for tissue segmentation in this study utilize different standard brain templates for prior spatial

information of the brain structures. For example, New Segment of SPM uses ICBM-452 T1 brain

atlas (Mazziotta et al., 2001a) and FS uses MNI 305 (Collins et al., 1994) brain atlas. Whereas

FAST of FSL does not use brain atlas but utilizes HMRF model to encode spatial information

through contextual constraints of neighboring pixels in an image (Zhang et al., 2001). The

differences among brain templates can arise primarily from two sources. First, the definition of

brain structures can vary among the brain templates. Second, the brain templates are created from

different cohorts of subjects scanned under different scanners and hence have different biases for

subject demographics, image quality, and scanner settings. The inter-method discrepancies in

brain structures segmentation can be minimized with the use population specific brain templates

(Mandal et al., 2012; Tanga et al., 2010).

55
3.4.5.2 Spatial normalization

Spatial normalization is the process whereby individual MRIs are registered to the common

anatomical space defined by the standard brain template. Spatial normalization conceptually

consists of two elements: image representation and transformation. The differences due to spatial

normalization starts from the choice in the mathematical representation of an image, i.e. how an

image is mathematically represented to be used in subsequent mathematical operations. For image

representation, several assumptions are made about the properties of the images, and these

assumptions vary with the preprocessing methods. The effect of the differences in these

assumptions propagate further and are finally evident with the discrepant segmentation results.

Similarly, differences arise also from the choice of the transformation applied to the individual

images to register it to a space defined by a standard template. In addition, different spatial

normalization techniques behave differently with different image acquisition parameters, motion

artifacts, and imaging artifacts such as bias field and intensity inhomogeneity. This adds further

discrepancy to the segmentations.

3.4.6 Implications to future neuroimaging studies

Our findings have important implications for the ongoing search for neuroimaging

biomarkers in ASD and other brain disorders. Inconsistencies across previous studies and lack of

evidence for brain biomarkers in ASD may in part be a result of failure to account for the issues

we have raised in this study. To reduce the impact of inter-method differences, we suggest the

following directions that need further investigation.

3.4.6.1 Cross-validation of findings

Results of our study and of many previous studies (Callaert et al., 2014; Eggert et al., 2012;

Nordenskjöld et al., 2013; Rajagopalan et al., 2014) have demonstrated the method dependence

56
of neuroimaging results. Each method has its own strengths and weakness, and there is no general

agreement on which method is optimal. Therefore, we suggest using multiple methods to segment

brain images to cross validate results across methods. In addition, a multi-variate classifier can be

trained on the outputs of several methods to improve the overall segmentation results. A simple

classifier would be a majority voting system where the final decision is made based on the majority

votes. For example, when SPM, FSL, and FS are used for binary tissue segmentation and if SPM

and FSL labels a voxel as CSF whereas FS labels it as GM, then the multivariate system would

label the voxel as CSF.

3.4.6.2 Development of better methods

The present preprocessing schemes apply spatial transformations that may introduce errors

in tissue segmentations and are one of the major causes of inter-method differences. Machine

learning based methods can be a way to perform segmentation with minimal preprocessing, and

among them, deep-learning methods have proven to be more promising. Recently, a few studies

have successfully performed the segmentation of brain structures from MRI using deep learning.

De Brébisson and Montana (2015) have reported competitive accuracy for segmentation of cortical

and sub-cortical structures from MRIs without performing any non-linear registration. Similarly,

(Kim et al., 2013; Lai, 2015) have reported successful segmentation of hippocampus using deep

learning.

3.4.6.3 Focus on improving data itself in addition to methods

Accurate segmentations may not be possible using a single type of MRI since it may not

have sufficient contrast to discriminate the boundaries of brain structures.
For example, T1-

weighted MRI do not have enough information to distinguish between brain structures. It is

difficult to segment CSF surrounding the brain region using T1-weighted MRIs since dense bone

57
as well as air appear dark in the CSF images. A very low inter-method correlation of 0.5 in CSF

volume estimates obtained in this study also demonstrates the difficulty. Whereas CSF appears

bright and air appears dark in T2-weighted MRI and this is very helpful for the CSF segmentation.

Segmentation accuracy of CSF as well as other brain structures can be improved if information

from T1 and T2 images are used together if T2 images are available.

3.4.6.4 Multi-variate patterns are more robust across methods and scanners

In addition to currently popular mass univariate group difference characterization

techniques, a pattern within extracted biomarkers can be explored using multivariate classification

techniques so that the pattern is robust across methods and sites. In this study, we have shown an

example of how a multi-variate pattern is more robust compared to a univariate. In Figure 3.4, the

relation between GM volume and age according to SPM, FSL, and FS are shown. According to

FSL and FS there is linear decrease in GM volume from 6 to 40 years. This is consistent with the

biological understanding of GM in human brain. However, according to SPM the GM volume

first increases until 25 years old age and then decreases afterwards. So, there is a clear mismatch

between the information provided by the three methods. But when the GM volume is divided by

TIV, and the % GM volume is plotted against age, all three methods show the same trend in GM

decrease with age. This example supports our claim that the multi-variate patterns are more robust

across methods.

58
Figure 3.4: Multi-variate pattern is more robust across methods
Gray matter (GM) age dependence is method dependent. But the %GM (100*GM/TIV) is a very basic
multivariate pattern and is robust across methods.

Some recent anatomical studies have utilized multivariate classification techniques to

identify anatomical patterns differing across ASD and TDC group instead of focusing on just one

measure at a time. These studies have reported remarkable accuracies in decoding the group

59
identity in subject-level using anatomical measures such as cortical thickness, geometry curvature,

surface area, etc. (Ecker et al., 2010a; Haar et al., 2014; Jiao, 2011; Uddin et al., 2011). In addition,

it is very likely that brain alterations in heterogeneous condition such as ASD are multi-variate in

nature.

Moreover, novel techniques that are robust to inter-scanner variability can be

implemented. For example, Vardhan et al. (2014) has proposed GM/WM ratio as an indirect way

to model longitudinal growth curves in early childhood since there is change in the GM/WM

contrast as the brain undergoes myelination. According to the above study, the contrast ratio is

relatively invariant to location, scanner type and scanning conditions, which is a highly desirable

feature for multi-site studies

3.5 Conclusion

We demonstrate that ASD vs. TDC group differences in brain volumes are method

dependent. According to SPM, ASD brain volumes were higher than TDC with statistical

significance but according to FSL and FS the difference was not significant. Inter-method brain

volume differences can be attributed to varying definitions of brain structures, use of different

templates, differences in image processing algorithms, and the varying effects of imaging artifacts

and acquisition settings. We suggest that research studies should cross-validate findings across

multiple methods before providing biological interpretations. To our knowledge current studies do

not account for the method dependency of results. Accounting for methodological differences will

be an important step in increasing the reliability and consistency of future neuroimaging findings

of ASD and other brain disorders, leading to a greater likelihood of establishing valid and reliable

neuroimaging biomarkers. We also emphasize that future work is needed to investigate the reasons

behind inter-method discrepancies and the need to develop better methods. Moreover, we also

60
suggest on using multi-variate techniques to characterize the brain alterations in ASD because the

multi-variate patterns are more robust across methods and it is very likely that brain alterations in

heterogeneous condition such as ASD are multi-variate in nature.

61
4 CHAPTER 4: UTILIZING NON-BRAIN

INFORMATION TO IMPROVE ASD DETECTION

BASED ON BRAIN MORPHOLOGY

Some material from (Katuwal et al., 2016c) has been reused in this chapter under Creative Commons Attribution

(CC BY) license.

Low success (<60%) in ASD classification based on brain morphology in the large multi-

site datasets and inconsistent findings on brain morphometric alterations in ASD can be attributed

to the ASD heterogeneity. Morphometric features from MRIs of 734 males (ASD: 361, controls:

373) of ABIDE were derived using FS. Applying the RF classifier, an AUC of 0.61 was achieved.

By augmenting the information from VIQ and age to brain morphometric features we were able

to increase the classification performance. The important features were mainly from the frontal,

temporal, ventricular, right hippocampal and left amygdala regions. However, the important

features highly varied with AS, VIQ, and age. The curvature and folding index features from

frontal, temporal, lingual, and insular regions were dominant in younger subjects suggesting their

importance for early detection. Our findings suggest that identifying brain biomarkers in sub-

groups of ASD can yield more robust and insightful results than searching across the whole

spectrum. Further, it may allow identification of sub-group specific brain biomarkers that are

optimized for early detection and monitoring, increasing the utility of MRI as an important tool

for early detection of ASD.

4.1 Introduction

62
ASD diagnosis based on MRI is highly desirable because it can be objective and can be

utilized for early detection (Glenn, 2010). In addition, it can improve our understanding of brain

anatomical underpinnings of ASD. However, characterizing the brain morphology of ASD has

been challenging due to its heterogeneity (Lenroot & Yeung, 2013). A number of studies have

investigated brain anatomical alterations in ASD compared to that of TDC subjects. The reported

anatomical alterations are mostly contained in the fronto-temporal regions (Bigler et al., 2007; Ha

et al., 2015), the amygdala-hippocampus complex (Groen et al., 2010; Nordahl et al., 2012), corpus

callosum (Wolff et al., 2015), and cerebellum (D’Mello et al., 2015). However, previously reported

regions have not been consistent across studies (Amaral et al., 2008; Chen et al., 2011; Katuwal et

al., 2015a). The variability in differences can be due to the heterogeneity of ASD, differences in

methodological approaches, or a combination of both.

More recently, several studies have applied machine learning on MRI derived brain

features, in an effort for ASD detection. Previous studies on ASD vs. TDC classification can be

mainly categorized into two groups based on the number of subjects used in the study: (1) small

dataset (n < 200) matched for DB measures such as age, sex, and IQs (Ecker et al., 2010b; Jiao et

al., 2010; Uddin et al., 2011; Wee et al., 2014) and (2) large heterogeneous datasets such as the

ABIDE dataset (n > 700) (Haar et al., 2014; Katuwal et al., 2015b; Sabuncu & Konukoglu, 2014).

Table 4.1, below, shows classification accuracies from previous studies and the disparity of results

is clear. For instance, Group 1 using small datasets reports high classification accuracies while

Group 2 using the large heterogeneous ABIDE dataset reports classification accuracies less than

60%.

63
Table 4.1: Previous ASD vs. TDC Classification studies
Study Sample MRI Feature Classification Classification
Accuracy (%)
#ASD/#TDC Technique
Group 1 (small dataset; n < 200)
Ecker et al. 2010 22/22 (Males) Gray matter SVM 81
Ecker, 20/20 (right- Cortical thickness in SVM 90
Marquand, et handed) left hemisphere
al. 2010
Jiao et al. 2010 22/16 Regional cortical LMT 87
thickness
Uddin et al. 24/24 Gray matter in SVM 90
2011 default mode
network regions
Wee et al. 2014 58/59 Regional and inter- SVM 96
regional cortical and
subcortical features
Group 2 (large dataset; n > 700)
Haar et al. 2014 539/573 Regional volume, LDA, QDA <60
surface area and
cortical thickness
Sabuncu & 325/325 Regional volume, SVM, <60
Konukoglu 2014 surface area and
NAF, RVM
cortical thickness
Katuwal et al. 373/361 Volume, surface RF, GBM, SVM 60
2015 area, cortical
thickness, thickness
std., mean curvature,
Gaussian curvature,
folding index
Katuwal et al., 373/361 Zernike moments of RF <60
2016a sub-cortical
structures
SVM: Support Vector Machine; LMT: Logistic Model Trees; LDA: Linear Discriminant Classifier; QDA: Quadratic
Discriminant Classifier; NAF: Neighborhood Approximation Forest; RVM: Bayesian Relevance Vector Machine; RF:
Random Forest; GBM: Gradient Boosting Machine

In this study, we investigate the heterogeneity in ASD as a major reason behind the

inconsistent neuroanatomical findings and the disparity in classification accuracies. We investigate

if DB measures of the subjects can be utilized to mitigate the ASD heterogeneity and facilitate our

effort to understand ASD brain alterations, thereby helping to predict ASD using brain

64
morphology. We use multiple automatically extracted brain morphometric features and multiple

classification techniques for this investigation. The aims of this chapter are as follows:

1. We investigate the incremental predictive power that can be gained by adding DB

measures such as age, VIQ, and AS to the brain morphometric features derived from

MRI.

2. We investigate if sub-grouping the subjects by the above-mentioned DB measures

helps to improve ASD vs. TDC classification.

3. We explore the important features for classification in the sub-groups and how they

change with DB measures.

4. We explain the discrepancies in the reported neuroimaging findings on brain

alterations in ASD and classification accuracies in relation to the heterogeneity in

ASD brain morphology.

4.2 Methods

4.2.1 MRI Data

The ABIDE (Di Martino et al., 2014) dataset with 1,112 MRIs from 17 different sites was used

in this study. Each MRI was inspected visually; those with significant motion or other artifacts were

excluded from analysis. Seventy-six subjects from two ABIDE sites were excluded due to poor

image quality (motion/artifacts). An additional 96 subjects were excluded due to poor image

quality, and another 64 subjects were discarded due to FS (Fischl, 2012) segmentation failure.

Since ASD is highly prevalent in males (Fombonne, 2005) and also to avoid the gender effects, 142

female subjects were excluded. Finally, 734 male subjects (ASD: 361, TDC: 373) were used from

65
the remaining 876 subjects. Summary of DB measures of the used sample is presented in Table

4.2 . Scanner information of the individual sites can be obtained from Table 1 in (Haar et al., 2014)

Table 4.2: Subject demographics and behavioral (DB) measures


ASD TDC ASD vs. TDC
t-test
(p-value)
N 361 373
Age(years) 17.9 ± 8.7 18.1 ± 8.2 0.7

(7 to 64) (6.47 to 56.2)


VIQ 104.5 ± 17.8 112.4 ± 12.9 6.2E-10*
PIQ 105.2 ± 16.8 108.7 ± 13.2 5.7E-3*
FIQ 105.2 ± 16.6 111.8 ± 12.3 4.8E-9*
ADOS 11.9 ± 3.7 NA NA
AS 7.1 ± 2.1 NA NA

4.2.2 Reasons for using male subjects only

We decided to use only male subjects in this study for several reasons. First, we wanted to

remove the gender effects on the ASD brain heterogeneity and focus only on the brain

morphometry of ASD males. This decision was motivated by the previous findings that there are

significant differences between the brain anatomy of ASD males and females. Brain alterations in

female ASD subjects reported by studies using only female subjects (Calderoni et al., 2012; Craig

et al., 2007) have a very small overlap with the alterations reported by the studies performing meta

analyses of predominantly male subjects (Radua et al., 2011; Via et al., 2011). A recent study by

Lai et al. (2013) focusing on the brain anatomical differences of ASD males and females has

reported that the neuroanatomy of adult ASD males and females differed and there was minimal

spatial overlap in both grey and white matter. Second, compared to females, ASD is more

66
prevalent in males (Fombonne, 2005). In addition, among the 876 high quality MRIs available

from ABIDE, 84 % (734) of them were of males.

4.2.3 MRI Data processing

The recon-all preprocessing workflow of FS v. 5.3.0 (Dale et al., 1999a; Fischl et al., 1999a,

2002; Ségonne et al., 2004) was used to extract brain morphometric features. Volume of 40 sub-

cortical structures from Aseg atlas (Fischl et al., 2002) and volume, surface area, Gaussian curvature,

mean curvature, folding index, thickness mean and thickness standard deviation of 34 cortical

structures from Desikan-Killiany atlas (Desikan et al., 2006) were derived from each MRI. In total,

538 brain morphometric features were derived for each subject. The volume features of each

subject were normalized by TIV since percentage or relative brain volumes have been found to be

more robust across scanner types and scanner drifts (Takao et al., 2011) and thus this should reduce

sensitivity to the fact that the data were collected at multiple sites.

4.2.4 Classification Algorithms

RF (Breiman, 2001) classification models were trained using brain morphometric features

for ASD vs. TDC classification. RF was used since it is inherently suitable for parallel processing,

has very few hyper parameters to tune, does not require scaling the data, is theoretically resistant

to overfitting, provides variable importance and has been found to be very good for a variety of

datasets (Breiman, 2001; Caruana & Niculescu-Mizil, 2006). To avoid the results from being

affected by model selection bias, we repeated the experiments using GBM (Friedman, 2000)

classifier models and the results are presented in Appendix B. For both RF and GBM, scikit-learn

0.16.1 (Pedregosa & Varoquaux, 2011) was used. For all classifiers, hyper parameter tuning was

performed with cross-validation and classification performance was estimated by 10-fold cross-

67
validation. RF and GBM classification models are briefly explained below and are explained in

detail in the cited publications.

The optimal number of predictors used to split a node of a decision tree was automatically

estimated by performing grid search within (Öm -Öm /2, Öm + Öm/2) where m is the number of

features. Gini impurity (Breiman, 1996) was minimized while growing decision trees. Gini impurity

is the probability that a randomly chosen sample would be incorrectly labeled if the samples were

labeled according to the distribution of class labels. In general, the higher the number of decision

trees in a RF, the more reliable is the prediction and the interpretability of the variable importance

(Strobl et al., 2009). So, a large number (5000) of decision trees were used in the RF classification

models used in this study.

A grid search on the depth of decision tree (1 to 12) and the subsample ratio (0.5 and 0.7)

was performed to automatically estimate their optimum values. To increase the stability of the

model and increase the reliability of the variable importance, a large number (5000) of decision

trees were used in the GBM classification models used in this study. To avoid overfitting, a very

low learning rate (0.001) was used.

4.2.5 Metrics Used

Classification accuracy and area under the ROC curve (AUC) were used to measure the

success of classification. Accuracy is a threshold-based metric and AUC is a ranking based metric.

AUC is the probability that a classifier will rank a randomly chosen positive example higher than

a randomly chosen negative example with the assumption that the positive example ranks higher

than the negative example (Bradley, 1997). The practical difference or effect size of ASD vs. TDC

group difference was quantified using Cohen’s d (Cohen, 1988).

68
4.2.6 Adding AS, VIQ, and age information to the brain morphometric features

We reduced the heterogeneity of subjects by adding DB measures (AS, VIQ, and age) to

brain morphometric features. The use of AS information for ASD classification may appear to be

a self-fulfilling prophecy. In this study, AS information was used in only to answer the following

questions: 1) Is classification difficulty dependent on severity? and 2) Are important features for

classification consistent with severity?

AS, VIQ, and age information were used under the following two schemes. First, VIQ, and

age were used as training features in conjunction with the morphometric features.

Sub-grouping: In the second scheme, the subjects were sub-grouped by AS, VIQ, and

age into three sub-groups by each as defined in Table 4.3. TDC subjects were not used while sub-

grouping by AS. AS = 5 was used as the threshold between low and mid AS sub-groups according

to Table 2 in (Gotham et al., 2009). The AS = 8 was used as the threshold between mid and high

AS sub-groups since the mean and median for AS ³ 6 are 7.9 and 8 respectively. Only 167 ASD

subjects who had AS information were used in the sub-groups by AS. In sub-grouping by VIQ,

the threshold of VIQ = 90 instead of natural choice 85 (one standard deviation below the median

100) was used to divide the low and normal VIQ sub-groups because there were very few subjects

with VIQ £ 85. Only 586 subjects (296 ASD, 290 TDC) who had VIQ information were used in

the sub-groups by VIQ. In sub-grouping by age, the thresholds 13 and 18 years were chosen

because they approximately reflect pre-puberty, adolescence and early adulthood, and also yielded

in sub-groups with similar sample sizes. Subjects greater than 40 years were not used as they were

very few in number (17).

In each sub-group, ASD vs. TDC classification models were trained using morphometric

features. The sub-grouping resulted in an unbalanced classification problem in sub-groups i.e.

69
classes with uneven sizes (Chawla, 2005). This problem was addressed in two ways: up-sampling

the smaller class in each training fold and down-sampling the larger class. AUC has been used to

evaluate the performance of classification models in sub-groups since AUC is insensitive to the

unbalanced classes (Fawcett, 2006).

Table 4.3: Sub-groups definition


Sub-groups Definition #ASD/#TDC
Up-sampling Down-sampling
AS mild 4 ≤ AS ≤ 5 20/373 20/20
moderate 6 ≤ AS ≤ 7 60/373 60/60
high 8 ≤ AS ≤ 10 76/373 76/76
VIQ low 75 ≤ VIQ ≤ 90 57/15 15/15
normal 90 < VIQ < 115 146/142 142/142
high 115 ≤ VIQ ≤ 150 93/133 93/93
Age young 6 ≤ Age < 13 116/120 NA
(years) mid 13 ≤ Age < 18 114/108 NA
old 18 ≤ Age ≤ 40 121/138 NA
NA: Not Applicable. Down-sampling was not performed in the sub-groups by age since the number of
subjects were comparable and very few subjects had AS and VIQ to use for matching the subjects.

4.3 Results

4.3.1 Classification using only brain morphometric properties

Applying RF on brain morphometric features, classification accuracy of 60% and AUC of

0.61 was achieved. Similar classification performance has been reported by previous studies using

the ABIDE dataset: Katuwal et al. 2015 (60%), Haar et al. 2014 (<60%) and Sabuncu &

Konukoglu 2014 (<60%).

4.3.2 Age and VIQ used as training features in conjunction with brain morphometric

features

We first sought to determine whether simply adding age or VIQ as training features would

improve brain morphometric classification. When age was added to the brain morphometric

70
features for training the classifier, AUC improved to 0.62. When VIQ was added, AUC improved

to 0.66. When both age and VIQ were added, AUC improved to 0.68. In addition, site information

was explicitly added to the morphometric features for training the classifier. One-hot coding

method was used to represent the site information, i.e. for each scanning site, a binary feature was

added with values of one for subjects from the scanning site and values of zero for other subjects.

In total, 17 binary features for 17 sites representing the site information were added to the 538

morphometric features. After adding site information to the morphometric features for training,

AUC did not improve and was same as that from using only the morphometric features.

4.3.3 Sub-grouping subjects by AS, VIQ, and age

4.3.3.1 Up-sampling smaller class in each training fold

In each sub-group, the smaller class was randomly up-sampled in each training fold to

match the number of ASD and TDC subjects. The AUC scores achieved in the sub-groups are

presented in Figure 4.1A and Table 4.4. In Figure 4.1A, a point represents the mean and an error

bar represents the one standard deviation of the AUC scores from 10 test folds. AUC scores and

number of ASD and TDC subjects are presented below the error bar.

71
A) ASD/TDC subjects numbers matched by upsampling the smaller group in each training fold
AS VIQ age
0.9

0.8

0.7 AUC=0.78 0.8


60/373
#ASD/TDC=20/373 0.72
0.75
AUC

0.6 76/373
57/15 0.63 0.62 0.66 0.65
146/142 116/120
0.5 93/133 121/138

0.4 0.5
114/108
0.3
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150 6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40

B) ASD/TDC subjects numbers matched by downsampling the larger group


AS VIQ
1.0

0.9 age, VIQ matched age matched

0.8
AUC=0.92
#ASD/TDC=20/20
0.7 0.81
AUC

60/60
0.6 0.69 0.8
76/76 16/16
0.5
0.62 0.59
142/142
93/93
0.4

0.3
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150

Figure 4.1: Improvement in classification by sub-grouping


The AUC scores of the classification in the sub-group by grouping based on autism severity (AS), age, and
Verbal IQ (VIQ) are presented. A point represents the mean and an error bar represents the one standard
deviation of the AUC scores from 10 test folds. A) Smaller classes were up-sampled in each training fold to
balance the number of ASD & TDC subjects. Sub-grouping improved the classification with the most and
least improvements from sub-grouping by AS and age respectively. B) Larger classes were down-sampled
matching the demographics of the smaller classes. This scheme further improved the classification
performance.

In sub-groups by AS, AUC was 0.78, 0.8 and 0.72 for low, moderate, and high sub-groups

respectively. The sample sizes in the sub-groups by AS were unequal. To determine if the results

may have been due to unequal sample sizes, separate classification models were built for the

subjects with AS = 4-5 (#ASD/#TDC = 20/373), 6 (33/373), 7 (27/373), 8 (25/373), 9 (29/373)

and 10 (22/373). The results of this experiment are presented in Figure 8.6 in Appendix B. In this

experiment, where the sample sizes in the sub-groups were comparable, AUC decreased with the

AS according to both RF and GBM. There was strong negative correlation (RF: r = -0.72, p =

0.1, GBM: r = -0.86, p = 0.028*) between mean AUC and mean AS of sub-groups.

72
Table 4.4: Classification AUC in sub-groups created by AS, VIQ and age
Up-sampling Down-sampling
Sub- AUC AUC
groups RF GBM RF GBM
AS mild 0.78 ± 0.09 0.74 ± 0.16 0.92 ± 0.12 0.92 ± 0.11
moderate 0.80± 0.10 0.76 ± 0.09 0.81 ± 0.12 0.80 ± 0.10
high 0.72 ± 0.07 0.69 ± 0.07 0.69 ± 0.08 0.68 ± 0.11
VIQ low 0.75 ± 0.20 0.71 ± 0.31 0.80 ± 0.30 0.80 ± 0.31
normal 0.63 ± 0.05 0.65 ± 0.05 0.62 ± 0.11 0.63 ± 0.11
high 0.62 ± 0.08 0.62 ± 0.07 0.59 ± 0.12 0.54 ± 0.13
Age young 0.66 ± 0.11 0.67 ± 0.10 NA NA
(years) mid 0.50 ± 0.13 0.51 ± 0.14 NA NA
old 0.65 ± 0.13 0.66 ± 0.14 NA NA
NA: Not Applicable. Down-sampling was not performed in the sub-groups by age since the number of
subjects were comparable and very few subjects had AS and VIQ to use for matching the subjects. Mean
and standard deviation of the AUC across 10 test folds are presented.

In sub-groups by VIQ, AUC decreased with VIQ, with AUC of 0.75, 0.63 and 0.62 for

low, normal and high VIQ sub-groups respectively. In sub-groups by age, AUC was modest in

young and old sub-groups with AUC of 0.66 and 0.65 respectively. AUC was low (0.5) in mid-age

sub-group.

In summary, sub-grouping the subjects by AS, VIQ, and age improved the classification

rate with the most and least improvements from sub-grouping by AS and age respectively. The

results from GBM were similar and are presented in Table 4.4 and Figure 8.1 in Appendix B.

4.3.3.2 Down-sampling the bigger class to match the demographics of the smaller class

In the above section, although the subjects were more homogenous after sub-grouping, the

distribution of other DB measures of ASD and TDC subjects in the sub-groups might be different.

This raises a concern that the results from the up-sampling scheme could have been influenced by

the difference in DB measures distribution. To check if the results are not due to the different

demographics, ASD and TDC subjects in each sub-group were matched on demographics. In each

73
sub-group, the bigger class was down-sampled to match the ASD and TDC subjects on age and/or

VIQ; see Table 4.3 for the number of subjects. Subjects were matched by age and VIQ in the sub-

groups by AS and by age in the sub-groups by VIQ. For sub-groups by age, down-sampling was

not performed as the number of subjects in each group were comparable and very few subjects

had AS and VIQ.

Classification performance further improved after matching the subject demographics. A

high AUC of 0.92 was achieved in the low AS sub-group; see Table 4.4 and in Figure 4.1B.

Similarly, high AUCs of 0.81 and 0.80 were achieved for moderate AS and low VIQ sub-groups

respectively. The AUC trends from this experiment were the same as that from the up-sampling

scheme presented above, i.e. AUC decreases with AS and VIQ. The results from GBM were

similar and are presented in Table 4.4 and Figure 8.1 of Appendix B. When separate classification

models were built for the ASD subjects with each level of AS, AUC sharply decreased with AS

according to both RF and GBM (RF: r = -0.86, p = 0.029*, GBM: r = -0.87, p = 0.026*); see

Figure 8.6 of Appendix B.

To confirm that the increase in classification performance after sub-grouping is not due to

optimization issues and is actually due to the reduction in the heterogeneity in the sub-groups, we

tested the RF classification model trained in one sub-group on other sub-groups. To obtain the

distribution of test scores, the classification model trained in one sub-group was tested on 200

bootstrap replications from another sub-group. Classification results are presented in Figure 8.3 of

Appendix B, where each sub-plot represents a sub-group and three data points correspond to the

performance scores (when tested on the sub-group) of the models trained in three sub-groups. The

AUC scores in intra-subgroup classification were much larger than the AUC scores in inter-

subgroup classification in 16 out of 18 comparisons. AUC score of intra-subgroup classification

was lower than inter-subgroup classification in 2 comparisons. This disparity occurred in the mid-

74
age sub-group where the intra-subgroup classification was close to chance (50% success) and the

inter-subgroup rates were also close to chance (53% and 52% success). Moreover, the AUC scores

decreased when the difference between the training and testing sub-groups increased along the

variable by which sub-groups were defined. For example, when the classification models were

tested on the low-VIQ sub-group, the AUC scores decreased from 0.75, 0.64, and 0.35 respectively

as the subjects from the low, mid, and high VIQ sub-groups were used for training the models.

4.3.4 Multivariate analysis: Important features for classification and their variability

across sub-groups

The top 10 important features for classification in each sub-group with matched subjects

(i.e. from section 4.3.3.2) are presented in Figure 4.2. The top features for classification across all

subjects are in Figure 4.2D. Each feature is represented by a bar whose length is proportional to

its importance for the classification. The feature importance was calculated as an average of the

importance scores from 10 test folds. Before each feature, ASD vs. TDC Cohen’s d and two sample

t-test significance (P<0.005** and P<0.05*) are presented. The different morphometric features

are color coded and have been grouped together. Findings for the volume features reported in this

study are after they were normalized by TIV.

The important features for classification varied across the sub-groups. However, the

important features were mainly from the frontal, temporal, insular, ventricular, right hippocampal,

and left amygdala regions. Most of the important features from RF and GBM were common; see

Figure 4.2D for RF results and Figure 8.2 of Appendix B for GBM results.

75
A) AS
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10
1.65**lh_lateralorbitofrontal_area 0.78**X3rd.Ventricle_volume 0.63**Right.choroid.plexus_volume
1.31**rh_medialorbitofrontal_area 0.73**lh_parahippocampal_volume 0.21 SubCortGrayVol_volume
1.75**lh_fusiform_thicknessstd 0.77**rhSurfaceHoles_volume 0.48**X3rd.Ventricle_volume
1.38**lh_bankssts_thicknessstd 0.92**rh_inferiorparietal_meancurv 0.63**rh_medialorbitofrontal_thicknessstd
1.17**lh_inferiortemporal_thickness 0.83**lh_superiortemporal_meancurv 0.65**rh_middletemporal_thicknessstd
1.31**rh_parsopercularis_thicknessstd 0.8** lh_middletemporal_meancurv 0.61**rh_parsopercularis_thicknessstd
1.22**lh_inferiorparietal_meancurv 0.5* rh_precentral_meancurv 0.42* lh_superiorparietal_thicknessstd
1.25**lh_lateraloccipital_meancurv 0.89**rh_inferiortemporal_meancurv 0.67**lh_middletemporal_meancurv
1.13**rh_lateraloccipital_gauscurv 0.09 rh_caudalmiddlefrontal_gauscurv 0.57**lh_frontalpole_meancurv
0.58 lh_supramarginal_gauscurv 0.3 lh_supramarginal_gauscurv 0.05 lh_frontalpole_gauscurv

B) VIQ
75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150
1.66**rh_parsopercularis_area 0.14 rh_parahippocampal_volume 0.25 lhCorticalWhiteMatterVol_volume
1.48**rh_WhiteSurfArea_area 0.26* lh_insula_volume 0.24 CorticalWhiteMatterVol_volume
1.33**lh_WhiteSurfArea_area 0.01 rh_parsorbitalis_volume 0.32* Right.Hippocampus_volume
1.44**rh_medialorbitofrontal_area 0.42**Left.Lateral.Ventricle_volume 0.3* Left.Hippocampus_volume
0.92* rh_parahippocampal_area 0.37**Right.Lateral.Ventricle_volume 0.36* Right.vessel_volume
1.41**rh_middletemporal_area 0.35**rh_parahippocampal_thickness 0.22 rhCorticalWhiteMatterVol_volume
1.03* lh_rostralanteriorcingulate_area 0.09 lh_parsopercularis_foldind 0.5** lh_frontalpole_thicknessstd
1.45**lh_precentral_area 0.13 lh_supramarginal_foldind 0.46**lh_inferiortemporal_thicknessstd
1.14**rh_precuneus_area 0.07 rh_precuneus_foldind 0.32* lh_frontalpole_gauscurv
0.13 lh_rostralanteriorcingulate_foldind 0.22 lh_frontalpole_gauscurv 0.02 rh_lateralorbitofrontal_gauscurv

C) Age (years)
6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40
0.23 rh_insula_foldind 0.48**CC_Central_volume 0.5** lh_transversetemporal_volume
0.2 lh_lateralorbitofrontal_foldind 0.35* CC_Mid_Anterior_volume 0.49**rh_bankssts_volume
0.29* lh_fusiform_foldind 0.39**X3rd.Ventricle_volume 0.45**lh_inferiortemporal_volume
0.19 lh_superiortemporal_foldind 0.36* Left.Lateral.Ventricle_volume 0.42**rh_superiortemporal_volume
0.18 lh_lateralorbitofrontal_gauscurv 0.03 lh_caudalmiddlefrontal_area 0.34* lhCortexVol_volume
0.28* lh_frontalpole_gauscurv 0.42**lh_lingual_thickness 0.27* Left.Putamen_volume
0.01 rh_medialorbitofrontal_gauscurv 0.34* rh_lingual_thicknessstd 0.32* Right.Hippocampus_volume
0.02 rh_lingual_gauscurv 0.29* lh_pericalcarine_thickness 0.41**Left.Amygdala_volume
0.35* lh_fusiform_gauscurv 0.26 lh_caudalanteriorcingulate_thickness 0.17 lh_medialorbitofrontal_thicknessstd
0.25 rh_parsopercularis_gauscurv 0.17 lh_entorhinal_gauscurv 0.38**lh_pericalcarine_meancurv

0 25 50 75 100 0 25 50 75 100 0 25 50 75 100


Feature Importance(%)
D) ALL Subjects
ALL
0.25**Left.Lateral.Ventricle_volume
Morphometric Attributes
0.02 rh_parahippocampal_volume
volume
0.24**CSF_volume area
0.23**Left.Inf.Lat.Vent_volume thickness
0.14 Left.Amygdala_volume foldind
0.17* CorticalWhiteMatterVol_volume meancurv
gauscurv
0.17* lhCorticalWhiteMatterVol_volume
0.07 rh_bankssts_volume Group Difference
0.04 rh_parstriangularis_area a TDC > ASD
a ASD > TDC
0.13 lh_frontalpole_gauscurv

0 25 50 75 100
Feature Importance(%)
76
Figure 4.2: Important features for classification are different across sub-groups.
Top 10 important features for ASD spectrum disorder (ASD) vs. typically developing controls (TDC)
classification in each sub-group (by AS, VIQ, age) are presented. Each feature is represented by a colored
bar; the length of the bar represents the relative % importance for classification with respect to the top
feature. The features have been grouped and color-coded by volume, area, thickness mean, thickness
standard deviation, folding index, mean curvature and Gaussian curvature. Before each feature, Cohen’s d
and two sample t-test significance (P<0.005** and P<0.05*) of ASD vs. TDC group difference are
presented. The important features for classification varied across the sub-groups demonstrating the
heterogeneity in ASD brain morphometry.

To remove the concern that the arbitrary cutoff of top 10 might have influenced our results,

the important features were also selected by another technique based on cumulative distribution

of the feature importance scores. After sorting the features in descending order of their importance

scores, the scores were cumulatively added starting from the most important feature. The features

required to reach 10% of the total sum of the scores were considered important and the

corresponding feature importance plot for RF is presented in Figure 8.4 of Appendix B. In

addition, we relaxed our criteria for important features and used 25% threshold; see Figure 8.5 of

Appendix B. Even after using this different technique to select the important features with multiple

thresholds, the top features for classification were highly dissimilar across the sub-groups.

The important features according to two classifiers were similar suggesting that the results

are not influenced by model choice. To statistically verify the similarity, we performed the

Pearson’s correlation test between the importance scores of all features from the two classifiers. We

performed the test separately in nine sub-groups and the correlation coefficients are reported in

Table 8.1 of Appendix B. All correlation coefficients were high (r > 0.75 in 9 and r > 0.85 in 7

sub-groups) and statistically significant (p < E-16). The high similarity between the feature

importance scores from two different classifiers supports that the important features reported in

this study are not affected by model choice and hence are robust.

77
To demonstrate the extent of the heterogeneity in brain alterations, variability of the 13

important features with AS, VIQ and age are presented in Figure 4.3. The 13 features include the

top feature from each sub-group (9 in total) and 4 important features from the classification using

all the subjects.

Statistical Significance (p<0.05) FALSE TRUE

lh_fusiform_thicknessstd lh_inferiortemporal_thicknessstd Left.Lateral.Ventricle_volume


rh_inferiorparietal_meancurv rh_insula_foldind CorticalWhiteMatterVol_volume
Right.choroid.plexus_volume CC_Mid_Anterior_volume Left.Amygdala_volume
lh_rostralanteriorcingulate_foldind lh_pericalcarine_meancurv
rh_parahippocampal_volume lh_frontalpole_gauscurv
AS VIQ age

1.5

1.0
ASD - TDC effect size (Cohen's d)

0.8

0.5

0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3

-0.5

-0.8

4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150 6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40
Sub-groups

Figure 4.3: Variability of the important features.


A total of 13 important features for classification are presented; the top feature for each sub-group (9 in
total) and the 4 important features for all subjects. The magnitude and direction of the ASD vs. TDC group
differences of the top features varied with ASD severity (AS), verbal IQ (VIQ), and age demonstrating the
heterogeneity in brain morphometry.

78
Curvature and thickness based features were predominant in the sub-groups by AS; see

Figure 4.2. Interestingly, there were no important volume features in the low-AS sub-group.

Volume features were present in moderate and high-AS sub-groups and many of them were from

ventricles. Thickness standard deviation of the left fusiform gyrus (red line in Figure 4.3) was the

most important feature in the low AS sub-group and had very large ASD vs. TDC group difference

(d = 1.75, p = 4E-16*). The group difference decreased with AS but was still high (d = 0.57, p =

0.0006) in the high AS sub-group. Interestingly, the group difference even changed its direction

with VIQ. The difference was negative (ASD < TDC) with small effect size (d = -0.11, p = 0.7) in

the low-VIQ sub-group but was positive and statistically significant with medium effect size (d =

0.36, p = 0.01*) in the high-VIQ sub-group. Similarly, mean curvature of the inferior parietal

gyrus (blue line in Figure 4.3) was the most important feature in the moderate-AS sub-group. It

was significantly larger in ASD (d = 0.91, 2E-6*). The group difference decreased with AS but was

still high (d = 0.51, p = 0.0006*) in the high AS sub-group. This feature also showed the reversal

in the direction of the group difference- ASD > TDC with medium effect size (d = 0.41, p = 0.02*)

in the young-age sub-group and ASD < TDC with medium effect size (d = -0.3, p = 0.03*) in the

old-age sub-group. Right choroid plexus volume (green line in Figure 4.3) was the most important

feature in the high-AS sub-group and had positive (ASD>TDC) group difference with large effect

size (d = 0.55, p = 9E-4*). Across all subjects, it was larger in ASD with small effect size (d = 0.18,

p = 0.02*). There was large positive group difference (d = 0.71, p = 0.05) in the low-VIQ sub-

group, however, it decreased with VIQ and was negative in the high-VIQ sub-group (d = -0.15, p

= 0.3).

Folding index of left rostral anterior cingulate gyrus (orange line in Figure 4.3) was the most

important feature in the low-VIQ sub-group with small negative group difference (d = -0.13, p =

0.7). It is an interesting observation that it is the most important feature for classification even when

79
the ASD vs. TDC group difference is very small and statistically insignificant. One thing to

remember is that it is the most important in the multi-variate setting where the importance of a

feature is dependent on its relationship with other features. For example, the group difference of

the ratio of folding index of left and right rostral anterior cingulate gyrus was large (d = 0.7, p =

0.05). This demonstrates the superiority of MVPTs over univariate techniques by its ability to

automatically find inter-variable relationships important for inter-group distinction. Similarly,

volume of right parahippocampal gyrus (black line in Figure 4.3) was the most important feature

in the mid-VIQ sub-group with small negative group difference (d = -0.15, p = 0.2). It was also an

important feature in classification using all subjects; see Figure 4.2D. Thickness standard deviation

of left inferior temporal gyrus (brown line in Figure 4.3) was the most important feature in the high-

VIQ sub-group where it was larger in ASD with medium effect size (d = 0.46, p = 0.002*).

However, the group difference was nearly zero in the normal-VIQ sub-group and even flipped its

direction in the low-VIQ sub-group (d = -0.45, p = 0.2).

The important features across the sub-groups by age were distinct. Folding index and

Gaussian curvature features from the frontal and temporal regions were predominant and there

were very few volume, thickness, and area based important features in the young-age sub-group.

The volume features became more dominant with increase in age and most of the important

features in the old-age sub-group were volume-based. Folding index of right insula gyrus (purple

line in Figure 4.3) was the most important feature in the low-age sub-group with small positive

ASD vs. TDC group difference (d = 0.23, p = 0.08). The group difference decreased with age and

was nearly zero for the old-age sub-group. Volume of the mid anterior corpus callosum (dotted red

line in Figure 4.3) was the most important feature in the mid-age sub-group where it was smaller

in ASD (d = 0.35, p = 0.03*). However, in the young-age sub-group, it was larger in ASD (d =

0.15, p = 0.4). Mean curvature of the left pericalcarine gyrus (dotted blue line in Figure 4.3) was

80
the most important feature in the old-age sub-group and was larger in ASD (d = 0.3, p = 0.02*)

but was smaller in ASD (d = -0.38, p = 0.002*) in the old-age sub-group.

Across all subjects, Gaussian curvature of frontal pole was the most important feature. The

volume features were predominant and were mainly from the left amygdala, right

parahippocampal, ventricular and temporal regions. As other important curvature based features,

the group difference in Gaussian curvature of frontal pole (dotted green line in Figure 4.3) was the

largest in younger subjects (d = 0.28, p = 0.03*) and was the smallest for older subjects (d=0.02, p

= 0.9). Among the ventricular volumes, left lateral ventricle volume (dotted orange line in Figure

4.3) was the most important across all subjects (d=0.24, p = 0.002*). It was larger in ASD and the

group difference decreased with VIQ and increased with age. Across all subjects, all the ventricles

were larger in ASD compared to TDC and the group differences were statistically significant

(before multiple comparisons). In general, except 3rd and 4th ventricles, the group difference in

ventricles decreased with VIQ; see Figure 4.3. Left amygdala (dotted brown line in Figure 4.3) was

also an important feature for classification across all subjects. It was larger in ASD in the old-age

sub-group with medium effect size (d = 0.41, p = 0.001*) but was smaller in the young-age sub-

group (d = -0.15, p = 0.3). Likewise, most of the important features showed high variability with

AS, VIQ, and age and even changed the ASD vs. TDC group difference direction.

4.4 Discussion

Modest classification success is achieved using only brain morphometric properties. We

demonstrated that the low success rate was due to the heterogeneity in ASD alterations by showing

the variability of important features across the sub-groups created by demographics and behavioral

measures AS, VIQ and age. To mitigate the challenges imposed by the ASD heterogeneity, we

then utilized AS, VIQ and age information in conjunction with the brain morphometric features.

81
We could significantly improve the classification success after utilizing extra information from AS,

VIQ and age and demonstrated this using two different classification techniques. When the

classification models trained in one sub-group were tested on other sub-groups, the inter-subgroup

classification scores were much lower than the intra-subgroup classification scores. Moreover, the

classification scores decreased when the difference between the training and testing sub-groups

increased along the variable by which sub-groups were defined. These results support our

hypothesis that the sub-grouping of subjects results in the heterogeneity reduction and hence the

improvement in classification performance.

The analysis of the important features for classification in conjunction with the univariate

tests provided valuable insight on structural alterations of autistic brains. The alterations were

mainly from ventricular, frontal, temporal, left amygdala and right hippocampal regions of the

brain. The important features from two different classification techniques were similar,

demonstrating the robustness of our results. Below we discuss some interesting observations on

heterogeneity of ASD brain morphometry in relation to the previous inconsistent neuroanatomical

findings and discrepancies in the classification accuracies. In addition, challenges and future

directions for neuroimaging studies on ASD prediction using brain morphometry are discussed.

4.4.1 Classification becomes difficult with increase in AS

Classification AUC was the lowest in the high-AS sub-group for both up-sampling and

down-sampling schemes; see Figure 4.1A. The results from GBM were similar and are presented

in Figure 8.1. Moreover, AUC decreased with AS when sub-groups were created for each AS

values according to both RF and GBM classification models; see Figure 8.6. This result is opposite

to that of Katuwal et al. (2015b) where it was reported that the classification accuracy increases

with Autism Diagnostic Observation Schedule (ADOS) score. AS is the standardized version of

82
ADOS (Gotham et al., 2009). This discrepancy is likely due to the difference in experiment design-

the classification model was trained across all subjects in the experiment of Katuwal et al. (2015b)

and separate classification models were trained in each sub-group in this study. In addition, only

167 ASD subjects had AS scores available and were present in AS sub-groups of this study, while

in Katuwal et al. (2015b), 361 ASD subjects were used to train classification models in leave-one-

out cross validation framework.

The decrease in AUC suggests that the most severely autistic subjects are the most difficult

to classify, perhaps because the most severely autistic subjects are the most heterogeneous. In each

sub-group, to check the heterogeneity in ASD data, we calculated the net variance as mean of the

relative standard deviation (s/mean) of all features. The net variance in ASD brain morphometry

increased with AS – 0.42, 0.47 and 0.49 for low, mid and high AS sub-groups respectively. Even

in the down-sampling scheme with demographics matched subjects, the net variance of the brain

morphometry of ASD subjects increased with AS- 0.46, 0.55 and 0.57 for low, mid and high AS

sub-groups respectively. This suggests that the brain morphology of ASD subjects becomes

increasingly dissimilar in more severe cases, reinforcing the difficulties in the classification.

In this study, we are not suggesting that the predictive models constructed for each AS sub-

group can be directly used in clinical practice. ASD subjects were sub-grouped by AS only to

investigate how ASD vs. TDC classification success and the features important for classification

change with AS. By demonstrating that the important features are very dissimilar across the

severity scores, we are suggesting that stratifying ASD subjects by severity scores might be helpful

to better understand the brain alterations in ASD subjects. In future clinical practice, improved

knowledge of brain alterations in ASD subjects will provide evidences to support currently existing

clinical diagnosis that are based on behavior.

83
4.4.2 Folding index and curvature features may be important markers for early

detection of ASD

In the young age sub-group, where the classification was performed between young ASD

and young TDC subjects, the most common top important features for classification were folding

index and curvatures (mean and Gaussian) of frontal, temporal, lingual and insular regions. The

importance of these features decreased with age and eventually in the old-age sub-group, volume

features were predominant; see Figure 4.2. The folding index and curvature features that were

important in the young-age sub-group and/or whose ASD vs. TDC group differences across all

734 subjects were statistically significant (multiple comparisons uncorrected) are presented in

Figure 4.4. All features (except Gaussian curvature of right lingual gyrus) were larger in young ASD

subjects compared to young TDC subjects. The group difference decreased with age and the

direction for 12 out of 17 features even flipped the direction (smaller in ASD) in the old-age sub-

group. This result not only demonstrates the heterogeneity in ASD brain morphometric differences

but also gives an important clue for the early diagnosis of ASD using brain morphometry. This

result suggests that the most prominent brain morphometric alterations in young ASD subjects

may be the folding index and (mean and Gaussian) curvatures of the frontal, temporal, lingual and

insular regions. A study by Nordahl et al. (2007) has also reported that cortical shape alterations

(measured by sulcal depth) were pronounced in children of age 7.5 to 12.5 years. Similarly, a study

by Auzias et al. (2014a) has reported a statistically significant and consistent pattern of shape

alterations in central, intra-parietal and frontal medial sulci in the children (18-108 months). The

study also reported that the shape descriptors of several sulci from frontal and temporal regions

and age were statistically significant. Moreover, the study also reported significant correlations

84
between the different sulcus shape descriptors and Childhood Autism Rating Scale (CARS) and

ADOS scores.

rh_superiortemporal_meancurv lh_fusiform_gauscurv
rh_rostralmiddlefrontal_meancurv rh_parsopercularis_gauscurv
rh_middletemporal_meancurv rh_fusiform_foldind
rh_inferiortemporal_meancurv rh_insula_foldind
lh_frontalpole_meancurv lh_fusiform_foldind
lh_bankssts_meancurv lh_lateralorbitofrontal_foldind
lh_lateralorbitofrontal_gauscurv lh_superiortemporal_foldind
lh_frontalpole_gauscurv lh_pericalcarine_meancurv
rh_lingual_gauscurv

Statistical Significance (p<0.05) FALSE TRUE

age

0.5

0.4

0.3
ASD - TDC effect size (Cohen's d)

0.2

0.1

0.0

-0.1

-0.2

-0.3

6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40


Sub-groups

Figure 4.4: Folding index and curvature features are important for classification in young
subjects
The important folding index and curvature features from the young-age sub-group and/or whose ASD vs.
TDC group differences across all subjects were statistically significant (multiple comparisons uncorrected)
are presented. The features are mainly from frontal, temporal, lingual and insular region, and are larger in

85
ASD. However, the group differences decrease with age and even the direction of the group difference flips
for 12 out of 17 features.
Knowledge of brain morphometric differences in young ASD subjects is comparatively

more valuable than that in older ASD subjects. Successful identification of robust brain biomarkers

for ASD diagnosis in young patients would allow early intervention, likely increasing the success of

ASD treatment. Our results show that the shape alterations of frontal, temporal, lingual, and

insular brain regions are important to classify ASD from TDC in young children. For this reason,

the shape of these brain regions merit special attention. However, most of the previous studies on

ASD brain alterations are based on volume, area, and thickness features and very few studies are

based on shape features such as curvature, folding index, and sulcal depth. We therefore

emphasize the need to include less conventional features of brain morphometry in order to improve

existing classification procedures for ASD.

4.4.3 ASD heterogeneity in brain morphometry

We were able to increase the classification AUC up to 0.68 from 0.61 by adding AS, VIQ

and age with the brain morphometric features. This shows that the brain morphometry and DB

measures have some non-overlapping information. In addition, when the subjects were sub-

grouped by AS, VIQ, and age, and the classification models were trained in each sub-group, there

was significant improvement in classification performance. The important features for

classification and the strength and the directionality of the ASD vs. TDC group difference in the

important features highly varied across the sub-groups. This variability was presented in detail in

section 4.4. Previous studies have also reported variability in brain alterations with factors such as

age, gender, handedness etc. A study by M.-C. Lai et al. (2013) reported that the brain regions

affected in ASD males and ASD females have little overlap. Similarly, Floris et al. (2013) has

reported strong rightward lateralization in the posterior and anterior mid-body of corpus callosum

86
demonstrating that the pattern of the lateralization is strongly depended on the handedness of the

subjects with ASD. A recent study by Lin et al. (2015) reported that the regional brain volume

differences between ASD and TDC males are highly age-dependent. Through the age-stratified

analyses, they showed that the patterns in GM and WM volumetric alterations in ASD are distinct

among the subsamples of children, adolescents and adults. Therefore, the results from this study

and the previous studies support the hypothesis that brain alterations in ASD are highly

heterogeneous across the ASD population.

4.4.4 Low classification success in large multi-site data due to ASD heterogeneity

The heterogeneous nature of the brain morphometry in ASD partially explains the

discrepancy in the predictive performances reported by the two groups of previous studies: small

sample size studies reporting high classification accuracies and the studies using the large multi-site

ABIDE dataset reporting low classification accuracies. This is counterintuitive to the general idea

that the generalized classification performance increases with the training sample size (Sordo &

Zeng, 2005). Large number of subjects from the multi-site datasets such as ABIDE provide more

information about brain morphometry than that from small sample size. However, the amount of

variance added due the ASD heterogeneity can surpasses the extra information gained with the

increase in sample size. As the heterogeneity of the data increases, the training, validation, and

testing folds used for estimating generalized predictive performance of a model become

increasingly dissimilar to each other. As a result, the model trained using training and validation

folds performs poorly in the test fold, hence decreasing the generalized predictive performance

estimated by cross validation. This explains the low accuracies (<60%) achieved by the previous

studies using the ABIDE dataset. A much larger dataset is required so that the information gained

from the increase in sample size is greater than the increase in variance introduced by the

87
heterogeneity of ASD. Larger standardized datasets easily accessible to the research community

would, therefore, be highly valuable. On the other hand, subjects collected in a site and matched

for factors such as age, sex, IQs etc. are relatively homogenous. The training, validation, and

testing folds are more similar to each other, hence, the predictive performances of the models

estimated by cross validation are larger. This explains the high accuracies achieved by the previous

studies using small well-matched data. In addition, although all previous studies using small sample

size have reported the use of cross validation to estimate the predictive performance, many of them

have not explicitly mentioned that the feature selection and classification steps were done under

the same cross validation framework. When the two steps are under different cross validation

framework i.e. when the feature selection is performed using all data, there will be a data leak. The

data leak results in an over-fitted model whose predictive performance cannot be generalized with

confidence. A recent study by Katuwal et al. (2015b) has reported that the high classification

accuracies reported in some of the previous studies might be due to over-fitted models caused by

data leak. In addition, the study demonstrated that the amount of over-fitting and hence the

overestimated predictive performance increases with the decrease in sample size. Another reason

for high classification accuracies in small datasets may arise due to manual feature engineering

performed specific to the dataset. Caution should be taken to interpret the manually crafted

features used in these studies. The interpretation of the important features for classification would

be more relevant and meaningful with respect to the subjects’ DB measures such as age, sex, IQs,

handedness etc.

4.4.5 Future direction for neuroimaging studies

Developing a single successful classification model to predict ASD type and severity using

brain morphometry is likely the ultimate objective of the research on ASD diagnosis using brain

88
morphometry. However, there are a few limitations and road blocks to be overcome before this

objective can be achieved. First, it requires a much larger standardized data set than currently

available datasets such as ABIDE. Second, based on the results shown here, it might not provide

insight into the neuroanatomical basis of ASD as it would be difficult to perform exploratory

analysis across large heterogeneous subjects compared to that in distinct homogenous sub-groups.

4.4.5.1 Divide and conquer: focus on smaller distinct homogenous sub-groups

ASD as currently diagnosed is a collection of autisms. It is highly heterogeneous in its

etiology, comorbidity, pathogenesis, genetics, severity, and brain morphology (Betancur, 2011;

Happé et al., 2006; Jeste & Geschwind, 2014; Lenroot & Yeung, 2013). As shown in Figure 4.2

and Figure 4.3, brain alterations in ASD compared to TDC are highly variable across AS, VIQ,

and age. For highly heterogeneous conditions such as ASD, it is very hard to find a robust global

brain biomarker. A better alternative may be to focus on the relatively more homogenous smaller

sub-groups defined by several criteria such as age, sex, IQs, handedness, severity, etc. Several

previous studies have suggested the same (Lenroot & Yeung, 2013; Volkmar et al., 2009). Dividing

the ASD population into distinct sub-groups provides more exploratory power to the study and

provides deeper insights into the anatomical alterations in ASD. The brain biomarkers identified

by this technique albeit local with respect to some DB measure, are more robust and provide

valuable insight on their effects in relation to the respective DB measure. For example, when the

subjects were grouped by age and classification was done separately, folding index and curvature

features were predominant in younger subjects (see Figure 4.2C and Section 4.3.4). ASD detection

in young age is more desirable as it would provide more time for early intervention. So, for the

classification in young subjects, one approach might be to focus on the folding index and curvature

based features from insular, fusiform, frontal and temporal regions. Similarly, although the

89
ventricular volumes were larger in ASD and were important features for classification, the ASD

vs. TDC group difference sharply decreased with VIQ; in subjects with high VIQ, the

directionality of group difference even reversed for some ventricular volumes; see Figure 4.5. This

result suggests that the ventricle volumes may be robust biomarkers in the low and normal VIQ

population but certainly not for the high VIQ population. Thus, brain biomarkers identified in

distinct sub-groups will be more robust and insightful and hence more helpful to understand the

neuroanatomical basis of the ASD heterogeneity.

In addition to the sub grouping of the ASD population, we propose adding the DB

measures of the subjects with the MRI features to train multivariate machine learning models. This

data-driven automatic approach would help to identify the relationship of DB measures with the

multi-variate brain morphometry and hence provide better insights on brain alterations in ASD.

For example, identifying the robust relationship between age and multi-variate patterns in brain

morphometry would be highly valuable to the understanding of the pathogenesis of ASD.

Eventually, this could make MRI a powerful tool for early detection of ASD.

90
Left.Lateral.Ventricle_volume X3rd.Ventricle_volume
Left.Inf.Lat.Vent_volume X4th.Ventricle_volume
Left.choroid.plexus_volume X5th.Ventricle_volume
Right.Lateral.Ventricle_volume TotalVentricular_volume
Right.Inf.Lat.Vent_volume EstimatedTotalIntraCranialVol_volume
Right.choroid.plexus_volume

Statistical Significance (p<0.05) FALSE TRUE

VIQ
1.5

1.0
ASD - TDC effect size (Cohen's d)

0.8

0.5

0.3

0.2

0.1

0.0

-0.1

-0.2
75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150
Sub-groups

Figure 4.5: ASD vs. TDC group difference in ventricular volumes decreases with verbal IQ
(VIQ)
The ventricular volumes were normalized by total intracranial volume (TIV). Across all subjects, except 3rd
and 4th, all ventricles and TIV were larger in ASD. When the subjects were sub-grouped by VIQ, the group
differences were the largest in the low-VIQ sub-group but decreased with VIQ. For some ventricular, the
direction of group difference even flipped in the high-VIQ sub-group i.e. volumes were larger in TDC for
some ventricles.

91
4.4.5.2 Need for better features and better methods

The morphometric features estimated using the preprocessing tools may not accurately

reflect the underlying morphology for several reasons. First, T1-weighted MRIs do not have

enough information to distinguish between all the brain structures. For example, they do not have

enough contrast between inter-sulcal CSF and skull. Second, preprocessing tools make several

assumptions which do not always hold in all subjects, leading to erroneous estimates. For example,

Morey et al. (2009) showed that the correlation of the amygdala volume from manual tracing with

the volumes estimated from FS and FSL were only 0.56 and 0.24 respectively. This suggests that

the uncertainty or the noise added during the automatic extraction of brain features is one of the

major obstacles to automatic ASD detection using brain morphometry. So, the development of

better automatic preprocessing tools is needed. In addition, preprocessing tools can be optimized

to utilize data fusion techniques to decrease the uncertainty in the estimation of brain features. For

example, T1 and T2-weighted images contain complementary information which can be utilized

for more robust estimation of brain features.

In this study, we utilized regional-level features i.e. values describing the morphometric

features of the brain regions. There may be hindrances in achieving high classification performance

through the exploration of these features, such as the need for impractically large sample sizes,

suggesting directions for future work. Features from different spatial scales can be unified to

improve the predictive performance of the classification using MRI. In addition, shape features

can be incorporated in the classification model. Moreover, hierarchical architecture such as

convolution neural network (CNN) (LeCun et al., 2015) can be applied for automatic identification

and extraction of features. This may help to improve the predictive performance and may also

provide better insights into the neuroanatomical underpinnings of ASD due to the hierarchical

92
nature of its features. Additionally, it would avoid the use of preprocessing tools and hence the

uncertainty associated with them.

4.5 Conclusion

This study demonstrated that brain alterations in ASD are highly heterogeneous, and the

heterogeneity makes the understanding and diagnosis of ASD using brain morphometry a

challenging problem. We showed that the heterogeneity can be mitigated when the demographics

and behavioral (DB) measures such as autism severity, VIQ and age are utilized in conjunction

with brain morphometric features, hence making the problem of understanding and diagnosis of

ASD easier. Utilizing DB measures, the ASD vs. TDC classification success rate was significantly

improved. Focusing on relatively homogenous sub-groups of ASD by sub-grouping the subjects

according to autism severity, VIQ and age, interesting and valuable relationships between these

DB measures and brain morphometry were observed. The heterogeneity in ASD brain

morphometric differences demonstrated in this study explains the inconsistent neuroanatomical

findings in ASD and the low classification success in large multi-site data. The results of this study

suggest that identifying brain biomarkers in relatively homogenous sub-groups defined by different

measures such as age, IQs, severity, sex, handedness etc., compared to identifying markers across

the whole ASD population, is easier and provides better insights on neuroanatomical

underpinnings of ASD.

93
5 CHAPTER 5: EARLY DETECTION OF AUTISM

USING BRAIN MORPHOLOGY

A comprehensive investigation of early brain alterations in ASD is critical for

understanding the neuroanatomical basis of ASD and for establishing methods for early diagnosis.

Most previous brain imaging studies in ASD, however, are based on children older than 6 years—

well after the median age of ASD diagnosis (46 months). In this study, we use brain images that

were collected as part of routine clinical scans from patients who were later diagnosed with ASD.

Using 15 subjects with ASD and 18 control (CTR) subjects of age 3 to 4 years, we perform a

comprehensive comparison of different brain morphometric and image intensity features. We find

that, although TIV of ASD was 5.5% larger than CTR, brain volumes of many other brain areas

(as a percentage of TIV) were smaller in ASD and can be partly attributed to larger (>10%)

ventricles in ASD. The folding indices of 58 of 68 cortices were higher in ASD indicating increased

gyrification. The larger TIV in ASD was related to larger surface area and increased amount of

cortical folding but was relatively independent of cortical thickness. Further, predominately in the

frontal and temporal regions the white matter was less bright suggesting myelination deficit. We

achieved 95% AUC in ASD classification using all brain features. When the classification was

performed separately for each brain feature type, image intensity yielded the highest predictive

power (95% AUC), followed by folding index (69%), volume (69%), and surface area (68%). The

high degree of success in prediction indicates that application of advanced analytic methods on

brain features holds promise for aiding early identification of ASD. To our knowledge this is the

first study to leverage a clinical imaging archive to investigate early brain markers in ASD.

94
5.1 Introduction

Early detection of ASD is important for three main reasons. First, early detection allows for

the application of early intervention methods. It has been shown that early intervention is effective

in reducing the impact of impairments (Dawson et al., 2010) and may result in more positive long-

term outcomes for the child (Pickles et al., 2016; Rogers & Vismara, 2010). Second, the early

detection of ASD can improve our knowledge of ASD etiology by separating out the effects of post-

natal environmental risk factors of ASD. The risk for ASD is influenced by genetic and pre, peri,

and post-natal environmental factors (Sandin et al. 2014). The environmental factors related to the

risk for ASD can interact with genetic factors making the identification of etiology of ASD

extraordinarily complex. So, if ASD can be detected early very few post-natal risk factors would

come into play and hence easier to understand its etiology. Third, ASD detection may be relatively

easier at younger age because of the less complex ASD etiology and manifestations.

Currently ASD diagnosis is based on a clinical assessment of the individual's behavior and

intellectual abilities. This approach is limited, however, as early diagnosis is not straightforward.

According to a recent report by CDC (2014), the median age of ASD diagnosis is 46 months. In

addition, behaviorally based diagnosis procedures can be subjective, time consuming, and

inconclusive due to factors such as comorbidity (Close et al., 2012). In addition, such an approach

does not provide insight into the neural underpinnings and the underlying etiology of ASD since

it is based only on behavioral symptoms. MRI is a non-invasive tool widely used to capture brain

morphometry. ASD diagnosis based on MRI can be objective and can potentially be utilized even

at the prenatal and neonatal stage (Glenn, 2010) and hence can be a useful tool for brain biomarker

discovery for early detection of ASD. MRI has already been successfully utilized for early diagnosis

of brain disorders such as Alzheimer’s (Frisoni et al., 2010; Hampel et al., 2010).

95
Most MRI studies that investigate brain alterations in ASD have used subjects older than

6 years; very few studies’ participants are in early childhood (<4 years) (Auzias et al., 2014b; Hazlett

et al., 2017; Nordahl et al., 2012). Of the studies using subjects less than age 4 years, only a small

subset of morphometric features such as volume and shape have been investigated. As a result,

there is a significant knowledge gap in brain imaging research on early brain alterations of ASD.

A comprehensive investigation of brain alterations in early childhood population with ASD is

needed to better characterize this disorder and merits special attention due to the importance of

early detection of ASD.

In this study, we compare brain features of ASD subjects in early childhood (3 to 4 years)

to non-ASD (CTR) subjects of the same age group using a comprehensive set of morphometric

and intensity features of brain cortical and sub-cortical structures. Multi-variate brain anomalies

are identified in a purely data-driven fashion by machine learning models trained for ASD vs. CTR

classification using the morphometric and intensity features.

5.2 Material and Methods

5.2.1 Subjects

We used clinical imaging archive of Geisinger Health System in this study. Initially, we had

access to the brain images of 413 CTR and 167 ASD subjects who were less than 4 years of age.

In total, there were 777 coronal 3D T1 images (631 from 413 CTR and 146 from 167 ASD); note

that some subjects did not have coronal 3D T1 images. The flowchart of subject selection process

is presented in Figure 5.1. We performed a strict data quality check to remove images with imaging

artifacts, motion, lesions, and abnormally large ventricles. The remaining images (247 from CTR

and 122 from ASD) were processed using FS recon-all workflow with default settings. FS

segmentation failed for 12 images (5 from CTR and 7 from ASD) and were excluded from the

96
study. We then removed 120 images of CTRs mental disorders identified using ICD 9 codes. This

resulted in a total of 112 images from CTR and 115 from ASD subjects. Since ASD is highly

prevalent in males (Fombonne, 2005) and also to avoid gender confounds (Lai et al., 2013), 54

female subjects (20 CTR, 34 ASD) were excluded.

The lower threshold for age was decided based on the trade-off between the need to

investigate the brain morphological alterations in ASD as early as possible and the applicability of

FS on the brain images of very young subjects. FS which was initially designed for adult subjects

may not provide truthful results for subjects younger than 3 years. Images of subjects as young as

3 years have been successfully segmented by FS (Retico et al., 2016). So, we decided to use the

subjects as young as 3 years. The upper threshold was chosen as 4 years based on the trade-off

between the age dependent brain alterations in ASD (Lin et al., 2015) and sample size

requirements. Finally, we used 41 images (23 from CTR, 18 from ASD) from male subjects of age

3 to 4 years.

Figure 5.1: Subject Selection Flowchart


Initially we had access to 777 coronal 3D T1 images from Geisinger Health System clinical archive; note
that some subjects did not have coronal 3D T1 images. After excluding inferior quality images, images for
which the segmentation failed, controls with mental disorders, female subjects, subjects outside the age
group of 36 to 48 months, 33 images were retained at the end and were used for further analyses.

97
5.2.2 Brain Features Extraction

Brain features were extracted using the recon-all workflow of FS v. 5.3.0 (Dale et al., 1999a;

Fischl et al., 1999a, 2002; Ségonne et al., 2004). The volume of 40 sub-cortical structures from Aseg

atlas (Fischl et al., 2002) were extracted. Similarly, volume, surface area, Gaussian curvature, mean

curvature, folding index, curvature index, thickness mean, thickness standard deviation (std.),

intensity mean, and intensity std. of 34 cortical structures from Desikan-Killiany atlas (Desikan et al.,

2006) were extracted. In total, 687 brain morphometric and intensity features were derived for

each image. Intensity mean and intensity std. of a brain structure is the mean and standard

deviation respectively of the voxel intensities within a certain brain structure. The volume features

of each subject were normalized by TIV since relative volumes are easy to compare and are more

robust against scanner effects (Takao et al., 2011). To eliminate the effects of image intensity biases,

intensity and intensity standard deviation features of each image were normalized by their

respective sums across all the brain structures.

The cortical and sub-cortical segmentation results of FS were assessed using the ENIGMA

protocols (http://enigma.ini.usc.edu/protocols/imaging-protocols/). Five CTR images and one

ASD image with imperfect segmentations were excluded from the study. After that, two subjects

had two images each and the image with lower quality was excluded for each subject. The

remaining 18 CTR and 15 ASD images were used in the subsequent analyses. Finally, a group of

15 males with ASD (42.6 ± 3.5 months; age range = 37.7 to 47.3 months) and 18 CTR males (41.4

± 2.9 months; age range = 37.0 to 47.0 months) were used in this study. There was no difference

in the age distribution of ASD and CTR subjects (Kolgmogorov-Smirnov statistic = 0.23, p-value

= 0.76).

98
5.2.3 Univariate Analysis

The statistical significances of ASD vs. CTR brain feature differences were estimated using

two sample t-tests. For each feature type, multiple comparisons correction was performed using

false discovery rate (Benjamini & Hochberg, 1995). The effect size of the differences were

quantified by Cohen’s d (Baron-Cohen et al., 2000).

5.2.4 Multivariate Analysis: ASD vs. CTR Classification

ASD vs. CTR classification was performed by RF (Breiman, 2001) classification models

trained with brain morphometric and intensity features. The classification flow chart is presented

in Figure 5.2. At first, the models were built with all 637 brain features. In addition, separate models

for each feature type were trained. Area under the ROC curve (AUC) was used to measure the

success of classification and the average classification success was estimated by 5-fold cross-

validation with stratified folds. Scikit-learn 0.17.1 (Pedregosa & Varoquaux, 2011) was used to

perform all the multivariate analyses.

‘Information gain’ measure was used to measure the quality of a node split while growing

decision trees. The following hyper parameters were used for optimal model selection: max_depth,

the number of features used to judge the quality of the impurity of a node, min_samples_split, the

minimum number of samples required in a node for further splitting, and min_samples_leaf, the

minimum number of samples required to be at a leaf node. Hyper parameter optimization was

performed through random search (Bergstra & Bengio, 2012).

99
Figure 5.2: Classification Flowchart.
Classification flowchart. Independent experiments were performed for each feature type. For each
classification experiment, the model success was quantified by AUC metric and the average model
assessment was estimated by 5–fold cross-validation. The 5-folds were generated by stratified sampling to
ensure dissimilarity between the folds. Each classification was performed using RF classifier where the node
purity was quantified by ‘Information Gain’. The optimum hyper parameters (max_depth, min_samples_split,
min_samples_leaf) were estimated using random search. Feature importance scores across the 5 folds were
averaged.

5.3 Results

5.3.1 ASD vs. CTR Brain Differences

The brain features for which ASD vs. CTR differences were statistically significant (p <

0.05, uncorrected) are presented in Figure 5.3. Effect sizes of all the differences were moderate to

large (Cohen’s d > 0.5); see Figure 9.1 in Appendix C.

100
Figure 5.3: Statistically Significant Brain Alterations.
Brain features with statistically significant differences (at p =0.05, uncorrected for multiple comparisons).
The numeric text near the points denote p-values.

Area: Statistically significant area features were mainly located in frontal, temporal,

supramarginal, and posterior cingulate regions. Except for the area of the left temporal pole, the

other 8 statistically significant area features were larger in ASD. The thickness mean of entorhinal

gyrus was greater in ASD and the effect size was large (d > 1; p = 9E-4) whereas the thickness std.

101
of the right entorhinal gyrus was smaller in ASD (d = 0.75; p = 0.035). Similarly, thickness std. of

the right superior frontal, fusiform, and caudal middle frontal gyri were larger in ASD (d > 0.60).

The total white surface area (GM/WM surface) was 7.2% larger in ASD without statistical

significance (d = 0.53, p = 0.14).

Volume: Most of the volumes with statistically significant differences were smaller in ASD

except for the right postcentral, supramarginal, and 5th ventricle volumes. However, one should

note that these reported volume features have been normalized by TIV. Raw volumes of these

structures were in fact larger in ASD but the differences were not statistically significant. This

discrepancy is mainly because TIV in ASD was larger than in CTR by 5.5% (d = 0.4, p = 0.17);

see Figure 5.4. Larger TIV in ASD was mainly due to the larger ventricles. All ventricles were

larger (>10%) in ASD, in particular the 5th ventricle was 290% larger in ASD (p = 0.015). Total

ventricular CSF volume was 27.89% larger in ASD (d = 0.38, p = 0.28) and after normalizing by

TIV, it was 19.14% larger (d = 0.29, p = 0.42). All global raw volumes were larger in ASD, but

when normalized by TIV, most of them were smaller in ASD. After normalizing, cerebral GM

was slightly larger in ASD whereas cerebellum GM was 5% smaller in ASD (d = -0.45, p = 0.20)

and cerebral WM was 2.1% smaller in ASD (p = 0.57) whereas cerebellum WM was 3.5% larger

in ASD (p = 0.6).

Folding index and curvature index. The folding index and curvature index features

with statistically significant differences were from frontal, temporal, cingulate, postcentral, and

precuneus regions and all were larger in ASD. Folding indices of 58 out of 68 cortices were higher

in ASD. The average amount of cortical folding (average of the folding indices of the cortices) was

12.7% greater in ASD with statistical significance (ASD = 62.2, TDC = 55.2, d = 0.71, p = 0.05).

The differences in curvature index of the left posterior cingulate gyrus and the folding index of the

102
right precuneus gyrus were large (d >1). Similarly, the Gaussian curvature of the left posterior

cingulate gyrus and left pericalcarine gyri were larger in ASD.

Intensity. In image intensity mean features, there were two distinct type of differences—

in general, intensity mean in WM regions were greater in ASD while intensity mean in sub-cortical

regions were lower in ASD. Intensity mean of the WM structures near frontal, supramarginal,

precuneus, precentral, and pars opercularis gyri were lower in ASD. In contrast, intensity mean of

the right thalamus, left caudate, left putamen, and left accumbens were higher in ASD. In general,

intensity std. of WM near gyri was higher in ASD whereas, intensity std. in cerebellum WM and

corpus callosum (not significant) were smaller in ASD.

103

40 d=0.43
p=0.23


30 d=0.34
p=0.34 ●
d=0.38
p=0.28

volume type
● ●
d=0.32 ● raw
ASD−CTR %

20 d=0.26
p=0.37 ● ●
p=0.46
d=0.5
d=0.29 ● normalized
p=0.42
p=0.16

● d=0.24
● ASD > CTR
d=0.17 p=0.51 ●
10 ● p=0.63
d=0.31 ● CTR > ASD
d=0.43 p=0.38
p=0.22
● ●
d=0.49 d=0.51 ●
p=0.17 p=0.15 d=0.41 ●
● d=0.18 ●
d=0.21
p=0.25 d=0.19
0 ● ● p=0.59 p=0.6 p=0.54
d=0.09 d=0.02 ●
p=0.8 p=0.96 d=−0.14 ● ●
d=−0.17 ●
p=0.69 d=−0.2 d=−0.21
p=0.57 p=0.63
● p=0.56
d=−0.45
p=0.2 ●
d=−0.61
−10 p=0.09
e

e
m

um

um

um

um

um

m
lu

lu

lu

lu

lu

lu

lu

lu
ol

ol

ol

ol

ol
vo

vo

vo

vo

vo
_v

_v

_v

_v

_v
__

__

__

__

__

__

__
_

r_

e_

_
IV

la

le

le

le

SF
te

te

te

te

te

te

cl
da

r ic

ric

ric
eT

at

at

at

at

at

at

tr i

−C
nt

nt

nt
yg
M

−M

en

ar
Ve

Ve

Ve
y−

y−

y−

te

Am

−V
te

te

ul
hi

d−

h−
ra

ra

ra

hi

hi

ric
l

al
ft−
lW

ra
−G

−G

3r

4t
−W

er

nt
te
l−

Le
ra

l−

at
l

ve
La
ra

ta

eb

ta

−L
llu

To

l−
eb

lu

ft−
er

To
e

ta
ht
el
er

C
eb

Le

To
ig
eb
C

er

R
er
C

Figure 5.4: Global Volume Features


Global Volume Features. TIV was larger in ASD and was mainly due to abnormally large ventricles.
Normalized global volumes were smaller in ASD.

5.3.2 ASD vs. CTR Classification

ASD vs. CTR Classification AUC scores for each feature type are presented in Figure 5.5.

The point and error bar represent the mean and standard deviation of the AUC scores from 5 test

folds.

104
Figure 5.5: ASD vs. CTR Classification Scores.
Mean classification AUC for all brain features and each feature type across 5 folds of testing. The error bar
represents the standard deviation of the AUC scores of 5 folds.

An average AUC of 0.92 was achieved under 5-fold cross-validation when all brain features

were used. When classification was performed separately for each brain feature type, we found that

most of the predictive power came from intensity mean (AUC = 0.83), folding index (AUC=0.69),

volume (AUC=0.69), and area (AUC=0.69) features. When intensity mean and intensity std.

features were used together, AUC of 0.95 was achieved. Thickness features yielded near chance

classification success rates.

5.3.3 Important Features for ASD vs. TDC Classification

The important features in classification, only for the feature types which yielded high

AUCs—intensity mean, intensity std., folding index, volume, and area—are presented. See Figure

5.6 for important brain features when all features were used for classification (ALL). See Figure 5.7

for important brain morphology features, and see Figure 5.8 for important brain intensity features.

After sorting the importance scores of the features in descending order, the features whose

cumulative sum of scores was at least 50% of the total importance scores were deemed as important

for classification. The length of the bar corresponding to a feature is proportional to its contribution

105
for the classification, relative to the most important feature. The numbers at the left of the bars

represent ASD-CTR Cohen’s d and the stars represent statistical significance at 0.05 (*) and 0.01

(**).

In classification using all features, the most important feature was the intensity mean of the

WM neighboring the rostral middle frontal gyrus. Intensity means of WM connecting left temporal

pole and right caudal middle frontal gyrus, were also important for classification and were smaller

in ASD. Other important features were volumes of the left acumbens-area, the right ventral

diencephalon, the left and right putamen, the left inferior temporal gyrus, etc.

Volume. In classification using volume features (Figure 5.7a), volumes of the left putamen,

left inferior temporal gyrus, and left accumbens-area were important for classification and all were

smaller in ASD. One should note that these volume features have been normalized by the TIV of

respective subject.

Area. In classification using area features (Figure 5.7b), the surface areas of the left middle

temporal gyrus, left posterior cingulate gyrus, right bank of superior temporal sulcus (bankssts), and

right rostral middle frontal gyrus were important for classification. All important area features were

larger in ASD.

106
Figure 5.6: Important brain feature for ASD vs. CTR classification
The length of the bar corresponding to a feature is proportional to its contribution towards classification.
The numbers at the left of the bars represent ASD-CTR Cohen’s d (positive is represented by red) and the
stars represent statistical significance at 0.05 (*) and 0.01 (**).

Folding index. In classification using folding index features (Figure 5.7c), the left middle

temporal gyrus was the most important structure for classification. It was also the most important

structure while performing classification using area features. Other important structures were left

middle temporal, left rostral middle frontal, and left inferior parietal. All important folding index

features were larger in ASD.

107
Figure 5.7: Important morphology features for ASD vs. CTR classification.
The length of the bar corresponding to a feature is proportional to its contribution towards classification.
The numbers at the left of the bars represent ASD-CTR Cohen’s d (positive is represented by red) and the
stars represent statistical significance at 0.05 (*) and 0.01 (**).

108
Figure 5.8: Important intensity features for ASD vs. CTR classification.
The length of the bar corresponding to a feature is proportional to its contribution towards classification.
The numbers at the left of the bars represent ASD-CTR Cohen’s d (positive is represented by red) and the
stars represent statistical significance at 0.05 (*) and 0.01 (**).

Intensity mean: In classification using image intensity mean features (Figure 5.8a), the

most important feature was the intensity mean of the WM neighboring rostral middle frontal gyrus

109
and compared to CTR, it was lower in ASD (d = 0.77, p = 0.04). Note that the same region was

the most important feature in the classification using all features. The intensity means of the WM

neighboring the right caudal middle frontal, left temporal pole, and the right pars opercularis were

also important for classification and were smaller in ASD. Other important brain structures were

the left accumbens-area, WM hypo-intensities, and right vessel.

Intensity std. In classification using image intensity std. features (Figure 5.8b), the most

important feature was the intensity std. of the WM hypo-intensities. Most of the important brain

features were from WM regions.

5.4 Discussion

High ASD vs. CTR classification success using brain morphometric and intensity features

were achieved in this study. The important features for classification were mainly from frontal and

temporal regions and these regions have been consistently associated with ASD (Bigler et al., 2007;

Ha et al., 2015). Most of the discriminative or predictive power for classification came from the

intensity features followed by folding index, volume, and area. In summary, three main brain

alterations in ASD were noted: abnormally large ventricles, higher gyrification, and less intensity

mean in WM of frontal and temporal region. This is the first study to perform ASD vs. CTR

classification in a very young population (3 to 4 years) using comprehensive brain features and to

achieve high classification success rates.

Very few studies have performed similar investigations to identify brain alterations in ASD.

Xiao et al. (2016) achieved an AUC of 0.88 while classifying ASD vs. subjects with developmental

delay using cortical thickness features. However, in our study thickness features yielded near

chance classification success. This may be due to the fact that their comparison control group had

developmental delay.

110
5.4.1 Amygdala in ASD

The amygdala is one of the most studied brain structure in relation to ASD, mainly

motivated by the amygdala theory of ASD (Baron-Cohen et al., 2000). Our study found smaller

amygdala volume in ASD without statistical significance: left(1%), right (2.8%) for raw volumes

and left (7.5%), right (5.9%) for normalized volumes. Nordahl et al. (2012) has reported the

opposite, the amygdala being 6% and 9% larger in ASD during 2 to 4 years.

5.4.2 Early brain overgrowth in ASD

One of the most replicated findings in ASD is that toddlers with ASD (age 2–4 years) on

average have a larger head size than TDC (Carper et al., 2002; Courchesne et al., 2011b, 2011a;

Campbell et al., 2014; Hazlett et al., 2012). A recent study by Hazlett et al. (2017) also reported

5.5% larger TIV in high risk ASD compared to negative high risk ASD at the age of two years.

Our study also shows 5.5% larger TIV and 7 % in larger cortical surface area in ASD.

The larger TIV in ASD in our study was mainly due to the larger ventricular volumes in

ASD (27.9 % raw, 19.1% normalized) whereas other normalized global volumes were smaller in

ASD. Padilla et al. (2015) has also reported smaller total volumes of temporal, occipital, insular,

and limbic regions in ASD after adjusting for total brain volume.

5.4.3 Overgrowth is related to increase in cortical surface area but not thickness

In our study, cortical surface area was 7.2% larger in ASD compared that of controls. Eight

out of nine area features with statistically significant differences were larger in ASD. However,

there was only one statistically significant thickness feature. In addition, area features yielded an

AUC of 69% whereas thickness features yielded near chance classification success. A study by

Hazlett et al. (2011) has also reported no differences in cortical thickness but larger surface area in

ASD at both 2 years and 4.5 years. Moreover, a recent study by Hazlett et al. (2017) has also

111
reported 7% larger cortical surface area in ASD but no difference in cortical thickness.

Importantly, they also report significant contribution of area features but insignificant contribution

of thickness features in ASD classification. These results suggest that the early brain overgrowth in

ASD brain is related with the increase in cortical surface area but is relatively independent to the

cortical thickness.

5.4.4 Higher cortical folding in ASD brains and its relation to early brain overgrowth

and cortical expansion

Folding indices of most of the gyri (58 out of 68) were greater in ASD and folding index

features yielded AUC of 0.69. The cortex of ASD brains were 12.7% more folded in average. This

suggests that ASD brains generally exhibit more gyrification and the amount of cortical folding is

a potential biomarker for early detection of ASD. Similar to our study, Ecker et al. (2016) have

reported that ASD individuals had a significant increase in gyrification around the left pre and

post-central gyrus. A study by Katuwal et al. (2016c) reported that the folding index features from

frontal, temporal, lingual, and insular regions were important in ASD vs. CTR classification for

subjects aged 6-12 years old. Similarly, a study by Auzias et al. (2014) reported higher folding of

the right intraparietal, the left medial frontal, and the left central sulci in children with ASD.

The higher amount of cortical folding in ASD may be the aftereffect of early brain

overgrowth in ASD toddlers. We hypothesize that higher folding in ASD could be due to larger

compressive stress in the cortex produced by the tangential hyper-expansion of the cortical layer.

We also hypothesize that the abnormally larger ventricles in ASD induces additional compressive

stress in the cortex and hence more cortical folding. A popular hypothesis on brain cortical

gyrification is that the tangential expansion of the cortical layer relative to sublayers generates a

compressive stress, leading to the mechanical folding of the cortex (Tallinen et al., 2014, 2016); see

112
Figure 5.9. This hypothesis has been substantiated by both physical and numerical models of the

brain and has shown that gyrification increases with brain growth (Tallinen et al., 2016).

Figure 5.9: Cortical expansion theory on cortical gyrification


The tangential expansion of the cortical layer relative to sublayers (white matter) generates a compressive
stress in cortex (GM), leading to its mechanical folding (Tallinen et al., 2014, 2016)

A recent study by Hazlett et al. (2017) which reported very similar findings to our study in

TIV and total surface area of the cortex, also reports that the rate of cortical surface area expansion

significantly increased in individuals with ASD from 6 to 12 months. Most importantly, they also

report the association between the subsequent brain overgrowth and the emergence of social

deficits. These results support our understanding that the hyper-expansion of the cortex and

subsequent brain overgrowth is related to ASD. In this study, in addition to relating hyper-

expansion of the cortex to ASD, we also find that the hyper-expansion may be associated with the

increased cortical folding we observe in ASD. We found that there was a correlation of 0.68 (p =

0.005) for ASD and 0.78 (p = 0.0002) for TDC between the average amount of cortical folding

and total surface area of the cortex. Similarly, there was a correlation of 0.53 (p = 0.04) for ASD

and 0.65 (p = 0.03) for TDC between the average amount of cortical folding and TIV.

113
5.4.5 Larger Ventricles and extra-axial volume in ASD may cause additional cortical

folding

We suggest that the increase in cortical folding may not only be due to the cortical hyper-

expansion; see Figure 5.10. In addition to the hyper-expansion, the abnormally larger ventricles in

ASD produce additional mechanical stress in the cortex, thereby introducing additional folds

according to the model proposed by Tallinen et al. (2014). In this study, we note that the correlation

between average amount of cortical folding and total ventricular CSF was 0.56 (p = 0.02) for ASD,

but was -0.13 (p = 0.6) for TDC. The statistically significant positive correlation in ASD suggests

the possibility of some degree of cortical folding may be due to the larger ventricles. In contrast,

the non-significant correlation in TDC suggests that the cortical folding is not affected by the

ventricles because they are of normal sizes and hence do not extend additional compressive stress

to the cortex.

Furthermore, it is possible that the larger volume of extra-axial fluid (CSF in the sub-

arachnoid space) in ASD also contributes to the compressive stress in the cortex and hence more

folding. A study by Shen et al. (2013) has reported that extra-axial fluid in ASD was 25% more

than in low-risk typical infants at the age of 6 to 24 months. In our study, we could not perform

the analysis for the extra-axial fluid volume because FS does not output the measure since T1 MRI

does not contain enough information to discriminate between CSF and air as both appear dark.

114
Figure 5.10: Higher cortical folding in ASD brain.
Higher amount of cortical folding in ASD brain may be the aftereffect of the hyper expansion of cortex,
larger ventricles, and more extra-axial volume.

5.4.6 WM in frontal and temporal regions less myelinated in ASD

Image intensity mean of the WM neighboring frontal and temporal regions were smaller

in ASD and some of these features were the most important for classification. Less image intensity

in WM means it is less bright suggesting less myelination surrounding the axons. In addition, an

AUC of 0.83 was achieved with intensity mean features and an AUC of 0.95 was achieved

combining intensity mean and intensity std. features. Most importantly, most of the important

features in all of these cases were WM neighboring frontal and temporal regions. This suggests

myelination deficits in frontal and temporal regions may be a potential early brain marker of ASD.

Several previous studies have also reported WM myelination deficits in ASD (Croteau-

Chonka et al., 2016; Peters et al., 2012; Zinkstok et al., 2012). Zinkstok et al. (2012) reported that

individuals with ASD had significantly less myelin content in numerous brain regions and WM

tracts. In addition, they have reported that the observed myelination deficit increased with autism

115
severity. Similarly, Peters et al. (2012) have reported myelination deficits in WM in autistic brains

of age 0.5 to 25 years. A study by Lazar et al. (2014) has reported that in ASD males of 18 to 25

years old, the axonal water fraction (a measure of axonal caliber and density) and intra-axonal

diffusivity were significantly lower and were associated with reduced processing speed of the brain.

All of these results suggest that brain disconnectivity resulting from insufficient development of the

myelin sheath may be one of the underlying causes of ASD and myelination deficits in frontal and

temporal regions in particular, may be a potential marker for early detection of ASD; see Figure

5.11.

Figure 5.11: Inefficient communication in ASD brain.


Insufficient development of the myelin sheath in frontal and temporal regions of ASD brain may cause
inefficient communication.

5.4.7 Brain overgrowth in ASD is followed by arrested growth and even degeneration

Several studies have reported that the early brain overgrowth in ASD might be followed

by arrested brain growth and even degeneration (Courchesne et al., 2011b; Schumann et al., 2010).

Our results also show similar results; see Figure 5.12a. From 3 to 4 years, TIV of ASD slightly

decreases to stabilize at the normal adult brain volume of 1.5 L whereas TIV of TDC continues to

116
increase. The limited sample size of our data does not have enough statistical power to discriminate

if TIV in ASD goes through arrested growth or degeneration or a combination of both. However,

from our results, we can certainly see that a normal brain continues to grow whereas the growth

rate of an autistic brain significantly decreases.

We noticed a similar pattern in the total cortical surface area with age; see Figure 5.12b.

The surface area in ASD remains somewhat constant from the age of 3 to 4 years whereas the

surface area in TDC continues to increase. Interestingly, the total ventricular volume decreases

steeply in ASD from the age of 3 to 4 years whereas remains constant in TDC; see Figure 5.12c.

Similarly, the average amount of cortical folding decreases in ASD but remains constant in TDC;

see Figure 5.12d. One important observation here is the decrease of the cortical folding in ASD

even when both cortical surface area and TIV are somewhat constant. However, the total

ventricular volume decreases during the time course and this decrease in ventricular volume may

be a reason behind the decrease in cortical folding. This pattern corroborates our hypothesis that

the larger ventricles in ASD induce additional compression stress in the cortex forcing it to fold

more. Moreover, it can be noticed that the amount of cortical folding in TDC remains constant

even when both TIV and cortical surface area increase. This may be due to the fact that the cortical

folding of a normal brain, caused by the compressive stress of standard brain growth, is already

stabilized by the age of 3 years.

117
Figure 5.12: Arrested brain growth in ASD
a) TIV of ASD slightly decreases to stabilize at the normal adult brain volume of 1.5 L whereas TIV of
TDC continues to increase. b) The surface area in ASD remains somewhat constant from the age of 3 to 4
years whereas the surface area in TDC continues to increase. c) The total ventricular volume decreases
steeply in ASD from the age 3 to 4 years whereas remains constant in TDC. d) The average amount of
cortical folding decreases in ASD but remains constant in TDC. The cortical folding in ASD decreases
even when both cortical surface area and TIV are somewhat constant. This decrease in cortical folding is
most likely due to the decrease in the total ventricular volume.

5.5 Conclusion

118
Using brain features derived from MRI, we were able to classify ASD from CTR subjects

with high accuracies. We also identified three potential brain markers for early detection of ASD:

larger ventricles, higher amount of cortical folding, and myelination deficits particularly in frontal

and temporal regions. Further, we were able to show that higher cortical folding in ASD brains

may be an aftereffect of early brain overgrowth and the additional compression in the cortex due

the abnormally large ventricles. In addition, we showed that the growth of autistic brain

significantly decreases after the age of 3 years.

119
6 CHAPTER 6: CONCLUSION AND FUTURE

WORK

In this thesis, we sought imaging biomarkers for ASD detection by applying machine

learning on brain morphological featues captured by MRI data. We started by pointing out the

drawbacks of currently used behavioral based ASD diagnosis. We then discussed how MRI based

ASD detection can provide a better alternative for ASD detection because it is objective, can be

utilized for earlier diagnosis, and will facilitate better understanding of the neuroanatomical basis

of ASD. After that, we presented the current literature in ASD brain morphometry captured using

MRI. We pointed out two major problems in MRI based studies: inconsistent brain anatomical

findings in ASD and low ASD classification success in the large multisite ABIDE dataset. We

attributed the inconsistent findings and low classification success to methodological differences and

ASD heterogeneity; and then investigated each of these factors one by one. In Chapter 3, we

investigated the effect of methodological differences, particularly the choice of brain image

processing tools, on the final results. We found that the brain anatomical findings were affected by

the choice of tools to a great extent. We concluded by suggesting the improvement of the image

processing tools, cross-validating the findings across tools, and the use of machine learning to

capture the multi-variate patterns. In Chapter 4, we investigated the heterogeneity in ASD brain

morphology and its effects on ASD classification. We found that the ASD brain morphology is

highly heterogeneous and the heterogeneity can be mitigated and hence ASD classification

improved by utilizing the additional information from demographics and behavioral measure. We

concluded that investigating brain markers of ASD in more meaningful sub-groups is easier and

more insightful than across the whole spectrum. Finally, in Chapter 5, we were able to identify

120
biologically plausible brain markers for early detection of ASD and were able to detect ASD at the

age of 3 to 4 years with greater than 90% AUC.

6.1 Contributions

The main contributions of this thesis and their implications are discussed below.

6.1.1 We demonstrated MRI as a potential tool for ASD detection

Using brain images of both children and adult subjects, we demonstrated that MRI has a

potential to successfully detect ASD based on brain morphology. Applying machine learning on

MRI derived brain morphological features, data-driven potential brain markers were identified;

the identified markers are consistent with the current biological understanding of ASD. In addition,

we could successfully classify ASD subjects from non-ASD subjects in many cases. The results of

this thesis prove that applying machine learning on MRI data is a potential technique for early

detection of ASD and understanding its brain anatomical underpinnings.

6.1.2 We showed that the methodological differences are a cause of inconsistent brain

imaging findings

In Chapter 3, we showed that methodological differences, the differences in brain image

processing tools are a source behind the inconsistent brain anatomical findings in ASD and

neuroimaging in general. We found that the ASD vs. TDC group differences in brain tissue

volumes were highly dependent on the tools used to estimate the volume of these tissues. Form this

work which was published in Frontier in Neuroscience (Katuwal et al., 2016b), we made an effort to

caution the neuroimaging community about the effects of brain image processing tools and the

pitfalls of blindly using them. In this work, we concluded by suggesting the need for the

121
improvement of the image processing tools, cross-validating the findings across tools, and the use

of machine learning to capture the multi-variate patterns.

Improving the current brain image preprocessing tools to make them more accurate and

standardizing them is the best possible solution to remove the effects of methodological differences.

However, when the existing popular tools have to be used, we suggest investigating multi-variate

relationships in addition to univariate ones in brain alterations because multi-variate relationships

are more robust across methods and scanners. In this thesis, we have provided an example to

corroborate this view.

6.1.3 We showed that identifying brain biomarkers in sub-groups of ASD is easier

and more meaningful than across the whole spectrum

In Chapter 3 of this thesis, we demonstrated that the heterogeneity in ASD brain

morphometry can be mitigated by augmenting the information from demographics and behavioral

(DB) measures. We showed that the ASD classification can be improved significantly utilizing the

additional information from DB measures. We also demonstrated that identifying ASD brain

alterations in relatively homogenous sub-groups is easier and more insightful than across the whole

heterogeneous ASD spectrum. From this work, which was published in PLOS ONE (Katuwal et al.,

2016c), we were able to demonstrate how ASD includes a wide range of variations and each sub-

group might be distinct and might have to be treated separately. We showed that characterizing

the brain morphology of ASD across the whole spectrum is a very complex problem. This complex

problem can be solved by breaking it down to simpler sub-problems where each sub-problem

would be identifying the brain alterations in a sub type of ASD.

122
6.1.4 We identified potential brain markers for early detection of ASD

Using multiple features automatically extracted from brain images of young subjects (3 to

4 years), we achieved very high success rates (>90% AUC) in ASD vs. control classification. We

also identified three potential brain markers for early detection of ASD: larger ventricles, higher

amount of cortical folding, and myelination deficits particularly in frontal and temporal regions.

Further, we were able to show that higher gyrification in ASD brains may be an aftereffect of the

early brain overgrowth. Furthermore, we demonstrated that the higher amount of cortical folding

in ASD is due to the greater compressive stress in the cortex induced by both hyper-expansion of

the cortex and abnormally large ventricles.

To our knowledge this is the first study to leverage clinical imaging archives to investigate

early brain markers in ASD. The high degree of success in classification and the biological relevant

potential brain markers indicate that application of advanced analytic methods on brain features

holds promise for aiding early identification of ASD.

6.2 Future Work

6.2.1 Critical need for large longitudinal studies

Brain overgrowth that begins before two years of age is clearly one of the brain markers of

ASD. However, there have been very few longitudinal studies to investigate this overgrowth

(Courchesne et al., 2011a; Schumann et al., 2010; Hazlett et al., 2017). The sample sizes of these

studies are limited, especially considering the heterogeneity of ASD. The question of when

overgrowth starts, its underlying neurobiological causes, and its mechanical effects in the anatomy

of the brain is overarching. In addition, it is important to find the causal association between the

brain overgrowth and the autistic symptoms. We therefore see that a large longitudinal study,

123
preferably starting from the neonatal stage, is a future work with a very high importance. In

alignment with this critical need, Infant Brain Imaging Study (IBIS) (http://ibisnetwork.org/) is

collecting longitudinal brain scans of children at risk for ASD (i.e., younger siblings of older autistic

individuals) from 3 to 24 months of age.

6.2.2 Data-driven brain features

Most of the studies investigating the brain markers of ASD, including this thesis, have used

pre-defined sets of global brain features. It is likely that the set of these brain features might not

have enough information to discriminate sub-types of ASD and to classify ASD from the normal

population. First, our knowledge of ASD brain morphology is very limited and the findings on

ASD brain alteration have been highly inconsistent. Consequently, limited and/or misguided

information may inhibit us from identifying the pertinent brain features to characterize ASD.

Second, ASD is highly heterogeneous and the global brain features that are being used might be

too simplistic to effectively capture the brain alterations in ASD.

Data-driven brain features can contain much richer information and are objective. Deep

learning is one of the very promising techniques for the extraction of informative data-driven

features especially due to the fact that the extracted features are hierarchical in nature. In addition,

the use of deep learning directly on MRI scans also bypasses the problem of method dependent

brain features. Recently, there have been a few studies that have applied deep learning to MRI

data for survival prediction of ALS patients (van der Burgh et al. 2016), early detection of

Alzheimer’s (Liu et al. 2014), ASD (Hazlett et al. 2017), and other ailments.

Considering the same underlying methodology shared by the studies on different brain

disorders and the promise of data-driven techniques, I have initiated a personal project to develop

a unified framework to identify the brain markers of major brain disorders in data-driven fashion.

124
State-of-the-art machine learning techniques can be applied on brain scans of multiple modalities

in this proposed framework. This will allow us to detect and investigate the etiology and progression

of several brain disorders within a common framework. In addition, the framework will provide

an unbiased platform to investigate the comorbidities among the brain disorders. For detailed and

up to date information, please visit http://predictbrain.com.

125
BIBLIOGRAPHY

Alpaydin, E. (2010). Introduction to machine learning. MIT Press.

Amaral, D.G., Schumann, C.M. & Nordahl, C.W. (2008). Neuroanatomy of autism. Trends in

neurosciences. 31 (3). p.pp. 137–45.

American Psychiatric Association (2013). Diagnostic and Statistical Manual of Mental Disorders:

DSM-5. 5th Edn Arlington, VA: American Psychiatric Association.

Asbury, C. (2011). Brain imaging technologies and their applications in neuroscience. The Dana

Foundation. p.pp. 1–45.

Ashburner, J., Chen, C., Moran, R., Henson, R., Glauche, V. & Phillips, C. (2013). SPM8 Manual.

Ashburner, J. & Friston, K.J. (2005). Unified segmentation. NeuroImage. 26 (3). p.pp. 839–51.

Assaf, Y. & Pasternak, O. (2008). Diffusion tensor imaging (DTI)-based white matter mapping in

brain research: A review. Journal of Molecular Neuroscience. 34 (1). p.pp. 51–61.

Auzias, G., Breuil, C., Takerkart, S. & Deruelle, C. (2014a). Detectability of brain structure

abnormalities related to autism through MRI-derived measures from multiple scanners. In:

IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). June 2014, Ieee,

pp. 314–317.

Auzias, G., Viellard, M., Takerkart, S., Villeneuve, N., Poinso, F., Fonséca, D. Da, Girard, N. &

Deruelle, C. (2014b). Atypical sulcal anatomy in young children with autism spectrum

disorder. NeuroImage: Clinical. 4. p.pp. 593–603.

Baron-Cohen, S., Ring, H. a., Bullmore, E.T., Wheelwright, S., Ashwin, C. & Williams, S.C.R.

(2000). The amygdala theory of autism. Neuroscience and Biobehavioral Reviews. 24 (3). p.pp. 355–

364.

126
Bates, D., Mächler, M., Bolker, B. & Walker, S. (2014). Fitting Linear Mixed-Effects Models using

lme4. arXiv. 67 (1). p.p. arXiv:1406.5823.

Bauer, E. & Kohavi, R. (1999). An Empirical Comparison of Voting Classification Algorithms:

Bagging, Boosting, and Variants. Machine Learning. 36 (1/2). p.pp. 105–139.

Bazazeh, D. & Shubair, R. (2016). Comparative study of machine learning algorithms for breast

cancer detection and diagnosis. In: 2016 5th International Conference on Electronic Devices, Systems

and Applications (ICEDSA). December 2016, IEEE, pp. 1–4.

Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful

approach to multiple testing. Journal of the Royal Statistical Society. 57 (1) p.pp. 289–300.

Bergstra, J. & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. Journal of

Machine Learning Research. 13. p.pp. 281–305.

Betancur, C. (2011). Etiological heterogeneity in autism spectrum disorders: More than 100 genetic

and genomic disorders and still counting. Brain Research. 1380. p.pp. 42–77.

Bigler, E.D., Mortensen, S., Neeley, E.S., Ozonoff, S., Krasny, L., Johnson, M., Lu, J., Provencal,

S.L., McMahon, W. & Lainhart, J.E. (2007). Superior temporal gyrus, language function, and

autism. Developmental neuropsychology. 31 (2). p.pp. 217–38.

Le Bihan, D., Mangin, J.-F., Poupon, C., Clark, C.A., Pappata, S., Molko, N. & Chabriat, H.

(2001). Diffusion tensor imaging: Concepts and applications. Journal of Magnetic Resonance

Imaging. 13 (4). p.pp. 534–546.

Boekel, W., Wagenmakers, E.J., Belay, L., Verhagen, J., Brown, S. & Forstmann, B.U. (2015). A

purely confirmatory replication study of structural brain-behavior correlations. Cortex. 66.

p.pp. 115–133.

Boukhris, T., Sheehy, O., Mottron, L. & Bérard, A. (2015). Antidepressant Use During Pregnancy

and the Risk of Autism Spectrum Disorder in Children. JAMA Pediatrics. p.p. 1.

127
Bradley, A.P. (1997). The use of the area under the ROC curve in the evaluation of machine

learning algorithms. Pattern Recognition. 30 (7). p.pp. 1145–1159.

De Brébisson, A. & Montana, G. (2015). Deep neural networks for anatomical brain segmentation.

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2015–Octob.

p.pp. 20–28.

Breiman, L. (2001). Random Forrest. Machine Learning. p.pp. 1–33.

Breiman, L. (1996). Technical note: Some properties of splitting criteria. Machine Learning. 24. p.pp.

41–47.

Brown, N. & Sandholm, T. (2017). Safe and Nested Endgame Solving for Imperfect-Information Games Prior

Approaches to Endgame Solving in.

Buckner, R.L., Head, D., Parker, J., Fotenos, A.F., Marcus, D., Morris, J.C. & Snyder, A.Z. (2004).

A unified approach for morphometric and functional data analysis in young, old, and

demented adults using automated atlas-based head size normalization: reliability and

validation against manual measurement of total intracranial volume. NeuroImage. 23 (2). p.pp.

724–38.

Button, K.S., Ioannidis, J.P. a, Mokrysz, C., Nosek, B. a, Flint, J., Robinson, E.S.J. & Munafò,

M.R. (2013). Power failure: why small sample size undermines the reliability of neuroscience.

Nature reviews. Neuroscience. 14 (5). p.pp. 365–76.

Calderoni, S., Retico, A., Biagi, L., Tancredi, R., Muratori, F. & Tosetti, M. (2012). Female

children with autism spectrum disorder: An insight from mass-univariate and pattern

classification analyses. NeuroImage. 59 (2). p.pp. 1013–1022.

Callaert, D. V., Ribbens, A., Maes, F., Swinnen, S.P. & Wenderoth, N. (2014). Assessing age-

related gray matter decline with voxel-based morphometry depends significantly on

segmentation and normalization procedures. Frontiers in Aging Neuroscience. 6 (JUN). p.pp. 1–

128
14.

Campbell, D.J., Chang, J. & Chawarska, K. (2014). Early generalized overgrowth in autism

spectrum disorder: Prevalence rates, gender effects, and clinical outcomes. Journal of the

American Academy of Child and Adolescent Psychiatry. 53 (10). p.pp. 1063–1073.

Carper, R.A., Moses, P., Tigue, Z.D. & Courchesne, E. (2002). Cerebral lobes in autism: early

hyperplasia and abnormal age effects. NeuroImage. 16 (4). p.pp. 1038–51.

Caruana, R. & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning

algorithms. Proceedings of the 23rd international conference on Machine learning. C (1). p.pp. 161–168.

CDC (2014). Prevalence of autism spectrum disorder among children aged 8 years - autism and developmental

disabilities monitoring network, 11 sites, United States, 2010.

Chakraborty, M., Pal, S., Pramanik, R. & Ravindranath Chowdary, C. (2016). Recent

developments in social spam detection and combating techniques: A survey. Information

Processing & Management. 52 (6). p.pp. 1053–1073.

Chawla, N. V. (2005). Data Mining for Imbalanced Datasets: An Overview. Data Mining and

Knowledge Discovery Handbook. p.pp. 853–867.

Chen, R., Jiao, Y. & Herskovits, E.H. (2011). Structural MRI in autism spectrum disorder. Pediatric

research. 69 (5 Pt 2). p.p. 63R–8R.

Close, H.A., Lee, L.-C., Kaufmann, C.N. & Zimmerman, A.W. (2012). Co-occurring Conditions

and Change in Diagnosis in Autism Spectrum Disorders. Pediatrics. 129 (2). p.pp. e305–e316.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Statistical Power Analysis for

the Behavioral Sciences. 2nd p.p. 567.

Collins, D.L., Neelin, P., Peters, T.M. & Evans, A.C. (1994). Automatic 3D intersubject

registration of MR volumetric data in standardized Talairach space. Journal of computer assisted

tomography. 18 (2). p.pp. 192–205.

129
Courchesne, E., Campbell, K. & Solso, S. (2011a). Brain growth across the life span in autism:

Age-specific changes in anatomical pathology. Brain Research. 1380. p.pp. 138–145.

Courchesne, E., Karns, C.M., Davis, H.R., Ziccardi, R., Carper, R. a, Tigue, Z.D., Chisum, H.J.,

Moses, P., Pierce, K., Lord, C., Lincoln, A.J., Pizzo, S., Schreibman, L., Haas, R.H.,

Akshoomoff, N. a & Courchesne, R.Y. (2011b). Unusual brain growth patterns in early life in

patients with autistic disorder: an MRI study. Neurology. 57 (24). p.pp. 2111–2111.

Cox, R.W. (1996). AFNI: Software for Analysis and Visualization of Functional Magnetic

Resonance Neuroimages. Computers and Biomedical Research. 29 (3). p.pp. 162–173.

Craig, M.C., Zaman, S.H., Daly, E.M., Ter, W.J.C., Robertson, D.M.W., Hallahan, B., Toal, F.,

Reed, S., Ambikapathy, A., Ammer, M.B.R., Murphy, C.M. & Murphy, D.G.M. (2007).

Women with autistic-spectrum disorder:magnetic resonance imaging study of brain anatomy.

British journal of psychiatry. 191. p.pp. 224–228.

Croteau-Chonka, E.C., Dean, D.C., Remer, J., Dirks, H., O’Muircheartaigh, J. & Deoni, S.C.L.

(2016). Examining the relationships between cortical maturation and white matter

myelination throughout early childhood. NeuroImage. 125. p.pp. 413–421.

D’Mello, A.M., Crocetti, D., Mostofsky, S.H. & Stoodley, C.J. (2015). Cerebellar gray matter and

lobular volumes correlate with core autism symptoms. NeuroImage: Clinical. 7. p.pp. 631–639.

Dale, A.M., Fischl, B. & Sereno, M.I. (1999a). Cortical Surface-Based Analysis:I. Segmentation,

Surface Reconstruction. NeuroImage. 9 (2). p.pp. 179–194.

Dale, A.M., Fischl, B. & Sereno, M.I. (1999b). Cortical surface-based analysis. I. Segmentation

and surface reconstruction. NeuroImage. 9 (2). p.pp. 179–94.

Dawson, G., Rogers, S., Munson, J., Smith, M., Winter, J., Greenson, J., Donaldson, A. & Varley,

J. (2010). Randomized, Controlled Trial of an Intervention for Toddlers With Autism: The

Early Start Denver Model. Pediatrics. 125 (1). p.pp. e17–e23.

130
Demirakca, T., Sartorius, A., Ende, G., Meyer, N., Welzel, H., Skopp, G., Mann, K. & Hermann,

D. (2011). Diminished gray matter in the hippocampus of cannabis users: Possible protective effects of

cannabidiol.

Desikan, R.S., Ségonne, F., Fischl, B., Quinn, B.T., Dickerson, B.C., Blacker, D., Buckner, R.L.,

Dale, A.M., Maguire, R.P., Hyman, B.T., Albert, M.S. & Killiany, R.J. (2006). An automated

labeling system for subdividing the human cerebral cortex on MRI scans into gyral based

regions of interest. NeuroImage. 31 (3). p.pp. 968–80.

Dinov, I., Lozev, K., Petrosyan, P., Liu, Z., Eggert, P., Pierce, J., Zamanyan, A., Chakrapani, S.,

van Horn, J., Parker, D.S., Magsipoc, R., Leung, K., Gutman, B., Woods, R. & Toga, A.

(2010). Neuroimaging study designs, computational analyses and data provenance using the

LONI pipeline. PLoS ONE. 5 (9).

Do, C.B. & Batzoglou, S. (2008). What is the expectation maximization algorithm? Nature

Biotechnology. 26 (8). p.pp. 897–899.

Dougherty, C.C., Evans, D.W., Katuwal, G.J. & Michael, A.M. (2016a). Asymmetry of fusiform

structure in autism spectrum disorder: trajectory and association with symptom severity.

Molecular Autism. 7 (1). p.p. 28.

Dougherty, C.C., Evans, D.W., Myers, S.M., Moore, G.J. & Michael, A.M. (2016b). A

Comparison of Structural Brain Imaging Findings in Autism Spectrum Disorder and

Attention-Deficit Hyperactivity Disorder. Neuropsychology Review. 26 (1). p.pp. 25–43.

Ecker, C., Andrews, D., Dell’Acqua, F., Daly, E., Murphy, C., Catani, M., Thiebaut De Schotten,

M., Baron-Cohen, S., Lai, M.C., Lombardo, M. V., Bullmore, E.T., Suckling, J., Williams,

S., Jones, D.K., Chiocchetti, A. & Murphy, D.G.M. (2016). Relationship between cortical

gyrification, white matter connectivity, and autism spectrum disorder. Cerebral Cortex. 26 (7).

p.pp. 3297–3309.

131
Ecker, C., Marquand, A., Mourão-Miranda, J., Johnston, P., Daly, E.M., Brammer, M.J.,

Maltezos, S., Murphy, C.M., Robertson, D., Williams, S.C. & Murphy, D.G.M. (2010a).

Describing the brain in autism in five dimensions--magnetic resonance imaging-assisted

diagnosis of autism spectrum disorder using a multiparameter classification approach. The

Journal of neuroscience : the official journal of the Society for Neuroscience. 30 (32). p.pp. 10612–10623.

Ecker, C., Rocha-Rego, V., Johnston, P., Mourao-Miranda, J., Marquand, A., Daly, E.M.,

Brammer, M.J., Murphy, C. & Murphy, D.G. (2010b). Investigating the predictive value of

whole-brain structural MR scans in autism: A pattern classification approach. NeuroImage. 49

(1). p.pp. 44–56.

Eggert, L.D., Sommer, J., Jansen, A., Kircher, T. & Konrad, C. (2012). Accuracy and Reliability

of Automated Gray Matter Segmentation Pathways on Real and Simulated Structural

Magnetic Resonance Images of the Human Brain Y. Fan (ed.). PLoS ONE. 7 (9). p.p. e45081.

Evans, A.C., Marrett, S., Neelin, P., Collins, L., Worsley, K., Dai, W., Milot, S., Meyer, E. & Bub,

D. (1992). Anatomical mapping of functional activation in stereotactic coordinate space.

NeuroImage. 1 (1). p.pp. 43–53.

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters. 27 (8). p.pp. 861–

874.

Fellhauer, I., Zöllner, F.G., Schröder, J., Degen, C., Kong, L., Essig, M., Thomann, P.A. & Schad,

L.R. (2015). Comparison of automated brain segmentation using a brain phantom and

patients with early Alzheimer’s dementia or mild cognitive impairment. Psychiatry Research:

Neuroimaging. 233 (3). p.pp. 299–305.

Fischl, B. (2012). FreeSurfer. NeuroImage. 62 (2). p.pp. 774–81.

Fischl, B., Van Der Kouwe, A., Destrieux, C., Halgren, E., Ségonne, F., Salat, D.H., Busa, E.,

Seidman, L.J., Goldstein, J., Kennedy, D., Caviness, V., Makris, N., Rosen, B. & Dale, A.M.

132
(2004). Automatically Parcellating the Human Cerebral Cortex. Cerebral Cortex. 14 (1). p.pp.

11–22.

Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A.,

Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B. & Dale, A.M.

(2002). Whole brain segmentation: automated labeling of neuroanatomical structures in the

human brain. Neuron. 33 (3). p.pp. 341–55.

Fischl, B., Sereno, M.I. & Dale, A.M. (1999a). Cortical Surface-Based Analysis II: Inflation,

Flattening, and a Surface-Based Coordinate System. NeuroImage. 9 (2). p.pp. 195–207.

Fischl, B., Sereno, M.I., Tootell, R.B. & Dale, a M. (1999b). High-resolution intersubject

averaging and a coordinate system for the cortical surface. Human brain mapping. 8 (4). p.pp.

272–84.

Floris, D.L., Chura, L.R., Holt, R.J., Suckling, J., Bullmore, E.T., Baron-Cohen, S. & Spencer,

M.D. (2013). Psychological correlates of handedness and corpus callosum asymmetry in

autism: The left hemisphere dysfunction theory revisited. Journal of Autism and Developmental

Disorders. 43 (8). p.pp. 1758–1772.

Fombonne, E. (2009). Epidemiology of pervasive developmental disorders. Pediatric Research. 65 (6).

p.pp. 591–598.

Fombonne, E. (2005). The Changing Epidemiology of Autism. Journal of Applied Research in

Intellectual Disabilities. 18 (4). p.pp. 281–294.

Friedman, J.H. (2000). Greedy Function Approximation: A Gradient Boosting Machine. Annals of

Statistics. 29. p.pp. 1189--1232.

Frisoni, G.B., Fox, N.C., Jack, C.R., Scheltens, P. & Thompson, P.M. (2010). The clinical use of

structural MRI in Alzheimer disease. Nature Reviews. 6 (2). p.pp. 67–77.

Ganz, M.L. (2007). The lifetime distribution of the incremental societal costs of autism. Archives of

133
pediatrics & adolescent medicine. 161 (4). p.pp. 343–349.

Gargaro, B.A., Rinehart, N.J., Bradshaw, J.L., Tonge, B.J. & Sheppard, D.M. (2011). Autism and

ADHD: How far have we come in the comorbidity debate? Neuroscience and Biobehavioral

Reviews. 35 (5). p.pp. 1081–1088.

Gelman, A. & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge

University Press.

Geschwind, D.H. & Levitt, P. (2007). Autism spectrum disorders: developmental disconnection

syndromes. Current opinion in neurobiology. 17 (1). p.pp. 103–11.

Glenn, O.A. (2010). MR imaging of the fetal brain. Pediatric Radiology. 40 (1). p.pp. 68–81.

Gotham, K., Pickles, A. & Lord, C. (2009). Standardizing ADOS Scores for a Measure of Severity

in Autism Spectrum Disorders. J Autism Dev Disord. 39 (5). p.pp. 693–705.

Gray, R.M. & M., R. (1990). Entropy and information theory. Springer-Verlag.

Greve, D.N. (2011). An Absolute Beginner’s Guide to Surface- and Voxel-based Morphometric

Analysis. Proceedings of the International Society for Magnetic Resonance in Medicine. i. p.pp. 1–7.

Groen, W., Teluij, M., Buitelaar, J. & Tendolkar, I. (2010). Amygdala and Hippocampus

Enlargement During Adolescence in Autism. Journal of the American Academy of Child & Adolescent

Psychiatry. 49 (6). p.pp. 552–560.

Gronenschild, E.H.B.M., Habets, P., Jacobs, H.I.L., Mengelers, R., Rozendaal, N., van Os, J. &

Marcelis, M. (2012). The effects of FreeSurfer version, workstation type, and Macintosh

operating system version on anatomical volume and cortical thickness measurements. S.

Hayasaka (ed.). PloS one. 7 (6). p.p. e38234.

Gupta, A. (2015). Unsupervised Learning of Disease Subtypes from Continuous Time Hidden

Markov Models of Disease Progression. PhD Thesis. (December).

Ha, S., Sohn, I.-J., Kim, N., Sim, H.J. & Cheon, K.-A. (2015). Characteristics of Brains in Autism

134
Spectrum Disorder: Structure , Function and Connectivity across the Lifespan. Experimental

Neurobiology. 24 (4). p.pp. 273–284.

Haar, S., Berman, S., Behrmann, M. & Dinstein, I. (2014). Anatomical Abnormalities in Autism?

Cerebral cortex. p.pp. 1–13.

Hampel, H., Frank, R., Broich, K., Teipel, S.J., Katz, R.G., Hardy, J., Herholz, K., Bokde,

A.L.W., Jessen, F., Hoessler, Y.C., Sanhai, W.R., Zetterberg, H., Woodcock, J. & Blennow,

K. (2010). Biomarkers for Alzheimer’s disease: academic, industry and regulatory

perspectives. Nature reviews. Drug discovery. 9 (7). p.pp. 560–574.

Hansen, T.I., Brezova, V., Eikenes, L., Håberg, A. & Vangberg, X.T.R. (2015). How does the

accuracy of intracranial volume measurements affect normalized brain volumes? sample size

estimates based on 966 subjects from the HUNT MRI cohort. American Journal of Neuroradiology.

36 (8). p.pp. 1450–1456.

Happé, F., Ronald, A. & Plomin, R. (2006). Time to give up on a single explanation for autism.

Nature neuroscience. 9 (10). p.pp. 1218–1220.

Hazlett, H.C., Gu, H., Munsell, B.C., Kim, S.H., Styner, M., Wolff, J.J., Elison, J.T., Swanson,

M.R., Zhu, H., Botteron, K.N., Collins, D.L., Constantino, J.N., Dager, S.R., Estes, A.M.,

Evans, A.C., Fonov, V.S., Gerig, G., Kostopoulos, P., Mckinstry, R.C., Pandey, J., Paterson,

S., Pruett Jr, J.R., Schultz, R.T., Shaw, D.W., Zwaigenbaum, L. & Piven, J. (2017). Early

brain development in infants at high risk for autism spectrum disorder. Nature Publishing Group.

542 (7641). p.pp. 348–351.

Hazlett, H.C., Poe, M., Gerig, G., Styner, M., Chappell, C., Smith, R.G., Vachet, C. & Piven, J.

(2011). Early Brain Overgrowth in Autism Associated with an Increase in Cortical Surface

Area Before Age 2 years. Archives of General Psychiatry. 68 (5). p.pp. 467–476.

Hazlett, H.C., Poe, M.D., Lightbody, A.A., Styner, M., MacFall, J.R., Reiss, A.L. & Piven, J.

135
(2012). Trajectories of early brain volume development in fragile X syndrome and autism.

Journal of the American Academy of Child and Adolescent Psychiatry. 51 (9). p.pp. 921–933.

Hornak, J.P. (1996). The basics of MRI.

Im, K., Lee, J.-M., Lee, J., Shin, Y.-W., Kim, I.Y., Kwon, J.S. & Kim, S.I. (2006). Gender

difference analysis of cortical thickness in healthy young adults with surface-based methods.

NeuroImage. 31 (1). p.pp. 31–38.

Jenkinson, M. & Smith, S. (2001). A global optimisation method for robust affine registration of

brain images. Medical image analysis. 5 (2). p.pp. 143–56.

Jeste, S.S. & Geschwind, D.H. (2014). Disentangling the heterogeneity of autism spectrum disorder

through genetic findings. Nat Rev Neurol. 10 (2). p.pp. 74–81.

Jiao, Y. (2011). Predictive models of autism spectrum disorder based on brain regional cortical

thickness. Autism. 50 (2). p.pp. 589–599.

Jiao, Y., Chen, R., Ke, X., Chu, K., Lu, Z. & Herskovits, E.H. (2010). Predictive models of autism

spectrum disorder based on brain regional cortical thickness. NeuroImage. 50 (2). p.pp. 589–

599.

Jumah, F., Ghannam, M., Jaber, M., Adeeb, N. & Tubbs, R.S. (2016). Neuroanatomical variation

in autism spectrum disorder: A comprehensive review. Clinical Anatomy. 29 (4). p.pp. 454–465.

Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R. & Wu, A.Y. (2002). An

efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on

Pattern Analysis and Machine Intelligence. 24 (7). p.pp. 881–892.

Katuwal, G., Baum, S. & Michael, A. (2016a). Zernike moments as shape descriptors of subcortical

structures for autism classification. In: Organization for Human Brain Mapping. 2016, Geneva.

Katuwal, G.J., Baum, S.A., Cahill, N.D., Dougherty, C.C., Evans, E., Evans, D.W., Moore, G.J.

& Michael, A.M. (2016b). Inter-method discrepancies in brain volume estimation may drive

136
inconsistent findings in autism. Frontiers in Neuroscience. 10 (SEP). p.p. 439.

Katuwal, G.J., Baum, S.A., Cahill, N.D. & Michael, A.M. (2016c). Divide and Conquer: Sub-

Grouping of ASD Improves ASD Detection Based on Brain Morphometry. PloS one. 11 (4).

p.p. e0153331.

Katuwal, G.J., Cahill, N.D., Baum, S.A., Dougherty, C.C., Evans, D.W., Moore, G.J. & Michael,

A.M. (2015a). Inter-method Inconsistencies and Inter-site Variability of Brain Volume in

Autism. In: Organization for Human Brain Mapping. 2015, p. 10.13140/RG.2.1.3449.2249.

Katuwal, G.J., Cahill, N.D., Baum, S.A. & Michael, A.M. (2015b). The Predictive Power of

Structural MRI in Autism Diagnosis. In: Engineering in Medicine and Biology Society (EMBC), 2015

37th Annual International Conference of the IEEE. 2015, pp. 4270–4273.

Katuwal, G.J. & Chen, R. (2016). Machine Learning Model Interpretability for Precision Medicine.

Kim, J., Kwon, J., Kim, M., Do, J., Lee, D. & Han, H. (2016). A small number of abnormal brain

connections predicts adult autism spectrum disorder. Nature Communications.

Kim, M., Wu, G. & Shen, D. (2013). Unsupervised deep learning for hippocampus segmentation in 7.0 Tesla

MR images.

Kinney, D.K., Munir, K.M., Crowleya, D.J. & Miller, A.M. (2009). Prenatal Stress and Risk for

Autism. Neuroscience Biobehavioral Reviews. 32 (8). p.pp. 1519–1532.

Klein, A. & Tourville, J. (2012). 101 Labeled Brain Images and a Consistent Human Cortical

Labeling Protocol. Frontiers in Neuroscience. 6 (DEC). p.pp. 1–12.

Kober, J., Bagnell, J.A. & Peters, J. (2013). Reinforcement learning in robotics: A survey. The

International Journal of Robotics Research. 32 (11). p.pp. 1238–1274.

Kolevzon, A., Gross, R. & Reichenberg, A. (2007). Prenatal and Perinatal Risk Factors for Autism.

Archives of Pediatrics & Adolescent Medicine. 161 (4). p.p. 326.

van Kooten, I.A.J., Palmen, S.J.M.C., von Cappeln, P., Steinbusch, H.W.M., Korr, H., Heinsen,

137
H., Hof, P.R., van Engeland, H. & Schmitz, C. (2008). Neurons in the fusiform gyrus are

fewer and smaller in autism. Brain. 131 (4). p.pp. 987–999.

Kucharsky Hiess, R., Alter, R., Sojoudi, S., Ardekani, B.A., Kuzniecky, R. & Pardoe, H.R. (2015).

Corpus Callosum Area and Brain Volume in Autism Spectrum Disorder: Quantitative

Analysis of Structural MRI from the ABIDE Database. Journal of Autism and Developmental

Disorders. 45 (10). p.pp. 3107–3114.

Kühn, S., Schubert, F. & Gallinat, J. (2010). Reduced Thickness of Medial Orbitofrontal Cortex

in Smokers. Biological Psychiatry. 68 (11). p.pp. 1061–1065.

Kuznetsova, A., Brockhoff, P.B. & Christensen, R.H.B. (2013). lmerTest: Tests for random and

fixed effects for linear mixed effect models (lmer objects of lme4 package). R package version. 2

(6).

Lai, M. (2015). Deep Learning for Medical Image Segmentation. arXiv preprint arXiv:1505.02000.

Lai, M.-C., Lombardo, M. V, Suckling, J., Ruigrok, A.N. V, Chakrabarti, B., Ecker, C., Deoni,

S.C.L., Craig, M.C., Murphy, D.G.M., Bullmore, E.T. & Baron-Cohen, S. (2013). Biological

sex affects the neurobiology of autism. Brain : a journal of neurology. 136 (Pt 9). p.pp. 2799–815.

Lazar, M., Miles, L.M., Babb, J.S. & Donaldson, J.B. (2014). Axonal deficits in young adults with

High Functioning Autism and their impact on processing speed. NeuroImage: Clinical. 4. p.pp.

417–425.

LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep learning. Nature. 521 (7553). p.pp. 436–444.

Leigh, J.P. & Du, J. (2015). Brief Report: Forecasting the Economic Burden of Autism in 2015 and

2025 in the United States. Journal of Autism and Developmental Disorders. 45 (12). p.pp. 4135–

4139.

Lenroot, R.K. & Yeung, P.K. (2013). Heterogeneity within Autism Spectrum Disorders: What

have We Learned from Neuroimaging Studies? Frontiers in human neuroscience. 7 (October). p.p.

138
733.

Lin, H.-Y., Ni, H.-C., Lai, M.-C., Tseng, W.-Y.I. & Gau, S.S.-F. (2015). Regional brain volume

differences between males with and without autism spectrum disorder are highly age-

dependent. Molecular Autism. 6 (1). p.p. 29.

Louppe, G. (2015). Understanding Random Forests from Theory to Practice. Univeristy of Liege.

Lyall, K., Munger, K.L., O’Reilly, É.J., Santangelo, S.L. & Ascherio, A. (2013). Maternal dietary

fat intake in association with autism spectrum disorders. American Journal of Epidemiology. 178

(2). p.pp. 209–220.

Mandal, P.K., Mahajan, R. & Dinov, I.D. (2012). Structural brain atlases: Design, rationale, and

applications in normal and pathological cohorts. Journal of Alzheimer’s Disease. 31 (Suppl 3).

p.pp. S169-88.

Di Martino, A., Yan, C.-G., Li, Q., Denio, E., Castellanos, F.X., Alaerts, K., Anderson, J.S., Assaf,

M., Bookheimer, S.Y., Dapretto, M., Deen, B., Delmonte, S., Dinstein, I., Ertl-Wagner, B.,

Fair, D.A., Gallagher, L., Kennedy, D.P., Keown, C.L., Keysers, C., Lainhart, J.E., Lord, C.,

Luna, B., Menon, V., Minshew, N.J., Monk, C.S., Mueller, S., Müller, R.-A., Nebel, M.B.,

Nigg, J.T., O’Hearn, K., Pelphrey, K.A., Peltier, S.J., Rudie, J.D., Sunaert, S., Thioux, M.,

Tyszka, J.M., Uddin, L.Q., Verhoeven, J.S., Wenderoth, N., Wiggins, J.L., Mostofsky, S.H.

& Milham, M.P. (2014). The autism brain imaging data exchange: towards a large-scale

evaluation of the intrinsic brain architecture in autism. Molecular psychiatry. 19 (6). p.pp. 659–

67.

Massoud, T.F. & Gambhir, S.S. (2003). Molecular imaging in living subjects: seeing fundamental

biological processes in a new light. Genes & development. 17 (5). p.pp. 545–80.

Matson, J.L., Rieske, R.D. & Williams, L.W. (2013). The relationship between autism spectrum

disorders and attention-deficit/hyperactivity disorder: An overview. Research in Developmental

139
Disabilities. 34 (9). p.pp. 2475–2484.

Matsuda, H., Mizumura, S., Nemoto, K., Yamashita, F., Imabayashi, E., Sato, N. & Asada, T.

(2012). Automatic voxel-based morphometry of structural MRI by SPM8 plus diffeomorphic

anatomic registration through exponentiated lie algebra improves the diagnosis of probable

Alzheimer Disease. AJNR. American journal of neuroradiology. 33 (6). p.pp. 1109–14.

May, A. (2011). Experience-dependent structural plasticity in the adult human brain. Trends in

Cognitive Sciences. 15 (10). p.pp. 475–482.

Mazziotta, J., Toga, A., Evans, A., Fox, P., Lancaster, J., Zilles, K., Woods, R., Paus, T., Simpson,

G., Pike, B., Holmes, C., Collins, L., Thompson, P., MacDonald, D., Iacoboni, M.,

Schormann, T., Amunts, K., Palomero-Gallagher, N., Geyer, S., Parsons, L., Narr, K.,

Kabani, N., Le Goualher, G., Feidler, J., Smith, K., Boomsma, D., Pol, H.H., Cannon, T.,

Kawashima, R. & Mazoyer, B. (2001a). A Four-Dimensional Probabilistic Atlas of the

Human Brain. Journal of the American Medical Informatics Association. 8 (5). p.pp. 401–430.

Mazziotta, J., Toga, a, Evans, a, Fox, P., Lancaster, J., Zilles, K., Woods, R., Paus, T., Simpson,

G., Pike, B., Holmes, C., Collins, L., Thompson, P., MacDonald, D., Iacoboni, M.,

Schormann, T., Amunts, K., Palomero-Gallagher, N., Geyer, S., Parsons, L., Narr, K.,

Kabani, N., Le Goualher, G., Boomsma, D., Cannon, T., Kawashima, R. & Mazoyer, B.

(2001b). A probabilistic atlas and reference system for the human brain: International

Consortium for Brain Mapping (ICBM). Philosophical transactions of the Royal Society of London.

Series B, Biological sciences. 356. p.pp. 1293–1322.

Mazziotta, J.C., Toga, a W., Evans, a, Fox, P. & Lancaster, J. (1995). A probabilistic atlas of the

human brain: theory and rationale for its development. The International Consortium for

Brain Mapping (ICBM). NeuroImage. 2 p.pp. 89–101.

Michael, A.M. (2009). Imaging schizophrenia: data fusion approaches to characterize and classify. Rochester

140
Institute of Technology.

Michael, A.M., Baum, S.A., White, T., Demirci, O., Andreasen, N.C., Segall, J.M., Jung, R.E.,

Pearlson, G., Clark, V.P., Gollub, R.L., Schulz, S.C., Roffman, J.L., Lim, K.O., Ho, B.C.,

Bockholt, H.J. & Calhoun, V.D. (2010). Does function follow form?: Methods to fuse

structural and functional brain images show decreased linkage in schizophrenia. NeuroImage.

49 (3). p.pp. 2626–2637.

Michael, A.M., King, M.D., Ehrlich, S., Pearlson, G., White, T., Holt, D.J., Andreasen, N.C.,

Sakoglu, U., Ho, B.-C., Schulz, S.C. & Calhoun, V.D. (2011). A Data-Driven Investigation

of Gray Matter-Function Correlations in Schizophrenia during a Working Memory Task.

Frontiers in human neuroscience. 5 (July). p.p. 71.

Mietchen, D. & Gaser, C. (2009). Computational morphometry for detecting changes in brain

structure due to development, aging, learning, disease and evolution. Frontiers in

neuroinformatics. 3 (August). p.p. 25.

Morey, R.A., Petty, C.M., Xu, Y., Hayes, J.P., Wagner, H.R., Lewis, D. V, LaBar, K.S., Styner,

M. & McCarthy, G. (2009). A comparison of automated segmentation and manual tracing

for quantifying hippocampal and amygdala volumes. NeuroImage. 45 (3). p.pp. 855–66.

Muhlert, N. & Ridgway, G.R. (2015). Failed replications, contributing factors and careful

interpretations: Commentary on “A purely confirmatory replication study of structural brain-

behaviour correlations” by Boekel etal., 2015. Cortex. 4. p.pp. 4–8.

Natekin, A. & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics. 7

(DEC).

Nelder, J.A., Baker, R.J., Nelder, J.A. & Baker, R.J. (2006). Generalized Linear Models. In:

Encyclopedia of Statistical Sciences. Hoboken, NJ, USA: John Wiley & Sons, Inc.

Niebles, J.C., Wang, H. & Fei-Fei, L. (2008). Unsupervised Learning of Human Action Categories

141
Using Spatial-Temporal Words. International Journal of Computer Vision. 79 (3). p.pp. 299–318.

Nishida, S., Narumoto, J., Sakai, Y., Matsuoka, T., Nakamae, T., Yamada, K., Nishimura, T. &

Fukui, K. (2011). Anterior insular volume is larger in patients with obsessive–compulsive

disorder. Progress in Neuro-Psychopharmacology and Biological Psychiatry. 35 (4). p.pp. 997–1001.

Nordahl, C.W., Dierker, D., Mostafavi, I., Schumann, C.M., Rivera, S.M., Amaral, D.G. & Essen,

D.C. Van (2007). Cortical Folding Abnormalities in Autism Revealed by Surface-Based

Morphometry. The Journal of Neuroscience. 27 (43). p.pp. 11725–11735.

Nordahl, C.W., Scholz, R., Yang, X., Buonocore, M.H., Simon, T., Rogers, S. & Amaral, D.G.

(2012). Increased Rate of Amygdala Growth in Children Aged 2 to 4 Years With Autism Spectrum Disorders.

69 (1). p.pp. 53–61.

Nordenskjöld, R., Malmberg, F., Larsson, E.-M., Simmons, A., Brooks, S.J., Lind, L., Ahlström,

H., Johansson, L. & Kullberg, J. (2013). Intracranial volume estimated with commonly used

methods could introduce bias in studies including brain volume measurements. NeuroImage.

83. p.pp. 355–60.

O’Brien, J.T. (2007). Role of imaging techniques in the diagnosis of dementia. The British Journal of

Radiology. 80 (special_issue_2). p.pp. S71–S77.

Padilla, N., Eklöf, E., Mårtensson, G.E., Bölte, S., Lagercrantz, H. & Ådén, U. (2015). Poor Brain

Growth in Extremely Preterm Neonates Long Before the Onset of Autism Spectrum Disorder

Symptoms. Cerebral Cortex. (February 2016). p.pp. 1–8.

Pedregosa, F. & Varoquaux, G. (2011). Scikit-learn: Machine Learning in Python. Journal of

Machine Learning Research. 12. p.pp. 2825–2830.

Peters, J.M., Sahin, M., Vogel-Farley, V.K., Jeste, S.S., Nelson, C.A., Gregas, M.C., Prabhu, S.P.,

Scherrer, B. & Warfield, S.K. (2012). Loss of White Matter Microstructural Integrity Is

Associated with Adverse Neurological Outcome in Tuberous Sclerosis Complex. Academic

142
Radiology. 19 (1). p.pp. 17–25.

Pichler, B.J., Kolb, A., Nägele, T. & Schlemmer, H.-P. (2010). PET/MRI: paving the way for the

next generation of clinical multimodality imaging applications. Journal of nuclear medicine : official

publication, Society of Nuclear Medicine. 51 (3). p.pp. 333–336.

Pickles, A., Le Couteur, A., Leadbitter, K., Salomone, E., Cole-Fletcher, R., Tobin, H., Gammer,

I., Lowry, J., Vamvakas, G., Byford, S., Aldred, C., Slonims, V., McConachie, H., Howlin,

P., Parr, J.R., Charman, T. & Green, J. (2016). Parent-mediated social communication

therapy for young children with autism (PACT): long-term follow-up of a randomised

controlled trial. The Lancet. 388 (10059). p.pp. 2501–2509.

Pooley, R.A. (2005). Fundamental Physics of MR Imaging. RadioGraphics. 25 (4). p.pp. 1087–1099.

Pringle, B., Colpe, L.J., Blumberg, S.J., Avila, R.M. & Kogan, M.D. (2012). Diagnostic history and

treatment of school-aged children with autism spectrum disorder and special health care

needs. NCHS Data Brief. (97). p.pp. 1–8.

Puterman, M.L. & L., M. (1994). Markov decision processes : discrete stochastic dynamic programming. Wiley.

Rabiner, L.R. & Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected

applications in speech recognition. PROCEEDINGS OF THE IEEE. 77 (2). p.pp. 257--286.

Radua, J., Via, E., Catani, M. & Mataix-Cols, D. (2011). Voxel-based meta-analysis of regional

white-matter volume differences in autism spectrum disorder versus healthy controls.

Psychological medicine. 41 (7). p.pp. 1539–1550.

Raichle, M.E. (1998). Behind the scenes of functional brain imaging: A historical and physiological

perspective. Proceedings of the National Academy of Sciences. 95 (3). p.pp. 765–772.

Rajagopalan, V., Yue, G.H. & Pioro, E.P. (2014). Do preprocessing algorithms and statistical

models influence voxel-based morphometry (VBM) results in amyotrophic lateral sclerosis

patients? A systematic comparison of popular VBM analytical methods. Journal of Magnetic

143
Resonance Imaging. 40 (3). p.pp. 662–667.

Ranzato, M., Huang, F.J., Boureau, Y.-L. & LeCun, Y. (2007). Unsupervised Learning of

Invariant Feature Hierarchies with Applications to Object Recognition. In: 2007 IEEE

Conference on Computer Vision and Pattern Recognition. June 2007, IEEE, pp. 1–8.

Rather, A.M., Sastry, V.N. & Agarwal, A. (2017). Stock market prediction and Portfolio selection

models: a survey. OPSEARCH. p.pp. 1–22.

Raznahan, A., Wallace, G.L., Antezana, L., Greenstein, D., Lenroot, R., Thurm, A., Gozzi, M.,

Spence, S., Martin, A., Swedo, S.E. & Giedd, J.N. (2013). Compared to what? Early brain

overgrowth in autism and the perils of population norms. Biological Psychiatry. 74 (8). p.pp.

563–575.

Redcay, E. & Courchesne, E. (2005). When is the brain enlarged in autism? A meta-analysis of all

brain size reports. Biological Psychiatry. 58 (1). p.pp. 1–9.

Ren, J., Sneller, B., Rueckert, D., Hajnal, J., Heckemann, R., Smith, S., Vickers, J. & Hill, D.

(2005). A comparison of the tissue classification and the segmentation propagation techniques

in MRI brain image segmentation Jinsong J. M. Fitzpatrick & J. M. Reinhardt (eds.).

Proceedings of SPIE. 5747. p.pp. 1682–1691.

Retico, A., Gori, I., Giuliano, A., Muratori, F. & Calderoni, S. (2016). One-class support vector

machines identify the language and default mode regions as common patterns of structural

alterations in young children with autism spectrum disorders. Frontiers in Neuroscience. 10 (JUN).

Riddle, K., Cascio, C.J. & Woodward, N.D. (2016). Brain structure in autism: a voxel-based

morphometry analysis of the Autism Brain Imaging Database Exchange (ABIDE). Brain

Imaging and Behavior.

Ridgway, G., Barnes, J., Pepple, T. & Fox, N. (2011). Estimation of total intracranial volume; a

comparison of methods. Alzheimer’s & Dementia. 7 (4). p.pp. S62–S63.

144
Rogers, S.J. & Vismara, L. a (2010). Evidence-Based Comprehensive Treatments for Early Autism.

Journal of Clinical Child and Adolescent Psychology. 37 (1). p.pp. 8–38.

Ronald, A., Happe, F., Bolton, P., Butcher, L.M., Price, T.S., Wheelwright, S., Baron-Cohen, S.

& Plomin, R. (2006). Genetic heterogeneity between the three components of the autism

spectrum: a twin study. Journal of the American Academy of Child & Adolescent Psychiatry. 45 (6).

p.pp. 691–699.

Sabuncu, M.R. & Konukoglu, E. (2014). Clinical Prediction from Structural Brain MRI Scans: A

Large-Scale Empirical Study. Neuroinformatics. 13 (1). p.pp. 31–46.

Sandin, S., Lichtenstein, P., Larsson, H., Cm, H. & Reichenberg, A. (2014). The familial risk of

autism. 311 (17). p.p. 24794370.

Sandin, S., Schendel, D., Magnusson, P., Hultman, C., Surén, P., Susser, E., Grønborg, T.,

Gissler, M., Gunnes, N., Gross, R., Henning, M., Bresnahan, M., Sourander, a, Hornig, M.,

Carter, K., Francis, R., Parner, E., Leonard, H., Rosanoff, M., Stoltenberg, C. &

Reichenberg, a (2015). Autism risk associated with parental age and with increasing

difference in age between the parents. Molecular psychiatry. (April). p.pp. 1–8.

Satterthwaite, F.E. (1946). An approximate distribution of estimates of variance components.

Biometrics. 2 (6) p.pp. 110–114.

Schaer, M., Bach Cuadra, M., Tamarit, L., Lazeyras, F., Eliez, S. & Thiran, J.P. (2008). A Surface-

based approach to quantify local cortical gyrification. IEEE Transactions on Medical Imaging. 27

(2). p.pp. 161–170.

Schmidt, R.J., Tancredi, D.J., Krakowiak, P., Hansen, R.L. & Ozonoff, S. (2014). Maternal intake

of supplemental iron and risk of autism spectrum disorder. American Journal of Epidemiology. 180

(9). p.pp. 890–900.

Schumann, C.M., Bloss, C.S., Barnes, C.C., Wideman, G.M., Carper, R.A., Akshoomoff, N.,

145
Pierce, K., Hagler, D., Schork, N., Lord, C. & Courchesne, E. (2010). Longitudinal magnetic

resonance imaging study of cortical development through early childhood in autism. The

Journal of neuroscience : the official journal of the Society for Neuroscience. 30 (12). p.pp. 4419–27.

Ségonne, F., Dale, a M., Busa, E., Glessner, M., Salat, D., Hahn, H.K. & Fischl, B. (2004). A

hybrid approach to the skull stripping problem in MRI. NeuroImage. 22 (3). p.pp. 1060–75.

Shen, M.D., Nordahl, C.W., Young, G.S., Wootton-Gorges, S.L., Lee, A., Liston, S.E.,

Harrington, K.R., Ozonoff, S. & Amaral, D.G. (2013). Early brain enlargement and elevated

extra-axial fluid in infants who develop autism spectrum disorder. Brain. 136 (9). p.pp. 2825–

2835.

Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J.,

Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J.,

Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. &

Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search.

Nature. 529 (7587). p.pp. 484–489.

Smith, S.M. (2002). Fast robust automated brain extraction. Human brain mapping. 17 (3). p.pp. 143–

55.

Smith, S.M., Zhang, Y., Jenkinson, M., Chen, J., Matthews, P.M., Federico, A. & De Stefano, N.

(2002). Accurate, Robust, and Automated Longitudinal and Cross-Sectional Brain Change

Analysis. NeuroImage. 17 (1). p.pp. 479–489.

Sordo, M. & Zeng, Q. (2005). On Sample Size and Classification Accuracy: A Performance

Comparison. Lect Notes Comput Sc. 3745. p.pp. 193–201.

Stanfield, A.C., McIntosh, A.M., Spencer, M.D., Philip, R., Gaur, S. & Lawrie, S.M. (2008).

Towards a neuroanatomy of autism: a systematic review and meta-analysis of structural

magnetic resonance imaging studies. European psychiatry : the journal of the Association of European

146
Psychiatrists. 23 (4). p.pp. 289–99.

Strobl, C., Malley, J. & Gerhard, T. (2009). An Introduction to Recursive Partitioning: Rationale,

Application Psychol Methods. Psychological methods. 14 (4). p.pp. 323–348.

Styner, M.A., Charles, H.C., Park, J. & Gerig, G. (2002). Multi-site validation of image analysis

methods - Assessing intra and inter-site variability. In: SPIE. 2002, San Diego, pp. 1–9.

Takao, H., Hayashi, N. & Ohtomo, K. (2011). Effect of scanner in longitudinal studies of brain

volume changes. Journal of magnetic resonance imaging : JMRI. 34 (2). p.pp. 438–44.

Talairach, J. & Tournoux, P. (1988). Co-planar stereotaxic atlas of the human brain. 3-Dimensional

proportional system: an approach to cerebral imaging. New York: Thieme.

Tallinen, T., Chung, J.Y., Biggins, J.S. & Mahadevan, L. (2014). Gyrification from constrained

cortical expansion. Proceedings of the National Academy of Sciences of the United States of America. 111

(35). p.pp. 12667–72.

Tallinen, T., Chung, J.Y., Rousseau, F., Girard, N., Lefèvre, J. & Mahadevan, L. (2016). On the

growth and form of cortical convolutions. Nature Physics. 476 (7358). p.pp. 57–62.

Tanga, Y., Hojatkashanib, C., Dinovb, I.D., Suna, B., Fana, L., Lina, X., Qic, H., Huab, X., Liua,

S. & Toga, A.W. (2010). A morphometric comparison study between Chinese and Caucasian

cohorts. NeuroImage. 51 (1). p.pp. 33–41.

Tondelli, M., Wilcock, G.K., Nichelli, P., de Jager, C.A., Jenkinson, M. & Zamboni, G. (2012).

Structural MRI changes detectable up to ten years before clinical Alzheimer’s disease.

Neurobiology of Aging. 33 (4). p.p. 825.e25-825.e36.

Townsend, D.W. & Cherry, S.R. (2001). Combining anatomy and function: The path to true

image fusion. European Radiology. 11 (10). p.pp. 1968–1974.

Tsang, O., Gholipour, A., Kehtarnavaz, N., Member, S., Gopinath, K., Briggs, R. & Panahi, I.

(2008). Comparison of tissue segmentation algorithms in neuroimage analysis software tools.

147
In: IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). 2008, pp.

3924–3928.

Uddin, L.Q., Menon, V., Young, C.B., Ryali, S., Chen, T., Khouzam, A., Minshew, N.J. &

Hardan, A.Y. (2011). Multivariate searchlight classification of structural magnetic resonance

imaging in children and adolescents with autism. Biological Psychiatry. 70. p.pp. 833–841.

Valk, S.L., Di Martino, A., Milham, M.P. & Bernhardt, B.C. (2015). Multicenter mapping of

structural network alterations in autism. Human Brain Mapping. 36 (6). p.pp. 2364–2373.

VanEssen, D.C. & Drury, H. a (1997). Structural and functional analyses of human cerebral cortex

using a surface-based atlas. Journal of Neuroscience. 17 (18). p.pp. 7079–7102.

Via, E., Radua, J., Cardoner, N., Happé, F. & Mataix-Cols, D. (2011). Meta-analysis of gray

matter abnormalities in autism spectrum disorder: should Asperger disorder be subsumed

under a broader umbrella of autistic spectrum disorder? Archives of general psychiatry. 68 (4). p.pp.

409–418.

Vogelstein, J.T., Park, Y., Ohyama, T., Kerr, R.A., Truman, J.W., Priebe, C.E. & Zlatic, M.

(2014). Discovery of Brainwide Neural-Behavioral Maps via Multiscale Unsupervised

Structure Learning. Science. 344 (6182). p.pp. 386--392.

Volkmar, F.R., State, M. & Klin, A. (2009). Autism and autism spectrum disorders: Diagnostic

issues for the coming decade. Journal of Child Psychology and Psychiatry and Allied Disciplines. 50 (1–

2). p.pp. 108–115.

Walther, K., Birdsill, A.C., Glisky, E.L. & Ryan, L. (2009). Structural brain differences and

cognitive functioning related to body mass index in older females. Human Brain Mapping. 31

(7). p.pp. 1052–1064.

Wee, C.-Y., Wang, L., Shi, F., Yap, P.-T. & Shen, D. (2014). Diagnosis of autism spectrum

disorders using regional and interregional morphological features. Human brain mapping. 35 (7).

148
p.pp. 3414–30.

Wolff, J.J., Gerig, G., Lewis, J.D., Soda, T., Styner, M.A., Vachet, C., Botteron, K.N., Elison, J.T.,

Dager, S.R., Estes, A.M., Hazlett, H.C., Schultz, R.T., Zwaigenbaum, L. & Piven, J. (2015).

Altered corpus callosum morphology associated with autism over the first 2 years of life. Brain.

138 (7). p.pp. 2046–2058.

Xiao, X., Fang, H., Wu, J., Xiao, C., Xiao, T., Qian, L., Liang, F., Xiao, Z., Chu, K.K. & Ke, X.

(2016). Diagnostic model generated by MRI-derived brain features in toddlers with autism

spectrum disorder. Autism Research. p.pp. 1–11.

Zablotsky, B., Black, L.I., Maenner, M.J., Schieve, L.A. & Blumberg, S.J. (2015). Estimated

Prevalence of Autism and Other Developmental Disabilities Following Questionnaire

Changes in the 2014 National Health Interview Survey. National Health Statistics Reports. (87).

p.pp. 1–21.

Zarei, M., Patenaude, B., Damoiseaux, J., Morgese, C., Smith, S., Matthews, P.M., Barkhof, F.,

Rombouts, S., Sanz-Arigita, E. & Jenkinson, M. (2010). Combining shape and connectivity

analysis: An MRI study of thalamic degeneration in Alzheimer’s disease. NeuroImage. 49 (1).

p.pp. 1–8.

Zhang, Y., Brady, M. & Smith, S. (2001). Segmentation of brain MR images through a hidden

Markov random field model and the expectation-maximization algorithm. IEEE transactions

on medical imaging. 20 (1). p.pp. 45–57.

Zinkstok, J., Kolind, S., D’Almeida, V., Shahidiani, A., Williams, S.C., Murphy, D.G. & Deoni,

S.C. (2012). Is Myelin Content Altered In Young Adults with Autism? In: INSAR. 2012,

Toronto.

Zwaigenbaum, L., Young, G.S., Stone, W.L., Dobkins, K., Ozonoff, S., Brian, J., Bryson, S.E.,

Carver, L.J., Hutman, T., Iverson, J.M., Landa, R.J. & Messinger, D. (2014). Early Head

149
Growth in Infants at Risk of Autism: A Baby Siblings Research Consortium Study. Journal of

American Academy of Child & Adolescent Psychiatry. 53 (10). p.pp. 1053–62.

150
7 Appendix A

Table 7.1: Estimated brain volumes and inter-method differences in NYU


TIV GM WM CSF
SPM (L) 1.517 ± 0.16 0.730 ± 0.07 0.507 ± 0.06 0.280 ± 0.04
SPM – FSL mean diff. (ml) 197.4 65.0 15.22 126.6
SPM
Correlation Coefficient 0.885 0.698 0.896 0.710
vs.
Cohen’s d 1.27 0.85 0.26 3.51
FSL
Paired t-test p-value 4E-74* 3E-29* 4E-11* <E-100*
FSL (L) 1.320 ± 0.15 0.665 ± 0.08 0.492 ± 0.06 0.154 ± 0.03
FSL – FS mean diff. (ml) -180.8 -41.6 11.8 NA
FSL
Correlation Coefficient 0.872 0.786 0.960 NA
vs.
Cohen’s d -1.12 -0.53 0.18 NA
FS
Paired t-test p-value 7E-64* 2E-54* 5E-9* NA
FS (L) 1.501 ± 0.17 0.706 ± 0.08 0.480 ± 0.07 NA
SPM – FS mean diff. (ml) 16.6 23.4 27.0 NA
SPM
Correlation Coefficient 0.877 0.960 0.930 NA
vs.
Cohen’s d 0.1 0.31 0.44 NA
FS
Paired t-test p-value 0.013* 9E-9* 1E-39* NA
Mean and standard deviation of the brain volumes estimated by SPM, FSL, and FS are presented. Cells
corresponding to CSFFS are filled as ‘NA’ since FS does not output total CSF volume. Inter-method
differences and corresponding statistics are presented in shaded cells. Correlation coefficient is used to
measure the association between the brain volumes estimated by two different methods. Cohen’s d is used
to measure the effect size of the inter-method difference and paired t-test was used to check the statistical
significance. Statistically significant differences are denoted by * for p < 0.05. Correlation coefficients show
that methods agree on volume estimates but the Cohen’s d and paired t-test p-values indicate significant
inter-method biases.

151
Table 7.2: ASD vs. TDC brain volume differences in NYU

TIV GM WM CSF
diff diff diff diff diff diff diff diff
p-val p-val p-val p-val
(ml) % (ml) % (ml) % (ml) %
Raw Brain Volumes
SPM 5.17 0.34 0.84 9.3 10.28 0.42 -4.4 -0.87 0.63 0.29 0.11 0.96
FSL -15.4 -1.16 0.52 1.7 0.25 0.90 -12.6 -2.54 0.20 -7.1 -4.52 0.14
FS -44.1 -2.9 0.10 5.4 0.77 0.66 -10.8 -2.22 0.31 NA NA NA
Adjusted for age and sex
SPM -0.81 -0.05 0.97 2.4 0.33 0.81 -4.89 -0.96 0.52 1.69 0.60 0.76
FSL -22.5 -1.70 0.28 -15.0 -2.26 0.16 -14.4 -2.89 0.10 -2.7 -1.73 0.36
FS -51.9 -3.42 0.03* -10.1 -1.44 0.34 -10.3 -2.12 0.24 NA NA NA
Adjusted for age, sex, and FIQ
SPM 10.9 0.72 0.61 8.1 1.12 0.41 -0.8 -0.15 0.92 3.6 1.28 0.52
FSL -8.9 -0.67 0.66 -9.5 -0.53 0.37 -9.3 -1.43 0.28 -1.3 -1.87 0.66
FS -37.0 -2.44 0.12 -3.6 -0.52 0.72 -5.2 -1.08 0.55 NA NA NA

Mean (ASD – TDC) difference (diff) in brain volume estimates according to SPM, FSL, and FS.
Percentage ASD vs. TDC group difference was calculated as 100*(𝐴𝑆𝐷 − 𝑇𝐷𝐶)/𝑇𝐷𝐶, where
𝑇𝐷𝐶 is the group mean of the TDC subjects. Cells corresponding to CSFFS are filled as ‘NA’ since
FS does not output total CSF volume. Statistically significant differences are denoted by * for
p<0.05. ASD vs. TDC differences are dependent upon the method used and only in SPM TIV,
GM and CSF volumes in ASD were significantly larger than TDC.

152
Table 7.3: Differential bias for diagnostic group (ASD) in NYU
TIV GM WM CSF
bias bias beta bias bias beta bias bias beta bias bias beta
(ml) % p-val (ml) % p-val (ml) % p-val (ml) % p-val
S
SPM
2 1.64 0.13 17 2.6 0.09 10 1.9 0.09 4.4 2.90 0.36
vs.
FSL#
F
FSL
21 1.96 0.09 -5 0.70 0.23 -4 0.85 0.30 NA NA NA
vs.
FS#
S
PM
51 3.4 0.001* 12.5 1.8 0.09 5.4 1.1 0.13 NA NA NA
vs.
FS#
# reference method

bias (ml): brain volume (in ml) by which a method overestimates in ASD subjects than in TDCs.
% bias: the percentage of brain volume by which a method overestimates in ASD subjects than in TDCs.

153
8 Appendix B

A) ASD/TDC subjects numbers matched by upsampling the smaller group in each training fold
AS VIQ age
0.9

0.8

0.7
0.76
AUC=0.74 60/373 0.69
AUC

0.6 #ASD/TDC=20/373
76/373 0.65 0.67
146/142 0.62 116/120 0.66
0.5
0.71 93/133 121/138
57/15

0.4 0.51
114/108
0.3
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150 6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40

B) ASD/TDC subjects numbers matched by downsampling the larger group


AS VIQ
1.0

0.9 age, VIQ matched age matched

0.8
AUC=0.92
#ASD/TDC=20/20
0.7 0.8
AUC

60/60
0.6 0.68 0.8
16/16
76/76 0.63
0.5 142/142
0.54
0.4 93/93

0.3
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150

Figure 8.1: Gradient Boosting Machine: Improvement in classification by sub-grouping


based on autism severity (AS), age and Verbal IQ (VIQ).
The results are similar to that of Random Forest which are presented in Figure 1. A point represents the
mean and an error bar represents the one standard deviation of the AUC scores from 10 test folds. A)
Smaller classes were up-sampled in each training fold to balance the number of ASD & TDC subjects. Sub-
grouping improved the classification with the most and least improvements from sub-grouping by AS and
age respectively. B) Larger classes were down-sampled matching the demographics of the smaller classes.
This scheme further improved the classification performance.

154
A) AS
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10
1.13**Optic.Chiasm_volume 0.2 Right.Accumbens.area_volume 0.63**Right.choroid.plexus_volume
1.31**rh_medialorbitofrontal_area 0.73**lh_parahippocampal_volume 0.21 SubCortGrayVol_volume
1.65**lh_lateralorbitofrontal_area 0.78**CSF_volume 0.48**X3rd.Ventricle_volume
1.75**lh_fusiform_thicknessstd 0.78**X3rd.Ventricle_volume 0.63**rh_medialorbitofrontal_thicknessstd
1.31**rh_parsopercularis_thicknessstd 0.59**rh_posteriorcingulate_volume 0.47**rh_entorhinal_thickness
1.38**lh_bankssts_thicknessstd 0.45* rh_rostralanteriorcingulate_thickness 0.42* lh_parsorbitalis_thickness
1.2** rh_pericalcarine_thicknessstd 0.8** lh_fusiform_thicknessstd 0.32* rh_insula_thicknessstd
1.07**rh_parstriangularis_thicknessstd 0.62**rh_lateralorbitofrontal_thicknessstd 0.42* rh_medialorbitofrontal_thickness
1.17**lh_inferiortemporal_thickness 0.92**rh_inferiorparietal_meancurv 0.37* lh_posteriorcingulate_thicknessstd
1.13**rh_lateraloccipital_gauscurv 0.09 rh_caudalmiddlefrontal_gauscurv 0.57**lh_frontalpole_meancurv

B) VIQ
75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150
1.04* rh_temporalpole_volume 0.14 rh_parahippocampal_volume 0.36* Right.vessel_volume
1.66**rh_parsopercularis_area 0.26* lh_insula_volume 0.32* Right.Hippocampus_volume
0.92* rh_parahippocampal_area 0.01 rh_parsorbitalis_volume 0.09 lh_isthmuscingulate_volume
1.03* lh_rostralanteriorcingulate_area 0.3* CSF_volume 0.25 lhCorticalWhiteMatterVol_volume
1.44**rh_medialorbitofrontal_area 0.15 rh_caudalmiddlefrontal_volume 0.14 lh_transversetemporal_thicknessstd
1.43**rh_bankssts_area 0.37**Right.Lateral.Ventricle_volume 0.5** lh_frontalpole_thicknessstd
1.45**lh_precentral_area 0.26* rh_entorhinal_thicknessstd 0.36* lh_superiortemporal_thicknessstd
1.48**rh_WhiteSurfArea_area 0.01 rh_pericalcarine_foldind 0.32* lh_frontalpole_gauscurv
0.13 lh_rostralanteriorcingulate_foldind 0.13 lh_supramarginal_foldind 0.02 rh_lateralorbitofrontal_gauscurv
0.19 rh_parsopercularis_foldind 0.22 lh_frontalpole_gauscurv 0.11 lh_pericalcarine_gauscurv

C) Age (years)
6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40
0.1 Left.Cerebellum.White.Matter_volume0.35* CC_Mid_Anterior_volume 0.5** lh_transversetemporal_volume
0.15 Right.Accumbens.area_volume 0.48**CC_Central_volume 0.45**lh_inferiortemporal_volume
0.01 lh_superiortemporal_volume 0.39**X3rd.Ventricle_volume 0.49**rh_bankssts_volume
0.14 lh_transversetemporal_area 0.18 rh_bankssts_volume 0.29* Left.Inf.Lat.Vent_volume
0.4** rh_rostralanteriorcingulate_area 0.42**lh_lingual_thickness 0.32* Right.Hippocampus_volume
0.23 rh_insula_foldind 0.34* rh_lingual_thicknessstd 0.42**rh_superiortemporal_volume
0.2 lh_lateralorbitofrontal_foldind 0.29* lh_pericalcarine_thickness 0.21 Right.vessel_volume
0.29* lh_fusiform_foldind 0.26 lh_caudalanteriorcingulate_thickness 0.12 rh_parahippocampal_volume
0.28* lh_frontalpole_gauscurv 0.22 rh_precuneus_thicknessstd 0.17 lh_medialorbitofrontal_thicknessstd
0.18 lh_lateralorbitofrontal_gauscurv 0.22 lh_bankssts_meancurv 0.38**lh_pericalcarine_meancurv

0 25 50 75 100 0 25 50 75 100 0 25 50 75 100


Feature Importance(%)
D) ALL Subjects
ALL
0.24**CSF_volume
Morphometric Attributes
0.14 Left.Amygdala_volume
volume
0.02 rh_parahippocampal_volume area
0.25**Left.Lateral.Ventricle_volume thickness
0.07 rh_bankssts_volume foldind
0.23**Left.Inf.Lat.Vent_volume meancurv
gauscurv
0.04 rh_medialorbitofrontal_area
0.04 rh_parstriangularis_area Group Difference
0.18* rh_transversetemporal_thicknessstd a TDC > ASD
a ASD > TDC
0.13 lh_frontalpole_gauscurv

0 25 50 75 100
Feature Importance(%)

Figure 8.2. Gradient Boosting Machine (GBM): Important features for classification are
variable across sub-groups.
Top 10 important features for autism spectrum disorder (ASD) vs. typically developing controls (TDC)
classification in each sub-group are presented. Each feature is represented by a colored bar; the length of
the bar represents the relative % importance for classification with respect to the top feature. The features
have been grouped and color-coded by volume, area, thickness mean, thickness standard deviation, folding
index, mean curvature and Gaussian curvature. Before each feature, Cohen’s d and two sample t-test
significance (P<0.005** and P<0.05*) of ASD vs. TDC group difference are presented. Important features
for classification were similar to that from random forest presented in Fig 2. The important features highly
varied across the sub-groups demonstrating the heterogeneity in ASD brain morphometry.

155
a inter-subgroup classification a intra-subgroup classification

Autism Severity (AS)


4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10
1.0

0.9 0.92
AUC scores

0.8 0.81
0.7 0.71 0.69
0.69 0.68
0.6
0.63
0.52
0.5 0.49
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10
Training sub-groups
Verbal (VIQ)
75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150
0.8
0.75
0.7
AUC scores

0.64 0.63 0.62


0.6
0.54 0.54 0.54
0.5
0.44
0.4
0.35
0.3
75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150
Training sub-groups
Age (years)
6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40

0.7
0.66 0.65
AUC scores

0.6
0.62
0.57
0.53 0.52
0.5 0.5
0.47
0.43
0.4
6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40 6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40 6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40
Training sub-groups

Figure 8.3: Inter and intra sub-groups classification AUC scores


Mean and standard deviations of AUC scores when classification models trained in different sub-groups
were tested on each other. The title of each sub-plot represents the sub-group on which the testing was
performed. Red data points correspond to when training and testing were performed on the same sub-
group (intra-subgroup) and black data points correspond to when training and testing were performed on
different sub-groups (inter-subgroup).
Intra-subgroup classification: A random forest classification model was trained in each sub-group
under the 10-fold cross-validation framework. Inter-subgroup classification: A random forest
classification model trained in each sub-group was tested on 200 bootstrap replications of the test sub-group.
Intra sub-groups AUC scores were much larger than inter sub-groups AUC scores in 16 out of 18
comparisons. The AUC scores decreased with the increasing distance between training sub-group and test
sub-group.

156
A) AS B) VIQ C) Age (years) D) ALL Subjects
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150 6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40 ALL
1.65**lh_lateralorbitofrontal_area 0.78**X3rd.Ventricle_volume 0.63**Right.choroid.plexus_volume 1.66**rh_parsopercularis_area 0.14 rh_parahippocampal_volume 0.25 lhCorticalWhiteMatterVol_volume 0.1 Left.Cerebellum.White.Matter_volume
0.48**CC_Central_volume 0.5** lh_transversetemporal_volume 0.25**Left.Lateral.Ventricle_volume
1.48**rh_WhiteSurfArea_area 0.26* lh_insula_volume 0.24 CorticalWhiteMatterVol_volume 0.02 rh_parahippocampal_volume
0.4** rh_rostralanteriorcingulate_area 0.35* CC_Mid_Anterior_volume 0.49**rh_bankssts_volume
1.31**rh_medialorbitofrontal_area 0.73**lh_parahippocampal_volume 0.21 SubCortGrayVol_volume 0.24**CSF_volume
1.33**lh_WhiteSurfArea_area 0.01 rh_parsorbitalis_volume 0.32* Right.Hippocampus_volume 0.23 rh_insula_foldind 0.39**X3rd.Ventricle_volume 0.45**lh_inferiortemporal_volume 0.23**Left.Inf.Lat.Vent_volume
1.75**lh_fusiform_thicknessstd 0.77**rhSurfaceHoles_volume 0.48**X3rd.Ventricle_volume 1.44**rh_medialorbitofrontal_area 0.42**Left.Lateral.Ventricle_volume 0.3* Left.Hippocampus_volume 0.2 lh_lateralorbitofrontal_foldind 0.36* Left.Lateral.Ventricle_volume 0.42**rh_superiortemporal_volume 0.14 Left.Amygdala_volume
1.38**lh_bankssts_thicknessstd 0.87**lh_middletemporal_thickness 0.29 Right.Putamen_volume 0.92* rh_parahippocampal_area 0.37**Right.Lateral.Ventricle_volume 0.36* Right.vessel_volume 0.29* lh_fusiform_foldind 0.23 lhCorticalWhiteMatterVol_volume 0.34* lhCortexVol_volume 0.17* CorticalWhiteMatterVol_volume
0.13 lh_rostralanteriorcingulate_foldind 0.3* CSF_volume 0.22 rhCorticalWhiteMatterVol_volume 0.17* lhCorticalWhiteMatterVol_volume
1.17**lh_inferiortemporal_thickness 0.92**rh_inferiorparietal_meancurv 0.63**rh_medialorbitofrontal_thicknessstd 0.19 lh_superiortemporal_foldind 0.3* Right.Lateral.Ventricle_volume 0.27* Left.Putamen_volume
0.15 rh_caudalmiddlefrontal_volume 0.2 rh_parahippocampal_volume 0.07 rh_bankssts_volume
0.4** lh_lateraloccipital_foldind 0.18 rh_bankssts_volume 0.32* Right.Hippocampus_volume 0.23**X3rd.Ventricle_volume
1.22**lh_inferiorparietal_meancurv 0.83**lh_superiortemporal_meancurv 0.65**rh_middletemporal_thicknessstd 0.07 Left.Amygdala_volume 0.05 CC_Mid_Anterior_volume
0.18 lh_lateralorbitofrontal_gauscurv 0.21 rhCorticalWhiteMatterVol_volume 0.41**Left.Amygdala_volume 0.22**Right.Lateral.Ventricle_volume Morphometric Attributes
1.13**rh_lateraloccipital_gauscurv 0.8** lh_middletemporal_meancurv 0.61**rh_parsopercularis_thicknessstd 0.25* lh_parsopercularis_volume 0.23 lh_parstriangularis_area 0.16* rhCorticalWhiteMatterVol_volume volume
0.28* lh_frontalpole_gauscurv 0.1 rh_parahippocampal_volume 0.33* CortexVol_volume
0.29* Right.Inf.Lat.Vent_volume 0.21 lh_temporalpole_area 0.18* Right.choroid.plexus_volume area
0.5* rh_precentral_meancurv 0.42* lh_superiorparietal_thicknessstd 0.01 rh_medialorbitofrontal_gauscurv 0.18 Left.Thalamus.Proper_volume 0.31* rhCortexVol_volume
0.08 rh_parahippocampal_area 0.09 rh_precentral_area 0.08 CC_Mid_Anterior_volume thickness
0.02 rh_lingual_gauscurv 0.23 CorticalWhiteMatterVol_volume 0.33* Left.Hippocampus_volume 0.19* Right.Inf.Lat.Vent_volume foldind
0.89**rh_inferiortemporal_meancurv 0.44* rh_parsorbitalis_thickness 0.35**rh_parahippocampal_thickness 0.5** lh_frontalpole_thicknessstd
0.35* lh_fusiform_gauscurv 0.03 lh_caudalmiddlefrontal_area 0.41**rh_transversetemporal_volume 0.17* rh_pericalcarine_volume meancurv
0.92**lh_lateraloccipital_meancurv 0.54**rh_parstriangularis_thicknessstd 0.26* rh_entorhinal_thicknessstd 0.46**lh_inferiortemporal_thicknessstd
0.25 rh_parsopercularis_gauscurv 0.42**lh_lingual_thickness 0.29* Left.Inf.Lat.Vent_volume 0.04 rh_parstriangularis_area gauscurv
0.26* rh_precentral_thickness 0.36* lh_superiortemporal_thicknessstd 0.04 rh_medialorbitofrontal_area
0.63**lh_inferiortemporal_meancurv 0.42* rh_medialorbitofrontal_thickness 0.48**rh_paracentral_gauscurv 0.34* rh_lingual_thicknessstd 0.12 rh_parahippocampal_volume Group Difference
0.09 lh_parsopercularis_foldind 0.45**rh_paracentral_thickness 0.04 rh_isthmuscingulate_area
0.09 rh_caudalmiddlefrontal_gauscurv 0.39* rh_lateralorbitofrontal_thickness 0.23 rh_insula_gauscurv 0.29* lh_pericalcarine_thickness 0.33* TotalGrayVol_volume 0.14 lh_transversetemporal_area a TDC > ASD
0.13 lh_supramarginal_foldind 0.14 lh_transversetemporal_thicknessstd
0.09 rh_frontalpole_gauscurv 0.26 lh_caudalanteriorcingulate_thickness0.17 lh_medialorbitofrontal_thicknessstd 0.11 rh_transversetemporal_area a ASD > TDC
0.3 lh_supramarginal_gauscurv 0.58**rh_parsopercularis_thickness 0.07 rh_precuneus_foldind 0.37* lh_medialorbitofrontal_thicknessstd
0.22 rh_precuneus_thicknessstd 0.43**lh_inferiortemporal_thickness 0.25**rh_lingual_thicknessstd
0.23 rh_middletemporal_foldind 0.35* rh_superiorparietal_thicknessstd 0.14 lh_inferiortemporal_thicknessstd
0.47**rh_entorhinal_thickness 0.29* rh_paracentral_thickness 0.4** lh_superiortemporal_thickness
0.01 rh_pericalcarine_foldind 0.42* rh_fusiform_thicknessstd 0.2* lh_isthmuscingulate_thicknessstd
0.67**lh_middletemporal_meancurv 0.22 lh_frontalpole_gauscurv 0.3* rh_inferiortemporal_foldind 0.35* lh_lingual_thicknessstd 0.38**lh_pericalcarine_meancurv 0.19* rh_posteriorcingulate_thicknessstd
0.32* lh_frontalpole_gauscurv 0.29* lh_posteriorcingulate_thickness 0.32* rh_pericalcarine_gauscurv 0.17* lh_medialorbitofrontal_thicknessstd
0.57**lh_frontalpole_meancurv
0.21 rh_fusiform_foldind 0.18* rh_transversetemporal_thicknessstd
0.02 rh_lateralorbitofrontal_gauscurv
0.63**lh_superiortemporal_meancurv 0.08 rh_insula_foldind
0.11 lh_pericalcarine_gauscurv 0.22 lh_bankssts_meancurv
0.22**lh_frontalpole_meancurv
0.05 lh_frontalpole_gauscurv 0.17 lh_rostralmiddlefrontal_gauscurv 0.17 lh_entorhinal_gauscurv 0.13 lh_frontalpole_gauscurv
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
Feature Importance(%) Feature Importance(%) Feature Importance(%) Feature Importance(%)

Figure 8.4: Random Forest: Important features for classification in sub-groups (10% threshold).
Important features for autism spectrum disorder (ASD) vs. typically developing controls (TDC) classification
in each sub-group are presented. The features required to have 10 % of the total feature importance scores
of all the features were considered as important. Each feature is represented by a colored bar; the length of
the bar represents the relative % importance for classification with respect to the top feature. The features
have been grouped and color-coded by volume, area, thickness mean, thickness standard deviation, folding
index, mean curvature and Gaussian curvature. Before each feature, Cohen’s d and two sample t-test
significance (P<0.005** and P<0.05*) of ASD vs. TDC group difference are presented. The important
features for classification varied across the sub-groups demonstrating the heterogeneity in ASD brain
morphometry. Important features are similar to that from random forest presented in Fig 2 where top 10
features are presented. The important features were dissimilar across the sub-groups.

157
A) AS B) VIQ C) Age (years) D) ALL Subjects
4 ≤ AS ≤ 5 6 ≤ AS ≤ 7 8 ≤ AS ≤ 10 75 ≤ VIQ ≤ 90 90 < VIQ < 115 115 ≤ VIQ ≤ 150 6 ≤ Age < 13 13 ≤ Age < 18 18 ≤ Age ≤ 40 ALL
1.13**Optic.Chiasm_volume 0.78**X3rd.Ventricle_volume 0.63**Right.choroid.plexus_volume 1.04* rh_temporalpole_volume 0.14 rh_parahippocampal_volume 0.25 lhCorticalWhiteMatterVol_volume 0.1 Left.Cerebellum.White.Matter_volume0.48**CC_Central_volume 0.5** lh_transversetemporal_volume 0.25**Left.Lateral.Ventricle_volume
1.66**rh_parsopercularis_area 0.26* lh_insula_volume 0.24 CorticalWhiteMatterVol_volume 0.15 Right.Accumbens.area_volume 0.35* CC_Mid_Anterior_volume 0.49**rh_bankssts_volume 0.02 rh_parahippocampal_volume
1.17**lhSurfaceHoles_volume 0.73**lh_parahippocampal_volume 0.21 SubCortGrayVol_volume 0.24**CSF_volume
1.48**rh_WhiteSurfArea_area 0.01 rh_parsorbitalis_volume 0.32* Right.Hippocampus_volume 0.01 lh_superiortemporal_volume 0.39**X3rd.Ventricle_volume 0.45**lh_inferiortemporal_volume
1.65**lh_lateralorbitofrontal_area 0.77**rhSurfaceHoles_volume 0.48**X3rd.Ventricle_volume 0.23**Left.Inf.Lat.Vent_volume
1.33**lh_WhiteSurfArea_area 0.42**Left.Lateral.Ventricle_volume 0.3* Left.Hippocampus_volume 0.21 lh_frontalpole_volume 0.36* Left.Lateral.Ventricle_volume 0.42**rh_superiortemporal_volume 0.14 Left.Amygdala_volume
1.31**rh_medialorbitofrontal_area 0.78**CSF_volume 0.29 Right.Putamen_volume 1.44**rh_medialorbitofrontal_area 0.37**Right.Lateral.Ventricle_volume 0.36* Right.vessel_volume 0.2 CC_Mid_Anterior_volume 0.23 lhCorticalWhiteMatterVol_volume 0.34* lhCortexVol_volume 0.17* CorticalWhiteMatterVol_volume
0.65 rh_parsopercularis_area 0.79**SurfaceHoles_volume 0.3 Left.Putamen_volume 0.92* rh_parahippocampal_area 0.3* CSF_volume 0.22 rhCorticalWhiteMatterVol_volume 0.24 rh_pericalcarine_volume 0.3* Right.Lateral.Ventricle_volume 0.27* Left.Putamen_volume 0.17* lhCorticalWhiteMatterVol_volume
1.41**rh_middletemporal_area 0.15 rh_caudalmiddlefrontal_volume 0.2 rh_parahippocampal_volume 0.08 Right.Caudate_volume 0.18 rh_bankssts_volume 0.32* Right.Hippocampus_volume 0.07 rh_bankssts_volume
0.94* rh_inferiortemporal_area 0.2 Right.Accumbens.area_volume 0.31 Left.Accumbens.area_volume
1.03* lh_rostralanteriorcingulate_area 0.07 Left.Amygdala_volume 0.05 CC_Mid_Anterior_volume 0.18 CC_Central_volume 0.21 rhCorticalWhiteMatterVol_volume 0.41**Left.Amygdala_volume 0.23**X3rd.Ventricle_volume
1.75**lh_fusiform_thicknessstd 0.79**lhSurfaceHoles_volume 0.5** Optic.Chiasm_volume 1.45**lh_precentral_area 0.25* lh_parsopercularis_volume 0.32* lh_insula_volume 0.22**Right.Lateral.Ventricle_volume
0.4** rh_rostralanteriorcingulate_area 0.1 rh_parahippocampal_volume 0.33* CortexVol_volume
1.38**lh_bankssts_thicknessstd 0.59**rh_posteriorcingulate_volume 0.28 Right.VentralDC_volume 0.16* rhCorticalWhiteMatterVol_volume
1.14**rh_precuneus_area 0.29* Right.Inf.Lat.Vent_volume 0.27 rh_frontalpole_volume 0.14 lh_transversetemporal_area 0.18 Left.Thalamus.Proper_volume 0.31* rhCortexVol_volume 0.18* Right.choroid.plexus_volume
1.17**lh_inferiortemporal_thickness 0.71**rh_medialorbitofrontal_area 0.36* X4th.Ventricle_volume 1.43**rh_bankssts_area 0.33* Left.Inf.Lat.Vent_volume 0.09 lh_isthmuscingulate_volume 0.33* lh_temporalpole_area 0.23 CorticalWhiteMatterVol_volume 0.33* Left.Hippocampus_volume 0.08 CC_Mid_Anterior_volume
0.97* lh_parsorbitalis_area 0.24* rh_cuneus_volume 0.22 Left.Pallidum_volume 0.29* rh_fusiform_area 0.3* CSF_volume 0.41**rh_transversetemporal_volume 0.19* Right.Inf.Lat.Vent_volume
1.31**rh_parsopercularis_thicknessstd 0.79**lh_lateralorbitofrontal_area 0.12 rh_precuneus_volume
1.36**rh_lateraloccipital_area 0.18 rh_bankssts_volume 0.07 lh_parahippocampal_volume 0.13 lh_lateraloccipital_area 0.17 lh_superiorparietal_volume 0.29* Left.Inf.Lat.Vent_volume 0.17* rh_pericalcarine_volume
1.07**rh_parstriangularis_thicknessstd 0.5* lh_caudalanteriorcingulate_area 0.12 Right.Accumbens.area_volume 1.38**rh_superiorfrontal_area 0.14 rh_superiortemporal_volume 0.13 Left.Inf.Lat.Vent_volume 0.16* lh_transversetemporal_volume
0.43**rh_entorhinal_thickness 0.19 Optic.Chiasm_volume 0.12 rh_parahippocampal_volume
1.2** rh_pericalcarine_thicknessstd 0.7** lh_bankssts_area 0.14 MaskVol_volume 1.35**rh_inferiorparietal_area 0.11 rh_parstriangularis_volume 0.16 Left.vessel_volume 0 rh_parsorbitalis_volume
0.25 rh_insula_thicknessstd 0.15 rh_posteriorcingulate_volume 0.33* TotalGrayVol_volume
1.2** rh_paracentral_area 0.18 lh_parsorbitalis_volume 0.14 lh_temporalpole_volume 0.09 rh_superiortemporal_volume
1.29**rh_parahippocampal_thickness 0.87**lh_middletemporal_thickness 0.38* lh_inferiortemporal_area 0.29* rh_parstriangularis_thicknessstd 0.12 lh_insula_volume 0.32* Left.Lateral.Ventricle_volume 0.09 Right.Hippocampus_volume
0.76* rh_postcentral_area 0.22 rh_insula_volume 0.04 Left.Thalamus.Proper_volume 0.38**lh_parahippocampal_thickness 0.11 lh_parahippocampal_volume 0.23 lh_middletemporal_volume
1.29**lh_insula_thickness 0.8** lh_fusiform_thicknessstd 0.63**rh_medialorbitofrontal_thicknessstd 0.12 Right.Accumbens.area_volume
0.13 lh_rostralanteriorcingulate_foldind 0.22 Right.choroid.plexus_volume 0.23 lh_parstriangularis_area 0.36* rh_parahippocampal_thickness 0.17 Left.Amygdala_volume 0.21 Right.vessel_volume 0.13 Left.Cerebellum.Cortex_volume
1.1** lh_cuneus_thicknessstd 0.75**lh_insula_thickness 0.65**rh_middletemporal_thicknessstd 0.19 rh_parsopercularis_foldind 0.18 Right.Caudate_volume 0.21 lh_temporalpole_area 0.11 lh_lateraloccipital_thickness 0.03 rh_parsopercularis_volume 0.25* rh_middletemporal_volume 0.15* rh_transversetemporal_volume
1.21**lh_parsorbitalis_thicknessstd 0.91**lh_inferiortemporal_thickness 0.61**rh_parsopercularis_thicknessstd 0.58 rh_middletemporal_foldind 0.29* X3rd.Ventricle_volume 0.09 rh_precentral_area 0.18 lh_posteriorcingulate_thicknessstd 0.23 Brain.Stem_volume 0.26* rh_supramarginal_volume 0.11 lh_parsopercularis_volume
0.38 lh_lateralorbitofrontal_foldind 0.14 lh_inferiortemporal_volume 0.29 lh_caudalmiddlefrontal_area 0.12 Left.Hippocampus_volume
0.87* lh_rostralanteriorcingulate_foldind 0.62**rh_lateralorbitofrontal_thicknessstd 0.42* lh_superiorparietal_thicknessstd 0.24 rh_parsorbitalis_thicknessstd 0.03 rh_parsorbitalis_volume 0.35* lh_superiortemporal_volume
0.53 rh_temporalpole_gauscurv 0.1 Left.VentralDC_volume 0.25 rh_fusiform_area 0.15* Optic.Chiasm_volume
1.22**lh_inferiorparietal_meancurv 0.78**rh_transversetemporal_thicknessstd0.44* rh_parsorbitalis_thickness 0.25 rh_posteriorcingulate_thicknessstd 0.08 rh_fusiform_volume 0.31* lh_superiorfrontal_volume 0.11 X4th.Ventricle_volume
0.22 rh_transversetemporal_volume 0.23 rh_rostralanteriorcingulate_area 0.23 rh_insula_foldind 0.14 rh_medialorbitofrontal_volume 0.15 Right.Putamen_volume
1.25**lh_lateraloccipital_meancurv 0.69**rh_inferiortemporal_thickness 0.54**rh_parstriangularis_thicknessstd 0 MaskVol_volume
0.08 rh_parahippocampal_area 0.13 rh_precuneus_area 0.2 lh_lateralorbitofrontal_foldind 0.1 Right.Amygdala_volume 0.28* Left.Pallidum_volume 0.1 lh_superiortemporal_volume
1** lh_inferiortemporal_meancurv 0.79**rh_parstriangularis_thicknessstd 0.42* rh_medialorbitofrontal_thickness 0.25* lh_parsopercularis_area 0.14 rh_lateralorbitofrontal_area 0.29* lh_fusiform_foldind 0.03 lh_caudalmiddlefrontal_area 0.11 rh_fusiform_volume 0.01 lh_inferiortemporal_volume
1.46**rh_lateraloccipital_meancurv 0.45* rh_rostralanteriorcingulate_thickness0.39* rh_lateralorbitofrontal_thickness 0.2 rh_caudalmiddlefrontal_area 0.08 rh_transversetemporal_area 0.19 lh_superiortemporal_foldind 0.09 rh_posteriorcingulate_area 0.05 rh_cuneus_volume 0.08 Right.vessel_volume
0 rh_lateraloccipital_area 0.2 rh_lateraloccipital_area 0.09 Left.Putamen_volume
1.13**rh_lateraloccipital_gauscurv 0.84**lh_fusiform_thickness 0.58**rh_parsopercularis_thickness 0.4** lh_lateraloccipital_foldind 0.22 rh_bankssts_area 0.1 CSF_volume
0.11 lh_entorhinal_area 0.21 lh_insula_area 0.04 rh_parstriangularis_area
0.28* rh_medialorbitofrontal_foldind 0.13 lh_parahippocampal_area 0.16 lh_rostralmiddlefrontal_volume 0.04 rh_medialorbitofrontal_area
0.58 lh_supramarginal_gauscurv 0.78**lh_parahippocampal_thickness 0.47**rh_entorhinal_thickness 0.17 lh_parahippocampal_area 0.19 lh_postcentral_area 0.12 lh_parsopercularis_foldind 0.1 rh_parahippocampal_area 0.31* lh_paracentral_volume 0.04 rh_isthmuscingulate_area
0.92* rh_superiorparietal_gauscurv 0.64**rh_parahippocampal_thickness 0.57**rh_superiortemporal_thicknessstd 0.35**rh_parahippocampal_thickness 0.22 lh_rostralmiddlefrontal_area
0.22 rh_superiorparietal_foldind 0.07 rh_precuneus_area 0.26* rh_superiorparietal_volume 0.14 lh_transversetemporal_area
0.87* rh_pericalcarine_gauscurv 0.92**rh_inferiorparietal_meancurv 0.42* lh_parsorbitalis_thickness 0.26* rh_entorhinal_thicknessstd 0.04 rh_parstriangularis_area 0.11 rh_transversetemporal_area
0.15 rh_caudalmiddlefrontal_foldind 0.12 rh_parstriangularis_area 0.19 rh_lateralorbitofrontal_volume
0.26* rh_precentral_thickness 0.16 rh_frontalpole_area 0.08 rh_rostralanteriorcingulate_area
0.34 lh_superiortemporal_gauscurv 0.83**lh_superiortemporal_meancurv 0.46* rh_parstriangularis_thickness 0.06 rh_superiorfrontal_foldind 0.42**lh_lingual_thickness 0.28* SubCortGrayVol_volume
0.25* rh_parahippocampal_thicknessstd 0.26 lh_posteriorcingulate_area 0.1 lh_parsopercularis_area
0.16 lh_caudalanteriorcingulate_foldind 0.34* rh_lingual_thicknessstd 0.23 lh_temporalpole_volume
0.74* lh_inferiorparietal_gauscurv 0.8** lh_middletemporal_meancurv 0.58**lh_fusiform_thicknessstd 0.15 lh_lingual_thickness 0.5** lh_frontalpole_thicknessstd 0.1 lh_paracentral_area
0.45**rh_parsopercularis_meancurv 0.29* lh_pericalcarine_thickness 0.26* lh_paracentral_area 0.06 lh_insula_area Morphometric Attributes
1.05**rh_lingual_gauscurv 0.5* rh_precentral_meancurv 0.3 rh_lingual_thickness 0.29* lh_parahippocampal_thickness 0.46**lh_inferiortemporal_thicknessstd
0.34* lh_frontalpole_meancurv 0.26 lh_caudalanteriorcingulate_thickness0.29* rh_transversetemporal_area 0.03 rh_insula_area volume
0.89**rh_inferiortemporal_meancurv 0.6** lh_bankssts_thicknessstd 0.1 rh_cuneus_thicknessstd 0.36* lh_superiortemporal_thicknessstd
0.5** rh_paracentral_meancurv 0.22 rh_precuneus_thicknessstd 0.19 rh_pericalcarine_area 0.23**rh_pericalcarine_area area
0.08 lh_isthmuscingulate_thickness 0.45**rh_paracentral_thickness 0.12 lh_caudalanteriorcingulate_area
0.92**lh_lateraloccipital_meancurv 0.35* rh_frontalpole_thickness 0.5** lh_fusiform_meancurv 0.29* rh_paracentral_thickness 0.29* rh_bankssts_area thickness
0.03 rh_insula_thickness 0.14 lh_transversetemporal_thicknessstd 0.2* lh_cuneus_area
0.46**rh_caudalmiddlefrontal_meancurv 0.35* lh_lingual_thicknessstd 0.27* lh_transversetemporal_area foldind
0.63**lh_inferiortemporal_meancurv 0.61**rh_precuneus_thicknessstd 0.16 rh_parsorbitalis_thickness 0.37* lh_medialorbitofrontal_thicknessstd 0.02 rh_precentral_area
0.51**lh_superiorfrontal_meancurv 0.29* lh_posteriorcingulate_thickness 0.14 rh_lateraloccipital_area meancurv
0.66**lh_frontalpole_meancurv 0.42* rh_isthmuscingulate_thicknessstd 0.22 lh_lingual_thicknessstd 0.35* rh_superiorparietal_thicknessstd 0.13 rh_paracentral_area
0.52**rh_superiorfrontal_meancurv 0.07 lh_paracentral_thickness 0.04 rh_isthmuscingulate_area 0.05 rh_caudalmiddlefrontal_area gauscurv
0.03 rh_transversetemporal_thickness 0.42* rh_fusiform_thicknessstd
0.63**lh_temporalpole_meancurv 0.33* lh_lingual_thickness 0.32* lh_inferiortemporal_meancurv 0.19 lh_frontalpole_thickness 0.04 rh_insula_area 0.02 rh_precuneus_area
0.07 rh_insula_thicknessstd 0.37* lh_caudalmiddlefrontal_thicknessstd Group Difference
0.88**lh_inferiorparietal_meancurv 0.46* lh_precuneus_thicknessstd 0.46**lh_superiortemporal_meancurv 0.31* lh_pericalcarine_thicknessstd 0.14 lh_rostralanteriorcingulate_area 0.14 rh_frontalpole_area
0.19 rh_entorhinal_thickness 0.31* lh_postcentral_thickness a TDC > ASD
0.36* rh_parsorbitalis_meancurv 0.18 rh_posteriorcingulate_thicknessstd 0.19 rh_paracentral_area 0.09 rh_bankssts_area
0.63**rh_parsopercularis_meancurv 0.37* lh_posteriorcingulate_thicknessstd 0.18 rh_posteriorcingulate_thicknessstd 0.4* rh_precuneus_thicknessstd 0.25**rh_lingual_thicknessstd
0.39**lh_parsopercularis_meancurv 0.26 lh_transversetemporal_thicknessstd 0.17 lh_isthmuscingulate_area a ASD > TDC
0.75**lh_fusiform_meancurv 0.34* lh_isthmuscingulate_thicknessstd 0.2 lh_precentral_thickness 0.16 rh_parahippocampal_thickness 0.14 lh_inferiortemporal_thicknessstd
0.46**lh_lateralorbitofrontal_meancurv 0.26 rh_medialorbitofrontal_thicknessstd 0.17 lh_medialorbitofrontal_thicknessstd
0.11 lh_transversetemporal_thickness 0.38* lh_lateralorbitofrontal_thicknessstd 0.2* lh_isthmuscingulate_thicknessstd
0.09 rh_caudalmiddlefrontal_gauscurv 0.61**lh_temporalpole_thickness 0.18 lh_lateralorbitofrontal_gauscurv 0.19 rh_bankssts_thickness 0.43**lh_inferiortemporal_thickness
0.16 rh_isthmuscingulate_thickness 0.27 lh_middletemporal_thicknessstd 0.19* rh_posteriorcingulate_thicknessstd
0.3 lh_supramarginal_gauscurv 0.33* rh_superiorparietal_thicknessstd 0.28* lh_frontalpole_gauscurv 0.07 rh_postcentral_thicknessstd 0.4** lh_superiortemporal_thickness 0.17* lh_medialorbitofrontal_thicknessstd
0.09 lh_parsopercularis_foldind 0.29* lh_temporalpole_thicknessstd
0.47* lh_superiortemporal_gauscurv 0.41* rh_parahippocampal_thickness 0.01 rh_medialorbitofrontal_gauscurv 0.27* lh_isthmuscingulate_thickness 0.4** lh_lingual_thicknessstd 0.18* rh_transversetemporal_thicknessstd
0.13 lh_supramarginal_foldind 0.3* rh_precentral_thicknessstd
0.02 rh_lingual_gauscurv 0.02 rh_cuneus_thicknessstd 0.26* rh_lingual_thicknessstd 0.24**lh_lingual_thicknessstd
0.24 rh_parsopercularis_gauscurv 0.32* rh_insula_thicknessstd 0.07 rh_precuneus_foldind 0.33* rh_medialorbitofrontal_thicknessstd 0.14 lh_posteriorcingulate_thicknessstd
0.23 rh_middletemporal_foldind 0.33* rh_superiorfrontal_thicknessstd 0.35* lh_fusiform_gauscurv 0.02 rh_superiorparietal_thicknessstd 0.35* lh_isthmuscingulate_thicknessstd
0.68**rh_lingual_gauscurv 0.46* lh_inferiorparietal_thicknessstd 0.05 rh_parsorbitalis_thickness
0.01 rh_pericalcarine_foldind 0.27 lh_inferiorparietal_thicknessstd 0.25 rh_parsopercularis_gauscurv 0.22 rh_lingual_thickness 0.19 rh_lateraloccipital_thickness
0.16* lh_pericalcarine_thickness
0.07 lh_precentral_gauscurv 0.5** rh_lingual_thicknessstd 0.24* rh_fusiform_foldind 0.15 rh_lingual_thicknessstd 0.48**rh_paracentral_gauscurv 0.19 lh_temporalpole_thicknessstd 0.28* rh_bankssts_thicknessstd 0.12 lh_frontalpole_thickness
0.65**rh_pericalcarine_gauscurv 0.45* lh_precentral_thicknessstd 0.17 lh_lateralorbitofrontal_foldind 0.29 lh_precentral_thicknessstd 0.23 rh_insula_gauscurv 0.23 lh_medialorbitofrontal_thickness 0.26* lh_lingual_thickness 0.09 rh_transversetemporal_thickness
0.18 lh_frontalpole_foldind 0.31* lh_superiorfrontal_thicknessstd 0.09 rh_frontalpole_gauscurv 0.2 rh_superiortemporal_thicknessstd 0.17 lh_inferiortemporal_thicknessstd 0.16* lh_lingual_thickness
0.25 rh_inferiorparietal_gauscurv 0.36* rh_inferiortemporal_thickness 0.17* rh_middletemporal_thicknessstd
0.21 rh_rostralmiddlefrontal_foldind 0.25 rh_pericalcarine_thickness 0.38**rh_parsorbitalis_gauscurv 0.23 lh_lateraloccipital_thickness 0.05 rh_parsorbitalis_thickness
0.26 rh_frontalpole_gauscurv 0.51**lh_superiortemporal_thicknessstd 0.46**lh_lateraloccipital_gauscurv 0.12 lh_parahippocampal_thicknessstd 0.24 rh_caudalanteriorcingulate_thicknessstd 0.16* rh_entorhinal_thickness
0.08 rh_paracentral_foldind 0.14 rh_posteriorcingulate_thicknessstd 0.21**lh_bankssts_thicknessstd
0.53**rh_superiorfrontal_thicknessstd 0.25* rh_superiorparietal_foldind 0.06 rh_caudalanteriorcingulate_thickness 0.22 lh_parsopercularis_gauscurv 0.23 rh_lateraloccipital_thickness 0.33* lh_transversetemporal_thickness
0.08 lh_lateraloccipital_thicknessstd
0.52**lh_middletemporal_thickness 0.14 lh_paracentral_foldind 0.36* rh_middletemporal_thicknessstd 0.25 rh_caudalmiddlefrontal_gauscurv 0.17 lh_paracentral_thicknessstd 0.18 rh_posteriorcingulate_thicknessstd 0.02 rh_insula_thicknessstd
0.02 rh_insula_foldind 0.22 rh_postcentral_thickness 0.25 lh_parahippocampal_gauscurv 0.19 rh_middletemporal_thicknessstd 0.27* rh_bankssts_thickness 0.11 rh_lateraloccipital_thickness
0.46* lh_pericalcarine_thickness
0.39**lh_lateralorbitofrontal_meancurv 0.3* rh_inferiortemporal_foldind 0.18 rh_superiorfrontal_gauscurv 0.21 rh_fusiform_foldind 0.12 lh_parsorbitalis_thicknessstd 0.11 lh_parahippocampal_thickness
0.4* lh_medialorbitofrontal_thicknessstd 0.4** lh_inferiorparietal_gauscurv 0.13 rh_insula_foldind 0.04 lh_cuneus_foldind 0.13 rh_isthmuscingulate_thicknessstd
0.38**lh_parsopercularis_meancurv 0.01 lh_parsorbitalis_foldind
0.67**lh_middletemporal_meancurv 0.35* lh_parstriangularis_gauscurv 0.21 lh_parsorbitalis_foldind 0.19 lh_rostralanteriorcingulate_foldind 0.16* lh_middletemporal_thicknessstd
0.18 rh_frontalpole_meancurv 0.12 rh_superiortemporal_foldind 0.02 rh_isthmuscingulate_thickness
0.57**lh_frontalpole_meancurv 0.3* rh_rostralanteriorcingulate_meancurv0.26 lh_posteriorcingulate_foldind 0.16 lh_posteriorcingulate_gauscurv 0.19 rh_supramarginal_foldind 0.15 rh_bankssts_foldind
0.11 rh_superiortemporal_thicknessstd
0.15 lh_bankssts_meancurv 0.35* lh_caudalanteriorcingulate_foldind 0.28* rh_postcentral_gauscurv 0.04 rh_middletemporal_foldind 0.14 rh_parsorbitalis_foldind 0.09 rh_entorhinal_thicknessstd
0.63**lh_superiortemporal_meancurv
0.22 lh_frontalpole_gauscurv 0.18 lh_lateralorbitofrontal_foldind 0.07 lh_supramarginal_gauscurv 0.2 rh_parstriangularis_foldind 0.26* rh_superiortemporal_foldind 0.13 rh_paracentral_thickness
0.43* lh_inferiortemporal_meancurv 0.19 lh_postcentral_foldind 0.06 rh_caudalmiddlefrontal_foldind 0.15* rh_caudalanteriorcingulate_thicknessstd
0.06 lh_fusiform_gauscurv 0.15 lh_supramarginal_foldind
0.45* lh_parstriangularis_meancurv 0.19 lh_lateralorbitofrontal_gauscurv 0.17 rh_superiortemporal_meancurv 0.1 lh_superiortemporal_foldind 0.38**lh_pericalcarine_meancurv 0.08 rh_insula_foldind
0.22 lh_bankssts_meancurv 0.38**rh_pericalcarine_meancurv 0.13 lh_supramarginal_foldind
0.44* lh_bankssts_meancurv 0.03 lh_posteriorcingulate_gauscurv 0.13 lh_pericalcarine_meancurv
0.21**rh_fusiform_foldind
0.04 rh_pericalcarine_gauscurv 0.25 lh_bankssts_meancurv 0.2 lh_lateralorbitofrontal_meancurv 0.35* rh_supramarginal_meancurv 0.1 lh_inferiortemporal_foldind
0.05 lh_frontalpole_gauscurv
0.22 lh_rostralanteriorcingulate_gauscurv0.35* lh_frontalpole_meancurv 0.11 rh_parahippocampal_meancurv 0.32* rh_pericalcarine_gauscurv 0.09 lh_frontalpole_foldind
0.23 lh_superiortemporal_gauscurv 0.17 lh_entorhinal_gauscurv 0.31* lh_pericalcarine_gauscurv
0.24* lh_entorhinal_gauscurv 0.09 rh_isthmuscingulate_meancurv 0.06 rh_superiortemporal_foldind
0.01 lh_supramarginal_gauscurv 0.32* lh_frontalpole_gauscurv 0.01 lh_bankssts_gauscurv 0.29* lh_rostralmiddlefrontal_gauscurv 0.22**lh_frontalpole_meancurv
0.41* rh_pericalcarine_gauscurv 0.02 rh_lateralorbitofrontal_gauscurv 0.1 lh_lateralorbitofrontal_gauscurv 0.03 lh_precuneus_gauscurv 0.14 rh_frontalpole_meancurv
0.06 rh_caudalmiddlefrontal_gauscurv 0.12 rh_pericalcarine_meancurv
0.4* rh_superiortemporal_gauscurv 0.11 lh_pericalcarine_gauscurv
0.19* lh_bankssts_meancurv
0.17 lh_rostralmiddlefrontal_gauscurv 0.25 rh_insula_gauscurv 0.13 lh_frontalpole_gauscurv
0.37* rh_lingual_gauscurv 0.06 lh_frontalpole_gauscurv
0.15 lh_parstriangularis_gauscurv 0.04 rh_frontalpole_gauscurv
0.46* rh_middletemporal_gauscurv 0.41* rh_paracentral_gauscurv 0.03 rh_precentral_gauscurv 0.02 rh_middletemporal_gauscurv
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100
Feature Importance(%) Feature Importance(%) Feature Importance(%) Feature Importance(%)

Figure 8.5: Random Forest: Important features for classification by in sub-groups (25% threshold).
Important features for autism spectrum disorder (ASD) vs. typically developing controls (TDC) classification
in each sub-group are presented. The features required to have 25 % of the total feature importance scores
across all the features were considered as important. Each feature is represented by a colored bar; the length
of the bar represents the relative % importance for classification with respect to the top feature. The features
have been grouped and color-coded by volume, area, thickness mean, thickness standard deviation, folding
index, mean curvature and Gaussian curvature. Before each feature, Cohen’s d and two sample t-test
significance (P<0.005** and P<0.05*) of ASD vs. TDC group difference are presented. The important
features for classification varied across the sub-groups demonstrating the heterogeneity in ASD brain
morphometry. Important features are similar to that from random forest presented in Fig 2 where top 10
features are presented. The important features were dissimilar across the sub-groups.

158
Figure 8.6: Classification performance degrades with Autism Severity (AS).
Separate classification models were trained for autism spectrum disorder (ASD) subjects with different AS
values. A point represents the mean and an error bar represents the one standard deviation of the AUC
scores from 10 test folds. AUC scores and number of ASD and TDC subjects are presented below the error
bar. Blue line represents the mean AUC vs. mean AS linear model and the shaded region represents the
95% confidence interval of the model. Classification performance decreased with AS according to both
random forest and gradient boosting machine classification techniques.

159
Table 8.1: Correlation between the feature importance scores from RF and GBM
Correlation
Sub-groups
Coefficient
mild 0.76
AS moderate 0.77
high 086
low 0.87
VIQ normal 0.90
high 0.88
young 0.86
Age
mid 0.94
old 0.88
All coefficients were statistically significant (p < E-16)

160
9 Appendix C

15

10

count
5

0
−1.0 −0.5 0.0 0.5 1.0
Effect Size (Cohen's d)

Figure 9.1: Histogram of the effect sizes of the statistically significant (at 0.05) differences in
brain features

161

You might also like