You are on page 1of 20

MSC IN NEUROIMAGING STATE OF THE ART ESSAY

Multivoxel Pattern Analysis (MVPA) in fMRI


settings : Fundamentals & Case of study
[Escriba el subttulo del documento]

Mario B.Prez
12/12/2013






Multi-voxel Pattern Analysis (MVPA) in fMRI settings:
Fundamentals & Case of study.

by Mario B. Perez


Introduction


The rise of MVPA as analysis technique for fMRI BOLD data is yet to come. Authors like
Haxby (2012) have pointed out the initial complexity and uniqueness of the MVPA perspective
upon brain response, and the slow adaptation of the researcher community to this new way of
thinking which involves knowledge from machine learning methods.
Unlike univariate techniques, that normally address where a cognitive process is
localized, MVPA (which has many synonyms; mutivariate pattern analysis, information-pattern
analysis...) can give an additional answer on how it is coded. An additionally interesting feature
is its ability to clarify the common situations in which two different processes overlap in their use
of brain areas, sharing the same resources for divergent purposes (Peelen and Downing, 2006).
The main aspect that makes MVPA a qualitative jump in fMRI BOLD processing is that it
accounts for the interactions between individual voxels and, as its name announces, detects
and refines this interactions in patterns of activation. This activation patterns can be aroused
due to any given process in the brain, and can then labeled and recognised when they will
appear again (Tong and Pratte, 2012). Although as we will see there are many possible flaws in
this process, with this basis many impressive and eccentric applications have flourished
gradually. Since the famous brain reading or brain decoding (Reddy et al,2010), to lying
detection (Davatzikos et al., 2005) or even natural scenes (Nishimoto et al.,2011).
Given the fact that the usage of MVPA normally entails the selection of regions of
interest (ROIs), the visual system has been the main target of studies undertook up to date, due
to its relatively well-known functional structure.
While reviewing the literature about this topic, the pioneering work of Haxby (2001) upon
visual category recognition and Kamitani and Tongs (2005) prediction study upon of grating
orientation are quoted very often, and are considered as responsibles for the spreading of
MVPA. The remarkable work of Kay et al., (2008) on image identification is also influential.
Although it has been applied to other sources of data like EEG (Rosenberg et al., 2012)
MVPA has been mainly applied to data from fMRI signal. Its novelty and distinction from other
conventional ways of analysis, together with the promising perspectives of its use and striking
preliminar applications make of MVPA the main topic of this essay. In the following sections we
try to set the basis that underlies MVPA, take a glance to the most common flaws while using it,
and finally reviewing a case of study in which this methodology was used.



Characterizing Multi-voxel Pattern Analysis


Like with everything else, probably the best way of introducing a new technique is to
compare it with the preexisting ones. Traditionally, when a researcher wants to know which
areas are involved in a particular task or setting, the analysis of the fMRI data considers each
voxel individually, although the signal is acquired at once from the whole brain (Haynes and
Rees, 2006). Thus, the activity course of each voxel is conceptualized as unrelated from others,
and its analysis is carried out without considering the possibility that other active or inactive
voxels may be relevant in it. Other voxels behaviour may provide a sense for that voxel
activation, and therefore contribute to draw the whole picture (Haxby, 2012). Although praised
and recognised as extremely useful (Norman et al., 2006) This mass univariate analysis has
shown its limitations (OToole et al., 2007) as there are limits to examine voxels in isolation.
By contrast, MVPA performs a multivariate analysis that takes into account those
relations and differences of activity between voxels that arise from complex stimulation settings
Then, MVPA is not exclusively aimed to determine which voxels are active, but how the
activation of different voxels is related, the so-called activity patterns. These activity patterns
portray valuable information about how an stimuli or percept is coded, and provide to fMRI
analysis an enhanced sensitivity to cognitive processes (Tong and Pratte, 2012). MVPA is
dedicated to this pattern-recognition activity.
This enhanced sensitivity is extracted by avoiding certain steps that take part in
conventional fMRI analysis settings (e.g block designs), such as spatial smoothing and
averaging to intensify the differences between experimental conditions (Norman et al., 2006)
Standard studies try to show that the average activity during one condition is significantly
different than other condition in all time points, and therefore the information about the brain
activity in a specific time point is lost (Haynes and Rees, 2006). This is specially relevant in
experiment that use complex stimulation , such as natural scenes, because averaging discards
the fine-grained activity that among with a certain amount of noise might carry valuable
information about how the stimuli is processed (Speirs and Maguire, 2007).Other processing
elements such spatial smoothing are also responsible of this blurring effect making fine-
grained activity unavailable (Mur et al.,2009). However, as Etzel et al. (2009) point out, spatial
smoothing can be useful in between-subjects analysis, where MVPA has shown difficulties due
to the great degree of specificity of the within-subject signal (Cox and Savoy,2003), that entails
generalization issues.
Haynes and Rees (2006) argued that this signal loss due to MVPA-unfriendly processing
steps could account for important features, because voxels that show a weak or inconsistent
response might do carry vital information when analyzed together and not separately. This
exposes an interesting idea about how some cognitive processes could work, the weak choral
activation of many voxels might be potentially as useful as a strong (significant) activation of a
individual voxel.



As an example of this Kamitani and Tong (2005), in their famous study upon orientation
detection, have revealed that many voxels show patterns of weak activation in a consistent
basis across repetitions of the same condition, proportioning a reasonable basis to this
argument. More importantly, the remarkable work about category recognition carried out by
Haxby (2001) showed that weaker or submaximal voxels are representative of each category
(in this case shoes or bottles). More importantly, this study showed that even if voxels the
highest consistent activation are removed, the categorial fingerprint can be identified above
chance. This accuracy in recognition when category-key voxels were put away implies that
areas showing lower levels of activation can be used to discriminate between two categories.
Because these categories share some high-activity areas, overlapping of two functions could be
resolved by selecting low level features (Peelen and Downing, 2006). Downing et al. (2005) also
provide a relation of overlapping category areas in which MVPA could be useful. Although
distinction between two activity patterns based on activation intensity is possible (Hanson et al.,
2004) research on overlapping issues needs to carry on.
MVPA has been also described as a major advance in information extraction from the
fMRI signal (Norman et al., 2006) and a necessary tool to avoid data wasting from neuroimaging
data, which is normally expensive and difficult to register (O'Toole, 2007). Thus, as we said
previously pattern analysis does not use processing steps that rule out potentially crucial
information.Instead of using those strategies, MVPA tries to make the most of that fine-
grained
1
activity by defining what is the activation pattern of a voxel ensemble in a given
example . Examples are presentations of our stimuli that will provide activation patterns to our
classifier algorithm. Once our classifier has been trained with several examples, it will be
theoretically ready to recognise which example has been presented to him. In a sense, the
classifier holds a weighted model of the activation pattern characteristic of an object category
for example.
All this information might be a little unclear while compressed in such a brief proceeding
description, but in the next sections we will address what is a classifier and how the process
takes place.


MVPA basic procedure

The graph below summarizes the procedure for carrying out an MVPA experiment.
There a few remarks that have to be addressed before getting into detail, specifically and
foremost about training data, testing data and feature selection. Preprocessing steps, as well as
scanning details are not taken into account in this essay
2
.

1
*The exploitation of these fine-grained activity represents one of the central features of MVPA, as low
level activity or activity that does not achieve significance might be lost if this fine-grained activity would
be disregarded.
2
Brief note on data preprocessing: As Etzel et al. (2009) indicate, many steps that are used in
univariate analysis that take part as well in MVPA. Correction for motion and normalization are typically
used, while voxel-wise detrending (to correct scanner drifting for example) might be controversial due to
the delicate nature of the data needed for MVPA. The will find a great review about how to undertake a
classification analysis upon fMRI data at the quoted article.
The overall majority of reviews and articles consulted on technical aspects of MVPA
make particularly clear the necessity of splitting training data and testing data as the first step to
be made (Pereira et al., 2009; Kriegeskorte et al., 2009; O'Toole et al., 2007; Mur et al.,2009).
Also important is to not to use the testing data as part of the feature selection. The reasons that
explain these precautions will be addressed later on, however, it must be clarified as the
illustration fails to reflect this particular aspect evident.


3



3
Illustration from Norman et al., 2006. It includes a fourth step (b or pattern assembly) that is rarely
mentioned. It does not appear in other reviews as Mur et al.,2009, and involves the labelling of activity
patterns. Since patterns are caused by discrete stimuli and we have set that stimuli, there is no need to
label the patterns. It may be pertinent in exploratory analysis, where the source of the pattern may be
unknown. Believe it or not, this is the best illustration of the process available and its repeatedly used.
Overview

The process outlined in the graph above includes three main steps that will be described
next (remember that data splitting has already took part):

1. Feature selection: (or voxel selection): This first step tries to delimitate a set of
voxels that will be used further. It can be done following different techniques. In
this case , bottles and shoes were showed to select the pertinent voxels

2. Classifier training: In these stage the training data is used to train our classifier
algorithm, so it will establish a successful function between our examples (or
stimuli) and its characteristic activity pattern. Once training has finished the
classifier generates a decision plane.

3. Classifier or generalization testing: The classifier is exposed to new data (testing
data set) that belongs to the same category. In our example, it will be images of
shoes and bottles not presented previously. The activation patterns will be
submitted through the classifier that will assign them a position on the decision
plane. Based on where they fall and the identity of the example, the
classification was successful or not.

Feature Selection

Also called voxel selection, this is a capital step in MVPA, because it will define the
framework and extent of the analysis. It is as well one of the steps that portrays many pitfalls, as
we will see in the section for that purpose.

First of all, Why is feature selection necessary?

Many articles disregard this fundamental question that arises easily. The mere presence
of a voxel selection appears looks to contradict the foundation of MVPA. If taking into account
the interactions between voxels is the goal to achieve, to narrow down the amount of voxels that
we are going to account for in our analysis seems nonsense. Nonetheless, as Cox and Savoy
(2003) point out, many classifiers experience an inherent loss of accuracy when the number of
voxels included into analysis is very high. While MVPAs power mainly resides in taking into
account voxels which activity is not necessarily significant, adding irrelevant voxels whose
activity mainly reflects noise or is very low affects significantly the performance of the classifier.
In spite of all, methods and applications that allow the usage of whole-brain activity have been
described at Tong and Pratte (2012).
Normally, these whole-brain studies deal with high-level cognition processes, which are
not easy to narrow down to a specific set of ROIs. Therefore, these researchers use
independent component analysis (ICA) or Principal Component Analysis (PCA) to narrow down
the number of dimensions.
In a simple way, is a method that allows reducing the number of variables to take care of
by grouping them around linear solutions, that are unrelated as much as is possible.

So,given the necessity to do so, feature selection will provide us with the voxels we are
going to include in our analysis. First of all, it is very common to select a region of interest (ROI)
relevant for our study in which our feature selection is going to take place (Haxby et al., 2001;
Mur et al., 2009; Chadwick et al.,2010). Following Pereira et al.(2008), we can distinguish
between filtering and wrapper feature selection methods. Wrapper methods carry out operations
adding and subtracting voxels taking in consideration the impact they have in the classificators
performance, however, these methods involve some combinatorial issues that make computing
complicated (Norman et al., 2006) and filtering methods are normally preferred. Filtering
involves creating a voxel ranking based on a specific criteria. We can then rank voxels based on
how active are they are, how high is their discrimination power between conditions, their
prediction accuracy, the consistency of their activation and so on.
It is important nonetheless to realize that by doing filtering, we are considering voxels as
separate identities again (so we are performing univariate analysis) . A popular option is to
focus on voxels which show maximum activity and hold a good discriminant power (Polyn et. al,
2005, as quoted in Norman et. al, 2006). With certain classifiers, a multivariate feature selection
called Searchlight accuracy can be used (Kriegeskorte et al, 2006). This method tries to add
the information from the voxels environment (neighbouring voxels) defining a spherical cluster
which is a ball of voxels of x radius. The testing data is used repeatedly so the useful voxels
can be detected within the radius of the spherical cluster


Classifier Training

This step involves to use those trials that we saved for training to supply examples of the
activation patterns (characteristic of our experimental conditions) to a multivariate algorithm.
This algorithm will learn the statistically representative features of our stimuli, and will generate
a decision function (1c in the previous graph) that will be used to make a call each time a stimuli
will be presented in our testing phase.
The algorithm choice is one of the decisive steps when it comes to MVPA..There is a
great variety of classifiers available and a not-so-exhaustive discussion about them will take all
the length of this essay. However, we cannot pursue with almost a slight discussion of this point.

What is a classifier algorithm?

The easiest way to introduce this notion is to describe the task that classifiers perform.
Classifiers have to identify the relationship between voxels activity and the stimulus
appearance, and being able to recognize that relationship with unpresented stimuli of the same
category in the future.


Thus, classifiers obtain a parametric profile of the activity pattern elicited by the stimulus
or example. This parameters are acquired during the training phase with the data reserved for
that purpose. When the training is finished, the classifier is supposed to be able to give us a
prediction (or identify, discriminate; that lies on the researcher's assumption
4
) of which stimulus
has been presented to a given subject. This prediction must be based on a different set of
examples than the one used for training, if otherwise, there would exist a problem of overfitting
(see section of limitations).

Classifiers differ in the type of function they learn (Pereira et. al, 2008). Primarily,
algorithms can be divided between linear and non-linear ones. The overall majority of MVPA
studies have used linear classifiers due to their success according to Mur et. al (2009).
Additionally, non-linear classifiers have not consistently demonstrated a superior performance in
any case to date according to Mur et al (2009) while the same authors consider the solutions
offered by these classifiers as difficult to interpret. Sheng (2011) suggests that one of the key of
linear classifiers is their simplicity and their ability to balance the influence of specific voxels
between examples or stimulus. All linear classifiers will elaborate a weighted model, that will
reflect the importance of the different voxel activity values . In the illustration below each voxel
(represented by x) has assigned a specific weight (w). In a hypothetical situation, category
shoe could be defined by xw>0 and class bottle could be defined by xw<0.
5


Aspects to consider when choosing for a classifier are the number of features or voxels
and the number of categories, among other factors not described here. Popular linear classifiers
are GNB, LR or SVM. Compared between them, GNB tends to have a poor performance in
settings with many voxels and LR has better results in situations with more than two conditions
or categories than SVM (Pereira et al., 2008). SVM requires of additional modifications for
working with more than two categories (our case of study uses on of these classifiers).

Decision-making threshold

Taken to the simplest situation in which we have two features or voxels, we can observe
how the decision-making process will take part.
These voxels (there is a slightly similar example at Mur et al., 2009) could work as
coordinates to define a point in a plane, and we would only need to build up a line to separate
our conditions.

4
Note that despite left to the researchers intention, identifying and discriminating are not the same thing
whatsoever. Identifying means positively specify to which category an activity pattern belongs to without
having presented it before to the classifier (Kay et al., 2008), while discriminating could mean just
distinguish between two examples (Chadwick et al., 2010)
5
Illustration from Pereira et al.,(2008)
This line will be constructed with the feedback that the classifier is provided with as
being trained with training data examples. Thus, given an example during the testing phase , the
xw model will be submitted to the decision function that has already been constructed
throughout the training phase ,and that will work as our linear threshold to determine which
category has been presented.

6




The illustration above give us a chance to explain part of the of the potentiality of MVPA.
In the first situation (a) , we see how the two distributions are completely segregated in a rather
simple way, when voxel X1 (lets say blue) and voxel X2(let's say red) have opposed
activations. When condition A is presented, voxel X1 shows activation while X2 is inactive.
In this situation the usage of univariate analysis would yield optimal results. There is no overlap
between conditions.
However, the situation at the right displays a more complex situation (b). It can be
approached nonetheless by using MVPA with a linear classifier, that by assigning weights to
each voxel will be able to code the influence of them. Then, given an specific point on the plane
the decision threshold will allow us to determine what condition or example was more probable
to have occurred.
The last of the three situations in our illustration (c.) will be tackled only with the help of a
non-linear classifier. The idea is the same as with linear ones, but in this case the decision
threshold is more complex.

Although non-linear classifiers might be more powerful, most of the texts are not very
enthusiastic about their utilization in one way or another (Kamitani and Tong, 2005, Pereira et.
al, 2008, O'Toole et. al, 2007, Norman et al., 2006 and others).

6
illustration from Cox and Savoy (2003)
As we said, it is considered that this methods yield results very difficult to interpret, and
that the gain in performance due to usage of non-linear classifiers is unclear.


Classification by Nearest-neighbour

This method is one of the simplest, as it does not imply the learning of a function
properly speaking. The example presented is compared to the ones already seen in the training
stage, so a decision is made based on the likeness between the training and the testing
examples. There are ways that can improve the performance of nearest-neighbour by averaging
the pattern left by the testing examples, but again this will remove variability that might be
valuable for making a decision. According to Pereira (2008), nearest neighbour works well as
long as the number of voxels remains relatively low. This classification system was used in
Haxby (2001).


Generalization Testing

Up to this point, the last step is just to test the classifier by exposing it to new,
unpresented data. The comparison between the presentation template and the predictions
yielded by our classifier will yield an accuracy percentage.The classifier has therefore made a
judgement in each case saying which of the conditions has been presented. If it achieves values
beyond chance, training has been successful.

Limitations and Pitfalls using MVPA

Like every method, MVPA has several weaknesses, some of which are more avoidable
than others. Technical limitations due to spatial or temporal resolution are difficult to avoid
(temporal resolution of MVPA is inevitably limited by the dispersion of the hemodynamic
response Norman et al.,2006), whereas others like feature selection or classifier choice are
likely to be controlled with the help of a good decision-making process.

Capacity to deal with overlapping states

For example, as we have seen one of the strengths of MVPA is to disentangle the
activation patterns (spatial patterns) produced by two different stimuli or mental states that have
take part (Cox and Savoy, 2003). By contrast, as Haynes and Rees (2006) point out there is
currently no evidence supporting that MVPA could distinguish between two stimulus that
happen at the same time and whose spatial representation share the same conjunct of neurons.
It can be argued that this limitation might be solved with the appearance of higher spatial
resolution, but as Haxby (2012) states, there is a necessary limit in the number of modules that
only can support one kind of processing.


A limited number of categories for an unlimited world

In a logical extension of this reasoning, Haynes and Rees (2006) claim that while
percepts or stimulation ways are virtually infinite, the number of training categories has to be
obviously limited. Hence, our classifier will be always limited to a certain number of
discriminations. Attempts to work in this issue could came from studies that deal with the
generalization problems of MVPA like the one carried out by Kay et al. (2008). The classification
in this report shows remarkable generalization skills while exposed to numerous unpresented
images reaching high prediction rates based in a training set of 1750 images.


The presence of previous knowledge

Although as we said to obtain whole-brain analysis is possible (Polyn et al.,2005, as
quoted in Haynes and Rees, 2006) it certainly involves many challenges difficult to resolve
(combinatorial limitations, overfitting.). A plausible alternative could be the usage of
searchlight feature analysis, which is supposed to alleviate the potential computational issues
(Tong and Pratte, 2012) . Thus, using MVPA implies to have a reasonable knowledge of the
features to study and almost some guidance to know where to find them. As Pereira et al.
(2008) mention, the definition of ROIs is a common step in the overwhelming majority of MVPA
studies. This particular issue is supposed to have a lower impact in systems which functional
architecture is relatively known (visual system according to Haynes and Rees, 2006) but stands
as a remarkable issue with other cognitive functions whose functioning basis has not been
described properly yet. Feature selection stands as one of the biggest causes of issues in
MVPAs studies. According to Tong and Pratte (2012) studies on higher-level cognition have
difficulties to define a coherent set of regions of interest, and to therefore to target correctly
relevant voxel arrays.

Generalization issues

This a topic related at the same time with the strengths of MVPA. Pattern recognition
involves the exploitation of the so-called fine grained activity (Norman et. al, 2006).
Consequently, response patterns are highly characteristic and difficult to extrapolate to other
subjects. The pattern aroused by the stimulus X in the subject 1 should be, in ideal situation,
the same as the one aroused by the same stimulus in subject 2. Currently and while some
extrapolations has been successful (Haxby 2011 developed hyperalignment which includes
tuning functions , as quoted in Haxby, 2012), this is an unresolved problem.


Studies normally conduct MVPA analysis in a within-individual basis. However,
generalization problems doesnt die there. As Haynes and Rees (2006) indicate, even more
complicated is the generalization across different contexts. That is, in our dummy example,
subjects 1 and 2 receive the same stimulation assuming a similar setting, but what would
happen if the context surrounding that presentations would not be the same?.
While classification accuracy does not drop uncontrollably, a severe worsening even
when the setting is the same but scans are carried out in different days (Cox and Savoy,2003).
Generalization across different stimulus was nonetheless achieved in a working memory study
(Harrison and Tong, 2009; as quoted in Mur et al, 2009) and between subjects in an auditory
perception one (Formisano et al., 2008 as quoted in Tong and Pratte, 2012).Once again the
study of Kay et al., (2008) stands as example, as it demonstrated successful generalization
across time. Haynes and Rees (2006) point out that to improve generalization normally takes is
cut out of the individual discriminatory power.
Finally, an important aspect is to interpret carefully the mere differential activity. Poldrack
(2009) found that when subjects carry out a cognitive task that varies enough almost all cortex
can show discrimination power. Differential activity rates can take part due to a myriad of
reasons like slight differences in memory processing load, difficulty, time to process or language
requirements (Tong and Pratte, 2012).Future directions have a great deal in developing
calibration and adjustment protocols between subjects and situations.

Circular Analysis or Overfitting

The danger of circular analysis is a common cause of concern in MVPA studies.
Reviewing the recent literature, it is evident that most of researchers are aware of this issue in
almost its simpler form, which we explain now.
Most of MVPA studies have as their main goal to train a classifier able to identify a
specific activity pattern and distinguish it from others of the same or other categories (examples
of this are Haxby, 2001;Kay et al., 2008;Chadwick et al.,;2010).To achieve this, we have stated
that separated training and test data have to exist, the first one to train the classifier and the
later to test the classifiers performance in terms of correct guesses proportion. The
independence of these two data sets is understood as crucial, and they must be splitted in two
before proceeding to the feature selection phase, as the first step of all process (Pereira et al.,
2009)
OToole et al., (2009) explains the reason why the training data cannot be used to test
the classifier. In fMRI studies, we have normally a large number of parameters (that are the
voxels we take into our analysis) compared to the number of examples (stimulus presentations)
presentations. Thus, that the voxels contained in our ROIs outnumber largely our number of
examples is relevant because the number of parameters that will characterize each activation
pattern will be enormous, and some of these parameters will contain noise (probable systematic
and unsystematic sources of error).
This overfitting or large parameter characterization leads to a situation in which a
perfect classification for the training test is possible. By contrast, the same classifier will obtain
poor results while tested with new data due to the same reason.
Overfitting leads to lower accuracy on the test set and therefore to lower generalization
skills for the classifier (Mur et al,2009)
Overfitting can lead to classifier overestimation as well if the training data will be used to
assess the classifiers accuracy during the testing stage (Kriegeskorte et al., 2009). In this
situation, the classifier built with many parameters can fit and identify a significant part of the
testing material, regardless of the algorithm skills to classify other stimuli of the same category.
Put simplistically, classification will be successful segregating the stimuli that has already been
presented to him, but there are no guarantees that it will be successful with other stimuli, even if
they are of the same category (Mitchell, 2010).
It has been stated then that overfitting is then related with voxels selection, because the
more the voxels we select in our analysis, the bigger the number of our parameters, and the
higher the chances that low level parameters take a leading part in our classification
performance. There is a second variant of overfitting described by Pereira et al., (2009) that
operates in a more subtle basis. It has been settled that feature selection can be carried out in
different ways, normally involving a set of data that will help us to delimitate our voxel selection,
probably into an beforehand selected ROI. The illustration below (from Kriegeskorte,2009)
shows the prediction results when only the training set or all the data is used to delimitate the
voxels selection (ignore the dark bars labelled as task). When all the data is used,
classification was almost at 100%, whereas using only the training data lowers down the
previous percentage to a rough 75%:
The same table provides the comparison with random data, when all the information is used to
delimitate our voxels selection and when only training data is used .Although still above chance,
the results have changed significantly. The reason for this is that it is a subtle way to let the
classifier to learn about our testing data. Some of the voxels time series seem to be compatible
between the training and the testing sets due to their belonging to a particular common
category. In a way, it can be said that the training and the testing data sets are no longer
independent (Kriegeskorte et al., 2009).























Case of study: Chadwick, M. J., Hassabis, D., Weiskopf, N., & Maguire, E. A.
(2010). Decoding individual episodic memory traces in the human hippocampus.Current
Biology, 20(6), 544-547.


Overview

The following is just a little summary which intention is to orientate the reader. The article
itself is easy to find and I encourage its reading (including the Supplementary Information). To
read this article may be useful to understand the following comments and remarks, as space
limitations make impossible a full detailed description that could have been biased anyway.

The theoretical background of this article lies on the principles about how the
hippocampus (HC) stores representations of episodic memories. More specifically, the authors
mention that the HC is supposed to store an index (Marr et al., 1971 as quoted) of the episodic
memory, that would contain the guidelines to reconstruct a complex, multimodal memory.
Across the article and by the claims made, it is evident that the authors suggest an identification
between the activity pattern aroused when the memory is recalled and the index previously
mentioned.
The experimental task consisted primarily in to determinate if the representations (that
is, activation patterns) of different memories from three different videos can be guessed and
therefore distinguished with a trained classifier.The videos show a different woman each one,
performing basic actions (post a letter, throw a can to a trash bin) and were viewed 15 times
by each subject prior to scanning.

Basically, the subjects performed three recalling tasks with measurements taken from
three regions of interest (ROIs), HC, entorhinal cortex (EC), and parahippocampal gyrus
(PHG).In the first modality, they were instructed which of the three memories they were
supposed to recall, while in the second the recall was free. Subjects were instructed to
randomize between the three memories.
In both conditions the multivariate analysis yielded significant decoding rates, with no
statistical differences between conditions, and eventually the data from both conditions was
collapsed for further analysis. The illustration below (taken from Chadwick et al.,2010)
summarizes the decoding rates for each area [hippocampus accuracy of 44% (p= 0.000001;
chance level = 33%), mean entorhinal cortex accuracy of 38.5% (p = 0.009), and mean
parahippocampal gyrus accuracy of 41% (p = 0.0004)]




Multivariate Classification Procedure


The above graph summarizes the multivariate classification procedure. In the illustration
only two representations videos are shown for the sake of simplicity as the authors said.
(taken from Chadwick et al., 2010, Supplementary Information). The image A shows two
image captures from two of the videos (each video of 7 sec.), B shows an stimuli template. It is
important to bear in mind that the stimuli are not the videos themselves, but the recall of them.
Thus, ABBBAA.. implies, video A evocation, video B evocation, and so on. C describes the
process of feature selection using the searchlight multivariate method for each ROI (it was
described in feature selection methods of this essay). The data was splitted between training set
and testing set, and only the testing set was used for voxel selection. They used a k-fold cross-
validation strategy, which involves the selection of new features by searchlight feature selection
each time (in each fold one example is left for testing and the rest are used), as authors say,
with different training data.D once the voxel selection is completed, the linear SVM classifier is
feeded with the examples to afterwards be tested using the example saved for testing purposes.
Finally, in E the test data (all the examples are used as test data almost once according to the
k-feature testing regime) which will be used to determine the classificators accuracy.
Predictions are then compared with the real video presentation to establish the accuracy
percentages.


Criticism and Remarks

If I may, let me first point out that as far as I know this is the first critic that is done upon
this article. I wanted to try to do my best for applying the knowledge that I have obtained while
writing this essay, that I have enjoyed doing as much as I have suffered. Thus, the attempt is to
go a bit further than the mere introduction of the technique, but also to take a glance of how its
applied out of the strictly machine-learning environment. Finally, I shall apologize for the
reckless criticisms that I might be doing.


Multivariate Classification Procedure & Claims of functional differentiation

This experiment is particularly difficult to picture and it is important to keep a set of
assumptions in mind. Firstly, examples used to test and train the classifier are activity patterns
which come from the recalling of the three videos.
Therefore, when the data set was splitted in testing and training data in each k-fold , it
means that activity patterns from the three videos were separated in testing data (only one
activity pattern) and training data (all the rest). While all of the activity patterns can be regarded
as different because are evoked in different times and due to the reconstructional nature that
characterizes memory recall. This characteristic of the memory recall makes the training data
(that in the illustration above showed only 9 examples and we could therefore think about 14
with three videos) clearly insufficient due to the highly similarity between videos.


The limited number of examples per stimuli together with the similarity between them
probably made the linear classifier to acquire a few amount of overlapping strong parameters,
while adding a great number of voxels whose activity was barely consistent each time. That
barely consistent activity contained the fine aspects that could have distinguish between the
three stimuli, and that may have helped to raise the classification accuracy rating by periodically
saturating a series of parameters. It seems illogical why the authors selected three memories
such similar between them if their intention was to distinguish memory traits (all of them with a
different woman that performs a similar action each time and walks away).For all what was
exposed, a singular variety of overfitting took part when accuracy ratings barely surpassed
chance levels.
Possible additional reasons for this relatively poor performance is the unequal number of
examples. When one example is reserved for testing, the other two have one example extra.
While this might look trivial, Pereira et al., (2008) point out that the classifier can tend to prime
the category with more examples, and therefore tend to predict it more frequently.

Our second issue concerns the following statement Our data provide further evidence
for functional differentiation within the medial temporal lobe, in that we show the hippocampus
contains significantly more episodic information than adjacent structures . The authors claim
functional attachments to the classification percentages, that show a slight better performance
in the hippocampal area. There is no doubt that when memories are evoked some neurons in
the hippocampus show activation. Even going further, there is a weak evidence that they can
discriminate between memories when a trained classifier is used, but there is no evidence which
supports that those neurons carry episodic information. This is as an example of reverse
inference (Poldrack, 2006).
Finally, a similar and more suggestive setting could have been the usage of a classifier
to try to distinguish between unpresented memories. Such experiment would have entailed the
memorization of the examples previously to feature selection, as they would be used only for
testing purposes. Feature and training data could be similar to ensure that the voxel selection
will contain pertinent voxels.















References


Chadwick, M. J., Hassabis, D., Weiskopf, N., & Maguire, E. A. (2010). Decoding
individual episodic memory traces in the human hippocampus.Current Biology, 20(6),
544-547.

Cox, D. D., & Savoy, R. L. (2003). Functional magnetic resonance imaging (fMRI)brain
reading: detecting and classifying distributed patterns of fMRI activity in human visual
cortex. Neuroimage, 19(2), 261-270.

Davatzikos, C., Ruparel, K., Fan, Y., Shen, D. G., Acharyya, M., Loughead, J. W., ... &
Langleben, D. D. (2005). Classifying spatial patterns of brain activity with machine
learning methods: application to lie detection. Neuroimage, 28(3), 663-668.


Downing, P. E., Wiggett, A. J., & Peelen, M. V. (2007). Functional magnetic resonance
imaging investigation of overlapping lateral occipitotemporal activations using multi-voxel
pattern analysis. The Journal of neuroscience,27(1), 226-233.

Downing, P. E., Chan, A. Y., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006).
Domain specificity in visual cortex. Cerebral cortex, 16(10), 1453-1461.

Etzel, J. A., Gazzola, V., & Keysers, C. (2009). An introduction to anatomical ROI-based
fMRI classification analysis. Brain Research, 1282, 114-125.

Hanson, S. J., Matsuka, T., & Haxby, J. V. (2004). Combinatorial codes in ventral
temporal lobe for object recognition: Haxby (2001) revisited: is there a face
area?. Neuroimage, 23(1), 156-166.

Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001).
Distributed and overlapping representations of faces and objects in ventral temporal
cortex. Science, 293(5539), 2425-2430

Haxby, J. V. (2012). Multivariate pattern analysis of fMRI: The early
beginnings.NeuroImage, 62(2), 852-855.

Haynes, J. D., & Rees, G. (2006). Decoding mental states from brain activity in humans.
Nature Reviews Neuroscience, 7(7), 523-534.


Kamitani, Y., & Tong, F. (2005). Decoding the visual and subjective contents of the
human brain. Nature neuroscience, 8(5), 679-685.


Kay, K. N., Naselaris, T., Prenger, R. J., & Gallant, J. L. (2008). Identifying natural
images from human brain activity. Nature, 452(7185), 352-355.


Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S., & Baker, C. I. (2009). Circular
analysis in systems neuroscience: the dangers of double dipping.Nature neuroscience,
12(5), 535-540.

Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain
mapping. Proceedings of the National Academy of Sciences of the United States of
America, 103(10), 3863-3868.

Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading:
multi-voxel pattern analysis of fMRI data. Trends in cognitive sciences, 10(9), 424-430.

Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., & Gallant, J. L. (2011).
Reconstructing visual experiences from brain activity evoked by natural movies. Current
Biology, 21(19), 1641-1646.

Mur, M., Bandettini, P. A., & Kriegeskorte, N. (2009). Revealing representational content
with pattern-information fMRIan introductory guide.Social cognitive and affective
neuroscience, 4(1), 101-109

Mitchell, T. M. (2008, January). Computational models of neural representations in the
human brain. In Discovery Science (pp. 26-27). Springer Berlin Heidelberg.

O'Toole, A. J., Jiang, F., Abdi, H., Pnard, N., Dunlop, J. P., & Parent, M. A. (2007).
Theoretical, statistical, and practical perspectives on pattern-based classification
approaches to the analysis of functional neuroimaging data.Journal of cognitive
neuroscience, 19(11), 1735-1752.

Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: a
tutorial overview. Neuroimage, 45(1), S199-S209

Reddy, L., Tsuchiya, N., & Serre, T. (2010). Reading the mind's eye: decoding category
information during mental imagery. Neuroimage, 50(2), 818-825.

Spiers, H. J., & Maguire, E. A. (2007). Decoding human brain activity during real-world
experiences. Trends in cognitive sciences, 11(8), 356-365.

Poldrack, R. A., Halchenko, Y. O., & Hanson, S. J. (2009). Decoding the large-scale
structure of brain function by classifying mental states across individuals. Psychological
Science, 20(11), 1364-1372.

Polyn, S. M., Kragel, J. E., Morton, N. W., McCluey, J. D., & Cohen, Z. D. (2012). The
neural dynamics of task context in free recall. Neuropsychologia,50(4), 447-457.

Rosenberg, M., List, A., Sherman, A., Grabowecky, M., Suzuki, S., & Esterman, M.
(2012). Decoding EEG data reveals dynamic spatiotemporal patterns in perceptual
processing. Journal of Vision, 12(9), 1173-1173.


Sheng, L. I. (2011). Multivariate pattern analysis in functional brain imaging.Acta
Physiologica Sinica, 63(5), 472-476.

Tong, F., & Pratte, M. S. (2012). Decoding patterns of human brain activity.Annual
review of psychology, 63, 483-509.

You might also like