This work describes and analyses step by step of MVPA. It also discusses the current literature about it, as well as future challenges. This essay was developed for a MSc degree qualification in Neuroimaging techniques.
If you want to reproduce any of the materials in it please ask for permission sending an email to christof3.14@gmail.com
Original Title
Multivoxel Pattern Analysis (MVPA) in fMRI settings : Fundamentals & Case of study
This work describes and analyses step by step of MVPA. It also discusses the current literature about it, as well as future challenges. This essay was developed for a MSc degree qualification in Neuroimaging techniques.
If you want to reproduce any of the materials in it please ask for permission sending an email to christof3.14@gmail.com
This work describes and analyses step by step of MVPA. It also discusses the current literature about it, as well as future challenges. This essay was developed for a MSc degree qualification in Neuroimaging techniques.
If you want to reproduce any of the materials in it please ask for permission sending an email to christof3.14@gmail.com
settings : Fundamentals & Case of study [Escriba el subttulo del documento]
Mario B.Prez 12/12/2013
Multi-voxel Pattern Analysis (MVPA) in fMRI settings: Fundamentals & Case of study.
by Mario B. Perez
Introduction
The rise of MVPA as analysis technique for fMRI BOLD data is yet to come. Authors like Haxby (2012) have pointed out the initial complexity and uniqueness of the MVPA perspective upon brain response, and the slow adaptation of the researcher community to this new way of thinking which involves knowledge from machine learning methods. Unlike univariate techniques, that normally address where a cognitive process is localized, MVPA (which has many synonyms; mutivariate pattern analysis, information-pattern analysis...) can give an additional answer on how it is coded. An additionally interesting feature is its ability to clarify the common situations in which two different processes overlap in their use of brain areas, sharing the same resources for divergent purposes (Peelen and Downing, 2006). The main aspect that makes MVPA a qualitative jump in fMRI BOLD processing is that it accounts for the interactions between individual voxels and, as its name announces, detects and refines this interactions in patterns of activation. This activation patterns can be aroused due to any given process in the brain, and can then labeled and recognised when they will appear again (Tong and Pratte, 2012). Although as we will see there are many possible flaws in this process, with this basis many impressive and eccentric applications have flourished gradually. Since the famous brain reading or brain decoding (Reddy et al,2010), to lying detection (Davatzikos et al., 2005) or even natural scenes (Nishimoto et al.,2011). Given the fact that the usage of MVPA normally entails the selection of regions of interest (ROIs), the visual system has been the main target of studies undertook up to date, due to its relatively well-known functional structure. While reviewing the literature about this topic, the pioneering work of Haxby (2001) upon visual category recognition and Kamitani and Tongs (2005) prediction study upon of grating orientation are quoted very often, and are considered as responsibles for the spreading of MVPA. The remarkable work of Kay et al., (2008) on image identification is also influential. Although it has been applied to other sources of data like EEG (Rosenberg et al., 2012) MVPA has been mainly applied to data from fMRI signal. Its novelty and distinction from other conventional ways of analysis, together with the promising perspectives of its use and striking preliminar applications make of MVPA the main topic of this essay. In the following sections we try to set the basis that underlies MVPA, take a glance to the most common flaws while using it, and finally reviewing a case of study in which this methodology was used.
Characterizing Multi-voxel Pattern Analysis
Like with everything else, probably the best way of introducing a new technique is to compare it with the preexisting ones. Traditionally, when a researcher wants to know which areas are involved in a particular task or setting, the analysis of the fMRI data considers each voxel individually, although the signal is acquired at once from the whole brain (Haynes and Rees, 2006). Thus, the activity course of each voxel is conceptualized as unrelated from others, and its analysis is carried out without considering the possibility that other active or inactive voxels may be relevant in it. Other voxels behaviour may provide a sense for that voxel activation, and therefore contribute to draw the whole picture (Haxby, 2012). Although praised and recognised as extremely useful (Norman et al., 2006) This mass univariate analysis has shown its limitations (OToole et al., 2007) as there are limits to examine voxels in isolation. By contrast, MVPA performs a multivariate analysis that takes into account those relations and differences of activity between voxels that arise from complex stimulation settings Then, MVPA is not exclusively aimed to determine which voxels are active, but how the activation of different voxels is related, the so-called activity patterns. These activity patterns portray valuable information about how an stimuli or percept is coded, and provide to fMRI analysis an enhanced sensitivity to cognitive processes (Tong and Pratte, 2012). MVPA is dedicated to this pattern-recognition activity. This enhanced sensitivity is extracted by avoiding certain steps that take part in conventional fMRI analysis settings (e.g block designs), such as spatial smoothing and averaging to intensify the differences between experimental conditions (Norman et al., 2006) Standard studies try to show that the average activity during one condition is significantly different than other condition in all time points, and therefore the information about the brain activity in a specific time point is lost (Haynes and Rees, 2006). This is specially relevant in experiment that use complex stimulation , such as natural scenes, because averaging discards the fine-grained activity that among with a certain amount of noise might carry valuable information about how the stimuli is processed (Speirs and Maguire, 2007).Other processing elements such spatial smoothing are also responsible of this blurring effect making fine- grained activity unavailable (Mur et al.,2009). However, as Etzel et al. (2009) point out, spatial smoothing can be useful in between-subjects analysis, where MVPA has shown difficulties due to the great degree of specificity of the within-subject signal (Cox and Savoy,2003), that entails generalization issues. Haynes and Rees (2006) argued that this signal loss due to MVPA-unfriendly processing steps could account for important features, because voxels that show a weak or inconsistent response might do carry vital information when analyzed together and not separately. This exposes an interesting idea about how some cognitive processes could work, the weak choral activation of many voxels might be potentially as useful as a strong (significant) activation of a individual voxel.
As an example of this Kamitani and Tong (2005), in their famous study upon orientation detection, have revealed that many voxels show patterns of weak activation in a consistent basis across repetitions of the same condition, proportioning a reasonable basis to this argument. More importantly, the remarkable work about category recognition carried out by Haxby (2001) showed that weaker or submaximal voxels are representative of each category (in this case shoes or bottles). More importantly, this study showed that even if voxels the highest consistent activation are removed, the categorial fingerprint can be identified above chance. This accuracy in recognition when category-key voxels were put away implies that areas showing lower levels of activation can be used to discriminate between two categories. Because these categories share some high-activity areas, overlapping of two functions could be resolved by selecting low level features (Peelen and Downing, 2006). Downing et al. (2005) also provide a relation of overlapping category areas in which MVPA could be useful. Although distinction between two activity patterns based on activation intensity is possible (Hanson et al., 2004) research on overlapping issues needs to carry on. MVPA has been also described as a major advance in information extraction from the fMRI signal (Norman et al., 2006) and a necessary tool to avoid data wasting from neuroimaging data, which is normally expensive and difficult to register (O'Toole, 2007). Thus, as we said previously pattern analysis does not use processing steps that rule out potentially crucial information.Instead of using those strategies, MVPA tries to make the most of that fine- grained 1 activity by defining what is the activation pattern of a voxel ensemble in a given example . Examples are presentations of our stimuli that will provide activation patterns to our classifier algorithm. Once our classifier has been trained with several examples, it will be theoretically ready to recognise which example has been presented to him. In a sense, the classifier holds a weighted model of the activation pattern characteristic of an object category for example. All this information might be a little unclear while compressed in such a brief proceeding description, but in the next sections we will address what is a classifier and how the process takes place.
MVPA basic procedure
The graph below summarizes the procedure for carrying out an MVPA experiment. There a few remarks that have to be addressed before getting into detail, specifically and foremost about training data, testing data and feature selection. Preprocessing steps, as well as scanning details are not taken into account in this essay 2 .
1 *The exploitation of these fine-grained activity represents one of the central features of MVPA, as low level activity or activity that does not achieve significance might be lost if this fine-grained activity would be disregarded. 2 Brief note on data preprocessing: As Etzel et al. (2009) indicate, many steps that are used in univariate analysis that take part as well in MVPA. Correction for motion and normalization are typically used, while voxel-wise detrending (to correct scanner drifting for example) might be controversial due to the delicate nature of the data needed for MVPA. The will find a great review about how to undertake a classification analysis upon fMRI data at the quoted article. The overall majority of reviews and articles consulted on technical aspects of MVPA make particularly clear the necessity of splitting training data and testing data as the first step to be made (Pereira et al., 2009; Kriegeskorte et al., 2009; O'Toole et al., 2007; Mur et al.,2009). Also important is to not to use the testing data as part of the feature selection. The reasons that explain these precautions will be addressed later on, however, it must be clarified as the illustration fails to reflect this particular aspect evident.
3
3 Illustration from Norman et al., 2006. It includes a fourth step (b or pattern assembly) that is rarely mentioned. It does not appear in other reviews as Mur et al.,2009, and involves the labelling of activity patterns. Since patterns are caused by discrete stimuli and we have set that stimuli, there is no need to label the patterns. It may be pertinent in exploratory analysis, where the source of the pattern may be unknown. Believe it or not, this is the best illustration of the process available and its repeatedly used. Overview
The process outlined in the graph above includes three main steps that will be described next (remember that data splitting has already took part):
1. Feature selection: (or voxel selection): This first step tries to delimitate a set of voxels that will be used further. It can be done following different techniques. In this case , bottles and shoes were showed to select the pertinent voxels
2. Classifier training: In these stage the training data is used to train our classifier algorithm, so it will establish a successful function between our examples (or stimuli) and its characteristic activity pattern. Once training has finished the classifier generates a decision plane.
3. Classifier or generalization testing: The classifier is exposed to new data (testing data set) that belongs to the same category. In our example, it will be images of shoes and bottles not presented previously. The activation patterns will be submitted through the classifier that will assign them a position on the decision plane. Based on where they fall and the identity of the example, the classification was successful or not.
Feature Selection
Also called voxel selection, this is a capital step in MVPA, because it will define the framework and extent of the analysis. It is as well one of the steps that portrays many pitfalls, as we will see in the section for that purpose.
First of all, Why is feature selection necessary?
Many articles disregard this fundamental question that arises easily. The mere presence of a voxel selection appears looks to contradict the foundation of MVPA. If taking into account the interactions between voxels is the goal to achieve, to narrow down the amount of voxels that we are going to account for in our analysis seems nonsense. Nonetheless, as Cox and Savoy (2003) point out, many classifiers experience an inherent loss of accuracy when the number of voxels included into analysis is very high. While MVPAs power mainly resides in taking into account voxels which activity is not necessarily significant, adding irrelevant voxels whose activity mainly reflects noise or is very low affects significantly the performance of the classifier. In spite of all, methods and applications that allow the usage of whole-brain activity have been described at Tong and Pratte (2012). Normally, these whole-brain studies deal with high-level cognition processes, which are not easy to narrow down to a specific set of ROIs. Therefore, these researchers use independent component analysis (ICA) or Principal Component Analysis (PCA) to narrow down the number of dimensions. In a simple way, is a method that allows reducing the number of variables to take care of by grouping them around linear solutions, that are unrelated as much as is possible.
So,given the necessity to do so, feature selection will provide us with the voxels we are going to include in our analysis. First of all, it is very common to select a region of interest (ROI) relevant for our study in which our feature selection is going to take place (Haxby et al., 2001; Mur et al., 2009; Chadwick et al.,2010). Following Pereira et al.(2008), we can distinguish between filtering and wrapper feature selection methods. Wrapper methods carry out operations adding and subtracting voxels taking in consideration the impact they have in the classificators performance, however, these methods involve some combinatorial issues that make computing complicated (Norman et al., 2006) and filtering methods are normally preferred. Filtering involves creating a voxel ranking based on a specific criteria. We can then rank voxels based on how active are they are, how high is their discrimination power between conditions, their prediction accuracy, the consistency of their activation and so on. It is important nonetheless to realize that by doing filtering, we are considering voxels as separate identities again (so we are performing univariate analysis) . A popular option is to focus on voxels which show maximum activity and hold a good discriminant power (Polyn et. al, 2005, as quoted in Norman et. al, 2006). With certain classifiers, a multivariate feature selection called Searchlight accuracy can be used (Kriegeskorte et al, 2006). This method tries to add the information from the voxels environment (neighbouring voxels) defining a spherical cluster which is a ball of voxels of x radius. The testing data is used repeatedly so the useful voxels can be detected within the radius of the spherical cluster
Classifier Training
This step involves to use those trials that we saved for training to supply examples of the activation patterns (characteristic of our experimental conditions) to a multivariate algorithm. This algorithm will learn the statistically representative features of our stimuli, and will generate a decision function (1c in the previous graph) that will be used to make a call each time a stimuli will be presented in our testing phase. The algorithm choice is one of the decisive steps when it comes to MVPA..There is a great variety of classifiers available and a not-so-exhaustive discussion about them will take all the length of this essay. However, we cannot pursue with almost a slight discussion of this point.
What is a classifier algorithm?
The easiest way to introduce this notion is to describe the task that classifiers perform. Classifiers have to identify the relationship between voxels activity and the stimulus appearance, and being able to recognize that relationship with unpresented stimuli of the same category in the future.
Thus, classifiers obtain a parametric profile of the activity pattern elicited by the stimulus or example. This parameters are acquired during the training phase with the data reserved for that purpose. When the training is finished, the classifier is supposed to be able to give us a prediction (or identify, discriminate; that lies on the researcher's assumption 4 ) of which stimulus has been presented to a given subject. This prediction must be based on a different set of examples than the one used for training, if otherwise, there would exist a problem of overfitting (see section of limitations).
Classifiers differ in the type of function they learn (Pereira et. al, 2008). Primarily, algorithms can be divided between linear and non-linear ones. The overall majority of MVPA studies have used linear classifiers due to their success according to Mur et. al (2009). Additionally, non-linear classifiers have not consistently demonstrated a superior performance in any case to date according to Mur et al (2009) while the same authors consider the solutions offered by these classifiers as difficult to interpret. Sheng (2011) suggests that one of the key of linear classifiers is their simplicity and their ability to balance the influence of specific voxels between examples or stimulus. All linear classifiers will elaborate a weighted model, that will reflect the importance of the different voxel activity values . In the illustration below each voxel (represented by x) has assigned a specific weight (w). In a hypothetical situation, category shoe could be defined by xw>0 and class bottle could be defined by xw<0. 5
Aspects to consider when choosing for a classifier are the number of features or voxels and the number of categories, among other factors not described here. Popular linear classifiers are GNB, LR or SVM. Compared between them, GNB tends to have a poor performance in settings with many voxels and LR has better results in situations with more than two conditions or categories than SVM (Pereira et al., 2008). SVM requires of additional modifications for working with more than two categories (our case of study uses on of these classifiers).
Decision-making threshold
Taken to the simplest situation in which we have two features or voxels, we can observe how the decision-making process will take part. These voxels (there is a slightly similar example at Mur et al., 2009) could work as coordinates to define a point in a plane, and we would only need to build up a line to separate our conditions.
4 Note that despite left to the researchers intention, identifying and discriminating are not the same thing whatsoever. Identifying means positively specify to which category an activity pattern belongs to without having presented it before to the classifier (Kay et al., 2008), while discriminating could mean just distinguish between two examples (Chadwick et al., 2010) 5 Illustration from Pereira et al.,(2008) This line will be constructed with the feedback that the classifier is provided with as being trained with training data examples. Thus, given an example during the testing phase , the xw model will be submitted to the decision function that has already been constructed throughout the training phase ,and that will work as our linear threshold to determine which category has been presented.
6
The illustration above give us a chance to explain part of the of the potentiality of MVPA. In the first situation (a) , we see how the two distributions are completely segregated in a rather simple way, when voxel X1 (lets say blue) and voxel X2(let's say red) have opposed activations. When condition A is presented, voxel X1 shows activation while X2 is inactive. In this situation the usage of univariate analysis would yield optimal results. There is no overlap between conditions. However, the situation at the right displays a more complex situation (b). It can be approached nonetheless by using MVPA with a linear classifier, that by assigning weights to each voxel will be able to code the influence of them. Then, given an specific point on the plane the decision threshold will allow us to determine what condition or example was more probable to have occurred. The last of the three situations in our illustration (c.) will be tackled only with the help of a non-linear classifier. The idea is the same as with linear ones, but in this case the decision threshold is more complex.
Although non-linear classifiers might be more powerful, most of the texts are not very enthusiastic about their utilization in one way or another (Kamitani and Tong, 2005, Pereira et. al, 2008, O'Toole et. al, 2007, Norman et al., 2006 and others).
6 illustration from Cox and Savoy (2003) As we said, it is considered that this methods yield results very difficult to interpret, and that the gain in performance due to usage of non-linear classifiers is unclear.
Classification by Nearest-neighbour
This method is one of the simplest, as it does not imply the learning of a function properly speaking. The example presented is compared to the ones already seen in the training stage, so a decision is made based on the likeness between the training and the testing examples. There are ways that can improve the performance of nearest-neighbour by averaging the pattern left by the testing examples, but again this will remove variability that might be valuable for making a decision. According to Pereira (2008), nearest neighbour works well as long as the number of voxels remains relatively low. This classification system was used in Haxby (2001).
Generalization Testing
Up to this point, the last step is just to test the classifier by exposing it to new, unpresented data. The comparison between the presentation template and the predictions yielded by our classifier will yield an accuracy percentage.The classifier has therefore made a judgement in each case saying which of the conditions has been presented. If it achieves values beyond chance, training has been successful.
Limitations and Pitfalls using MVPA
Like every method, MVPA has several weaknesses, some of which are more avoidable than others. Technical limitations due to spatial or temporal resolution are difficult to avoid (temporal resolution of MVPA is inevitably limited by the dispersion of the hemodynamic response Norman et al.,2006), whereas others like feature selection or classifier choice are likely to be controlled with the help of a good decision-making process.
Capacity to deal with overlapping states
For example, as we have seen one of the strengths of MVPA is to disentangle the activation patterns (spatial patterns) produced by two different stimuli or mental states that have take part (Cox and Savoy, 2003). By contrast, as Haynes and Rees (2006) point out there is currently no evidence supporting that MVPA could distinguish between two stimulus that happen at the same time and whose spatial representation share the same conjunct of neurons. It can be argued that this limitation might be solved with the appearance of higher spatial resolution, but as Haxby (2012) states, there is a necessary limit in the number of modules that only can support one kind of processing.
A limited number of categories for an unlimited world
In a logical extension of this reasoning, Haynes and Rees (2006) claim that while percepts or stimulation ways are virtually infinite, the number of training categories has to be obviously limited. Hence, our classifier will be always limited to a certain number of discriminations. Attempts to work in this issue could came from studies that deal with the generalization problems of MVPA like the one carried out by Kay et al. (2008). The classification in this report shows remarkable generalization skills while exposed to numerous unpresented images reaching high prediction rates based in a training set of 1750 images.
The presence of previous knowledge
Although as we said to obtain whole-brain analysis is possible (Polyn et al.,2005, as quoted in Haynes and Rees, 2006) it certainly involves many challenges difficult to resolve (combinatorial limitations, overfitting.). A plausible alternative could be the usage of searchlight feature analysis, which is supposed to alleviate the potential computational issues (Tong and Pratte, 2012) . Thus, using MVPA implies to have a reasonable knowledge of the features to study and almost some guidance to know where to find them. As Pereira et al. (2008) mention, the definition of ROIs is a common step in the overwhelming majority of MVPA studies. This particular issue is supposed to have a lower impact in systems which functional architecture is relatively known (visual system according to Haynes and Rees, 2006) but stands as a remarkable issue with other cognitive functions whose functioning basis has not been described properly yet. Feature selection stands as one of the biggest causes of issues in MVPAs studies. According to Tong and Pratte (2012) studies on higher-level cognition have difficulties to define a coherent set of regions of interest, and to therefore to target correctly relevant voxel arrays.
Generalization issues
This a topic related at the same time with the strengths of MVPA. Pattern recognition involves the exploitation of the so-called fine grained activity (Norman et. al, 2006). Consequently, response patterns are highly characteristic and difficult to extrapolate to other subjects. The pattern aroused by the stimulus X in the subject 1 should be, in ideal situation, the same as the one aroused by the same stimulus in subject 2. Currently and while some extrapolations has been successful (Haxby 2011 developed hyperalignment which includes tuning functions , as quoted in Haxby, 2012), this is an unresolved problem.
Studies normally conduct MVPA analysis in a within-individual basis. However, generalization problems doesnt die there. As Haynes and Rees (2006) indicate, even more complicated is the generalization across different contexts. That is, in our dummy example, subjects 1 and 2 receive the same stimulation assuming a similar setting, but what would happen if the context surrounding that presentations would not be the same?. While classification accuracy does not drop uncontrollably, a severe worsening even when the setting is the same but scans are carried out in different days (Cox and Savoy,2003). Generalization across different stimulus was nonetheless achieved in a working memory study (Harrison and Tong, 2009; as quoted in Mur et al, 2009) and between subjects in an auditory perception one (Formisano et al., 2008 as quoted in Tong and Pratte, 2012).Once again the study of Kay et al., (2008) stands as example, as it demonstrated successful generalization across time. Haynes and Rees (2006) point out that to improve generalization normally takes is cut out of the individual discriminatory power. Finally, an important aspect is to interpret carefully the mere differential activity. Poldrack (2009) found that when subjects carry out a cognitive task that varies enough almost all cortex can show discrimination power. Differential activity rates can take part due to a myriad of reasons like slight differences in memory processing load, difficulty, time to process or language requirements (Tong and Pratte, 2012).Future directions have a great deal in developing calibration and adjustment protocols between subjects and situations.
Circular Analysis or Overfitting
The danger of circular analysis is a common cause of concern in MVPA studies. Reviewing the recent literature, it is evident that most of researchers are aware of this issue in almost its simpler form, which we explain now. Most of MVPA studies have as their main goal to train a classifier able to identify a specific activity pattern and distinguish it from others of the same or other categories (examples of this are Haxby, 2001;Kay et al., 2008;Chadwick et al.,;2010).To achieve this, we have stated that separated training and test data have to exist, the first one to train the classifier and the later to test the classifiers performance in terms of correct guesses proportion. The independence of these two data sets is understood as crucial, and they must be splitted in two before proceeding to the feature selection phase, as the first step of all process (Pereira et al., 2009) OToole et al., (2009) explains the reason why the training data cannot be used to test the classifier. In fMRI studies, we have normally a large number of parameters (that are the voxels we take into our analysis) compared to the number of examples (stimulus presentations) presentations. Thus, that the voxels contained in our ROIs outnumber largely our number of examples is relevant because the number of parameters that will characterize each activation pattern will be enormous, and some of these parameters will contain noise (probable systematic and unsystematic sources of error). This overfitting or large parameter characterization leads to a situation in which a perfect classification for the training test is possible. By contrast, the same classifier will obtain poor results while tested with new data due to the same reason. Overfitting leads to lower accuracy on the test set and therefore to lower generalization skills for the classifier (Mur et al,2009) Overfitting can lead to classifier overestimation as well if the training data will be used to assess the classifiers accuracy during the testing stage (Kriegeskorte et al., 2009). In this situation, the classifier built with many parameters can fit and identify a significant part of the testing material, regardless of the algorithm skills to classify other stimuli of the same category. Put simplistically, classification will be successful segregating the stimuli that has already been presented to him, but there are no guarantees that it will be successful with other stimuli, even if they are of the same category (Mitchell, 2010). It has been stated then that overfitting is then related with voxels selection, because the more the voxels we select in our analysis, the bigger the number of our parameters, and the higher the chances that low level parameters take a leading part in our classification performance. There is a second variant of overfitting described by Pereira et al., (2009) that operates in a more subtle basis. It has been settled that feature selection can be carried out in different ways, normally involving a set of data that will help us to delimitate our voxel selection, probably into an beforehand selected ROI. The illustration below (from Kriegeskorte,2009) shows the prediction results when only the training set or all the data is used to delimitate the voxels selection (ignore the dark bars labelled as task). When all the data is used, classification was almost at 100%, whereas using only the training data lowers down the previous percentage to a rough 75%: The same table provides the comparison with random data, when all the information is used to delimitate our voxels selection and when only training data is used .Although still above chance, the results have changed significantly. The reason for this is that it is a subtle way to let the classifier to learn about our testing data. Some of the voxels time series seem to be compatible between the training and the testing sets due to their belonging to a particular common category. In a way, it can be said that the training and the testing data sets are no longer independent (Kriegeskorte et al., 2009).
Case of study: Chadwick, M. J., Hassabis, D., Weiskopf, N., & Maguire, E. A. (2010). Decoding individual episodic memory traces in the human hippocampus.Current Biology, 20(6), 544-547.
Overview
The following is just a little summary which intention is to orientate the reader. The article itself is easy to find and I encourage its reading (including the Supplementary Information). To read this article may be useful to understand the following comments and remarks, as space limitations make impossible a full detailed description that could have been biased anyway.
The theoretical background of this article lies on the principles about how the hippocampus (HC) stores representations of episodic memories. More specifically, the authors mention that the HC is supposed to store an index (Marr et al., 1971 as quoted) of the episodic memory, that would contain the guidelines to reconstruct a complex, multimodal memory. Across the article and by the claims made, it is evident that the authors suggest an identification between the activity pattern aroused when the memory is recalled and the index previously mentioned. The experimental task consisted primarily in to determinate if the representations (that is, activation patterns) of different memories from three different videos can be guessed and therefore distinguished with a trained classifier.The videos show a different woman each one, performing basic actions (post a letter, throw a can to a trash bin) and were viewed 15 times by each subject prior to scanning.
Basically, the subjects performed three recalling tasks with measurements taken from three regions of interest (ROIs), HC, entorhinal cortex (EC), and parahippocampal gyrus (PHG).In the first modality, they were instructed which of the three memories they were supposed to recall, while in the second the recall was free. Subjects were instructed to randomize between the three memories. In both conditions the multivariate analysis yielded significant decoding rates, with no statistical differences between conditions, and eventually the data from both conditions was collapsed for further analysis. The illustration below (taken from Chadwick et al.,2010) summarizes the decoding rates for each area [hippocampus accuracy of 44% (p= 0.000001; chance level = 33%), mean entorhinal cortex accuracy of 38.5% (p = 0.009), and mean parahippocampal gyrus accuracy of 41% (p = 0.0004)]
Multivariate Classification Procedure
The above graph summarizes the multivariate classification procedure. In the illustration only two representations videos are shown for the sake of simplicity as the authors said. (taken from Chadwick et al., 2010, Supplementary Information). The image A shows two image captures from two of the videos (each video of 7 sec.), B shows an stimuli template. It is important to bear in mind that the stimuli are not the videos themselves, but the recall of them. Thus, ABBBAA.. implies, video A evocation, video B evocation, and so on. C describes the process of feature selection using the searchlight multivariate method for each ROI (it was described in feature selection methods of this essay). The data was splitted between training set and testing set, and only the testing set was used for voxel selection. They used a k-fold cross- validation strategy, which involves the selection of new features by searchlight feature selection each time (in each fold one example is left for testing and the rest are used), as authors say, with different training data.D once the voxel selection is completed, the linear SVM classifier is feeded with the examples to afterwards be tested using the example saved for testing purposes. Finally, in E the test data (all the examples are used as test data almost once according to the k-feature testing regime) which will be used to determine the classificators accuracy. Predictions are then compared with the real video presentation to establish the accuracy percentages.
Criticism and Remarks
If I may, let me first point out that as far as I know this is the first critic that is done upon this article. I wanted to try to do my best for applying the knowledge that I have obtained while writing this essay, that I have enjoyed doing as much as I have suffered. Thus, the attempt is to go a bit further than the mere introduction of the technique, but also to take a glance of how its applied out of the strictly machine-learning environment. Finally, I shall apologize for the reckless criticisms that I might be doing.
Multivariate Classification Procedure & Claims of functional differentiation
This experiment is particularly difficult to picture and it is important to keep a set of assumptions in mind. Firstly, examples used to test and train the classifier are activity patterns which come from the recalling of the three videos. Therefore, when the data set was splitted in testing and training data in each k-fold , it means that activity patterns from the three videos were separated in testing data (only one activity pattern) and training data (all the rest). While all of the activity patterns can be regarded as different because are evoked in different times and due to the reconstructional nature that characterizes memory recall. This characteristic of the memory recall makes the training data (that in the illustration above showed only 9 examples and we could therefore think about 14 with three videos) clearly insufficient due to the highly similarity between videos.
The limited number of examples per stimuli together with the similarity between them probably made the linear classifier to acquire a few amount of overlapping strong parameters, while adding a great number of voxels whose activity was barely consistent each time. That barely consistent activity contained the fine aspects that could have distinguish between the three stimuli, and that may have helped to raise the classification accuracy rating by periodically saturating a series of parameters. It seems illogical why the authors selected three memories such similar between them if their intention was to distinguish memory traits (all of them with a different woman that performs a similar action each time and walks away).For all what was exposed, a singular variety of overfitting took part when accuracy ratings barely surpassed chance levels. Possible additional reasons for this relatively poor performance is the unequal number of examples. When one example is reserved for testing, the other two have one example extra. While this might look trivial, Pereira et al., (2008) point out that the classifier can tend to prime the category with more examples, and therefore tend to predict it more frequently.
Our second issue concerns the following statement Our data provide further evidence for functional differentiation within the medial temporal lobe, in that we show the hippocampus contains significantly more episodic information than adjacent structures . The authors claim functional attachments to the classification percentages, that show a slight better performance in the hippocampal area. There is no doubt that when memories are evoked some neurons in the hippocampus show activation. Even going further, there is a weak evidence that they can discriminate between memories when a trained classifier is used, but there is no evidence which supports that those neurons carry episodic information. This is as an example of reverse inference (Poldrack, 2006). Finally, a similar and more suggestive setting could have been the usage of a classifier to try to distinguish between unpresented memories. Such experiment would have entailed the memorization of the examples previously to feature selection, as they would be used only for testing purposes. Feature and training data could be similar to ensure that the voxel selection will contain pertinent voxels.
References
Chadwick, M. J., Hassabis, D., Weiskopf, N., & Maguire, E. A. (2010). Decoding individual episodic memory traces in the human hippocampus.Current Biology, 20(6), 544-547.
Cox, D. D., & Savoy, R. L. (2003). Functional magnetic resonance imaging (fMRI)brain reading: detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage, 19(2), 261-270.
Davatzikos, C., Ruparel, K., Fan, Y., Shen, D. G., Acharyya, M., Loughead, J. W., ... & Langleben, D. D. (2005). Classifying spatial patterns of brain activity with machine learning methods: application to lie detection. Neuroimage, 28(3), 663-668.
Downing, P. E., Wiggett, A. J., & Peelen, M. V. (2007). Functional magnetic resonance imaging investigation of overlapping lateral occipitotemporal activations using multi-voxel pattern analysis. The Journal of neuroscience,27(1), 226-233.
Downing, P. E., Chan, A. Y., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006). Domain specificity in visual cortex. Cerebral cortex, 16(10), 1453-1461.
Etzel, J. A., Gazzola, V., & Keysers, C. (2009). An introduction to anatomical ROI-based fMRI classification analysis. Brain Research, 1282, 114-125.
Hanson, S. J., Matsuka, T., & Haxby, J. V. (2004). Combinatorial codes in ventral temporal lobe for object recognition: Haxby (2001) revisited: is there a face area?. Neuroimage, 23(1), 156-166.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425-2430
Haxby, J. V. (2012). Multivariate pattern analysis of fMRI: The early beginnings.NeuroImage, 62(2), 852-855.
Haynes, J. D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523-534.
Kamitani, Y., & Tong, F. (2005). Decoding the visual and subjective contents of the human brain. Nature neuroscience, 8(5), 679-685.
Kay, K. N., Naselaris, T., Prenger, R. J., & Gallant, J. L. (2008). Identifying natural images from human brain activity. Nature, 452(7185), 352-355.
Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S., & Baker, C. I. (2009). Circular analysis in systems neuroscience: the dangers of double dipping.Nature neuroscience, 12(5), 535-540.
Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. Proceedings of the National Academy of Sciences of the United States of America, 103(10), 3863-3868.
Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in cognitive sciences, 10(9), 424-430.
Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., & Gallant, J. L. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), 1641-1646.
Mur, M., Bandettini, P. A., & Kriegeskorte, N. (2009). Revealing representational content with pattern-information fMRIan introductory guide.Social cognitive and affective neuroscience, 4(1), 101-109
Mitchell, T. M. (2008, January). Computational models of neural representations in the human brain. In Discovery Science (pp. 26-27). Springer Berlin Heidelberg.
O'Toole, A. J., Jiang, F., Abdi, H., Pnard, N., Dunlop, J. P., & Parent, M. A. (2007). Theoretical, statistical, and practical perspectives on pattern-based classification approaches to the analysis of functional neuroimaging data.Journal of cognitive neuroscience, 19(11), 1735-1752.
Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: a tutorial overview. Neuroimage, 45(1), S199-S209
Reddy, L., Tsuchiya, N., & Serre, T. (2010). Reading the mind's eye: decoding category information during mental imagery. Neuroimage, 50(2), 818-825.
Spiers, H. J., & Maguire, E. A. (2007). Decoding human brain activity during real-world experiences. Trends in cognitive sciences, 11(8), 356-365.
Poldrack, R. A., Halchenko, Y. O., & Hanson, S. J. (2009). Decoding the large-scale structure of brain function by classifying mental states across individuals. Psychological Science, 20(11), 1364-1372.
Polyn, S. M., Kragel, J. E., Morton, N. W., McCluey, J. D., & Cohen, Z. D. (2012). The neural dynamics of task context in free recall. Neuropsychologia,50(4), 447-457.
Rosenberg, M., List, A., Sherman, A., Grabowecky, M., Suzuki, S., & Esterman, M. (2012). Decoding EEG data reveals dynamic spatiotemporal patterns in perceptual processing. Journal of Vision, 12(9), 1173-1173.
Sheng, L. I. (2011). Multivariate pattern analysis in functional brain imaging.Acta Physiologica Sinica, 63(5), 472-476.
Tong, F., & Pratte, M. S. (2012). Decoding patterns of human brain activity.Annual review of psychology, 63, 483-509.