You are on page 1of 10

A New Method of EEG Classification for BCI with Feature Extraction Based on Higher Order Statistics of Wavelet Components

and Selection with Genetic Algorithms


Marcin Koodziej, Andrzej Majkowski, and Remigiusz J. Rak
Warsaw University of Technology, Pl. Politechniki 1, 00-662 Warsaw {kolodzim,amajk,rakrem}@iem.pw.edu.pl

Abstract. A new method of feature extraction and selection of EEG signal for brain-computer interface design is presented. The proposed feature selection method is based on higher order statistics (HOS) calculated for the details of discrete wavelets transform (DWT) of EEG signal. Then a genetic algorithm is used for feature selection. During the experiment classification is conducted on a single trial of EEG signals. The proposed novel method of feature extraction using HOS and DWT gives more accurate results then the algorithm based on discrete Fourier transform (DFT). Keywords: feature extraction, feature selection, genetic algorithms (GA), higher order statistics (HOS), discrete wavelet transform (DWT), brain-computer interface (BCI), data-mining.

1 Introduction
Constructing of an efficient brain-computer interface (BCI) is one of the most challenging scientific problems and focuses scientists attention from all over the world. In order to work out an efficient and accurate brain-machine interface the cooperation of scientist from many disciplines, such as medicine, psychology and computer science is necessary. The best known BCI interfaces are based on EEG signals recorded from the surface of the scalp, because this method of monitoring the brain activity is easy to use and quite inexpensive. Besides, electroencephalography is widely used in medicine, for example for diagnosis of certain neurological diseases. Brain-computer interfaces make use of several brain potentials such as: P300, SSVEP or ERD/ERS [1,2,3]. The most difficult case for implementation is BCI based on brain potentials associated with movements (ERD/ERS). The ERD/ERS name origins from the phenomenon of power rise or fall of EEG signal in the bands about 8-12 Hz and 18-26 Hz, when the subject spontaneously imagines a movement. For example the imagination of left or right hand movement causes a rise or fall of EEG signal power collected from various locations on the scalp. In our experiment we tried to classify EEG signals for single asynchronous
A. Dobnikar, U. Lotri, and B. ter (Eds.): ICANNGA 2011, Part I, LNCS 6593, pp. 280289, 2011. Springer-Verlag Berlin Heidelberg 2011

A New Method of EEG Classification for BCI with Feature Extraction

281

trial of movement imagination. There were three different classes associated with different actions the subjects were asked to imagine. The aim of the proposed algorithm was to decide to which class belonged particular one-second window of the EEG signal, that is what the subject imagined at that time.

2 Dataset Description
In the experiment we used a dataset of EEG signals provided by IDIAP Research Institute (Silvia Chiappa, Jos del R. Milln) [10]. The set contains data from 3 normal subjects acquired during 3 non-feedback sessions. The subjects were relaxed, sat in a normal chairs with arms resting on their legs. Each subject had three tasks to execute: imagination of repetitive self-paced left hand movements, imagination of repetitive self-paced right hand movements, generation of word which starts with the same random letter.

All 3 sessions were conducted on the same day. Each session lasted 4 minutes with 510 minutes breaks in between them. The subject performed a given task for about 15 seconds and then switched randomly to another task on the operator's request. EEG data was not split in trials. In the experiment we have focused on the first session of only one subject. EEG signals were recorded with a Biosemi system using a cap with 32 integrated electrodes located at standard positions of the conventional 10-20 system. The sampling rate was 512 Hz. No artifact rejection or correction was employed. Dataset contains raw EEG signals - 32 EEG potentials acquired in the following order: Fp1, AF3, F7, F3, FC1, FC5, T7, C3, CP1, CP5, P7, P3, Pz, PO3, O1, Oz, O2, PO4, P4, P8, CP6, CP2, C4, T8, FC6, FC2, F4, F8, AF4, Fp2, Fz, Cz. In the training files, each line has a 33rd component indicating the class label.

3 Proposed Feature Extraction Method Based on DWT and HOS


There exists many methods of feature extraction from signal. The most widely used for EEG signal are based on frequency analysis, for example discrete Fourier transform (DFT) or power spectral density (PSD). We propose a new method based on higher order statistics (HOS) of wavelet components (DWT). At first the EEG signal is divided in one-second windows overlapping by a half of second. The half-second overlap enables to generate large enough set of data for efficient classifier learning. Each one second window of EEG signal contains information from 32 channels, with 512 samples per channel. Further we refer to that portion of data as to a block. Next for each block the wavelet transform is calculated. Continuous Wavelet Transform: (CWT) of x(t)L2() (where L2() denotes a vector space of one-dimenssional functions) for certain wavelet (t) is defined as:

282

M. Koodziej, A. Majkowski, and R.J. Rak

W ( , ) =

t , (t ) = 1

x(t ) (t )dt
,

(1 )

where - denotes a scale and - a delay of wavelet window , (t ) . For the Discrete Wavelet Transform (DWT). We assume that:

= 2s , = 2s l

(2)

(where l describes the delay and s the scale coefficients: l=0,1,2,... s=0,1,2,...) what as a result, after discretization of signal x(t), brings a new form of the wavelet transform:

W ( , ) = W (l 2 s ,2 s ) = W (l , s ) = = 2 s / 2 x( n) ( 2 s n l )
n

(3)

It is also possible to define a wavelet series, like in a case of Fourier transform, for any function x(t)L2:

x(t ) = wl ,s l ,s (t )

l , s (t ) = 2 s / 2 (2 s t l )

(4)

But in a case when {l,s(t)} creates an orthonormal base in L2() space then, like in a case of Fourier series, we have:

wl , s = x (t ), l , s (t ) = 2 s / 2 W , x (l , s )

(5)

Very useful and practical implementation of the DWT algorithm has been proposed in 1998 by S. Mallat [11]. It relies on the decomposition of the signal into wavelet components with the help of digital filters.

Fig. 1. Filtering process of wavelet decomposition first decomposition level

A New Method of EEG Classification for BCI with Feature Extraction

283

We can distinguish approximations and details. The approximations are the lowfrequency components of the signal. The details, in turn, are the high-frequency components. The filtering process of DWT at its most basic level is presented in figure 1. The process of decomposition can be continued, so that one signal is broken down into many lower resolution components. This is called the wavelet decomposition tree (fig. 2).

Fig. 2. Wavelet decomposition tree

For decomposition we used 5th order wavelet from the Daubechies family (db5) [7]. In our experiment 7th level DWT decomposition was performed, so for one second window of EEG signal we received 7 details and an approximation (fig.3). All steps include decimation, so on 7th level the detail signal consists of only 4 samples. In the next step higher order statistics (HOS) are calculated. In this way variance, skewness and kurtosis were counted on the successive details d1,d2,d3,d7. As the result of that process 21 features were obtained (7 details 3 HOS) for each onesecond window. Since the operation was performed on 32 channels, we had 672 features form one second block.

Fig. 3. Details (d1,d2,d3,...d7) and the approximation (a7) obtained by using wavelet transform (db5) of EEG signal

284

M. Koodziej, A. Majkowski, and R.J. Rak

For feature extraction we used our, prepared earlier, Matlab FE_Toolbox [8]. For HOS method the toolbox enables to select a type of decomposition wavelet, number of decomposition levels and statistics which we want to count. Calculated feature vectors, together with its features description, are saved in Matlab workspace.

4 Proposed Feature Selection Method Based on Genetic Algorithms


As we used a database that contained EEG signal collected from 32 electrodes we obtain a large number of features (672 features). In that case a very important problem was to select the best features and in consequence electrodes which carry the most important information for the classifier. To solve this problem a genetic algorithm (GA) was implemented in our experiment. The GA is a popular technique used in optimization tasks. It is used to select the best combination of features that minimize classification error. It was assumed that out of all 672 features only 30 features would be selected, so a chromosome consisted of 30 genes. A randomly chosen set of 500 chromosomes formed an initial population (fig.4). The size of initial population was chosen experimentally. In order to verify which chromosomes were the best adapted a special fitness function was used. The main part of the fitness function was a classifier. For that purpose Linear Discriminant Analysis (Matlab function classify with the option 'quadratic') was used. The fitness function performed 10-fold cross-validation test and returned percentage of the classify error. Smaller errors depicted more relevant chromosomes (more relevant features).

Initial population

Fitness evaluation

Continue?

stop

Parent selection

Crossover function

Mutation function

Fig. 4. The phases of the proposed genetic algorithm

A New Method of EEG Classification for BCI with Feature Extraction

285

The selection operation determines which parent chromosomes are involved in producing the next generation of offspring. In our case the process was based on spinning the roulette wheel, for which parents were selected for mating with a probability that was proportional to their fitness values. Crossover function enables to exchange chromosomes information between highly-fit individuals. In our experiment the probability of crossover was 80%. It means that 80% of chromosomes were allowed to exchange genes. The crossover point was selected randomly and next for two given parent chromosomes the values in the chromosome to the left of the crossover point were swapped. As the population may not contain all the information needed for optimal classification of the EEG signal, a mutation function is introduced. The probability of mutation was 5%. A chromosome that mutate was allowed to change one gene to anyone out of 672 available features. The GA was stopped after 100 generations. It is also worth to know that the calculations lasted about 30 minutes for one run of GA on PC with QuadCore processor, so the algorithm is rather not proper to use in real time. We run 10 times the GA and look which features repeated most often. The achieved results are presented in fig. 5. It is also interesting where the features were localized in channels. The distribution of features per channel is presented in fig.6.

Fig. 5. Features that repeated in 10 runs of genetic algorithm. Darker shades mean that the features described by them occurred more frequently.

286

M. Koodziej, A. Majkowski, and R.J. Rak

Fig. 6. The distribution of features per frequency

The classification error for a single launch of genetic algorithm ranged from 11.6% to 8.6%. Accurate results for 10 runs of genetic algorithm have been presented in table 1. It can be noticed that the most often selected features were taken from channels: 8, 9, 23 and 32 (that is from the electrodes: C3, CP1, C4 and Cz).
Table 1. Classification error for 10 runs of genetic algorithm (BEST - is referred to the error achieved for the best adapted individual from the population, MEAN - to mean error for all individuals in the population)
BEST 11.6% 9.50% 10.1% 8.10% 10.3% 9.10% 9.70% 9.90% 8.60% 11.8% MEAN 13.1% 11.2% 11.5% 9.41% 11.9% 10.5% 11.4% 11.6% 9.90% 13.7%

1 2 3 4 5 6 7 8 9 10

As we knew which channels (electrodes) bring the most information, it would be interesting to check if information from only these channels (instead of 32 electrodes) would be sufficient enough for constructing brain-computer interface. Our further studies went in that direction. We examined the classification error for only four electrodes C3, CP1, C4 and Cz). Then the genetic algorithm was started to select the new best features and determine the classification error. It appeared that for the best case

A New Method of EEG Classification for BCI with Feature Extraction

287

Fig. 7. The fitness value for 4 channels as a function of generation number for one run of GA (upper curve mean value, bottom curve best value)

Fig. 8. The distribution of features. Feature number from 1 to 7 denote the variance of details d1d7. Feature number from 8 to 14 denote the kurtosis of details d1d7. Feature number from 15 to 21 denote the skewness of details d1d7.

288

M. Koodziej, A. Majkowski, and R.J. Rak

the mean classification error for 10 runs of GA for 4 selected electrodes achieved 10.5 %. In figure 7 there is presented the fitness value for 4 channels as a function of generation number for one run of GA.

5 Conclusion
The results show that it is possible successfully apply DWT and HOS as a method of feature extraction for brain-computer interfaces. The selected features can be considered as statistical parameters describing signals after applying filter banks. The variance can be interpreted as the power of the variable component of the signal. The skewness is most often interpreted as the measure of the asymmetry of the signal distribution. Kurtosis in turn is interpreted as the measure of flatness of the signal distribution. In our experiment feature number 3 was the most often selected feature during ten runs of the genetic algorithm (fig. 8). This feature is a variance of the d3 detail signal. An important disadvantage of using the genetic algorithm in practice is the long time it requires to complete calculations. It is hard to imagine using GA while carrying out an experiment on-line (for example while operating the BCI system in real time). In this case, it seems to be necessary to implement other methods for feature selection, for example ranking methods. We compare the results with our previous research where we used discrete Fourier transform as a method of feature extraction [9]. In that case the mean classification error for the same subject was about 22%. In the new proposed method the same error was only 10.5%. Next step of our research showed that is possible to reduce number of channels to 4 and there are not significant changes in the classification results. It is worth to note, that set of the most relevant electrodes selected from HOS/DWT algorithm was different than in a case of pure FFT. It means that the set of selected electrodes depends on the implemented selection method. For final verification the new method should be tested for all subjects and all sessions. Such experiment could select features that are best and universal. Furthermore implementation of biofeedback sessions could bring better results.

References
1. Vidal, J.J.: Direct brain-computer communication. Ann. Rev. Biophys Bioeng. 2 (1973) 2. Molina, G.: Direct Brain-Computer Communication through scalp recorded EEG signals. PhD Thesis, cole Polytechnique Fdrale de Lausane (2004) 3. Wolpaw, J.R., Birbaumer, N., Heetderks, W.J., Mcfarland, D.J., Hunter Peckham, P., Schalk, G., Donchin, E., Quatrano, L.A., Robinson, C.J., Vaughan, T.M.: BrainComputer Interface Technology: A Review ofthe First International Meeting. IEEE Transactions on Rehabilitation Engineering 8(2) (June 2000) 4. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. IEEE Press & John Wiley (November 2002) 5. Documentation of Genetic Algorithm and Direct Search ToolboxTM MATLAB

A New Method of EEG Classification for BCI with Feature Extraction

289

6. Koodziej, M., Majkowski, A., Rak, R.J.: A new method of feature extraction from EEG signal for braincomputer interface design. Przeglad Elektrotechniczny 86(9), 3538 (2010) 7. Koodziej, M., Majkowski, A., Rak, R.J.: Matlab FE-Toolbox - An universal utility for feature extraction of EEG signals for BCI realization. Przeglad Elektrotechniczny 86(1), 4446 8. Koodziej, M., Majkowski, A., Rak, R.J.: Implementation of genetic algorithms to feature selection for the use of brain-computer interface. In: CPEE 2010 (2010) 9. del Milln, J.R.: On the need for on-line learning in brain-computer interfaces. In: Proc. Int. Joint Conf. on Neural Networks (2004) 10. Mallat, S.: A wavelet Tour of Signal Processing. Academic Press, London (1998)

You might also like