Professional Documents
Culture Documents
(2019) 10:311–323
DOI 10.1007/s13042-017-0716-2
ORIGINAL ARTICLE
Received: 28 November 2016 / Accepted: 16 August 2017 / Published online: 2 September 2017
© Springer-Verlag GmbH Germany 2017
Abstract Automatic epileptic seizure detection based on Keywords Electroencephalogram (EEG) time series
electroencephalogram is crucial to epilepsy diagnosis and data · Epilepsy seizures · Biclustering · Extreme learning
treatment. However, the large numbers of time series make it machine(ELM) · Feature extraction · Unsupervised feature
quite challenging to establish a high performance automatic learning
detection method. Considering different physiological states
of the brain could be characterized by distinct combinations
or interactions of similar discontinuous local temporal pat- 1 Introduction
terns, a novel framework based on biclustering for automatic
epileptic seizure detection is proposed in this paper. First, Epilepsy is the second highest risk disease in the nervous
the CC algorithm is used to identify similar discontinuous system, behind only the disease of heart head blood-vessel
local temporal patterns. Then, the bicluster membership [1]. According to incomplete statistics of the World Health
matrix using a new similarity measurement is constructed Organization, there is about 0.5–1.5% of the world popula-
to reduce the dimensionality. At last, the ELM classifier is tion suffering from epilepsy [2]. Due to side effects induced
adopted to discriminate between epileptic seizure and sei- by the treatment, mental disorder or brain damage associated
zure-free EEG signals. With extensive comparative stud- with seizures, social phobia and social discrimination [3],
ies and evaluations on the publicly available Bonn epileptic frequent epileptic seizures severely affect patients’ physi-
EEG dataset, it indicates that the proposed framework could cal and psychological health and their life quality. Worse
not only automatically detect or predict an epilepsy seizure still, family or caregivers members of these patients also
with high performances with respect to accuracy, robustness have been experiencing a lot of financial or psychological
and efficiency, but also implicitly provide valuable knowl- difficulties. As a main biological signal of the brain electri-
edge for studying the mechanisms of epilepsy. cal activities, electroencephalogram (EEG) provides better
physiological and pathological information than other meth-
ods that used for diagnosing brain disorder and studying of
brain function. Thus it plays an important role in epilepsy
diagnosis and treatment [4]. The accurate detection of epi-
lepsy based on EEG is an important step for assisting the
diagnosis and treatment, such as medications and surgery.
* Huai‑Ling Zhang However, conventional manual seizure detections are always
huailing@163.com
tedious and time-consuming. Furthermore, their prediction
* Yun Xue performances are also badly influenced by the presence of
xueyun@scnu.edu.cn
myogenic artifacts. Therefore, developing a high perfor-
1
School of Information Engineering, Guangdong Medical mance automatic seizure detection method for epilepsy sei-
University, Dongguan 523808, China zures is clinically desirable and an important task.
2
School of Physics and Telecommunication Engineering, However, a major challenge faced by automatic seizure
South China Normal University, Guangzhou 510006, China detection methods is how to extract discriminative features
13
Vol.:(0123456789)
312 Int. J. Mach. Learn. & Cyber. (2019) 10:311–323
from epileptic EEG signals. That is because the dimension expression patterns. Apart from DNA microarray applica-
of the epileptic EEG signals is increasingly high owing to tions, Stanislav et al. presented a pilot study of applying
the rapidly increasing number of channels (electrodes) and consistent biclustering to analyze scalp EEG data [22]. How-
sampling frequencies. Various feature extraction methods ever, a literature survey leaves us with the impression that
have been developed from different points of view. Most of biclustering algorithms have not been exhaustively investi-
these existing methods broadly fit into four categories: time gated for automatic epileptic seizure detection.
domain analysis, frequency domain analysis, time–frequency Motivated by above factors, a novel framework based on
domain analysis and non-linear methods. Among them, time biclustering for automatic epileptic seizure detection is pro-
domain analysis and frequency domain analysis are two posed in this paper. Firstly, we adopt the CC algorithm [18]
simple and classical methods. The time domain analysis to identify homogenous subgroups of EEG signals that con-
method mainly analyzes the intuitively geometric proper- sistently fluctuate over a subset of discontinuous time series.
ties of EEG signals such as statistical features of amplitudes Then we further construct a bicluster membership matrix to
[5], rhythmic, cyclical patterns [6] and so forth, while the perform dimensionality reduction. Finally, extreme learn-
frequency domain analysis method is mainly based on the ing machine (ELM) [23] is utilized to discriminate between
power spectrum and coefficient of each frequency band in epileptic seizure and seizure-free EEG signals. The main
the EEG signals [7]. Nevertheless, the aforementioned two contribution of our framework lies in threefold:
methods impose the underlying hypothesis of the stationarity
or linearity of the signals, which is quite inconsistent with 1. We propose a novel framework based on biclustering
the non-stationary and non-linear nature of EEG signals. In for automatic epileptic seizure detection. Distinguished
contrast, discrete wavelet transform could provide a more from most of state-of-the-art methods, our framework
flexible representation of signals in time–frequency domain utilizes the bicluster membership matrix to characterize
to capture and localize transient features like the epileptic the interior relationship between EEG samples and time
spikes [8].Recently, other methods based on nonlinear the- series so that it could better characterize the dynamics
ory also have been successfully utilized for seizure detection, changes of different brain bioelectrical states. This will
including largest lyapunov exponent (LLE) [9], correlation further enhance prediction accuracy.
dimension (CD) [10], entropies [11] and empirical mode 2. A new similarity measurement based on the mean
decomposition (EMD) [12], and so on. squared residue (MSR) is developed to construct biclus-
Despite of these great efforts, most of those methods sel- ter membership matrix. Compared with binary simi-
dom consider the interior relationship between EEG sam- larity measurement it could represent a more precise
ples and time series. Since physiological states of the brain relationship between signals and biclusters rather than
could be characterized by distinct combinations of the basic simply indicate whether a signal belongs to a bicluster
EEG rhythms [13], clustering algorithms such as k-means or not.
clustering [14], self-organizing maps (SOM) [15] and fuzzy 3. Except for focusing on the general performances in
c-means (FCM) algorithm [16] were used to characterize terms of the prediction accuracy, robustness and effi-
and detect different EEG states. These traditional cluster- ciency, our framework could not only implicitly dis-
ing algorithms partition EEG samples into a set of clus- cover a small number of discriminative EEG samples
ters based on the similarity distribution among all the time and time series but also characterize the local interaction
series. However, an EEG state could very often be defined patterns between them, which might be useful to reveal
by only a portion of time series dominated by one rhythm, the mechanism of epilepsy.
a particular mixture or alternation of rhythms, or the fre-
quent appearance of isolated events [13]. Meanwhile, there To the best of our knowledge, only a few works have been
is considerable individual variability in characteristics of done on the complete application of the biclustering algo-
different EEG signals [17]. In other words, conventional rithms to EEG data for automatic epileptic seizure detection.
one-way clustering algorithms obtain global patterns rather Particularly, investigating the important parameters of the
than local ones, failing to finely identify similar discontinu- CC algorithm and suitable classifier to achieve a high per-
ous local temporal patterns. Fortunately, recent biclustering formance framework, this paper might be an early feasible
algorithms [18] provide an effective approach to overcome work on applying biclustering to automatic epileptic seizure
this limitation through simultaneously clustering on both detection. Eventually, experimental results demonstrate that
samples and attributes. Most importantly, biclustering algo- that the proposed framework could not only automatically
rithms also have been extended to supervised biclustering detect or predict an epilepsy seizure with high performances
based classifiers by incorporating class labels [19–21]. The with respect to accuracy, robustness and efficiency, but also
major application of these methods is to establish connec- implicitly provide valuable knowledge for studying the
tions between disease symptoms and correlated local gene mechanisms of epilepsy.
13
Int. J. Mach. Learn. & Cyber. (2019) 10:311–323 313
The remainder of this paper is organized as follows. Sec- under a small fraction of experimental conditions. Thus, it
tion 2 elaborates the structure of the proposed framework overcomes the disadvantage of the conventional one-way
and related methods. Experiment results and discussion of clustering. Thereafter, a series of excellent biclustering
the proposed framework are presented in Sect. 3. Finally, algorithms were proposed, including OPSM [25] and Plaid
Sect. 4 draws a conclusion of the paper and introduces our Models [26]. In addition, biclustering also has been suc-
future works. cessfully introduced into other high-dimensional application
fields, such as text mining [27], market segmentation [28],
recommendation system [29], financial forecasting [30], and
2 Methodology collaborative filtering [31].
According to the type of biclusters the algorithm is able
As depicted in Fig. 1, the procedure of utilizing the frame- to find, Madeira and Oliveira [32] divided biclustering algo-
work for the detection of epileptic seizures includes three rithms into four major classes: (1) Biclusters with constant
interrelated parts. Firstly, the CC algorithm is used to iden- values; (2) Biclusters with constant values in rows or col-
tify similar discontinuous local temporal patterns whose umns; (3) Biclusters with coherent values; (4) Biclusters
homogenous EEG signals consistently fluctuate over a subset with coherent evolutions. Figure 2 presents typical exam-
of discontinuous time series. Then, the bicluster member- ples of these four classes of biclusters. As we can see in
ship matrix, which represents the degree of the similarity Fig. 2, different classes of biclutering algorithms have their
between signals and biclusters, is constructed to reduce the own characteristics and are suitable for identifying different
dimensionality. At last, the ELM classifier is adopted to dis- types of biclusters. Therefore, when finding or developing
criminate between epileptic seizure and seizure-free EEG a proper biclustering algorithm for a specific problem, it
signals. In this step, all training samples with their labels is crucial to take the type of biclusters into serious con-
(seizure or seizure-free) are fed into ELM to train a classi- sideration. For instance, the Bimax algorithm presented by
fier, Once a classifier is completely trained, those unlabeled Prelić et al. [33] can identify biclusters with constant values,
data samples could be predicted by this classifier. Hence, we while SAMBA(Statistical Algorithmic Method for Bicluster
could take the advantages of this framework to automatically Analysis, SAMBA) proposed by Tanay et al. [34] is suited
detect and predict when and whether an epileptic seizure to find out biclusters with coherent evolutions.
occurs.
2.1.2 CC algorithm
2.1 CC algorithm
This paper focuses on developing a framework based on biclus-
2.1.1 The basic idea of biclustering tering for automatic epileptic seizure detection. The first and
utmost step is to identify a set of local temporal patterns which
Biclustering, also known as co-clustering, subspace clus- are sufficient to characterize different brain states. In general,
tering, or block clustering, is a new branch of clustering a collection of EEG signals are stored as a numerical matrix
methods. Its original idea of simultaneous clustering both where rows denote EEG samples and columns represent time
rows and columns was put forward by Hartigan [24] in series. Therefore, it is important that the algorithm selected or
1972. However, until the year 2000, Cheng and Church [18] developed can deal with the numerical data. Furthermore, we
first applied it in gene expression data analysis. Instead of assume that different brain states could be characterized by
clustering on genes or conditions separately, biclustering distinct combinations or interactions of local temporal patterns
clusters on genes and conditions simultaneously so that it whose homogenous EEG signals consistently fluctuate over
can discover groups of genes co-expressed or co-regulated a subset of discontinuous time series. Hence, an algorithm
t1 . . . tn t1 . . . tn B1 . . . Bk
+
S1 S1 S1 0.1 . . . 0. 2
Bk . .
. . B1 . .
Label
. . . . . .
B2 . .
. . . .
t1 . . .
St St tn St 0.78 . . . 0.07
. -
St+1 St+1 St+1
. 0.06 . . . 0.09 +
. . . Label
. 0.06 . . . 0.90
. . . . . .
. .
. . . . . .
. .
Sm Sm Sm . . .
. .
0.56 . . . 0.85 0.56 . . . 0.85 -
Construct Bicluster
Data acquisition Identify Biclusters Train and predict
Membership
13
314 Int. J. Mach. Learn. & Cyber. (2019) 10:311–323
(a) Constant bicluster (b) Constant rows (c) Coherent values-additive (d) Contant columns (e) Coherent values-multiplicative
6 6 6 6 6 6 6 6 6 6 6 6
6 6 6 6 6 6 6 6 6 6 6 6
6 6 6 6 6 6 6 6 6 6 6 6
6 6 6 6 6 6 6 6 6 6 6 6
(f) Overall coherent evolution (g) Coherent evolution-rows (h) Coherent evolution-columns (i) Coherent evolution on the (j) Coherent sign changes on rows
columns and columns
should be selected or developed on the condition that homog- aiJ , aIj, aIJ represents the means of row, column and sub-
enous subgroups of samples with consistent fluctuation over a matrix, respectively. A submatrix AIJ refers to a δ-bicluster,
subset of attributes can be identified from the original matrix. when H(I, J) ⩽ 𝛿 for some 𝛿 ⩾ 0.
The CC algorithm introduced by Cheng and Church [18] is Since the problem of finding the best δ-bicluster is NP
suitable to identify such type of local temporal patterns, for it hard, the CC algorithm is performed in a heuristic greedy
discovers submatrices from a numerical matrix where samples fashion by iteratively updating rows and columns [18]. The
have consistent fluctuation over related attributes. More pre- procedure mainly contains three phases. In the first phase,
cisely, its mathematical model of the biclusters is defined by the algorithm starts with the whole matrix for which it
a MSR score function. The lower MSR the bicluster has, the calculates the initial MSR H . Then it deletes a number of
more consistently its samples would fluctuate over its attrib- rows and columns quickly with MSR H ′ > 𝛼H , where 𝛼 is
utes. The specific model and procedure of the CC algorithm a threshold for multiple node deletion. In the second phase,
are described as follows. it deletes rows or columns one by one to achieve a more
Let a dataset with m rows and n columns be given as a elaborate decrease of the MSR. Finally, in the sense that
m × nreal data matrix A = (aij )m×n, where aij is the value of some rows and columns may be added without increasing
the ith row at the jth column. Define the row and column the MSR, a node addition phase is performed to obtain a
indices of Am×n as X = {1, 2, … , m} and Y = {1, 2, … , n} better result. The algorithm terminates when there is not a
. A submatrix is an indexed set of entries B = (I, J), where possible move to increase the size of the candidate bicluster
I = {1, 2, … , k} ⊆ X and J = {1, 2, … , l} ⊆ Y . The MSR can without exceeding the MSR threshold 𝛿. Once a bicluster is
be formulated by identified, the algorithm masks the elements corresponding
to the identified bicluster with random values, then iterates
1 ∑ the same procedure to search for other biclusters.
H(I, J) = (a − aiJ − aIj + aIJ )2 (1)
|I||J| i∈I,j∈J ij
2.2 Constructing the bicluster membership matrix
where
When a set of significant biclusters are identified, most of
1 ∑ 1 ∑ biclustering based classifiers [19–21] employ a bicluster
aiJ = aij , aIj = a (2)
|J| j∈J |I| i∈I ij membership matrix to construct the relationship between
each sample and these biclusters. This process reduces the
dimensionality by mapping a large number of attributes to
∑ 1 ∑ 1 ∑
aIJ =
1
aij = aiJ = a a small number of identified biclusters. During this process,
|I||J| i∈I,j∈J |I| i∈I |J| j∈J Ij (3)
binary similarity measurement that indicates if a sample
belongs to a bicluster or not is frequently used to compute
the bicluster membership matrix. However, because different
13
Int. J. Mach. Learn. & Cyber. (2019) 10:311–323 315
types of biclusters may differ significantly, it is insufficient have been used as one of the most important linear approaches
and inflexible for binary similarity measurement to reflect for classifying epileptic EEG signals due to its powerful ability
the degree of the similarity between each sample and dif- in discriminating complex nonlinear biological signals. Nev-
ferent types of biclusters. In other words, the similarity ertheless, most of the traditional ANNs learning algorithms,
measurement should be developed based on the type of the like the back propagation (BP) algorithm, are prone to fall into
identified biclustes. In the view of this problem, a new simi- a local optimum. Besides, their learning speed is too low [23].
larity measurement based on MSR is introduced to compute As another nonlinear method, support vector machine (SVM)
the bicluster membership matrix. To be more specific, let has a relatively strong generalization capability, especially for
Q = (qij )m×k be a bicluster membership matrix,{ and denote} small EEG datasets. Thus, it has been successfully employed
the collection of identified{biclusters as B} = b1 , b2 , … , bk to differentiate between epileptic seizure and seizure-free EEG
and EEG samples as S = s1 , s2 , … , sm . Hence, the func- signals [37]. But the high computing complexity of SVM [23]
tion can be formulated by: makes it inefficient to handle with the enlarging EEG data-
[ � � ( )] [ � � ( )]
( ) |H(Ij , Jj ) − H Ij , Jj | − Min |H(Ij , Jj ) − H Ij , Jj |
qij = Sim si , bj = [ ( )] [ ( )] (4)
Max |H(Ij , Jj ) − H Ij , Jj | − Min |H(Ij , Jj ) − H Ij , Jj |
� � � �
where Ij and Jj are the set of row and column of the jth sets. Unlike ANNs and SVM classifiers, k-Nearest Neighbour
bicluster bj, respectively. Ij and Jj are the set of rows and (kNN) [38] is a relatively fast nonlinear classifier for classify-
′ ′
columns of the jth bicluster bj adding or deleting the ith ing EEG signals. Whereas the performance of the kNN classi-
sample si, respectively. The element qij is the element of Q fier is heavily depend on the parameter k. As an emergent tech-
which represents similarity between the ith sample and the nology, ELM [23, 39] is a kind of learning algorithm of single
jth bicluster. The function expresses the normalized devia- hidden layer feed forward neural networks (SLFN). Its input
tion between the MSR of the bicluster bj and the MSR of bj weights and biases of the hidden layer can be chosen randomly
adding or deleting the sample si. The greater the value of the and output weights can be determined analytically. Hence, it
function (namely the larger qij), the smaller the similarity not only achieves a better generalization performance, but also
between the sample si and the bicluster bj. An example of improves the learning speed greatly. It has attracted increasing
bicluster membership matrix is given in Table 1. attention and become widely acceptable in the detection of
epileptic seizures [11, 40]. In this paper, ELM is adopted as a
2.3 Extreme learning machine classifier in our presented framework. {( )}N
Given N arbitrary training samples 𝐱j , 𝐭j j=1, where
Classification is another challenge for automatic epileptic [ ]T
𝐱j = [ xj1 , xj2 , ⋯ , xjn] ∈ Rn is the jth sample and
seizures detection, for there are still considerable overlaps T
𝐭j = tj1 , tj2 , ⋯ , tjm ∈ Rmdenotes the label of 𝐱j , a standard
between epileptic seizure and seizure-free EEG signals after
SLFN with L hidden nodes can be mathematical formulated
dimensionality reduction. In the last two decades, many clas-
as
sification methods have been developed for the classification
of epileptic EEG signals. Naive bayesian (NB) classifier [35], ∑
L
( )
a famous linear classifiers based on probability distributions, is 𝛃i g 𝐰i ⋅ 𝐱j + bi = 𝐨j , j = 1, 2, … , N (5)
frequently utilized to classify epileptic EEG signals. However, i=1
the main limitation of the NB classifier is that the posterior [ ]T
where 𝐰i = wi1 , wi2 , … , win represents the input weights
probabilities are usually difficult to be determined directly
connecting
[ the input] neurons and the ith hidden neuron,
[36]. In recent years, artificial neural networks (ANNs) [14] T
𝛃i = 𝛽i1 , 𝛽i2 , … , 𝛽im represents the output weights con-
necting the output neurons and the ith hidden neuron, bi is
the bias of the ith hidden neuron, 𝐨j is the output of the jth
Table 1 The bicluster sample. g(⋅) can be Sigmoid or RBF, or even other non-
b1 b2 … bj … bk
membership matrixQ differentiable activation functions.
s1 q11 q12 … q1j … q1k
{( If the )}NSLFN approaches these ∑NN arbitrary
� �
samples
s2 q21 q22 … q2j … q2k 𝐱j , 𝐭j j=1 with zero error, that is j=1 �𝐨j − 𝐭j � = 0, the
� �
… … … … … … …
formula (5) can be written as
si qi1 qi2 … qij … qik
… … … … … … … 𝐇𝛃 = 𝐓 (6)
sm qm1 qm2 … qmj … qmk where
13
316 Int. J. Mach. Learn. & Cyber. (2019) 10:311–323
� �
𝐇 𝐰1 , … , 𝐰L , b1 , … , bL , 𝐱1 , … , 𝐱L Start
� � � �
⎡ g 𝐰1 ⋅ 𝐱1 + b1 ⋯ g 𝐰L ⋅ 𝐱1 + bL ⎤
(7)
= ⎢ ⋮� � ⋱ ⋮� �⎥
⎥ Input EEG signals
⎢
⎣ g 𝐰1 ⋅ 𝐱N + b1 ⋯ g 𝐰L ⋅ 𝐱N + bL ⎦N×L Dmensionality
reduction
Find biclusters
13
Int. J. Mach. Learn. & Cyber. (2019) 10:311–323 317
algorithm as a dimensionality reduction method. Fifthly, we investigated to achieve desirable performances of the pro-
evaluate the overall performance of the proposed framework posed framework in terms of accuracy and efficiency.
and compare it with other state-of-the-art detection methods
in the literatures. Finally, we further analyze characteristics
of the identified biclusters to explore more meticulous rela- 3.1.1 Experiments for the optimal value of 𝛿
tionship between the EEG samples and related time series.
The epileptic EEG time series used in the following When the value of 𝛿 is larger than 3 or less than 0.5, the
experiments are from Department of Epileptology, Bonn recognition performance of the proposed framework will
University, Germany. The dataset are publicly available decrease sharply. Hence, we run the proposed framework
online [41], and widely used as a benchmark dataset. The on Sets A and E with the value of 𝛿 varying from 0.5 to 3 to
whole dataset consists of five subsets (A–E). Each subset pick up the optimal value of 𝛿. To compare the discriminat-
contains 100 single-channel EEG signals, each with duration ing capability of the proposed framework with different val-
of 3.2 s. All the signals were selected and cut from continu- ues of 𝛿, the receiver operating characteristic (ROC) curve is
ous multi-channel EEG recording after visual inspection for used. Figure 4 displays six ROC curves of averaged results
artifacts, caused by the muscle activities or eye movements, of ten detections with the value of 𝛿 varying from 0.5 to 3.
etc. Sets A and B were recorded from five healthy volunteers From Fig. 4, it is obvious that the area under ROC curve
in relaxed and awake state with eyes open (Set A) and eyes (AUC) obtained at 𝛿 = 1.5 is the largest with the value of
closed (Set B) using the standard international 10–20 system 0.9200. Therefore, the optimal value of 𝛿 could be set as 1.5.
for surface EEG recording. However, for Sets C, D, and E,
five epileptic patients were chosen for pre-surgical evalua-
tion of epilepsy by using intracranial electrodes. Electrodes 3.1.2 Experiments for the optimal value of𝛼
were implanted symmetrically for recording EEG signals
from epileptogenic zone (Set D) and the hippocampal for- In order to find the optimal value of 𝛼, two parts of experi-
mation of opposite hemisphere of the brain (Set C). The ments are carried out on Sets A and E using the proposed
Set E was taken from all recording sites exhibiting the ictal framework with the value of 𝛼 varying from 1.0 to 2.5.
activity. Namely, Sets C and D contain interictal intervals Because the value of 𝛼 is mainly related to the efficiency
whereas Set E includes the ictal activity. All EEG signals of the proposed framework, we study the relationship
were recorded within 128-channel amplifier system with an between the value of 𝛼 and executing time to determine
average common reference with a sampling rate of 173.6 Hz. the relatively optimal interval of 𝛼 preliminarily. Then, we
Therefore, each signal has a length of 4097 time series. In pick up the optimal value of 𝛼 with the highest recognition
order to assess the performance, five binary classification accuracy in the following experiment.
tasks are performed to discriminate from Sets A and E, Sets As shown in Fig. 5, with the increase of the value of 𝛼,
B and E, Sets C and E, Sets D and E, Sets A and D, respec- the executing time goes up gradually. More precisely, in the
tively. In each binary classification task, 95% of datasets is
selected randomly as training sets, while the remaining as
1
testing sets. In addition, all the experiments are performed
in MatlabR2013a platform. 0.9
0.8
3.1 Experiments for important parameters
0.7
13
318 Int. J. Mach. Learn. & Cyber. (2019) 10:311–323
100 1000 80
ELM
95 900
95 KNN
90 800 SVM
60
80 600
Time(s)
75 500 40
70 400
30
65 300
20
60 200
Accuracy(%) 10
55 100
Time(s)
50 0 0
1 1.5 2 2.5 Average Task I Task II Task III Task IV Task V
α
In order to select a classifier with high accuracy and good To verify the effectiveness of the proposed similarity
efficiency, we run the proposed framework with ELM, NB measurement, we compare its performance with binary
BP, SVM, and kNN to classify EEG signals in five binary similarity measurement in five classification tasks. Fig-
classification tasks. Besides, the value of k in kNN is set as ure 7 presents a histogram that shows averaged accura-
10, while the SVM type is nu-support vector classification cies of ten detections using these two types of similarity
(nu-SVC). Figure 6 shows the ten-fold cross validation measurements. As we expected, results obtained by the
error rates obtained by the proposed framework aided with proposed similarity measurement have obvious advantage
five classifiers. As seen from Fig. 6, the validation error over the binary similarity measurement. The reason might
rates obtained by the proposed framework aided with ELM be that the proposed similarity measurement could better
is less than others in all the classification tasks, especially characterize the degree of the relationship between EEG
for Task V. Furthermore, we also compare the average signals and identified biclusters than the binary similarity
executing times of the proposed framework aided with five measurement.
classifiers in all classification tasks. Results are shown in
Table 2 Average accuracy for the proposed framework with different values of 𝛼 when 𝛿 = 1.5
𝛼 values 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.5
Accuracy (%) 95.00 87.00 85.00 88.00 89.00 84.00 85.00 89.00 83.00 87.00 83.00 87.00
13
Int. J. Mach. Learn. & Cyber. (2019) 10:311–323 319
60
Accuracy(%)
40
20
0
Task I Task II Task III Task IV Task V
40
20
0
Task I Task II Task III Task IV Task V
13
320 Int. J. Mach. Learn. & Cyber. (2019) 10:311–323
13
Int. J. Mach. Learn. & Cyber. (2019) 10:311–323 321
3.5
2.5
Number
2
1.5
0.5
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Bicluster ID
40 80
S117 S109
30 S121 60 S140
S135 S166
Amplitude(microvolt)
Amplitude(microvolt)
20
40
10
20
0
0
-10
-20
-20
-40
-30 3846 3847 3848
196 2295 2296
Time series
Time series
20
S176 60
10 S112
S182
40 S136
S183
0 S138
Amplitude(microvolt)
20
Amplitude(microvolt)
-10 S177
0
-20 -20
-30 -40
-40 -60
-50 -80
-60 -100
2645 2988 3845 3846 2989 2990 3845
Time series Time series
(c) Bicluster ID 3 (d) Bicluster ID 5
13
322 Int. J. Mach. Learn. & Cyber. (2019) 10:311–323
13
Int. J. Mach. Learn. & Cyber. (2019) 10:311–323 323
mixture model of zero-crossing intervals. IEEE Trans Biomed 35. Kumar U, Raja SK, Mukhopadhyay C et al (2011) Hybrid Bayes-
Eng 60(5):1401–1413 ian classifier for improved classification accuracy. IEEE Geosci
18. Cheng Y, Church GM (2000) Biclustering of expression data. In: Remote Sens Lett 8(3):474–477
Proceedings of the 8th International Conference on Intelligent 36. Zhang GP (2000) Neural networks for classification: a survey.
Systems for Molecular Biology, San Diego, La Jolla, California, IEEE Trans Syst Man Cybern Part C (Appl Rev) 30(4):451–462
USA, pp. 93–103 37. Kumar Y, Dewal ML, Anand RS (2014) Epileptic seizure detec-
19. Chen HC, Zou W, Lu TP et al (2014) A composite model for tion using DWT based fuzzy approximate entropy and support
subgroup identification and prediction via bicluster analysis. Plos vector machine. Neurocomputing 133(8):271–279
One 9(10):e111318 38. Kotsiantis SB (2007) Supervised machine learning: a review of
20. Carreiro AV, Anunciação O, Carriço JA et al (2011) Prognostic classification techniques. Informatica 31(3):249–268
prediction through biclustering-based classification of clinical 39. Wang XZ, Chen AX, Feng HM (2011) Upper integral net-
gene expression time series. J Integr Bioinform 8(3):175–175 work with extreme learning mechanism. Neurocomputing
21. Asgarian N, Greiner R (2006) Using rank-1 biclusters to classify 74(16):2520–2525
microarray data. Department of Computing Science, University 40. Yuan Q, Zhou W, Li S et al (2011) Epileptic EEG classification
of Alberta, Edmonton, pp 1–10 based on extreme learning machine and nonlinear features. Epi-
22. Busygin S, Boyko N, Pardalos PM et al (2007) Biclustering EEG lepsy Res 96(1–2):29–38
data from epileptic patients treated with vagus nerve stimulation. 41. Andrzejak RG, Lehnertz K, Mormann F et al (2001) Indications of
Data Min Syst Anal Optim Biomed 953(1):220–231 nonlinear deterministic and finite-dimensional structures in time
23. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: series of brain electrical activity: dependence on recording region
theory and applications. Neurocomputing 70(1–3):489–501 and brain state. Phys Rev E 64(6):0619071–0619078
24. Hartigan JA (1972) Direct Clustering of a Data Matrix. J Am Stat 42. Jahankhani P, Kodogiannis V, Revett K (2006) EEG Signal Clas-
Assoc 67(337):123–129 sification Using Wavelet Feature Extraction and Neural Networks.
25. Bendor A, Chor B, Karp R et al (2003) Discovering local structure IEEE Symposium on John Vincent Atanasoff International Mod-
in gene expression data: the order-preserving submatrix problem. ern Computing, Sofia, pp 120–124
In: Proceedings of the 6th Annual International Conference on 43. Polat K, Güneş S (2007) Classification of epileptiform EEG using
Computational Biology, Washington, DC, USA, vol. 10, no. 3–4, a hybrid system based on decision tree classifier and fast Fourier
pp. 49–57 transform. Appl Math Comput 187(2):1017–1026
26. Lazzeroni L, Owen A (2002) Plaid models for gene expression 44. Guo L, Rivero D, Seoane JA et al (2009) Classification of EEG
data. Stat Sin 12(1):61–86 signals using relative wavelet energy and artificial neural net-
27. de Castro PAD, de França FO, Ferreira HM et al (2007) Applying works. In: Proceedings of the 2nd Genetic and Evolutionary
biclustering to text mining: an immune-inspired approach. Artif Computation Conference, Shanghai, China, pp. 177–184
Immune Syst 83–94 45. Chandaka S, Chatterjee A, Munshi S (2009) Cross-correlation
28. Wang B, Miao Y, Zhao H et al (2016) A biclustering-based aided support vector machine classifier for classification of EEG
method for market segmentation using customer pain points. Eng signals. Expert Syst Appl Int J 36(2):1329–1336
Appl Artif Intell 47:101–109 46. Übeyli˙ ED (2010) Least squares support vector machine employ-
29. Inbarani H, Thangavel K (2010) A robust biclustering approach ing model-based methods coefficients for analysis of EEG signals.
for effective web personalization. Visual analytics and interactive Expert Syst Appl 37(1):233–239
technologies: data, text and web mining applications, pp 186–202 47. Pachori RB, Patidar S (2014) Epileptic seizure classification in
30. Xue Y, Liu Z, Luo J et al (2015) Stock market trading rules discov- EEG signals using second-order difference plot of intrinsic mode
ery based on biclustering method. Math Probl Eng 2015(1):1–13 functions. Comput Methods Programs Biomed 113(2):pp 494–502
31. Symeonidis P, Nanopoulos A, Papadopoulos AN et al (2008) 48. Jie X, Li C, Li H et al (2015) The detection of epileptic seizure
Nearest-biclusters collaborative filtering based on constant and signals based on fuzzy entropy. J Neurosci Methods 243:18–25
coherent values. Inf Retr 11(1):51–75 49. Song JL, Hu W, Zhang R (2015) Automated detection of epileptic
32. Madeira SC, Oliveira AL (2004) Biclustering algorithms for bio- EEGs using a novel fusion feature and extreme learning machine.
logical data analysis: a survey. IEEE/ACM Trans Comput Biol Neurocomputing 175(PA):383–391
Bioinform 1(1):24–45 50. Yang J, Wang H, Wang W et al (2003) Enhanced Biclustering on
33. Prelić A, Bleuler S, Zimmermann P et al (2006) A systematic Expression Data. In: Proceedings of the 3rd IEEE Symposium on
comparison and evaluation of biclustering methods for gene Bioinformatics and Bioengineering, pp. 321–327
expression data. Bioinformatics 22(9):1122–1129 51. Akhtar MT, Mitsuhashi W, James CJ (2012) Employing spa-
34. Tanay A, Sharan R, Shamir R (2002) Discovering statistically tially constrained ICA and wavelet denoising, for automatic
significant biclusters in gene expression data. Bioinformatics removal of artifacts from multichannel EEG data. Signal Process
18(Suppl 1):S136–S144 92(2):401–416
13