You are on page 1of 12

Computers in Biology and Medicine 102 (2018) 168–179

Contents lists available at ScienceDirect

Computers in Biology and Medicine


journal homepage: www.elsevier.com/locate/compbiomed

ECG authentication system design incorporating a convolutional neural T


network and generalized S-Transformation
Zhidong Zhaoa,c,∗, Yefei Zhangb,∗∗, Yanjun Dengc, Xiaohong Zhangc
a
Hangdian Smart City Research Center of Zhejiang Province, Hangzhou Dianzi University, Hangzhou, 311300, PR China
b
College of Communication Engineering, Hangzhou Dianzi University, Hangzhou, 311300, PR China
c
College of Electronics and Information, Hangzhou Dianzi University, Hangzhou, 311300, PR China

ARTICLE INFO ABSTRACT

Keywords: Electrocardiogram (ECG) is gaining increased attention as a biometric method in a wide range of applications,
Electrocardiogram(ECG) such as access control and security/privacy requirements. The majority of reported investigations using the ECG
Authentication system biometric method are usually based on fiducial or nonfiducial methods, which are always accompanied by a
Generalized stransformation series of issues, such as locating fiducial points accurately is difficult, feature selection is subjective, and clas-
ECG trajectory
sifiers are limited by the quantity and structure of data. This paper proposes a new biometric authentication
Convolutional neural network(CNN)
system for human identification that uses ECG signals as a biometric trait and integrates a generalized S-
transformation and a convolutional neural network (CNN). Specifically, we first introduce a blind segmentation
strategy that effectively avoids difficult data-specific heartbeat recognition and segmentation techniques. Then, a
generalized S-transformation is performed on the blind signal-processed ECG signal, capturing the ECG trajec-
tory at each time point in the frequency domain. Next, the getframe technology is used to capture an image of
the ECG trajectories and convert the one-dimensional signal to a two-dimensional image, which serves as the
input layer of the CNN, thus fully reflecting the changing trend in the ECG signal spectrum characteristics over a
continuous period. Finally, the CNN is used for automatic discriminative feature learning and representations,
which avoids a tedious feature extraction algorithm. In addition, considering the possible impact of ECG signals
with different signal behaviors on identification, experiments are performed on three ECG databases with diverse
features, comprising normal individuals, atrial fibrillation patients, and a noisy database, to evaluate the ef-
fectiveness of the proposed algorithm. Promising identification rates of 99%, 98%, and 99% were achieved,
respectively. Thus, our proposed ECG authentication system can be effectively used for identity recognition
under various conditions.

1. Introduction biological recognition.


Compared with the above-mentioned biometric identification sys-
In modern society, different identification and verification methods tems, ECG-based biometric authentication has many attractive char-
are being used in information systems to provide a high security level. acteristics that have been commercially proven [3]. These ECG char-
In critical secure access control, international border crossing and law acteristics are as follows: 1. high security level, 2. liveness detection, 3.
enforcement applications in particular, there is a great need for reliable internal features, and 4. simple acquisition. Therefore, ECG biometrics
and robust human recognition techniques. Among these methods, bio- is a highly promising, robust technology from a security point of view.
metric technologies play an important role in security applications [1]. Many feature extraction and recognition techniques have been
Different physiological data, such as the face, fingerprints, iris, DNA, proposed for ECG recognition, which is usually based on two strategies:
and genes, can be used for biometric recognition techniques [2]. fiducial or nonfiducial methods.
However, many types of external physiological data, such as the face,
fingerprints, and iris, are permeable to spoofing attacks. Electro- 1) Fiducial methods. The features are usually extracted based on ECG
cardiogram (ECG) cognition technology is a new type of anti-spoofing heartbeat waveforms, such as the amplitude, width, angle, or slope
biometric identification technology, which has opened up new fields of of the waves. Several subsets of these features have been used in the


Corresponding author. Hangdian Smart city research center of Zhejiang Province, Hangzhou Dianzi University, Hangzhou, 311300, PR China.
∗∗
Corresponding author.
E-mail addresses: zhaozd@hdu.edu.cn (Z. Zhao), Valora.Zhang@gmali.com (Y. Zhang).

https://doi.org/10.1016/j.compbiomed.2018.09.027
Received 14 July 2018; Received in revised form 25 September 2018; Accepted 25 September 2018
0010-4825/ © 2018 Elsevier Ltd. All rights reserved.
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

literature based on fiducial points (P, Q, R, S, T, and U) [4]– [5] 2. Methods and materials
[18],– [19] [21,23,27,28]. The accuracy of the method depends on
the accurately identifiable performance of the fiducial points. The 2.1. Methodological outline
features based on the QRS complex have been widely used for bio-
metric tasks, which are less sensible to physical and emotional The system diagram of the proposed approach is shown in Fig. 1.
variations with respect to the other portions of ECG signals [29]. This section provides a detailed description of the approach according
The Pan–Tompkins algorithm is the most frequent choice for fiducial to the signal processing flow.
detection [30], which is specifically developed for real-time QRS
detection in ECG signals [31]. This algorithm reliably detects QRS 1. Preprocessing. Cyclic translation ECG denoising based on wavelet
complexes using slope, amplitude, and width information. This al- hard thresholding is first performed to remove noise. Then, the fil-
gorithm also require extensive manual feature engineering work. tered ECG is blindly segmented into signal segments with an equal
Therefore, their generalization ability may be limited. It is difficult length of 3 s, without leveraging any heartbeat location information,
to determine when noise and variability distort the heartbeat wa- which is immune to diverse morphological variability.
veforms and render the measurements unreliable [32]. Juan Ar- 2. Generalized S-transformation and getframe technology. The
teaga-Falconi [26] reported the extraction of only one feature point, signal is converted from time domain to frequency domain analysis,
but the accuracy was relatively poor. In contrast, Liu [19] obtained a which is expected to reveal more detailed time and frequency
high accuracy rate based on four feature points, but the applied characteristics at multiple resolutions than the original time do-
algorithm is much more difficult and is less generalizable. Rugger- main. Drawing the current ECG trajectory at each time point in the
oDonida Labati [33] presented a convolutional neural network frequency domain captures all the ECG trajectory images (M points
(CNN)-based biometric approach for ECG signals and obtained a are used to draw M ECG trajectories). It also realizes the processing
satisfactory accuracy for identification. However, it also needs to of one-dimensional signals to two-dimensional images and serves as
consider the QRS complex, and extracts a set of m QRS complexes the input layer of the CNN, which fully reflects the changing trend of
from ECG samples of short duration. the ECG signal spectrum characteristics over the continuous period.
2) Nonfiducial methods. To improve the generalization ability and 3. CNN. Based on the enriched data representation, CNN is applied to
reduce the feature engineering effort, nonfiducial methods have each spectral characteristic at each time to learn the intrinsic pat-
been introduced. They do not use characteristic points for gen- terns automatically. This allows parallel feature self-learning for
erating the feature set. Generally, the entire ECG curve is divided various spectral characteristics, avoiding the time-consuming
into windows that do or do not overlap, and features are extracted manual feature engineering. The learned features reflected by the
from these windows [6,20,25]. Therefore, the ECG curve is subject internal parameters of the CNN are then used to enable one-to-one
to some similar restrictions that are effective for one dataset or identity authentication.
sensor placement method but may not be well generalized to other
datasets or placements. Ting [17] had reported using an extended 2.2. Databases
Kalman filter to an ECG-based personal identification, but the
classifier is limited by the quantity and structure of data and the In this work, ECG signals were obtained from an ECG database
ability to handle abnormal situations such as negative T is weak. (Physionet ECG database). We adopt two such databases, and Table 1
presents the characteristics of ECG signals obtained from the said da-
The technique of automatically generating features has been ex- tabase.
tensively addressed in the literature, especially based on the cutting- The D1 database [7] measurements were obtained with ECG lead I.
edge technology for machine learning and neural networks of deep The records were obtained from volunteers (44 men and 46 women,
learning [10]– [14]. Such technique has played an important role in aged 13–75 years) from which we randomly selected 50 volunteers.
artificial intelligence, such as identity and speech recognition. Among Importantly, we chose this database because the number of records for
these approaches, the CNN greatly simplifies the subjectiveness of each person varies from 2 (collected during one day) to 20 (collected
feature extraction because it does not require complicated image pre- periodically over 6 months). Each record is independent and has a
processing and feature extraction steps. A CNN consists of multiple completed ECG signal with no electrodes unstuck.
layers, and each layer has a small set of neurons to process a portion of The D2 and D3 databases [8] consist of single-lead ECG records,
the input image. These sets are tiled to introduce overlapping regions, supplied by AliverCor, which were annotated by clinical experts and
and the process is repeated layer by layer to achieve a high-level ab- categorized into one of four groups (i.e., normal rhythm, atrial fi-
straction of the original image. Inspired by this approach, ECG signals brillation (AF), other rhythm, and noisy recordings). We randomly se-
can be viewed as a one-dimensional image. Therefore, we explore how lected 50 groups of AF signals, marked as database D2, and 50 groups of
to effectively apply the CNN method to ECG authentication to avoid noisy signals (O), marked as database D3. We chose these databases
labor-intensive feature engineering efforts and capture more hidden because the measurement methods coincided with the ever-evolving
patterns from the ECG data. mobile measurement and wearable measurement methods. Ad-
This paper proposes a novel ECG biometric authentication system ditionally, some of the patients with AF and noisy testers were selected
that incorporates generalized S-transformation (GST) and CNN techni- to verify the robustness of the algorithm against unhealthy people and
ques. The proposed system achieves accurate identification results noisy environments. The biometric authentication system we proposed
using simple data entry and sophisticated internal feature acquisition in this paper is expected to increase the availability of authentication in
techniques, as shown in Fig. 1. It avoids complicated heartbeat detec- the emerging wearable device industry.
tion and segmentation techniques associated with ECG signals and the Through the selection of the above three databases, we compre-
cumbersome manual processing with subjectivity, which are both time hensively consider the different signal behaviors that exist in real-world
consuming and have limited generalization capabilities. The remainder measurements.
of this paper is organized as follows. Section 2 presents the detailed To maximize contrast and create a balanced database, a 10-fold
design of the ECG authentication system, followed by the experimental cross-validation method was used 10 times to reduce the generalization
analysis of the parameters and performance optimization of each al- error in the training set, Fig. 2 below shows the schematic diagram of
gorithm, which are described in section 3. Section 4 provides the eva- 10-fold cross validation.
luation of the ECG recognition performance in three databases. Finally, The database D1, D2 and D3 are first divided into 10 equally sized
Section 5 concludes this paper. mutually exclusive subsets: Di = d1 d2 ... d10 . di dj is empty. Each

169
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

Fig. 1. Flowchart of the proposed ECG authentication system, incorporating convolutional neural network and generalized S-transformation (Note: M = 3fs;
fs = sampling frequency).

Table 1 10 subsets. To reduce the difference caused by different sample divi-


Characteristics of ECGs obtained from Physionet ECG database. sions, we randomly apply different division methods by repeating the
Data source Type of Amount Sampling Data
process 10 times. The resulting assessment is the average of the 10
abnormalities frequency length replicates of the 10-fold cross-validation results, the mean of the ob-
[data/ tained results is the final performance indicator. Obviously, the stability
time] and fidelity of the cross-validation method evaluation results are con-
D1 ECG-ID Normal (N) 50 fs = 500 Hz 1500/3 s
siderably improved compared with those of the commonly used single-
Database division leave-one-out method.
D2 Physionet/Cinc Atrial fibrillation 50 fs = 300 Hz 900/3 s
Challenge 2017 (AF)
D3 Noisy (O) 50 2.3. Preprocessing

The preprocessing operation includes two steps, namely filtering


and blind segmentation, as shown in Fig. 1.
Cyclic translation ECG denoising method as shown in Fig. 3, which
we proposed in previous studies [9], was used.
Thereafter, the filtered ECG was blindly segmented into windows
with equal lengths. The fixed window length was regulated as 3 s to
ensure that there was at least one complete heartbeat. For each subject,
fixed-window-length ECG data were randomly selected and used for the
next time-frequency domain conversion. An example is shown in Fig. 4
in which ECG windows are randomly chosen. These data generally in-
clude different numbers of heartbeats as well as different amplitude
Fig. 2. Schematic of 10-fold cross-validation method repeated 10 times. signal patterns (normal, AF, and noisy). The blind segmentation
strategy can effectively avoid the complexities of data-specific heart-
beat recognition and segmentation techniques.
subsection maintains the consistency of the data distribution, which is
obtained through hierarchical sampling from D. Then, every nine sub-
sets of the union is considered as a training set, and the remaining 2.4. Generalized S-transformation
subset serves as the test set. Thus, 10 training and test sets are obtained.
Thus, we can conduct 10 training and testing assessments and obtained The instantaneous frequency characteristics of signals can reflect
the mean of 10 final test results. In this paper, the mean of 10 test sets is the frequency variation of signals at different times. Compared with the
used as the result of cross validation to evaluate the performance of the conventional time-frequency analysis method, the time-frequency
algorithm. Numerous methods are available to divide database D into analysis method of GST has advantages of high frequency resolution,
strong noise immunity, no cross-term interference, and an adjustable

Fig. 3. Flowchart of ECG denoising.

170
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

Fig. 4. Blindly chosen ECG segments with diverse behaviors from three databases.

window function. Therefore, we first used GST to transform the ECG transformation are as follows:
signal in time and in the frequency domain, obtaining a one-dimen- f 2p t 2
f p
sional signal from a two-dimensional image processing, thus fully re- g (t ) = e 2 2
flecting the time-frequency joint characteristics of the ECG signal. The 2 (5a)
spectral feature map at each time point (a total of n frequency spectra at 2
f ( t) 2
n time points) served as the input layer of the CNN, providing sufficient f 2
GS ( t, f , { GS }) = e GS2
time domain-frequency domain joint distribution information for 2 GS (5b)
identity authentication.
In this study, the values of the parameters and p are adjusted
The S-transformation is a “phase correction” of the continuous
based on the recognition rate.
wavelet transform (i.e., the extension of the wavelet transform at
multiple resolutions). GST is an improvement of the Gaussian window
2.5. ECG trajectory image presentation as input of CNN
function based on the S-transformation.
First, the S-transformation [10] of the signal x (t ) is expressed as
Based on the GST described in the previous chapter, we convert the
follows:
ECG signal from a time domain to a time-frequency domain for analysis
s( , f ) =
+
x (t ) (t )e j2 ft dt and construct a new feature vector as the input layer of the CNN. The
(1) current ECG trajectory at each time point (M points are used to draw M
t2 2 2 2 ECG trajectories) is drawn in the frequency domain. The specific
where 2 2, , and W ( ) = e is the Fourier
1 1
(t ) = e (f ) = f2
2 f structure is shown in the following flowchart (Fig. 5).
transform of (t ) . An example of ECG trajectory generation is illustrated in Fig. 6
Therefore, the standard form of the S-transformation can be ex- below. Each map is an ECG trajectory at the current time point, cap-
pressed as turing short-term system features, a 3-s signal with a length of 1500
points (D1) and 900 points (D2 and D3), and an obtained time-fre-
+ f (t )2 f 2
S( , f ) = x (t ) e 2 e j2 ft dt quency matrix size of 450 × 1500 (D1) and 450 × 900 (D2 and D3).
2 (2)

here, is a time shift factor. Therefore, the S-transformation can be Step 1: Since the GST is a complex number, it contains both real
regarded as a continuous wavelet transform of the signal multiplied by parts and imaginary parts, and the resulting complex matrix can be
a phase correction factor, and a translational and telescopic local expressed as the following equation (6), where N = 1.5fs , M = 3fs ,
Gaussian window function is used as a mother wavelet. and fs is the sampling frequency, and each column is the "local
Second, considering that the time and frequency width cannot be spectrum" for that point in time.
adjusted in an S-transformation, Mansinha et al. [11] used a general- a1,1 + b1,1 i ... a1, M + b1, M i
ized Gaussian window function instead of a Gaussian window. They
a2,1 + b2,1 i ... a2, M + b2, M i
introduced two parameters ( and p ) to change the time and frequency
aN ,1 + bN ,1 i ... aN , M + bN , M i (6)
resolution of the signal, and achieved a multiresolution analysis. The N ×M
modified window width factor is as follows:
Step 2: We perform the same processing procedure for each column
(f ) = (“local spectrum”), setting the real part as the horizontal axis and
f p
(3) the imaginary part as the vertical axis. The ECG trajectory at each
Equation (3) is applied to equation (1) and then to equation (2) to time point is plotted as shown in the third row image in Fig. 6.
obtain the GST of the signal. When both and p have a value of 1, the Step 3: The generated ECG trajectory still requires further proces-
S-transformation is achieved. sing to obtain the final ECG image. In this study, we use the get-
frame technology in MATLAB to capture the current axis because it
)2f 2p
appears on the screen as a movie frame and obtain a structure
(t
+ f p
S( , f ) = x (t ) e 2 2 e j2 ft dt
2 (4) containing image data, including two steps to generate image pixels,
defined as equations (7) and (8):
here, the window function and basic wavelet of the generalized S-

171
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

1 In Windows systems, a pixel is 1/96th of an inch.


2 In Macintosh systems, a pixel is 1/72nd of an inch.
3 In Linux® systems, the size of a pixel is determined by the system
resolution.

In this study, our algorithms were all applied in a Windows system.


With a computer resolution of 3840 × 2160 , we capture an Im g image
with 1120 rows and 840 columns, and a fixed pixel size of
p (r , c ) = 1/96th of an inch . Thus, a picture needs approximately
940,800 pixels.

Step 4: The higher the image pixels, the slower the identification
speed. Therefore, we reduce the pixels of the obtained trajectory
vector, as indicated in equation (9). In this study, we set
shift1 = 1/28 and shift2 = 1/24 to reduce the size of the image to
40 × 35 and ensure that the number of pixels required is only 1400,
as shown in the image in the final row in Fig. 6.
z j (i) = Shift1j r (i ) + Shift 2j c (i) (9)
Fig. 7 shows the overlap of ECG trajectory images in the frequency
domain. The results were derived from subject T1 of database D1,
subject T2 of database D2, and subject T3 of database D3. As for each
subject, the three ECG signals in the same row (e.g., row 1 or 3 or 5) are
randomly selected by the fixed window to verify the characteristics of
the same subject, i.e., the same ECG trajectory.
The ECG trajectory images of the same subject at different times are
horizontally compared, e.g., the ECG trajectory of T1 at three time
points, i.e., the three figures in the 2nd row. These three figures are
basically the same. It means that in the frequency domain, the ECG
Fig. 5. Construction of the input layer of CNN. trajectory images show a greater similarity for the same individual at
different periods. Despite being collected at different times, the overlap
is essentially the same.
Additionally, the ECG trajectory images of different subjects are
compared longitudinally, e.g., the overlap of ECG trajectory images in
the 1st column. These three images show significant differences, par-
ticularly in the edge portion, which are caused by different subjects
with different ECG trajectories.
These observations can be understood as follows: as a quasi-periodic
signal, the ECG signal itself is unique and permanent. When it is con-
verted from the time domain to the frequency domain, the nature does
not change. ECG trajectory images of the same individuals have simi-
larities and differences with different individuals.
The above analysis shows that the ECG trajectory images at M time
points in the frequency domain obtained by GST reflect unique and
permanent spectrum feature trends in continuous time. In summary, we
can use the ECG trajectory images proposed by GST to represent dif-
ferent individuals, with a total of n trajectories at M time points as the
input layer of CNN to perform ECG authentication.

2.6. Convolutional neural networks


Fig. 6. ECG signal of subject T2. Second row: Each column of the complex
matrix obtained by generalized S-transformation (“local spectrum”); third row:
As a powerful feed-forward neural network, CNNs have become a
ECG trajectory of the “local spectrum”; fourth row: corresponding image.
heavily researched topic in the field of image recognition [12]. A CNN
essentially learns higher-level information features spontaneously by
Im g = {I (r , c )|1 < r < N , 1 < c < M } (7) constructing multiple hidden layer networks, convolutional operations,
and training data to achieve more accurate classification or prediction
tasks. Therefore, CNNs can greatly simplify the subjectivity of feature
z (i) = {z (j)|1 j 2} (8)
extraction in conventional learning methods. Moreover, compared with
Thus, an image Im g of N rows and M columns is obtained, corre- the manual feature engineering, CNNs are more capable of discovering
sponding to N*M pixels I (r , c ) as equation (7). The most critical pro- intricate patterns in high-dimensional data.
blem is the size of the captured area (the window, coordinate area, or The standard architecture of CNN consists of the following four
area specified by rect). The size of the image data group for segmen- parts: convolution, rectification linear activation function, pooling of
tation or capture technology often depends on the screen resolution and functions, and full connection layer. The architecture is shown in Fig. 1
operating system settings. We use the getframe technology to capture above [13].
the graph data:
1. Input: The input of the CNN model is the one-dimensional spectral

172
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

Fig. 7. Overlap effect of ECG trajectories in the frequency domain for subject T1, T2, and T3 in different periods.

feature map (a total of n spectral feature maps at n time points) stride and padding in computer operations) before the convolution step.
obtained through GST. For each spectrum feature map, the size is
formatted as 35 × 41. 3. Activation layer: Nonlinear transform operation. Nonlinearities are
2. Convolution layer: The convolution layer is the main building introduced into the convolutional network by rectification linear
block of the CNN. This feature is extracted by applying a convolu- activation functions mapping to the data. Compared with conven-
tion filtering operation to the input data. It is composed of multiple tional artificial neural networks, CNNs have a much deeper layer.
feature maps. Each feature map is formed by a convolution opera- Thus, the forward propagation calculation is relatively large, and it
tion of the convolution kernel for the feature map of the previous is easy to generate gradients and other related problems during
layer, expressed as equation (10). The convolution kernel is the back-propagation calculations. Therefore, in this study, we use the
content that the network will learn, including the weight value rectified linear unit as an activation function instead of the con-
matrix W (i.e., the m-dimensional filter) and the bias term b. ventional sigmoid and tanh functions. The form of the function is
shown in equation (11).
X (l) = f (W (l) X (l 1) + b (l) ) (10)
f (X ) = max(0, WX + b) (11)
Equation (6) shows that the weight matrix W is the same for all
neurons in layer X (l) , which embodies an important feature of convolu- 4. Pooling layer: Each feature map is downsampled, and the dimen-
tion layer-weight sharing. In an actual convolution operation, we first sions of each feature map and amount of data are reduced, but the
need to set up the number of filters and the bias term (represented as

173
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

Fig. 8. Acc fluctuations at different time/frequency resolutions.

most important information is retained. Therefore, this procedure is 2) Accuracy (Acc)


also referred to as a downsampling operation, which is expressed as
follows: Acc = (T )/(T + F ) (13)

X (l) = f (Z (l) ) = f (W (l) down (X (l 1) ) + b (l ) ) (12) T and F represent the number of true acceptances and number of
false acceptances respectively.
The subsampling function down ( ) has a variety of forms. Among
them, maximum pooling is the most commonly used strategy. The input 3) Equal error rate (EER)
data are divided into a set of nonoverlapping rectangles, and the
maximum value of each subset is the output [14]. The threshold is adjusted, and when the false rejection rate (FRR) is
equal to the false acceptance rate (FAR), the FAR (or FRR) value is
5. Fully connected layer: The outputs of the convolutional layer and called the EER.Here,
the pooled layer described above represent the advanced features of
the input image. When the fully connected layer is achieved, it uses Number of false rejections
FRR =
these features for classification. Each neuron is interconnected with Total number of intraclass tests (14)
all neurons of the previous layer. The softmax layer is the final layer
of the fully connected network. The output is an N-dimensional Number of false acceptances
FAR =
vector, corresponding to the number of classes that we require. In Total number of interclass tests (15)
this study, N is set as the number of all testers in this work, and all
neurons are fully connected and learned through feed-forward and
back-propagation algorithms [15].
6. Hardware implementation: The CNN model presented here was 3. Experiments
originally developed at Keras, which has a TensorFlow background
[16]. The model described in this paper was written in MATLAB, 3.1. Optimization of generalized S-transformation
executed with a graphics processing unit (GPU). Specifically, the
model was built on a DELL Tower workstation with an E5-2690 v4 GST provides a rich data representation in the frequency domain. In
Intel® Xeon® processor GPU with 32 GB (2*16 GB) 2400 MHz DDR4 this study, we use the tester's one-dimensional spectral feature map (a
RDIMM ECC memory and an NVIDIA® Quadro® P5000 16 GB video total of n spectral feature maps at n time points) as the input layer of the
card in a Windows environment. Compared with a central proces- CNN for ECG authentication.
sing unit, a GPU typically achieves at least 5x to 10x acceleration, We adjust the window function by and p to change the time-fre-
which can significantly speed up the training process. quency resolution of the signal. Additionally, we use the tester's identity
verification matching accuracy, Acc, to adjust the parameters.

2.7. Performance analysis 1) When = 0.4 is set and p is changed, we observe a trend of
changes in Acc. As shown in Fig. 8 on the left, Acc gradually in-
Several accepted formulas and measures were used as indices to creases with the increase in p and reaches its maximum value at
evaluate the performance of the ECG authentication system: p = 5.
2) When p = 5 is set and is changed, we observe a trend of changes
1) Testing time (s) in Acc. As shown in Fig. 8 on the right, Acc gradually increases with
an increase in and reaches its maximum value at = 0.4 .
The authentication time of each subject (Environmental conditions:
the hardware implementation setup and ECG authentication system in Therefore, the parameters are set as = 0.4 and p = 5 in this
this paper). study to obtain the best accuracy.

174
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

3.2. Optimization of CNN model show that the 9-layer and 12-layer architectures provide the best
performance. However, compared with the 12-layer architecture,
To test the proposed ECG authentication system, we conducted the 9-layer architecture results in a faster test time.
training and testing with database D1 described in section 2.2. An
important goal of these experiments was to determine the best archi- Fig. 9 shows the learning results in terms of the accuracy obtained
tecture that can provide the highest performance while achieving the as the number of epochs was varied. The results showed that when
lowest computational cost for real-time implementation. applied to the test set, after many iterations, the accuracy reached a
There are many parameters to consider when choosing the best stable value. The 9-layer architecture resulted in a higher accuracy for
architecture for a CNN model. In terms of the construction of the CNN the training and test sets than the other architectures. From these re-
model, we need to consider the parameters of the convolutional layer sults, it can be concluded that the CNN model, which was designed and
and pooling layer, and the architecture of the CNN. For the test data, optimized to implement ECG authentication, was reliable. Therefore,
the data length must be considered. the optimal architecture of the CNN model for ECG authentication was
selected on the above analysis, as shown in Fig. 10 and Table 2.
1. Parameter settings: Ensuring the activation function ReLU in the
active layer, subsampling function of the maximum pool in the
pooling layer, and 7-layer architecture, the length of data collected
is 3 s (1000 data points for database D1). The number and size of
convolution kernels and pool kernels, pace, and zero-fill operations
are adjusted in turn. In our proposed ECG authentication system, we
found that adjusting these parameters makes Acc and the test time
basically stable. Therefore, the specified parameters are as presented
in Table 2 below.

Table 2
Details of the CNN structure for the ECG authentication system.
Layers Type Kernel size Kernel Stride Padding Output size
amount

1 Input – – – – 35 × 41
2 Convolution 3×3 8 1 1×1 33 × 39
3 Excitation – – – – 33 × 39
4 Max-pooling 3×3 – 2 0×0 11 × 13
5 Convolution 3×2 8 2 1×1 9 × 12
6 Excitation – – – – 9 × 12
7 Max-pooling 1×2 8 2 1×1 9×6 Fig. 9. Accuracy of the proposed CNN model for the ECG authentication system
8 Fully connected 300 – – – 300 (database D1).
9 Output – – – – n

Notes. fs : sampling frequency, n: number of testers.


3. Tester data length settings: Based on Fig. 10 and Table 1, we
2. Architecture of CNN settings: When the parameters in the con- obtained the best CNN model for identity authentication, con-
volution and pooling layers are set according to the rules in Table 2 sidering that in the current system, the tester's identity authentica-
and the depth of the layer is adjusted, the results are as presented in tion time is a major factor affecting the user experience. Table 4
Table 3. Architectures with more than 12 layers show over-fitting or indicates that as the time of data collection by the tester increases,
under-fitting and are therefore not considered. As the depth of the the one-dimensional spectrum feature map (a total of M ECG tra-
layer increases, both the test time and accuracy increase. The results jectory images at M time points) obtained after GST increases. Thus,
the length of the input layer of the CNN model increases, which
Table 3 directly leads to an increase in the authentication time, and finally
Comparison of different architectures of the CNN model for the ECG authen- results in a worse user experience. This process will undoubtedly
tication system. improve the accuracy of the test but will also lead directly to an
Layers Type Performance increase in the tester's authentication time, which will result in a
poor user experience. Based on comprehensive considerations, we
Training time Training loss Acc (%) stipulate that every time that identity authentication is performed,
(s) (%)
the length of data collection should be 3 s.
5 I - C - M - Fc eO 4.4352 0.6377 94.93
6 I - C - E − M - Fc eO 4.5153 0.9541 95.30 4. Results and discussion
7 I - C - M - C - M - Fc -O 5.0071 0.1367 95.73
8 I - C - E − M - C - M - Fc -O 5.0201 0.0526 95.03
4.1. Performance of CNN model
9 I-C-E−M-C-E−M- 5.2094 0.0447 96.63
Fc eO
10 I-C-E−M-C-E−M- 5.2196 0.9289 95.98 In section 3.2, we described the optimization of the architecture of
Fc - Fc eO the CNN model for ECG authentication, as demonstrated in Fig. 10 and
11 I-C-E−M-C-E−M- 5.2485 0.1678 96.37 Table 2. Table 5 shows the performance of the ECG authentication
C - M -Fc eO
system when using databases D1, D2, and D3. When applied to the
12 I-C-E−M-C-E−M- 6.1586 0.2597 96.86
C - E − M -Fc eO training set, an accuracy of 99% and a very short test time of 5.2094 s
were obtained for the normal group D1. These parameters were 98%
Notes. I: Input, C: convolution, M: max-pooling, Fc: fully connected, O: output. and 5.0274 s, respectively, for the AF group D2, and 99% and 5.4513 s
The best performance is indicated in bold. for the noisy group D3. When applied to the test set, satisfactory results
were again achieved for these three databases. The accuracy of the

175
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

Fig. 10. Architecture of the proposed CNN model for the ECG authentication system (Note: M = 3 fs ; fs = sampling frequency).

Table 4 authentication system combined with GST and CNN presented in this
Comparison of different data lengths in the CNN model for the ECG authenti- study shows a strong learning ability, and its robustness in a noisy
cation system. environment is very strong. The results also indicate that our proposed
Data length Performance system can understand the basic structure of noisy ECG signals.
Therefore, we might be able to use our proposed system to accurately
Training time (s) Training loss (%) Acc (%) classify unknown noisy ECG signals.
In addition, the testing performance in terms of the confusion ma-
2 4.8965 0.1181 94.37
3 5.2094 0.0447 96.63 trix of all the above three databases is shown in Fig. 12. The perfor-
4 5.6795 0.0626 96.89 mance visualization presented in Fig. 12 clearly shows that our identity
5 6.0536 0.0267 96.47 authentication system can effectively identify testers (with yellow di-
6 6.2513 0.0988 96.61 agonal entries) with very few false positives or false negatives (non-
7 6.9759 0.0248 97.13
8 7.5783 0.1222 97.08
zero, non-diagonal entries).
9 8.4269 0.0210 97.14

Note. The best performance is indicated in bold. 4.2. Comparison of performance with state-of-the–art methods

Table 5 To carry out a more extensive and accurate comparative perfor-


Performance of the proposed ECG authentication system. mance evaluation, we compared the base performance of the proposed
system with existing advanced algorithms in terms of whether they
Database Behavior Performance
perform the time-domain conversion and GST in the signal analysis or
Acc (%) Time (s) EER (%) include the blind signal processing technology (BSP) and which feature
extraction method and classifier are used. The experimental results are
Training set Normal 99.00 5.2094 5.68 presented in Table 6.
Atrial fibrillation 98.00 5.0274 5.96
Noisy 99.00 5.4513 6.45
In response to the existing challenges, we first introduce a blind
segmentation strategy that effectively avoids the complexities of data-
Test set Normal 96.63 1.0420 - specific heartbeat recognition and segmentation techniques. Then, GST
Atrial fibrillation 96.23 1.0055 – is performed on the blind signal-processed ECG signal, and the ECG
Noisy 96.18 1.0903 –
trajectory at each time point in the frequency domain is captured.
Note. The best performance is indicated in bold. Thereafter, we use the getframe technology to capture images of the
ECG trajectories and convert the one-dimensional signal to a two-di-
noisy group, D3, was not markedly different from that of the normal mensional image. Each ECG trajectory image reflects the “local spec-
group, D1, which may be due to the adoption of the cyclic translation trum” at the current time point. All of the images (a total of n ECG
ECG denoising based on a wavelet hard threshold. The test perfor- trajectory images at n time points) are used as the input layers of the
mances of the FAR and FRR of the above three databases are shown in CNN to fully reflect the time-frequency joint characteristics of the ECG
Fig. 11, where the value of the point at which the FAR and FRR curves signal. Finally, CNN is used for automatic feature learning, which ef-
intersect is the EER value. fectively reduces the complexity and difficulty of the algorithm in the
Similarly, these parameters performed well when applied to the test calculation and greatly improves the generalization ability of the al-
sets. Whether for a healthy person or a person with AF, the identity gorithm. It is an advantage of the in-depth learning of conventional
machine learning algorithms.

Fig. 11. FAR and FRR test performance curves for human identification based on training data from three ECG databases.

176
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

Fig. 12. Confusion matrix for human identification based on testing data from three ECG databases.

In addition, considering the possible impact of ECG signals with to learn the basic feature patterns from the data more effectively.
different signal behaviors on identity recognition, we used the normal RuggeroDonida Labati [33] presented a CNN-based biometric approach
individual, AF patient, and noisy databases for performance assessment, for ECG signals, which extracts significant features (a set of m QRS
making our work much more comprehensive than many other research complexes) from one or more leads, obtaining a biometric template.
reports. The performance of our identity authentication system in- Then, biometric templates were compared by computing simple and
creases simultaneously with the increase in data. However, to improve fast distance functions, obtaining a remarkable accuracy for identifi-
performance, a large dataset is required to train our system to achieve cation. However, its high accuracy is based on a complex extraction of
the best generalization ability. QRS waves, an artificial signal stitching, and a small database. It is
Notably, when Tang et al. [18] evaluated their algorithms, the re- difficult to determine whether the results are still valid when the da-
cognition rate was much lower than that of our method (detailed al- tabase becomes more complicated. Moreover, although Zhang et al.
gorithms and performance are given in the 3rd row of Table 6). This [22] adopted the CNN algorithm, our algorithm exhibited a better ac-
result may be because we used deep learning algorithms in our research curacy (detailed algorithms and performance are given in the 3rd last

Table 6
Comparative tabulation of experimental results for different identification algorithms.
Authors GST + BSP Feature Extraction Classifier Performance

Singh et al., 2012 [24] No P and T wave delineation Generation of scores from feature vectors Acc = 94%(with face)
QRS delineation Acc = 96%(with fingerprint)
Interval, amplitude and angle EER = 10.8%
features
Singh et al., 2013 [25] No Heartbeat interval features Linear projecting the sample space to a lower Acc = 95.55%
Waveform morphological features feature space EER = 4.45%
Tang et al., 2015 [18] No QRS detection Quantum networks neural Acc = 91.50%
Time-domain features
Liu et al., 2015 [19] No R-peak detection Decision tree Acc = 94.40%
ECG polynomial fitting
Polyfit-based ECG parameterization
Akaike information criterion
Sharma et al., 2015 [20] No Wavelet transform Support vector machine Acc = 96.00%
Multiscale energy
Multiscale eigenspace analysis
JuanArteaga-Falconi et al., 2015 No Fiducial points detection Threshold Acc = 84.93%
[26] Hierarchical validation scheme. FAR(EER) = 1.29%
Acharya et al., 2016 [21] No R-peak detection k-nearest neighbor Acc = 98.80%
Detection with 47 features
Localization with 25 features
Luca et al., 2016 [23] No R-peak detection Support vector machine Acc = 95.60%
Time-domain features
Zhang et al., 2017 [22] BSP Needless CNN Acc = 93.50%
Md Saiful Islam et al., 2017 [27] No Heart vector (HV) unimodal fusions of information EER = 9.58%
Wavelet distance measure
Short-time frequency (STF)
Fiducial feature-based template
(FFT)
Self-aligned morphology
Heartbeat shape (HBS)
Ruggero Donida Labati et al., 2018 No Complex QRS detection CNN EER = 1.36%–5.95%
[33]
Proposed GST + BSP Not required CNN D1: Acc = 96.63%, EER = 5.68%
D2: Acc = 96.23%, EER = 5.96%
D3: Acc = 96.18%, EER = 6.45%

Notes. GST: generalized s-transformation. BSP: blind signal processing. T: test time (s) The best results are highlighted. The results of this study are underlined.

177
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

row of Table 6). This is because at the input layer of CNN, we first used Appendix A. Supplementary data
the generalized S-transformation to perform the time-frequency domain
conversion, which shows more detailed time and frequency character- Supplementary data to this article can be found online at https://
istics than the original time domain. Another aspect worth noting is that doi.org/10.1016/j.compbiomed.2018.09.027.
the work of Acharya et al. shows a high degree of accuracy. However,
the authors first need to determine each heartbeat, which requires more References
engineering than our BSP. Three feature extractions were performed,
which represents an enormous engineering feat. [1] A.K. Jain, et al., An introduction to biometric, International Conference on Pattern
Recognition 11 (4) (2009) 1-1.
[2] L. Hong, A. Jain, Integrating faces and fingerprints for personal identification, Asian
Conference on Computer Vision 20 (12) (1998) 16–23.
5. Conclusion [3] M. Bassiouni, et al., A study on the intelligent techniques of the ECG-based bio-
metric systems, Recent Advances in Electrical Engineering (2015) 26–31.
[4] Safie, et al., ECG Biometric Authentication Using Pulse Active Width (PAW),
In this paper, we propose a new biometric authentication system for Biometric Measurements & Systems for Security & Medical Applications, (2011),
human identification that uses ECG signals as data sources and in- pp. 1–6.
tegrates GST and a CNN. 1). The blind segmentation technique is ap- [5] N. Laplante, et al., Caring:An undiscovered "super-ility" of smart healthcare, IEEE
Software 33 (6) (2016) 16–19.
plied to signal segments to effectively avoid the complexities of wave-
[6] M. Tantawi, et al., ECG based biometric recognition using wavelets and RBF neural
form recognition and segmentation techniques of ECG signals, such as network, Proc. 7thEur. Comput. Conf. (ECC), 2013, pp. 100–105.
R-peak detection and QRS wave identification. 2). A GST is introduced, [7] Physionet, The ECG-ID database, https://physionet.org/physiobank/database/
ecgiddb/, (2017).
combined with the getframe technology, to convert the signal from the
[8] Physionet, AF classification from a short single lead ECG recording: the PhysioNet/
time domain to the frequency domain and achieve the conversion of computing in cardiology challenge 2017, https://physionet.org/challenge/2017/,
one-dimensional signals to two-dimensional images. The ECG trajectory (2017).
images (a total of n ECG trajectory images at n time points) can fully [9] Y. Zhang, Z. Zhao, Evaluation of single-lead ECG signal quality with different states
of motion, International Congress on Image & Signal Processing, 2017, pp. 1–7.
reflect the change trend of the ECG signal spectrum characteristics in [10] Kaifeng Guo, et al., Signal instantaneous frequency feature extraction based on S
the continuous period, that is, the quasi-periodicity and uniqueness of transform, Electronic world 15 (2013) 97-97.
the ECG signal in the time domain are reflected in the frequency do- [11] Yang Yang, The Application of Time-frequency Analysis of Generalized S
Transform, Harbin Engineering University, 2011.
main. 3). A CNN, which is a deep learning technology, is introduced, [12] M. Oquab, et al., Learning and transferring mid-level image representations using
greatly reducing the workload of feature engineering and capturing the convolutional neural networks, IEEE Conference on Computer Vision & Pattern
intrinsic feature patterns more effectively. 4). The proposed system is Recognition, 2014, pp. 1717–1724.
[13] Zhihua Zhou, Machine Learning, Tsinghua University Press, Beijing, 2016.
extensively evaluated based on three databases, consisting of data from [14] I. Sutskever, et al., Sequence to Sequence Learning with Neural Networks, (2014),
normal individuals or AF patients and a noisy database, comprehen- pp. 3104–3112 eprint arXiv:1409.3215.
sively considering the different signal behaviors that exist in real-world [15] X. Yu, et al., A general backpropagation algorithm for feedforward neural networks
learning, IEEE Trans. Neural Network. 13 (1) (2002) 251–254.
measurements. The proposed algorithm can also be extended to other
[16] M. Abadi, et al., TensorFlow: a system for large-scale machine learning, OSDI 16
quasi-periodic, bio-signal-based user recognition applications, such as (2016) 265–283.
ballistocardiography and evaluation of body movements (walking, [17] C.M. Ting, et al., ECG based personal identification using extended Kalman filter,
10th International Conference on Information Science, Signal Processing and Their
jumping, etc.).
Applications, 2010, pp. 774–777.
However, we also noted that compared with the current mature [18] X. Tang, L. Shu, Classification of electrocardiogram signals with RS and quantum
technology based on fingerprint recognition, more efforts need to be networks neural, Int. J. Multimedia Ubiquitous Eng. 9 (2) (2014) 363–372.
made in several aspects. [19] B. liu, et al., A novel electrocardiogram parameterization algorithm and its appli-
cation in myocardial infarction detection, Comput. Biol. Med. 61 (C) (2015)
178–184.
1). In terms of database maturity, although an increasing number of [20] L.N. Sharma, et al., Multiscale energy and eigenspace approach to detection and
ECG databases benefit from the tremendous advances in data ac- localization of myocardial infarction, IEEE Trans. Biomed. Eng. 62 (7) (2015)
1827–1837.
quisition systems, the amount of data is still limited and cannot be [21] UR Acharya, et al., Automated detection and localization of myocardial infarction
compared with fingerprint data, which typically includes hundreds using electrocardiogram:A comparative study of different leads, Knowl. Base Syst.
of thousands of fingerprint records. 99 (2016) 146–156.
[22] Qingxue Zhang, et al., HeartID: a multiresolution convolutional neural network for
2). In terms of technology, fingerprint recognition has achieved a high ECG-based biometric human identification in smart health applications, IEEE
accuracy and rapid identification in mining and testing. Therefore, Access 99 (2017) 1-1.
we need to constantly improve the speed of recognition. [23] Luca Mesin, et al., A Low Cost ECG Biometry System Based on an Ensemble of
Support Vector Machine Classifiers, Springer International Publishing Switzerland,
3). Whether the different methods of sensor placement and different 2016.
motion states result in variability of ECG signals remains unknown. [24] Y.N. Singh, S.K. Singh, Evaluation of electrocardiogram for biometric authentica-
tion, J. Inf. Secur. 3 (1) (2011) 39–48.
[25] Y.N. Singh, S.K. Singh, Identifying individuals using eigenbeat features of electro-
cardiogram, J. Eng. 3 (13) (2013).
Author contributions [26] J.S. Arteaga-Falconi, et al., ECG authentication for mobile devices, IEEE
Transactions on instrumentation and measurement 65 (3) (2016) 591–600.
ZZ, YZ, YD and XZ developed the concept of this review and wrote [27] M.S. Islam, et al., Selection of Heart-Biometric Templates for Fusion 99 (2017) 1-1.
[28] I. Odinaka, et al., ECG biometric recognition: a ComparativeAnalysis, IEEE Trans.
the manuscript. Inf. Forensics Secur. 7 (6) (2012) 1812–1824.
[29] Y. Wang, et al., Analysis of human electrocardiogram for biometric recognition,
EURASIP J. Appl. Signal Process. 2008 (1) (2007) 1–11.
[30] W. Louis, M. Komeili, D. Hatzinakos, Continuous authentication using one-dimen-
Acknowledgments sional multi-resolution local binary patterns (1DMRLBP) in ECG biometrics, IEEE
Trans. Inf. Forensics Secur. 11 (12) (2016) 2818–2832.
This research was supported by the National Natural Science [31] J. Pan, W.J. Tompkins, A real-time QRS detection algorithm, IEEE Trans. Biomed.
Eng., BME- 32 (3) (1985) 230–236.
Foundation of China (Grant No. 61571173), the Welfare Project of the [32] J.R. Pinto, J.S. Cardoso, A. Lourenço, Evolution, current challenges, and future
Science and Technology Department of Zhejiang Province (Grant No. possibilities in ECG biometrics, IEEE Access 1 (1) (2018) 99–129.
2015C31086) and the Smart City Collaborative Innovation Center of [33] Ruggero Donida Labati, et al., Deep-ecg: convolutional neural networks for ECG
biometric recognition, Pattern Recogn. Lett. (2018), https://doi.org/10.1016/j.
Zhejiang Province. patrec.2018.03.028.

178
Z. Zhao et al. Computers in Biology and Medicine 102 (2018) 168–179

Zhidong Zhao received the BS and MS degree of mechanical Yanjun Deng received the BS and MS degree of electronic
engineering from Nanjing University of Science and and communication engineering from Hangzhou Dianzi
Technology, China, in 1998 and 2001,respectively. He re- University in 2015 and 2018, respectively. He is going to
ceived phD degree of biomedical engineering from Zhejiang study for the PhD degree in the College of Electronics and
University, china, in 2004. He is currently a full professor at Information Engineering, Hangzhou Dianzi University, in
Hangzhou Dianzi University, China. His research interests September 2019. His research interests include Circuits&
include biomedical signal processing, biometrics and ma- systems and smart medical instrument development.
chine learning.

Yefei Zhang received the BS degree in biomedical en- Xiaohong Zhang received the BS and MS degree of
gineering from China Jiliang University, Hangzhou, China Electronics and Information from Hangzhou Dianzi
in 2015. She is currently working toward the MS's degree in University, China, in 2002 and 2009,respectively. She is
electronic and communication engineering, Hangzhou currently a lecturer at Hangzhou Dianzi University, China.
Dianzi University, China. Her research interests include Her research interests include Circuits&systems and bio-
new biometric identification instrument systems, and signal medical signal processing.
processing.

179

You might also like