Professional Documents
Culture Documents
www.elsevier.com/locate/patrec
Department of Computer Science and Engineering, University of Connecticut, 191 Auditorium Road, U-155, Storrs, CT 06269-3155, USA
Communicated by T.K. Ho
Abstract
This paper attempts to characterize a medical time series by quantifying the ruggedness of the time series. The presence of two close
data points on the time axis implies that these points are similar along the time axis. It creates fuzzy similarity. Following the principle
similar causes create similar eects, we expect that the magnitudes corresponding to those two data points should also be similar. Fre-
quently, it is not observed in a time series. One of the reasons is as follows: if other features have been considered along with the time
information, then those two close data points would have looked dierent. Consequently, the magnitudes corresponding to those two
apparently similar points become dierent. Therefore, if we consider the closeness along the time axis as a cause, then the eect, i.e.,
the corresponding magnitudes could be either same or similar or completely dissimilar. This phenomenon makes causeeect relation-
ship, i.e., time versus magnitude relationship, one-to-many. Specically, the closeness creates fuzziness, the one-to-many relationship cre-
ates roughness, and together they form fuzzy-roughness. If the ruggedness is expressed as the fuzzy-roughness, then in some time series it
is observed that the fuzzy-roughness of a part of the time series is similar to that of the whole time series. Specically, the scaling up of the
fuzzy-roughness follows the power law of fractal theory. Experiments on ICU data sets show that the ruggedness measure using the
fuzzy-rough set based fractal dimension is more robust than some popular measures of ruggedness like Hurst exponent.
2005 Elsevier B.V. All rights reserved.
Keywords: Characterization; Time series; Fuzzy; Rough; Fuzzy-rough; Hurst exponent and fractal
0167-8655/$ - see front matter 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2005.09.007
448 M. Sarkar / Pattern Recognition Letters 27 (2006) 447454
1.3. Scope
rather fuzzy. It is because all the points (say hxi, yii) of the The partition-based statistics IR/S(dl) is the arithmetic
time series are treated as similar (with varying degrees) to mean of R(i, dl)/S(i, dl) for all the intervals. The whole pro-
the point hxj, yji, around which the Gaussian is built. If cess is repeated for several lengths of the interval dl. The
the width of the Gaussian is changed, then the similarity transscale statistics is the slope of the linear regression that
does not change abruptly. The closer hxi, yii and hxj, yji ts a plot of log(IR/S(dl)) vs. log(dl) for all l.
are, the more is the similarity. The proposed method Intuitively, R(i, dl) measures the variation of the cumula-
exploits this trick to derive a fractal dimension that can tive value of the ordinate (i.e., W) in the ith interval. Instead
be used to quantify the ruggedness of a time series. This of the ordinate (y), this method specically uses W because
fractal dimension can also be used as a feature while com- the eect of noise and outlier is usually less in W. If we com-
paring more than one time series. pute the average of R across all the intervals of length dl,
then the average value would provide a measure of rough-
2. Backgrounds of Hurst exponent and rough uncertainty ness. However, if the ith interval has many jumps, R(i, dl)
would be large, and its large value would dominate in the
2.1. Hurst exponent calculation of the average. Consequently, the computed
average value would be very dierent from the actual aver-
Usual methods of estimating Hurst exponents can be age value. To reduce this problem, R(i, dl) is normalized
structured into the following three steps: using S(i, dl), which represents the spread of the ordinate
values within the ith partition.
Sequence of partitions: The whole time axis is parti- Dispersional analysis: It is similar to the R/S method.
tioned into equal intervals. Each interval isolates some But the dierences are
scale of observations.
Single scale statistics: It is computed using the following the local statistics is the mean of the ordinates values in
two statistics: each interval,
Local statistics: Statistics based on the values of the the partition based statistics is the standard deviation of
ordinates within a single interval is extracted. For the means, and
example, the local statistics in the ith interval can be the transscale statistics is 1.0 plus the slope of the linear
the mean of the ordinates of all the data points that regression obtained from the t of log(partition based
are having abscissa values in the ith interval. statistics) vs. log(dl).
Partition based statistics: A measure or measures sum-
marizing the local statistics from all the intervals. For The dispersional analysis is just opposite to the R/S
instance, it can be the variance of the means in each method. In each interval, the R/S method determines the
interval. amount of variation of the cumulative values of the ordi-
Transscale statistics: Estimates of the Hurst exponent nate. Then it computes on an average how much the dier-
are derived from the partition based statistics over a ence is across all the intervals. In contrast, the dispersional
range of interval lengths. Generally, the transscale analysis nds the average value of the ordinate in each
statistics is the ratio of the logarithm of the partition- interval, and then it computes how much this average value
based statistics and the logarithm of the length of the diers when all the intervals are considered.
interval.
2.2. Rough sets
Two popular approaches to estimate the Hurst exponent
are as follows: Let R be an equivalence relation on a universal set X.
Rescaled range or R/S method: It partitions the time ser- Moreover, let X/R denote the family of all equivalence clas-
ies {hx1, y1i, hx2, y2i, . . ., hxn, yni} into equal intervals each ses induced on X by R (Klir and Yuan, 1995; Pawlak, 1982,
of length dl. Let us call the ith interval W(i, dl). The local 1991; Pawlak et al., 1995). One such equivalence class in
statistics in the interval W(i, dl) is dened as R(i, dl)/ X/R that contains x 2 X is designated by [x]R. For any
S(i, dl), where output class C X, we can dene the lower R(C) and
upper RC approximations, which approach C as closely
Ri;dl maxfhxj ;Wj ijxj 2 W i;dl g
Yj as possible from the inside and outside, respectively
(Pawlak, 1991). Here,
minfhxj ;Wj ijxj 2 W i;dl g; 1
Yj
s RC \fxR jxR C and x 2 X g 3
1 X
Si;dl y y ;
2 is the union of all equivalence classes in X/R that are con-
kfhxj ;y j ijxj 2 W i;dl gk fhxj ;y j ijxj 2W i;dl g j
tained in C and
2 RC [fxR jxR \ C 6 ; and x 2 X g 4
Pj 1
Pn
where Wj k1 y k y ; y n k1 y k and kAk indicates is the union of all equivalence classes in X/R that overlap with
the cardinality of the set A. C. The rough set RC hRC; RCi is a representation
450 M. Sarkar / Pattern Recognition Letters 27 (2006) 447454
of the given set C by R(C) and RC. The set RC RC 3. Proposed method
is a rough description of the boundary of C by the equiva-
lence classes of X/R. The approximation is rough uncer- 3.1. Sources of uncertainty
tainty free if RC RC. Thus, when all the patterns
from an equivalence class do not carry the same output We can identify the following two uncertainties that can
class label, the rough uncertainty is generated as a manifes- inuence the ruggedness of the time series:
tation of the one-to-many relationship between that equiv- Fuzzy uncertainty: It may arise due to the following two
alence class and the output class labels. factors:
2.3. Fuzzy-rough sets 1. How similar any two data points are along the time axis:
The similarity decreases as the distance between the data
A rough-fuzzy set (Dubois and Prade, 1990, 1992) is a points increases along the time axis. This similarity can
generalization of the rough set in the sense that here the be quantied in the form of fuzzy membership func-
output class is fuzzy (Zadeh, 1965). Let X be a set, R be tions.
an equivalence relation dened on X, and the output class 2. How similar the data points are along the ordinate: This
C X be a fuzzy set. The rough-fuzzy set is a tuple similarity can also be quantied in the form of fuzzy
hRC; RCi, where the lower approximation R(C) and membership functions.
the upper approximation RC are fuzzy sets of X/R, with
membership functions dened by Dubois and Prade (1990, Rough uncertainty: It may appear due to the following
1992) reason:
Lack of features makes two originally dissimilar points
lRC xR infflC xjx 2 xR g 8x 2 X 5 neighbors: When the spatial representations of all the
and neighbors along the time axis are similar, it is expected that
the corresponding ordinate values should also be similar.
lRC xR supflC xjx 2 xR g 8x 2 X . 6 Due to the incomplete knowledge about the process gener-
ating the data, generally the input representation is not per-
Here, lR(C)(x) and lRC x are the membership values of fect. As a result, hxi, yii and its neighbors appear similar
[x]R in R(C) and RC, respectively. based on the time information, although they may not be
A fuzzy-rough set is a further generalization of the similar when the other features are augmented. It makes
rough-fuzzy set. When the equivalence classes are not crisp, the inputoutput relationship one-to-many, and the rough
they are in the form of fuzzy clusters F1, F2, . . . , FH gener- uncertainty appears. In some contexts, these kinds of data
ated by a fuzzy weak partition (Dubois and Prade, 1990, points are called noisy.
1992) of the input set X. Here, H is the number of clusters.
The term fuzzy weak partition means that each Fj is a nor- 3.2. Measures of fuzzy-roughness
mal fuzzy set, i.e.,
Let us assume that along the time axis, the neighbor-
supflF j xg 1 and inf maxflF j xg > 0 7 hood region of each data point (say hxi, yii, where
x x j
xi 2 X, yi 2 Y) is crisp (called W). Typically, the neighbor-
while hood region is an interval in which xi lies. If all the neigh-
bors have the same magnitudes, then there is no roughness
sup minflF i x; lF j xg < 1 8i; j 2 f1; 2; . . . ; H g; i 6 j.
x Fi in the neighborhood. However, if any neighbor has ordi-
8 nate dierent from yi, then the rough uncertainty arises
in W. Although the neighbors are similar from the features
Here, lF j x is the fuzzy membership function of the perspective, they are not similar from the magnitude
pattern x in the cluster Fj. In addition, the output class C perspective. It makes the inputoutput relationship one-
may be fuzzy too. Given a weak fuzzy partition F1, to-many. This uncertainty can be captured using rough
F2, . . . , FH on X, the description of any fuzzy set C by ownership function r : X Y ! [0, 1]. The rough ownership
means of the fuzzy partitions under the form of an upper function for the data point hxi, yii with the neighborhood
and a lower approximation C and C is as follows: W is dened by Sarkar (2002)
lC F j inf fmax1 lF j x; lC xg 8x 9 kW \ Sk
x
ri;W ; 11
kW k
lC F j supfminlF j x; lC xg 8x. 10
x
where S is the set of data points in W with magnitudes yi,
The tuple hC; Ci is called fuzzy-rough set. Here, lC(x) 2 and kWk denotes the cardinality of the set W. If all the
[0, 1] is the fuzzy membership of the input x to the class neighbors have magnitudes yi, then ri,W is equal to one
C. The fuzzy-roughness appears when a fuzzy cluster con- indicating that the neighborhood is smooth. In contrast,
tains patterns that belong to dierent classes. if ri,W is equal to zero, then possibly hxi, yii is an outlier.
M. Sarkar / Pattern Recognition Letters 27 (2006) 447454 451
The confusion is maximum when half of the neighbors time series is high when the fuzzy-rough ownership value
have the magnitude yi, and the remaining half of the neigh- is in between zero and one.
bors have dierent magnitudes. Thus ri,W = 0.5 indicates
the maximum roughness. Note that similar kind of formu- 3.3. Scaling of fuzzy-roughness
lation is also used in literature to measure rough inclusion
(Polkowski and Skowron, 1996). The ruggedness in terms of fuzzy-roughness at the data
Next we make the situation more complex, but closer to point hxi, yii is quantied by ii;dl , and this value varies from
the reality. Till now we have assumed that each neighbor point to point. We would like to nd the average fuzzy-
resides in the structure W equally and completely. Now roughness at any data point. Then we would examine at
every training pattern belongs to W with dierent degrees, what rate the average fuzzy-roughness changes when the
i.e., the training pattern closer to hxi, yii along the time axis resolution of the time series is changed.
belongs to the neighborhood W to a high degree, and the We dene a term called partition function that measures
training pattern far away from W supports the neighbor- the average fuzzy-roughness at any data point of the time
hood by a negligible amount. Therefore, W spans all the series. It is
training patterns. The similarity along the time axis can
be measured using the value of a Gaussian at the point 1X n
I FR dl ii;d . 17
hxi, yii. One possible way to dene the Gaussian is n i1 l
2
lx i; j; dl exp xi xj =2d2l ; 12 We next investigate at what rate the average fuzzy-rough-
ness scales down/up when the resolution of the time series
where dl is the width of the Gaussian. Hence the amount of is increased/decreased. We are particularly interested to
the total roughness at the point hxi, yii is know whether any power law relationship I FR dL /
1 X n dDFR holds. If there is any particular power law, then it
si;dl l i; j; dl lS j; 13 can be calculated from
n 1 j1 x
j6i logI FR dl
DFR . 18
where lS(j) = 1 if yj = yi, otherwise lS(j) = 0. logdl
We can still ne-tune Eq. (13). Till now we have consid- Ideally, DFR should be measured when the lengths of the
ered only the neighbors that have same magnitudes. We neighborhood intervals are very small i.e., when dl ! 0.
relax it to know whether the magnitudes of the neighbors Thus,
are similar or not, i.e., we modify the characteristic func-
tion lS to the fuzzy membership function ly. Specically, logI FR dl
DFR lim . 19
2
! dl !0 logdl
y i y j
ly i; j; ry;i;dl exp 14 In reality, at the limits of the resolution, DFR does not re-
2r2y;i;dl
main constant. Moreover, DFR uctuates due to the noise
represents the fuzzy similarity between hxi, yii and hxj, yji and outliers. Therefore, a more rened estimate can be ob-
along the ordinate, where tained from the slope of the best-t line that passes through
v hlog(dl), log(IFR(dl))i "i = 1, 2, . . ., L, where L is an integer
uPn such that the Gaussian with the width dl is sucient to cov-
u 2
j1 lx i; j; dl y i y j
ry;i;dl t Pn 15 er the time series with a high degree. If we cannot t a
j1 lx i; j; dl straight line due to the randomness of the data, then most
likely the power law does not hold for the time series, and
indicates the spread of the Gaussian along the ordinate and in that case, the fractal dimension cannot characterize the
around the ith data point hxi, yii. Thus, we incorporate the time series. Thus, using the power law, the fractal dimen-
concept of fuzzy similarity in the rough ownership function sion DFR measures how the fuzzy-roughness scales down/
to obtain the following fuzzy-rough ownership function: up when the resolution is increased/decreased. The pro-
posed fractal dimension acts as a feature, which reects
1 X n
ii;dl l i; j; dl ly i; j; ri;dl . 16 the self-similar property of the time series. The algorithm
n 1 j1 x is shown in Fig. 2.
j6i
Note that when ry;i;dl 0, Eq. (14) becomes undened. To 3.4. Salient aspects of the proposed method
avoid it, in this case ry;i;dl is made equal to one.
For an absolutely smooth and horizontal time series, the The use of Gaussians enables us to interpret the basic
fuzzy-rough ownership value at each data point would be philosophy of the fractal in a dierent manner. The Hurst
close to one. In contrast, around a sudden jump or around exponent is proposed presuming that there exists a set of
a discontinuity, the fuzzy-rough ownership value would be generators. It is assumed that the part and the whole time
close to zero. The confusion about the smoothness of the series are constructed by the same generator (Mandelbrot,
452 M. Sarkar / Pattern Recognition Letters 27 (2006) 447454
Fig. 2. The algorithm to compute the fuzzy-rough set-based fractal dimension DFR. Note that we need to compute lx(i, j, dl) for all i and j before
computing ly i; j; ry;i;rl .
1983). However, the part and the whole time series look 4. The Hurst exponent can be derived from the proposed
dierent since in these two cases the occurrences of the gen- dimension.
erators are dierent. The Hurst exponent aims to capture 5. Depending on the domain, other results of the fuzzy set
how the probability of occurrence changes when the reso- theory can be incorporated into the proposed dimen-
lution is changed. In reality the number of generators sion.
may be innite, and hence, we may not have any access
to know how many of them are present and how they
are, and hence it is dicult to estimate their occurrences. 3.5. Proposed technique and generalized fuzzy-rough sets
Instead of trying to know the generators, it is more attrac-
tive to observe the pattern formed in a local region, and The rough set framework proposed in (Pawlak, 1982) is
how the fuzzy-roughness evolves when the resolution is based on the equivalence relation. In other words, this
increased. The proposed dimension is directed along this rough set framework relies on the reexivity, symmetry
direction. and transitivity properties of a relation. Various modica-
The advantages of the proposed method are as follows: tions of this framework have been proposed by relaxing
some of those three properties (Inuiguchi et al., 2003; Intan
1. The fractal nature of the time series is viewed as how the and Mukaidono, 2002). For example, (Slowinski and Van-
fuzzy-roughness changes with decreasing resolution. derpooten, 2000) proposes tolerance relation where the
Unlike the Hurst exponent, the proposed method does reexivity and symmetry should hold, but the transitivity
not need to assume any particular generator and the may or may not hold. Using the tolerance relation, the con-
occurrence of the generator. cept of rough sets has been generalized.
2. The change of the length of the interval does not change All the three properties reexivity, symmetry and transi-
the fractal dimension abruptly. tivity have been relaxed in (Wu et al., 2004), and the con-
3. When the dimension is computed from the slope of the cept of rough set is generalized for any kind of binary
best-t line, we do not face any stair-casing eect. relation. Let U and W be two nite universes. Suppose that
M. Sarkar / Pattern Recognition Letters 27 (2006) 447454 453
R is an arbitrary relation from U to W. We can dene a set- because of the existence of inconsistent data. The remain-
valued function f : U ! PW by ing 1979 data points were used for the experiments.
f x fy 2 W : x; y 2 W g; x 2W. 20 We have compared the errors in estimating the proposed
dimension and the Hurst exponent. Using Eq. (17), we
Obviously, any set-valued function from U to W denes a have calculated log(IFR(dl)) and log(dl) 100 times (i.e.,
binary relation from U to W by setting R = {(x, y) 2 U L = 100) for dierent values of dl. The minimum and max-
W : y 2 f(x)}. imum values of d are the half of the minimum and maxi-
For any set C W, a pair of lower and upper approxi- mum distances between any two data points. Assuming
mations can be dened as q = 1, log(IFR(dl)) is plotted against log(dl) (Fig. 3). After
RC fx 2 U : f x Cg; 21 tting the best-t line through these points, we have
obtained DFR as 0.9010. The smoothness of the best-t line
RC fx 2 U : f x \ C 6 ;g. 22 indicates that the time series follows the power law of frac-
RC hRC; RCi is referred to as a generalized rough tal theory. By plotting log(IR/S(dl)) vs. log(dl) for dierent
set. values of l, we have obtained the Hurst exponent as
The above denition of rough sets can be used to gener- 0.5487 (Fig. 4). We can observe that the plot for the
alize fuzzy-rough sets. Let R be an arbitrary fuzzy relation Hurst exponent is not as smooth as that of the proposed
from U to W. Dene the mapping / : U ! PW by
/x; y Rx; y x; y 2 U W . 23
For any a 2 0; 1; /a : U ! PW is dened as
/a x fy 2 W : /x; y P ag; x 2 U. 24
For a given a 2 [0, 1] any set C W, a pair of lower and
upper approximations can be dened as
/a C fx 2 U : /a x Cg 25
C fx 2 U : / x \ C 6 ;g.
/ 26
a a
h/C; /Ci is called generalized fuzzy-rough set, where
/C _a20;1 a ^ /1a C a ; 27
/C C a .
_a20;1 a ^ / 28
a