You are on page 1of 5

1

A Modified Chi-Squares Test for


Improved Bad Data Detection
Murat Gl, Member, IEEE
EEE Department
Middle East Technical University
Ankara, Turkey
mgol@metu.edu.tr

Ali Abur, Fellow, IEEE


ECE Department
Northeastern University
Boston, MA, U.S.A.
abur@ece.neu.edu

AbstractCurrent state estimators employ the Weighted Least


Squares (WLS) estimator to solve the state estimation problem.
Once the state estimates are obtained, Chi-Square test is
commonly used to detect the presence of bad data in the
measurement sets. Regretfully, this test is not entirely reliable,
that is, bad data existing in the measurement set could be missed
for certain cases. One reason for this is the approximations used
to compute the bad data suspicion threshold, which is set based
on an assumed chi-squares distribution for the objective
function. In this paper, a modified metric is proposed in order
to improve the bad data detection accuracy of the commonly
used chi-square test. The bad data detection performance of the
proposed test is compared with that of conventional chi-square
test.
Index Terms-- Bad-data detection, state estimation, Chi-squared
distribution, measurement residuals, weighted least squares.
1

I.

INTRODUCTION

Power system state estimation is one of the key tools of an


Energy Management System (EMS) [1]. State estimators
provide the best estimates of the system voltage magnitudes
and phase angles using the system model and a redundant
enough measurement set. Those estimates are used in the
economic and control tools of the EMS.
The most common state estimation technique employed in
present systems is the weighted least squares (WLS) method
[1]. WLS is a well-developed and fast method. When applied
to the first order approximation of measurement equations, it
provides the best linear unbiased estimator (BLUE) given
normally distributed measurement errors [2]. In the presence
of Gaussian errors, WLS provides unbiased state estimates.
Unfortunately WLS estimator is not robust against bad data,
and even a single measurement with gross error may
significantly bias the estimation results. Therefore, almost all
WLS estimators carry out a post-estimation bad data
detection test, which is commonly accomplished by the socalled Chi-Squares test [3] - [4]. Although the Chi-Squares
This work made use of Engineering Research Center Shared Facilities
supported by the Engineering Research Center Program of the National
Science Foundation and the Department of Energy under NSF Award
Number EEC-1041877 and the CURENT Industry Partnership Program.

test is the most common bad data detection method used in


several commercial state estimators, this test may not always
yield correct results. There are cases where Chi-Squares test
can be shown to fail to detect existing bad data in the
measurement set.
Missing a bad measurement which is present in the
measurement set has dire consequences, such as biased
estimates which will affect the decisions based on those
estimates. Therefore, this paper proposes a simple
modification that will improve bad data detection capability
in existing state estimators. The proposed modification
requires calculation of residual covariance matrix. The
computation of residual covariance matrix uses a subset of
the elements in the inverse of the sparse gain matrix. It is
known that matrix inversion is a computationally expensive
operation, and hence avoided in power system analysis.
However, thanks to the efficient sparse inverse methods, [5] [7], the computation can be performed with little
computational cost. In this paper the proposed method is
compared with the conventional Chi Squares method in terms
of computational performance and bad measurement
detection accuracy.
The rest of the paper is organized as follows, Section II
explains the conventional Chi-Squares Test, while the
proposed method is explained in detail in Section III. The
simulations and the numerical results are shown in Section IV
and Section V concludes the paper.
II.

CONVENTIONAL CHI-SQUARE TEST

Consider a random variable Y, which has a chi-squared


(2) distribution with N degrees of freedom given by the
following expression:
N

Y=

2
i

(1)

i =1

where the random variables X1, X2, , XN are independent


and distributed according the standard normal distribution.

2
In power system state estimation problem formulation,
measurement errors are commonly assumed to have a normal
distribution with zero mean and known variance. Using the
same assumption a function f(x) can be defined as given in
(2), where f(x) has a chi-squared distribution with at most (mn) degrees of freedom (m being the number of measurements
and n being the number of the states). Note that in a power
system with m measurements and n system states at most (mn) errors can be linearly independent, since at least n
measurements are required to obtain a solution. Thus the
degrees of freedom will be at most (m-n).
f (x ) =

i =1

Rii1ei2

e
= i

i =1 Rii
m

=
eiN

i =1

( )

(2)

In (2), ei is the measurement error with normal distribution


and Rii is the variance of the ith measurement error, where R is
the diagonal error covariance matrix. eiN is the normalized
error which has a standard normal distribution.
Consider the Chi-squared probability density function plot
given in Fig. 1 [1]. The area below the p.d.f. represents the
probability of finding X in the given region, as shown below.
P{X xt } =

xt

2 (u )du

(3)

threshold, the presence of the bad measurement will be


suspected.
In order to detect bad data, most of the commercial state
estimators that employ WLS estimation method, use the
following metric:

J (x ) =

(zi hi (x ))2

i2

i =1

(ri )2

i =1

2
i

(4)

where m is the number of measurements, x is the (nx1)


estimated state vector, hi (x ) , z i and ri are the estimated and
measured values and the residual for the ith measurement
respectively, and i2 is the corresponding measurement
variance, which is the same as Rii. The conventional chisquares test will suspect existence of bad data if the computed
metric J (x ) is larger than (2mn ), p , the bad data suspicion

threshold value according to a chi-squared distribution for a


given probability p and degrees of freedom (m-n).
Note that, a random variable with standard normal
distribution can have a chi-squared distribution if that random
variable is normalized with its variance as defined in (2).
Therefore, (4) is an approximation of f ( x ) , which is defined
in (2), since the measurement residuals are normalized with
respect to the variances of the measurement errors.

Eq. (3) represents the probability of X being larger


than xt . This probability decreases as xt increases, since the
tail of the distribution decays. According to the Fig. 1, xt is
25 as shown by the dotted line for the chosen probability
0.05.

III.

PROPOSED APPROACH

The conventional chi-square test assumes that the metric

J m (x ) shown in (4) is distributed according to a chi-squared

distribution. However, the denominator is not the variance of


the corresponding residual appearing in the numerator. This
introduces an approximation, which may lead to incorrect
results, i.e. existing bad data may not be detected.
According to [2], the key to the analysis of bad data is the
residual sensitivity matrix, S, which is obtained by
linearization of the relation between the measurement vector
z, and system state vector x and measurement error vector e,
which is defined as follows.
z = Hx + e

x = H T R 1 H
r = z Hx

H T R 1 z

r = (Hx + e ) H H T R 1 H

r = e H H T R 1 H
Fig. 1.

Chi-Squared probabbility density function [1].

xt represents the largest value that will not be identified


as bad measurement. If the measured value exceeds the

r = I H H T R 1 H

H T R 1 (Hx + e )

(5)

H T R 1 e

H T R 1 e

S = I H H T R 1 H

H T R 1

(6)

3
S is the residual sensitivity matrix, R is the measurement error
covariance matrix, H is the measurement Jacobian matrix and
I is the mxm identity, m being the number of measurements
[1]. Note that the derivation is based on the linear
measurement model. The details on derivation of S can be
found in [1]. The residual sensitivity matrix S has the
following properties [1].

S S S "S = S
SRS T = SR

(7)

Once the linearized measurement model is assumed, the


residual sensitivity matrix S, represents the relation between
the measurement errors and measurement residuals [1] as
shown below.
r = Se

(8)

where r is the measurement residual vector and e is the


measurement error vector.
Using (7) and (8), and the known covariance matrix for
the measurement errors R, one can easily derive the expected
value and the covariance matrix of the measurement residuals
as given below:
E{r} = E{Se} = S E{e} = 0
Cov (r ) =

[ ]

[ ]

(9)

= E rr T = S E eeT S = SRS T = SR

where, r = z h(x ) , is the residual covariance matrix. Note


that, due to the standard normal distributed measurement
error assumption, the expected value of the measurement
errors is 0.
As seen in (9), differs significantly from R, the
measurement error covariance matrix. Therefore, in this paper
it is proposed to use a modified bad data detection
metric, m (x ) , as defined below, where ii is the variance of
the ith measurement residual.

m (x ) =

i =1

(zi hi (x ))2
ii

The main computational cost of this approach is the


computation of , since a matrix inversion must be
performed. However, thanks to the extremely sparse structure
of the measurement Jacobian H, efficient sparse inverse
methods [4] - [7] can be employed and the computational
burden will not be significant even for large-scale systems.
Note also that does not strongly depend on the operating
point. Therefore, as long as the topology and measurement
configuration remain the same, will not have to be updated.
IV.

In this section a real utility system with 265-buses and


340-branches will be used to illustrate the benefits of the
proposed bad data detection test. The system is measured by
362 measurements which ensure high enough measurement
redundancy to detect presence of bad data. Simulations are
carried out in MATLAB R2014a environment using a PC
with 4GB RAM and Windows operating system.
The first study shows the additional computational burden
required for computation of residual covariance matrix. The
second study compares bad data detection performances of
the proposed modified method and the conventional chisquares test.

Case 1: In this study solution time of WLS estimation is


compared with the CPU times required for the proposed bad
data detection approach and the conventional one. 1500
Monte-Carlo simulations are carried out and mean value of
the results is obtained. In these simulations, random
Gaussian errors are added to the measurement set and one
randomly selected measurement is intentionally corrupted to
emulate bad data by changing its sign. Table I shows the
CPU times for the WLS state estimation solution as well as
for the modified and conventional Chi-squares tests. The
increase in computation time when using the proposed
modified test is expected and is primarily caused by the
computation of residual covariance matrix, .
TABLE I.

MEAN COMPUTATION TIME (MILLISECONDS)

WLS Estimation

(10)

Note that is a rank-deficient matrix, such that it is not


invertible. Therefore, instead of using the inverse of , the
diagonal entries, which are the measurement residual
variances, are employed. In this formulation, off-diagonal
entries of , which represent the correlations among
measurement residuals will be neglected and only the
diagonal elements will be considered. Thus, this metric will
still be an approximation, albeit a more reliable metric
compared to (4), since the residuals are normalized using the
square root diagonal entries of the residual covariance matrix,
which are the measurement residual standard deviations,
instead of those of measurement errors.

SIMULATION RESULTS

Proposed Modified
Chi-Squares
3.4

Conventional
Chi-Squares
0.1

Case 2: Bad data detection performance of the proposed


approach is compared to that of the conventional method.
Four different single bad data scenarios are studied. Each
scenario is repeated 1500 times each time introducing a
randomly selected bad measurement. In these four cases, a
certain amount of error, which is proportional to the standard
deviation of the considered measurement , is added to the
original measurements in order to emulate bad measurements.
The amount of error introduced for each case is given below.
In order to make the simulations realistic, Gaussian errors are
also added to all measurements.

Case 2.a: No bad measurement.

Case 2.b: 3.
Case 2.c: 40.
Case 2.d: 100.

-3

8
6
4
2
x true - x est

Table II shows the bad data detection performance of the


proposed method and the conventional approach. The values
given in Table II are percentage values, which also indicate
bad data detection probability of the proposed and the
conventional methods. As evident in Table II, both cases give
correct results for very large and very small error values.
However, for intermediate error values such as Case 2.c,
which can still significantly bias the estimation results, the
proposed approach can detect bad data which is missed by the
conventional chi-squares test.

0
-2
-4
-6
-8
-10

TABLE II.

BAD DATA DETECTION PERFORMANCE

Bad Data Detection Percentage


Proposed
Conventional Bad Data
Modified
Chi-Squares
Present
Chi-Squares
0
0
No
0
0
No
100
68.9
Yes
100
100
Yes

Case
2.a
2.b
2.c
2.d

x 10

According to Table II, the estimation results of Case 2.b


are unbiased, while estimation results of case 2.c are biased.
Fig. 2.a presents the difference between the true states and
estimation results of one randomly selected Monte Carlo run
for Case 2.b. Similarly, Fig. 2.b presents the difference
between the true states and estimation results of the same
randomly selected Monte Carlo run for Case 2.c, such that
both figures consider the same measurement but with
different errors. As seen in Fig. 2.b, although the estimation
results are biased, the conventional method was not capable
of identifying the presence of gross error. On the other hand,
the proposed metric successfully detected the presence of bad
measurement.

100

200

300
States

400

500

(b) Case 2.c


Fig. 2.

Mismatch between estimated and true states.

Finally, it is quite informative to take a look at the


covariance values for the errors and residuals. Fig. 3 presents
the variation of ii and Rii values. As seen in Fig. 3,
compared to the constant Rii values, ii values in general
appear to be much smaller. Therefore, the proposed bad data
suspicion threshold will always be smaller than that of the
conventional Chi-squares test.
-4

10.5

x 10

10
9.5
9
8.5
8
7.5
7

-3

x 10

6.5
0

200

400

600
800
1000
Measurement Residuals

1200

1400

Fig. 3.

x true - x est

Variation of ii and Rii values.

V.

-2
-4
-6
-8
-10

100

200

300
States

(a) Case 2.b

400

500

CONCLUSIONS

In this paper a modified Chi-squares test to improve the


bad data detection accuracy when using WLS method in state
estimation is proposed. As seen in the simulations, the
proposed metric has a better performance compared to the
conventional test in detecting presence of bad data in a given
measurement set. Although the proposed test is successful in
detection of bad data, identification and removal of the bad
measurements will still have to be carried out by methods
such as normalized residuals test [8].

5
Most commercial programs use Chi-squares test as a
computationally cheap filter to decide whether or not to
conduct an identification test. In that sense, this modification
may serve a useful purpose in increasing the reliability of this
initial filter so that bad data will not be missed.

[4]
[5]

[6]

REFERENCES
[1]
[2]

[3]

A. Abur and A. Gomez-Exposito, Power System State Estimation:


Theory and Implementation, book, Marcel Dekker, 2004.
A. C. Aitken, On Least Squares and Linear Combinations of
Observations, Proc. Royal Society of Edinburg, 1935, vol. 35, pp. 4248.
E. Handschin, F. C. Schweppe, J. Kohlas, and A. Fiechter, Bad data
analysis for power systems state estimation, IEEE Trans. Power App.
Syst., vol. 94, pp. 329337, Mar./Apr. 1975.

A. Monticelli, Electric Power System State Estimation, Proceedings


of the IEEE, vol. 88, no 2, February 2000.
K. Takahashi, J. Fagan and M. Chen, Formation of a Sparse Bus
Impedance Matrix and Its Application to Short Circuit Study, PICA
Proceedings, May 1973, pp. 63-69.
Y. E. Campbell and T. A. Davis, Computing the Sparse Inverse
Subset: An Inverse Multi-frontal Approach, University of Florida,
Technical Report TR-95-021.

[7]

B. Bilir and A. Abur, Bad Data Processing When Using the Coupled
Measurement Model and Takahashis Sparse Inverse Method,
Innovative Smart Grid Technologies Conference - Europe, IEEE,
Istanbul, Turkey, 12-15 Oct. 2014.

[8]

A. Monticelli and A. Garcia, Reliable Bad Data Processing for RealTime State Estimation, IEEE Transactions on Power Apparatus and
Systems, Vol. PAS-102, No. 5, May 1983, pp. 1126-1139.

You might also like