Bad Data

1
A Modified Chi-Squares Test for

Improved Bad Data Detection
Murat Gl, Member, IEEE
EEE Department
Middle East Technical University
Ankara, Turkey
mgol@metu.edu.tr
Ali Abur, Fellow, IEEE

ECE Department
Northeastern University
Boston, MA, U.S.A.
abur@ece.neu.edu
AbstractCurrent state estimators employ the Weighted Least

Squares (WLS) estimator to solve the state estimation problem.
Once the state estimates are obtained, Chi-Square test is
commonly used to detect the presence of bad data in the
measurement sets. Regretfully, this test is not entirely reliable,
that is, bad data existing in the measurement set could be missed
for certain cases. One reason for this is the approximations used
to compute the bad data suspicion threshold, which is set based
on an assumed chi-squares distribution for the objective
function. In this paper, a modified metric is proposed in order
to improve the bad data detection accuracy of the commonly
used chi-square test. The bad data detection performance of the
proposed test is compared with that of conventional chi-square
test.
Index Terms-- Bad-data detection, state estimation, Chi-squared
distribution, measurement residuals, weighted least squares.
1
I.
INTRODUCTION
Power system state estimation is one of the key tools of an

Energy Management System (EMS) [1]. State estimators
provide the best estimates of the system voltage magnitudes
and phase angles using the system model and a redundant
enough measurement set. Those estimates are used in the
economic and control tools of the EMS.
The most common state estimation technique employed in
present systems is the weighted least squares (WLS) method
[1]. WLS is a well-developed and fast method. When applied
to the first order approximation of measurement equations, it
provides the best linear unbiased estimator (BLUE) given
normally distributed measurement errors [2]. In the presence
of Gaussian errors, WLS provides unbiased state estimates.
Unfortunately WLS estimator is not robust against bad data,
and even a single measurement with gross error may
significantly bias the estimation results. Therefore, almost all
WLS estimators carry out a post-estimation bad data
detection test, which is commonly accomplished by the socalled Chi-Squares test [3] - [4]. Although the Chi-Squares
This work made use of Engineering Research Center Shared Facilities
supported by the Engineering Research Center Program of the National
Science Foundation and the Department of Energy under NSF Award
Number EEC-1041877 and the CURENT Industry Partnership Program.
test is the most common bad data detection method used in

several commercial state estimators, this test may not always
yield correct results. There are cases where Chi-Squares test
can be shown to fail to detect existing bad data in the
measurement set.
Missing a bad measurement which is present in the
measurement set has dire consequences, such as biased
estimates which will affect the decisions based on those
estimates. Therefore, this paper proposes a simple
modification that will improve bad data detection capability
in existing state estimators. The proposed modification
requires calculation of residual covariance matrix. The
computation of residual covariance matrix uses a subset of
the elements in the inverse of the sparse gain matrix. It is
known that matrix inversion is a computationally expensive
operation, and hence avoided in power system analysis.
However, thanks to the efficient sparse inverse methods, [5] [7], the computation can be performed with little
computational cost. In this paper the proposed method is
compared with the conventional Chi Squares method in terms
of computational performance and bad measurement
detection accuracy.
The rest of the paper is organized as follows, Section II
explains the conventional Chi-Squares Test, while the
proposed method is explained in detail in Section III. The
simulations and the numerical results are shown in Section IV
and Section V concludes the paper.
II.
CONVENTIONAL CHI-SQUARE TEST
Consider a random variable Y, which has a chi-squared

(2) distribution with N degrees of freedom given by the
following expression:
N
Y=
2
i
(1)
i =1
where the random variables X1, X2, , XN are independent

and distributed according the standard normal distribution.
2
In power system state estimation problem formulation,
measurement errors are commonly assumed to have a normal
distribution with zero mean and known variance. Using the
same assumption a function f(x) can be defined as given in
(2), where f(x) has a chi-squared distribution with at most (mn) degrees of freedom (m being the number of measurements
and n being the number of the states). Note that in a power
system with m measurements and n system states at most (mn) errors can be linearly independent, since at least n
measurements are required to obtain a solution. Thus the
degrees of freedom will be at most (m-n).
f (x ) =
i =1
Rii1ei2
e
= i
i =1 Rii
m
=
eiN
i =1
( )
(2)
In (2), ei is the measurement error with normal distribution

and Rii is the variance of the ith measurement error, where R is
the diagonal error covariance matrix. eiN is the normalized
error which has a standard normal distribution.
Consider the Chi-squared probability density function plot
given in Fig. 1 [1]. The area below the p.d.f. represents the
probability of finding X in the given region, as shown below.
P{X xt } =
xt
2 (u )du
(3)
threshold, the presence of the bad measurement will be

suspected.
In order to detect bad data, most of the commercial state
estimators that employ WLS estimation method, use the
following metric:
J (x ) =
(zi hi (x ))2
i2
i =1
(ri )2
i =1
2
i
(4)
where m is the number of measurements, x is the (nx1)

estimated state vector, hi (x ) , z i and ri are the estimated and
measured values and the residual for the ith measurement
respectively, and i2 is the corresponding measurement
variance, which is the same as Rii. The conventional chisquares test will suspect existence of bad data if the computed
metric J (x ) is larger than (2mn ), p , the bad data suspicion
threshold value according to a chi-squared distribution for a

given probability p and degrees of freedom (m-n).
Note that, a random variable with standard normal
distribution can have a chi-squared distribution if that random
variable is normalized with its variance as defined in (2).
Therefore, (4) is an approximation of f ( x ) , which is defined
in (2), since the measurement residuals are normalized with
respect to the variances of the measurement errors.
Eq. (3) represents the probability of X being larger

than xt . This probability decreases as xt increases, since the
tail of the distribution decays. According to the Fig. 1, xt is
25 as shown by the dotted line for the chosen probability
0.05.
III.
PROPOSED APPROACH
The conventional chi-square test assumes that the metric
J m (x ) shown in (4) is distributed according to a chi-squared
distribution. However, the denominator is not the variance of

the corresponding residual appearing in the numerator. This
introduces an approximation, which may lead to incorrect
results, i.e. existing bad data may not be detected.
According to [2], the key to the analysis of bad data is the
residual sensitivity matrix, S, which is obtained by
linearization of the relation between the measurement vector
z, and system state vector x and measurement error vector e,
which is defined as follows.
z = Hx + e
x = H T R 1 H
r = z Hx
H T R 1 z
r = (Hx + e ) H H T R 1 H
r = e H H T R 1 H
Fig. 1.
Chi-Squared probabbility density function [1].
xt represents the largest value that will not be identified

as bad measurement. If the measured value exceeds the
r = I H H T R 1 H
H T R 1 (Hx + e )
(5)
H T R 1 e
H T R 1 e
S = I H H T R 1 H
H T R 1
(6)
3
S is the residual sensitivity matrix, R is the measurement error
covariance matrix, H is the measurement Jacobian matrix and
I is the mxm identity, m being the number of measurements
[1]. Note that the derivation is based on the linear
measurement model. The details on derivation of S can be
found in [1]. The residual sensitivity matrix S has the
following properties [1].
S S S "S = S
SRS T = SR
(7)
Once the linearized measurement model is assumed, the

residual sensitivity matrix S, represents the relation between
the measurement errors and measurement residuals [1] as
shown below.
r = Se
(8)
where r is the measurement residual vector and e is the

measurement error vector.
Using (7) and (8), and the known covariance matrix for
the measurement errors R, one can easily derive the expected
value and the covariance matrix of the measurement residuals
as given below:
E{r} = E{Se} = S E{e} = 0
Cov (r ) =
[ ]
[ ]
(9)
= E rr T = S E eeT S = SRS T = SR
where, r = z h(x ) , is the residual covariance matrix. Note

that, due to the standard normal distributed measurement
error assumption, the expected value of the measurement
errors is 0.
As seen in (9), differs significantly from R, the
measurement error covariance matrix. Therefore, in this paper
it is proposed to use a modified bad data detection
metric, m (x ) , as defined below, where ii is the variance of
the ith measurement residual.
m (x ) =
i =1
(zi hi (x ))2
ii
The main computational cost of this approach is the

computation of , since a matrix inversion must be
performed. However, thanks to the extremely sparse structure
of the measurement Jacobian H, efficient sparse inverse
methods [4] - [7] can be employed and the computational
burden will not be significant even for large-scale systems.
Note also that does not strongly depend on the operating
point. Therefore, as long as the topology and measurement
configuration remain the same, will not have to be updated.
IV.
In this section a real utility system with 265-buses and

340-branches will be used to illustrate the benefits of the
proposed bad data detection test. The system is measured by
362 measurements which ensure high enough measurement
redundancy to detect presence of bad data. Simulations are
carried out in MATLAB R2014a environment using a PC
with 4GB RAM and Windows operating system.
The first study shows the additional computational burden
required for computation of residual covariance matrix. The
second study compares bad data detection performances of
the proposed modified method and the conventional chisquares test.
Case 1: In this study solution time of WLS estimation is

compared with the CPU times required for the proposed bad
data detection approach and the conventional one. 1500
Monte-Carlo simulations are carried out and mean value of
the results is obtained. In these simulations, random
Gaussian errors are added to the measurement set and one
randomly selected measurement is intentionally corrupted to
emulate bad data by changing its sign. Table I shows the
CPU times for the WLS state estimation solution as well as
for the modified and conventional Chi-squares tests. The
increase in computation time when using the proposed
modified test is expected and is primarily caused by the
computation of residual covariance matrix, .
TABLE I.
MEAN COMPUTATION TIME (MILLISECONDS)
WLS Estimation
(10)
Note that is a rank-deficient matrix, such that it is not

invertible. Therefore, instead of using the inverse of , the
diagonal entries, which are the measurement residual
variances, are employed. In this formulation, off-diagonal
entries of , which represent the correlations among
measurement residuals will be neglected and only the
diagonal elements will be considered. Thus, this metric will
still be an approximation, albeit a more reliable metric
compared to (4), since the residuals are normalized using the
square root diagonal entries of the residual covariance matrix,
which are the measurement residual standard deviations,
instead of those of measurement errors.
SIMULATION RESULTS
Proposed Modified
Chi-Squares
3.4
Conventional
Chi-Squares
0.1
Case 2: Bad data detection performance of the proposed

approach is compared to that of the conventional method.
Four different single bad data scenarios are studied. Each
scenario is repeated 1500 times each time introducing a
randomly selected bad measurement. In these four cases, a
certain amount of error, which is proportional to the standard
deviation of the considered measurement , is added to the
original measurements in order to emulate bad measurements.
The amount of error introduced for each case is given below.
In order to make the simulations realistic, Gaussian errors are
also added to all measurements.
Case 2.a: No bad measurement.
Case 2.b: 3.
Case 2.c: 40.
Case 2.d: 100.
-3
8
6
4
2
x true - x est
Table II shows the bad data detection performance of the

proposed method and the conventional approach. The values
given in Table II are percentage values, which also indicate
bad data detection probability of the proposed and the
conventional methods. As evident in Table II, both cases give
correct results for very large and very small error values.
However, for intermediate error values such as Case 2.c,
which can still significantly bias the estimation results, the
proposed approach can detect bad data which is missed by the
conventional chi-squares test.
0
-2
-4
-6
-8
-10
TABLE II.
BAD DATA DETECTION PERFORMANCE
Bad Data Detection Percentage

Proposed
Conventional Bad Data
Modified
Chi-Squares
Present
Chi-Squares
0
0
No
0
0
No
100
68.9
Yes
100
100
Yes
Case
2.a
2.b
2.c
2.d
x 10
According to Table II, the estimation results of Case 2.b

are unbiased, while estimation results of case 2.c are biased.
Fig. 2.a presents the difference between the true states and
estimation results of one randomly selected Monte Carlo run
for Case 2.b. Similarly, Fig. 2.b presents the difference
between the true states and estimation results of the same
randomly selected Monte Carlo run for Case 2.c, such that
both figures consider the same measurement but with
different errors. As seen in Fig. 2.b, although the estimation
results are biased, the conventional method was not capable
of identifying the presence of gross error. On the other hand,
the proposed metric successfully detected the presence of bad
measurement.
100
200
300
States
400
500
(b) Case 2.c

Fig. 2.
Mismatch between estimated and true states.
Finally, it is quite informative to take a look at the

covariance values for the errors and residuals. Fig. 3 presents
the variation of ii and Rii values. As seen in Fig. 3,
compared to the constant Rii values, ii values in general
appear to be much smaller. Therefore, the proposed bad data
suspicion threshold will always be smaller than that of the
conventional Chi-squares test.
-4
10.5
x 10
10
9.5
9
8.5
8
7.5
7
-3
x 10
6.5
0
200
400
600
800
1000
Measurement Residuals
1200
1400
Fig. 3.
x true - x est
Variation of ii and Rii values.
V.
-2
-4
-6
-8
-10
100
200
300
States
(a) Case 2.b
400
500
CONCLUSIONS
In this paper a modified Chi-squares test to improve the

bad data detection accuracy when using WLS method in state
estimation is proposed. As seen in the simulations, the
proposed metric has a better performance compared to the
conventional test in detecting presence of bad data in a given
measurement set. Although the proposed test is successful in
detection of bad data, identification and removal of the bad
measurements will still have to be carried out by methods
such as normalized residuals test [8].
5
Most commercial programs use Chi-squares test as a
computationally cheap filter to decide whether or not to
conduct an identification test. In that sense, this modification
may serve a useful purpose in increasing the reliability of this
initial filter so that bad data will not be missed.
[4]
[5]
[6]
REFERENCES
[1]
[2]
[3]
A. Abur and A. Gomez-Exposito, Power System State Estimation:

Theory and Implementation, book, Marcel Dekker, 2004.
A. C. Aitken, On Least Squares and Linear Combinations of
Observations, Proc. Royal Society of Edinburg, 1935, vol. 35, pp. 4248.
E. Handschin, F. C. Schweppe, J. Kohlas, and A. Fiechter, Bad data
analysis for power systems state estimation, IEEE Trans. Power App.
Syst., vol. 94, pp. 329337, Mar./Apr. 1975.
A. Monticelli, Electric Power System State Estimation, Proceedings

of the IEEE, vol. 88, no 2, February 2000.
K. Takahashi, J. Fagan and M. Chen, Formation of a Sparse Bus
Impedance Matrix and Its Application to Short Circuit Study, PICA
Proceedings, May 1973, pp. 63-69.
Y. E. Campbell and T. A. Davis, Computing the Sparse Inverse
Subset: An Inverse Multi-frontal Approach, University of Florida,
Technical Report TR-95-021.
[7]
B. Bilir and A. Abur, Bad Data Processing When Using the Coupled
Measurement Model and Takahashis Sparse Inverse Method,
Innovative Smart Grid Technologies Conference - Europe, IEEE,
Istanbul, Turkey, 12-15 Oct. 2014.
[8]
A. Monticelli and A. Garcia, Reliable Bad Data Processing for RealTime State Estimation, IEEE Transactions on Power Apparatus and
Systems, Vol. PAS-102, No. 5, May 1983, pp. 1126-1139.

Bad Data

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bad Data

Uploaded by

Copyright:

Available Formats

1

A Modified Chi-Squares Test for

Ali Abur, Fellow, IEEE

AbstractCurrent state estimators employ the Weighted Least

Power system state estimation is one of the key tools of an

test is the most common bad data detection method used in

CONVENTIONAL CHI-SQUARE TEST

Consider a random variable Y, which has a chi-squared

where the random variables X1, X2, , XN are independent

In (2), ei is the measurement error with normal distribution

threshold, the presence of the bad measurement will be

where m is the number of measurements, x is the (nx1)

threshold value according to a chi-squared distribution for a

Eq. (3) represents the probability of X being larger

The conventional chi-square test assumes that the metric

J m (x ) shown in (4) is distributed according to a chi-squared

distribution. However, the denominator is not the variance of

Chi-Squared probabbility density function [1].

xt represents the largest value that will not be identified

Once the linearized measurement model is assumed, the

where r is the measurement residual vector and e is the

where, r = z h(x ) , is the residual covariance matrix. Note

The main computational cost of this approach is the

In this section a real utility system with 265-buses and

Case 1: In this study solution time of WLS estimation is

MEAN COMPUTATION TIME (MILLISECONDS)

Note that is a rank-deficient matrix, such that it is not

Case 2: Bad data detection performance of the proposed

Case 2.a: No bad measurement.

Table II shows the bad data detection performance of the

BAD DATA DETECTION PERFORMANCE

Bad Data Detection Percentage

According to Table II, the estimation results of Case 2.b

(b) Case 2.c

Mismatch between estimated and true states.

Finally, it is quite informative to take a look at the

Variation of ii and Rii values.

(a) Case 2.b

In this paper a modified Chi-squares test to improve the

A. Abur and A. Gomez-Exposito, Power System State Estimation:

A. Monticelli, Electric Power System State Estimation, Proceedings

You might also like