Professional Documents
Culture Documents
To determine the effect of nonresponse rates in the results for each imputation method
(IM), evaluation of different IMs was performed. In the evaluation of the different IMs,
the results of each IM will be discussed independently. For each IM, the discussion of
results will go as follows: (1) bias of the mean of the imputed data, (2) distribution of the
imputed data using the Kolmogorov-Smirnov Goodness of Fit Test, and (3) other
measures of variability using the mean deviation (MD), mean absolute deviation (MAD),
The table of results will contain the following columns: (a) variable of interest (VI), (b)
nonresponse rate (NRR), (c) the bias of the mean of the imputed data, Bias ( y ' ), (d)
percentage of correct distribution of the imputed data to the actual data set out of 1000
Table 8 shows the results of the different criteria in evaluating the imputed data using the
OMI method.
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 640.66 0.00% -6406.60 56929.61 108547.82
TOTEX2 20% 499.43 0.00% -2497.14 59555.36 119193.32
30% -222.76 0.00% 20310.91 90396.26 271775.35
10% -597.84 0.00% 5978.39 77502.27 167206.24
TOTIN2 20% -2855.49 0.00% 14277.43 87469.87 244758.00
30% -6093.27 0.00% 742.53 62388.11 151740.94
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
In (c) of Table 8, results show that for the bias of the mean of the imputed data, as
the NRR increases, the bias for TOTEX2 slowly decreases in magnitude. The
behind the decrease of the bias of the mean of the imputed data. As the magnitude
(i.e. the mean of TOTEX1, the total expenditure of the first visit data, which is
equal to 105566.9) that is higher than the mean of the actual data set also
decreases.
On the other hand, the results shown for TOTIN2 are the opposite of TOTEX2 as
NRR increases. The bias of the mean of the imputed data for TOTIN2 rapidly
increases in magnitude as NRR increases. The rationale for this is the decrease in
TOTEX2, the imputed values (i.e. the mean of TOTIN1, the total income for the
first visit data, which is equal to 121820.7) are much lower than the actual mean
Results in column (e) of Table 8 showed that in all NRRs and VIs, the OMI
method failed to maintain the distribution of the actual data. This was expected
primarily because for each missing observation for the VIs, the observations were
replaced by a single value which is the overall mean of the first visit of the VIs.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
Results from related studies that performed OMI stated that this method is one of
the worst among all IM since it distorts the distribution of the data. The
distribution of the data becomes too peaked which makes this method unsuitable
The three criteria in Table 8 under the columns (f), (g) and (h) show the other
measures of variability of the imputed data. The values for the MAD and RMSD
are increasing in magnitude as NRR increases for TOTEX2. The data which have
the highest percentage of imputed values have the highest values for the three
magnitude is seen in all the three criterions from the twenty to thirty percent NRR
for TOTEX2.
For TOTIN2, the data which have twenty percent imputed observations have the
highest values in all the three measures of variability. Unlike for TOTEX2,
surprisingly, values from the three measures of variability under the highest NRR
Table 9 shows the results of the different criteria in evaluating imputed data using the hot
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 491.91 100.00% 4919.40 78071.61 79251.22
TOTEX2 20% 179.42 96.90% 897.18 78292.63 67149.16
30% -606.37 0.00% -2021.19 81395.79 71390.65
10% -717.52 100.00% -7175.25 105369.15 242022.99
TOTIN2 20% -3095.41 100.00% -15477.09 111748.04 297151.50
30% -6508.65 1.00% -21695.52 115087.13 313814.92
Similar to the results in the OMI method for the TOTIN2 variable, as the NRR
increases, the bias of the mean of the imputed data rapidly increases. In the
TOTEX2 variable, the biases fluctuated as the NRR increases. For TOTEX2
and TOTIN2, the data with the highest NRR has the largest bias. For the
TOTEX2 variable, the data with twenty percent NRR provided the least bias.
On the other hand, the data with the lowest NRR yielded the smallest bias for
TOTIN2.
Results in column (e) shows that in TOTIN2, the data which contained ten and
distribution of the actual data. In TOTEX2, only the data which contained ten
distribution of the actual data for all the one thousand data sets. In the data
observations, 969 out of the 1000 data sets maintained the distribution of the
For TOTEX2 and TOTIN2, the data with the highest number of imputed
worse, none of the simulated data set for TOTEX2 registered the same
distribution as the actual. On the other hand, only a lone data set maintained
the same distribution as the actual. The researchers look into the possibility
that more than one recipient are having the same donor.
The three criteria in Table 9 under the columns (f), (g) and (h) show the other
measures of variability of the imputed data. For the variable TOTEX2, the
following results were obtained: (i) data that contains twenty percent imputed
value yielded the least values for the MD and RMSD, (ii) the data with the
lowest number of imputations yielded the largest value for MD and RMSD
and (iii) MAD is the only criterion which the values are increasing as NRR
increases.
For the variable TOTIN2, the following results were obtained: (i) all the three
criteria increases as NRR increases, (ii) results for the three criteria were
larger than for TOTEX2, and (iii) the data with the largest number of
Table 10 shows the results of the different criteria in evaluating the imputed data using
the deterministic regression imputation method with three imputation classes (DRI3).
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' )
PCD MD MAD RMSD
10% 536.32 100.00% 5363.47 33683.48 70553.64
TOTEX2 20% 1080.12 98.40% 5400.71 33782.60 72487.39
30% 398.39 100.00% 1328.06 32449.49 72803.60
10% 897.11 100.00% 9043.98 51363.17 106374.39
TOTIN2 20% -1815.39 100.00% -9076.98 57429.24 148278.49
30% 356.50 100.00% 1188.31 51886.73 131429.61
Looking at Table 10, column (c), the bias of the VI is increasing in magnitude as
the NRR increases for TOTEX2 and TOTIN2. Compared to OMI and HDI3
where the bias increases tremendously as NRR increases, the increase in bias for
DRI3 is much slower. The bias of the data with twenty percent NRR is just twice
the bias of the data set with ten percent NRR. For TOTEX2, this method produces
larger bias for the mean of the imputed data in all NRR than the OMI and HDI3.
Contrary to the results in the OMI method under this criterion, results in column
(e) shows that the imputed data maintained the distribution of the actual data in all
NRR and VIs. It is even much better than HDI since all of the imputed data sets
under all the NRRs and VIs preserved the same distribution as the actual data. It is
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
interesting to note that the regression models that were used in this study did not
show the expected results that were mentioned in the related literature and
provided a distinct result. Earlier studies that made use of categorical auxiliary
variables, the matching variables that were transformed into dummy variables,
concluded that DRI is just the same as the mean imputation. However, in this
study, the independent variable was the first visit VIs and for each imputation
The three criteria in Table 10 under the columns (f), (g) and (h) show the other
measures of variability of the imputed data. For these criteria, the following
results were obtained: First, results from the three criteria are almost stable as
NRR increases for TOTEX2 and TOTIN2. The rate of change of the values for
MD, MAD and RMSD is minimal compared to OMI and HDI3. Second, the
MAD and RMSD have smaller values than for OMI and HDI3 for TOTEX2 and
TOTIN2. Fitting models with high R2 was the key factor that made this method
Table 11 shows the results of the different criteria in evaluating the imputed data using
the stochastic regression imputation method with three imputation classes (SRI3).
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 536.32 100.00% 5363.47 33683.48 70553.64
TOTEX2 20% 1080.12 98.40% 5400.71 33782.60 72487.39
30% 398.39 100.00% 1328.06 32449.49 72803.60
10% 897.11 100.00% 9043.98 51363.17 106374.39
TOTIN2 20% -1815.39 100.00% -9076.98 57429.24 148278.49
30% 356.50 100.00% 1188.31 51886.73 131429.61
Looking at Table 11, column (c), for TOTEX2 and TOTIN2, values produced for
this method yielded much better results than for DRI3. The bias for TOTEX2 and
TOTIN2 do not follow the same scenario for the previous three method that as the
NRR increases, the bias increases. The biases fluctuate from one NRR to another.
Compared to the three previously evaluated, this method provided the least bias in
the highest NRR for both TOTEX2 and TOTIN2. While the other methods
reached a four digit bias, SRI3 generated only a three digit bias. Moreover, there
is a huge disparity in the third NRR where it only produced less than twenty
Results from the SRI3 performed better than HDI3 which also simulated the data
1000 times. Unlike in HDI3, SRI3 maintained the same distribution for all
imputed data sets for the first and third nonresponse rates. The SRI3 also
outperformed HDI3 for the twenty percent NRR. In earlier studies, the stochastic
regression imputation performs better than any of the three methods used here.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
The random residual was added to the deterministic predicted value to preserve
The three criteria in Table 10 under the columns (f), (g) and (h) show the other
measures of variability of the imputed data. For this criteria, the following results
were obtained: First, similar to the results in measuring the bias of the mean of the
imputed data, results in TOTIN2 for all the criteria fluctuates from one NRR to
increases while the MAD and MD fluctuates from one NRR to another. Third, the
data with the highest NRR yielded the lowest results for the MD criterion.
Fourth, for TOTIN2, the data with twenty percent NRR yielded the largest values
To provide additional information on the distribution of the imputed data that was
discussed previously, the distribution of the true (deleted) values (TVs) and the
imputed values (IVs) from each of the IMs for all the VIs and NRRs were
obtained. Table 12, 13, and 14 shows the frequency distribution of the methods
with their corresponding relative frequencies (RFs) for the first, second, and third
NRR respectively. The RFs’ for the 1000 simulated data set from HDI3 and SRI3
were averaged. The first column represents the VIs frequency classes (FCs). This
was the same classes that were used in the Kolomogorov - Smirnov Goodness of
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
imputed data. For each NRR, the table containing the distribution of the actual
and imputed values will go as follows: (a) VIs, (b) FCs, (c) RFs of the TVs (TV),
(d) RFs of the OMI (OMI), (e) RFs of the HDI3 (HDI3), (f) RFs of the DRI3
10% NRR
IMs
(a) (b) (c)
VI FCs TV (d) (e) (f) (g)
OMI HDI3* DRI3 SRI3*
<37869.5 10.90% 0.00% 13.90% 7.70% 9.50%
37869.5 – 47056.5 9.70% 0.00% 10.20% 8.70% 8.70%
47056.5 – 54922.0 9.70% 0.00% 9.70% 11.40% 6.10%
54922.0 – 62365.0 11.40% 0.00% 8.90% 12.30% 9.50%
63265.0 – 73868.0 8.70% 0.00% 9.10% 11.10% 11.40%
TOTEX2
73868.0 – 86103.0 9.70% 0.00% 9.40% 12.60% 11.10%
86103.0 - 101947.0 10.90% 0.00% 9.40% 8.00% 11.10%
101947.0 - 126254.5 11.10% 100.00% 8.90% 11.40% 8.50%
126254.5 - 169964.0 9.00% 0.00% 8.90% 9.00% 12.20%
>169964 8.90% 0.00% 11.60% 7.70% 12.10%
IMs
(a) (b) (c)
VI FCs TV (d) (e) (f) (g)
OMI HDI3* DRI3 SRI3*
<40570 9.70% 0.00% 15.10% 6.10% 9.10%
40570.0 – 51564.0 10.20% 0.00% 11.90% 8.70% 7.90%
51564.0 – 62006.5 9.40% 0.00% 10.10% 14.50% 8.30%
62006.5 – 73900.5 10.20% 0.00% 9.50% 10.70% 10.00%
73900.5 – 88127.0 9.00% 0.00% 9.60% 12.80% 12.40%
TOTIN2
88127.0 - 104801.0 10.90% 0.00% 9.30% 9.20% 9.00%
104801.0 - 128000.0 11.90% 100.00% 9.80% 9.90% 10.50%
128000.0 - 161669.0 11.40% 0.00% 7.80% 11.10% 9.30%
161669.0 - 233907.0 7.70% 0.00% 8.00% 10.70% 11.20%
>233907 9.90% 0.00% 8.90% 6.30% 12.30%
* RF for each class was obtained by taking the average of the 1000 simulated data set.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
20% NRR
IMs
(a) (b) (c)
VI FCs TV (d) (e) (f) (g)
OMI HDI3* DRI3 SRI3*
<37869.5 9.40% 0.00% 14.30% 7.40% 8.20%
37869.5 - 47056.5 9.70% 0.00% 10.40% 9.60% 7.60%
47056.5 - 54922.0 11.60% 0.00% 9.70% 9.00% 8.20%
54922.0 - 62365.0 10.00% 0.00% 9.00% 11.00% 7.90%
63265.0 - 73868.0 9.60% 0.00% 9.20% 12.30% 10.30%
TOTEX2
73868.0 - 86103.0 8.40% 0.00% 9.40% 12.50% 11.90%
86103.0 - 101947.0 9.60% 0.00% 9.30% 9.90% 10.30%
101947.0 - 126254.5 11.30% 100.00% 8.70% 10.80% 11.80%
126254.5 - 169964.0 9.70% 0.00% 8.70% 8.80% 11.70%
>169964 10.70% 0.00% 11.30% 8.70% 12.10%
IMs
(a) (b) (c)
VI FCs TV (d) (e) (f) (g)
OMI HDI3* DRI3 SRI3*
<40570 10.00% 0.00% 15.70% 4.80% 11.80%
40570.0 - 51564.0 10.30% 0.00% 12.10% 11.90% 12.20%
51564.0 - 62006.5 11.70% 0.00% 10.10% 10.20% 11.30%
62006.5 - 73900.5 10.20% 0.00% 9.60% 11.70% 9.90%
73900.5 - 88127.0 8.60% 0.00% 9.50% 11.90% 8.50%
TOTIN2
88127.0 - 104801.0 9.40% 0.00% 9.30% 9.60% 10.10%
104801.0 - 128000.0 9.10% 100.00% 9.70% 11.70% 9.00%
128000.0 - 161669.0 9.20% 0.00% 7.60% 9.80% 8.30%
161669.0 - 233907.0 11.30% 0.00% 7.80% 9.70% 8.90%
>233907 10.20% 0.00% 8.70% 8.70% 10.10%
* RF for each class was obtained by taking the average of the 1000 simulated data set.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
30% NRR
IMs
(a) (b) (c)
VI FCs TV (d) (e) (f) (g)
OMI HDI3* DRI3 SRI3*
<37869.5 9.80% 0.00% 14.30% 7.80% 10.30%
37869.5 - 47056.5 8.80% 0.00% 10.40% 9.00% 9.60%
47056.5 - 54922.0 9.60% 0.00% 9.70% 9.40% 8.30%
54922.0 - 62365.0 9.50% 0.00% 8.90% 10.80% 9.30%
63265.0 - 73868.0 11.00% 0.00% 9.20% 12.70% 10.10%
TOTEX2
73868.0 - 86103.0 10.70% 0.00% 9.40% 11.50% 10.60%
86103.0 - 101947.0 10.70% 0.00% 9.40% 12.10% 9.80%
101947.0 - 126254.5 9.40% 100.00% 8.70% 8.80% 10.10%
126254.5 - 169964.0 11.00% 0.00% 8.70% 9.00% 8.10%
>169964 9.50% 0.00% 11.30% 9.00% 13.70%
IMs
(a) (b) (c)
VI FCs TV (d) (e) (f) (g)
OMI HDI3* DRI3 SRI3*
< 40570 9.40% 0.00% 15.60% 6.50% 8.90%
40570.0 - 51564.0 9.00% 0.00% 12.10% 10.40% 8.20%
51564.0 - 62006.5 9.90% 0.00% 10.10% 10.80% 8.80%
62006.5 - 73900.5 10.70% 0.00% 9.60% 11.50% 10.10%
73900.5 - 88127.0 10.20% 0.00% 9.50% 12.20% 11.00%
TOTIN2
88127.0 - 104801.0 10.30% 0.00% 9.30% 10.70% 10.20%
104801.0 - 128000.0 10.30% 100.00% 9.70% 10.50% 10.40%
128000.0 - 161669.0 9.80% 0.00% 7.60% 11.20% 10.80%
161669.0 - 233907.0 10.70% 0.00% 7.70% 8.20% 10.30%
>233907 9.90% 0.00% 8.70% 8.00% 11.30%
* RF for each class was obtained by taking the average of the 1000 simulated data set.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
In all NRR, the results clearly illustrate the distortion of the distribution. Since the OMI
method assigns the mean of the first visit VI to all the missing cases, all the data sets
concentrated in one particular frequency class. The three other methods which
implemented imputation classes, gave a better outcome than OMI by spreading the
For the HDI method, in all nonresponse rates, most of the imputed observations clustered
in the first frequency class, that is less than 37859.5 for TOTEX2 and 40570.0 for
TOTIN2. The clustering was also formed for the first and third nonresponse rate in last
frequency class for TOTEX2 and for the all nonresponse rates in second frequency class
for TOTIN2. The percentage of the data from the lowest class for TOTEX2 and TOTIN2,
for all nonresponse rate ranges from 14-16% as compared to the actual percentage which
While there is an over representation of the data for HDI3, an under representation was
observed from the interval 86103-126254.5 for the 10% and 20% nonresponse imputed
data sets respectively and from the interval 63265-101947 for the 30% nonresponse
imputed data sets. The percentage from the interval indicated for the 10% and 20% under
the actual data totaled about 30% while the imputed data only totaled less than 30%.
For the two regression imputation methods, unlike hot deck and OMI which had major
cluster, produced more spread distribution although there are some areas that are under
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
resulted into a severe under representation of the data in particular the first frequency
class. On the other hand, the SRI which considered a random residual provided better
results than DRI. However, there are some areas that the added random produced
For this section, the rankings of all the tests are the basis to determine which of the
following IMs will be chosen as the best IMs for this particular study and data. The
selection of the best method will be independent for all VIs and NRRs. The ranking are
based on a four-point system wherein the rank value of 4 denotes the worst IM for that
specific criterion and 1 denotes the best IM for that criterion. In case of ties, the average
ranks will be substituted. The IM with the smallest rank total will be declared the best IM
for the particular VI and NRR. The ranking of IM will cover the following criteria: (a)
Bias of the mean of the imputed data (N.B.), (b) percentage of correct distributions
(PCD), and (c) Other measures of variability, namely, MD, MAD and RMSD. All in all,
Tables 15, 16 and 17 show the ranking of the different imputation methods for the 10%,
20% and 30% NRR respectively. For each NRR, the table containing the rankings of the
IMs will go as follows: (a) VIs, (b) Criteria, (c) OMI, (d) HDI3, (e) DRI3, and (f) SRI3.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
10% NRR
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 3 1 4 2
PCD 4 1.3 1.3 1.3
MD 3 1 4 2
TOTEX2
MAD 3 4 1 2
RMSD 4 3 1 2
TOTAL 17 10.3 11.3 9.3
Category Rank 4th 2nd 3rd 1st
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 1 2 4 3
PCD 4 1.3 1.3 1.3
MD 1 2 4 3
TOTIN2
MAD 3 4 1 2
RMSD 3 4 1 2
TOTAL 12 13.3 11.3 11.3
Category Rank 3rd 4th 1st 1st
20% NRR
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 2 1 4 3
PCD 4 3 1 2
MD 2 1 4 3
TOTEX2
MAD 3 4 1 2
RMSD 4 2 1 3
TOTAL 15 11 11 13
Category Rank 4th 1st 1st 3rd
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 3 4 2 1
PCD 4 1.3 1.3 1.3
MD 3 4 2 1
TOTIN2
MAD 3 4 1 2
RMSD 3 4 1 2
TOTAL 16 17.3 7.3 7.3
Category Rank 3rd 4th 1st 1st
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
30% NRR
VI CRITERIA IMs
OMI HDI3 DRI3 SRI3
N.B. 1 3 4 2
PCD 4 3 1.5 1.5
MD 1 3 4 2
TOTEX2
MAD 3 4 1 2
RMSD 4 2 1 3
TOTAL 13 15 11.5 10.5
Category Rank 3rd 4th 2nd 1st
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 3 4 2 1
PCD 4 3 1.5 1.5
MD 3 4 2 1
TOTIN2
MAD 3 4 1 2
RMSD 3 4 1 2
TOTAL 16 19 7.5 7.5
Category Rank 3rd 4th 1st 1st
Rankings show that the two regression IMs provided better results than their model-free
counterparts. For all the nonresponse rates under the TOTIN2 variable, the two regression
imputation methods tied as the best IM, and surprisingly the HDI finished the worst IM
behind OMI. Under the TOTEX2 variable, mixed rankings were seen for all nonresponse
rates. The regression methods still provided good results. The SRI method finished first
in the 10% and 30% NRR and ranked third in the 20% NRR while the DRI method
finished third, first and second in the 10%, 20% and 30% NRR respectively. While the
HDI was seen as the worst IM for TOTIN2, the OMI was concluded the worst IM for
TOTEX2 by ranking last for both 10% and 20% NRR and third for the 30% NRR.
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
In conclusion, the best imputation method for this study is the SRI3 using the 1997 FIES
data. It is very closely followed by the DRI3 method. No records in the results show that
SRI3 method ranked last in all the criteria, NRRs and VIs, unlike for DRI3 which
provided the worst IM in the bias of the mean of the imputed data and MD criteria. The
researchers selected the HDI3 as the worst IM in this study. The HDI3 method fared the
worst in most of the criteria in particular to the other measures of variability in the 20%