Evaluation of Different Imputation Methods

Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
5.4 Evaluation of Different Imputation Methods
To determine the effect of nonresponse rates in the results for each imputation method
(IM), evaluation of different IMs was performed. In the evaluation of the different IMs,
the results of each IM will be discussed independently. For each IM, the discussion of
results will go as follows: (1) bias of the mean of the imputed data, (2) distribution of the
imputed data using the Kolmogorov-Smirnov Goodness of Fit Test, and (3) other
measures of variability using the mean deviation (MD), mean absolute deviation (MAD),
and root mean square deviation (RMSD).
The table of results will contain the following columns: (a) variable of interest (VI), (b)
nonresponse rate (NRR), (c) the bias of the mean of the imputed data, Bias ( y ' ), (d)
percentage of correct distribution of the imputed data to the actual data set out of 1000
trials (PCD) , (e) MD, (f) MAD, and (g) RMSD.
5.4.1 Overall Mean Imputation
Table 8 shows the results of the different criteria in evaluating the imputed data using the
OMI method.
Table 8: Criteria results for the OMI method
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 640.66 0.00% -6406.60 56929.61 108547.82
TOTEX2 20% 499.43 0.00% -2497.14 59555.36 119193.32
30% -222.76 0.00% 20310.91 90396.26 271775.35
10% -597.84 0.00% 5978.39 77502.27 167206.24
TOTIN2 20% -2855.49 0.00% 14277.43 87469.87 244758.00
30% -6093.27 0.00% 742.53 62388.11 151740.94
1. Bias of the mean of the imputed data
In (c) of Table 8, results show that for the bias of the mean of the imputed data, as
the NRR increases, the bias for TOTEX2 slowly decreases in magnitude. The
decrease in magnitude of the respondents’ mean as NRR increase is the rationale
behind the decrease of the bias of the mean of the imputed data. As the magnitude
of the respondents’ mean decreases, variability caused by imputing a single value
(i.e. the mean of TOTEX1, the total expenditure of the first visit data, which is
equal to 105566.9) that is higher than the mean of the actual data set also
decreases.
On the other hand, the results shown for TOTIN2 are the opposite of TOTEX2 as
NRR increases. The bias of the mean of the imputed data for TOTIN2 rapidly
increases in magnitude as NRR increases. The rationale for this is the decrease in
magnitude of the respondents’ mean as NRR increases. However, unlike in
TOTEX2, the imputed values (i.e. the mean of TOTIN1, the total income for the
first visit data, which is equal to 121820.7) are much lower than the actual mean
of the data set.
2. Distribution of the Imputed Data
Results in column (e) of Table 8 showed that in all NRRs and VIs, the OMI
method failed to maintain the distribution of the actual data. This was expected
primarily because for each missing observation for the VIs, the observations were
replaced by a single value which is the overall mean of the first visit of the VIs.
Results from related studies that performed OMI stated that this method is one of
the worst among all IM since it distorts the distribution of the data. The
distribution of the data becomes too peaked which makes this method unsuitable
for many post-analyses. (Cheng & Sy, 1999)
3. Other measures of Variability
The three criteria in Table 8 under the columns (f), (g) and (h) show the other
measures of variability of the imputed data. The values for the MAD and RMSD
are increasing in magnitude as NRR increases for TOTEX2. The data which have
the highest percentage of imputed values have the highest values for the three
measures of variability in TOTEX2. It’s worth noting that a huge increase in
magnitude is seen in all the three criterions from the twenty to thirty percent NRR
for TOTEX2.
For TOTIN2, the data which have twenty percent imputed observations have the
highest values in all the three measures of variability. Unlike for TOTEX2,
surprisingly, values from the three measures of variability under the highest NRR
have the lowest results.
5.4.2 Hot Deck Imputation
Table 9 shows the results of the different criteria in evaluating imputed data using the hot
deck imputation (HDI3) method with three imputation classes.

Table 9: Criteria results for the HDI3 method
(c)
(a) (b) (d) (e) (f) (g)
10% 491.91 100.00% 4919.40 78071.61 79251.22
TOTEX2 20% 179.42 96.90% 897.18 78292.63 67149.16
30% -606.37 0.00% -2021.19 81395.79 71390.65
10% -717.52 100.00% -7175.25 105369.15 242022.99
TOTIN2 20% -3095.41 100.00% -15477.09 111748.04 297151.50
30% -6508.65 1.00% -21695.52 115087.13 313814.92
Similar to the results in the OMI method for the TOTIN2 variable, as the NRR
increases, the bias of the mean of the imputed data rapidly increases. In the
TOTEX2 variable, the biases fluctuated as the NRR increases. For TOTEX2
and TOTIN2, the data with the highest NRR has the largest bias. For the
TOTEX2 variable, the data with twenty percent NRR provided the least bias.
On the other hand, the data with the lowest NRR yielded the smallest bias for
TOTIN2.
Results in column (e) shows that in TOTIN2, the data which contained ten and
twenty percent imputation of the total number of observations, maintained the
distribution of the actual data. In TOTEX2, only the data which contained ten
percent imputations of the total number of observations maintained the
distribution of the actual data for all the one thousand data sets. In the data
which contained twenty percent imputations of the total number of
observations, 969 out of the 1000 data sets maintained the distribution of the
actual data set.

For TOTEX2 and TOTIN2, the data with the highest number of imputed
observations failed to maintain the distribution of the actual data. Much
worse, none of the simulated data set for TOTEX2 registered the same
distribution as the actual. On the other hand, only a lone data set maintained
the same distribution as the actual. The researchers look into the possibility
that more than one recipient are having the same donor.
3. Other measures of variability
measures of variability of the imputed data. For the variable TOTEX2, the
following results were obtained: (i) data that contains twenty percent imputed
value yielded the least values for the MD and RMSD, (ii) the data with the
lowest number of imputations yielded the largest value for MD and RMSD
and (iii) MAD is the only criterion which the values are increasing as NRR
increases.
For the variable TOTIN2, the following results were obtained: (i) all the three
criteria increases as NRR increases, (ii) results for the three criteria were
larger than for TOTEX2, and (iii) the data with the largest number of
imputations generated the highest value in the three criteria.

5.4.3 Deterministic Regression Imputation
Table 10 shows the results of the different criteria in evaluating the imputed data using
the deterministic regression imputation method with three imputation classes (DRI3).
Table 10: Criteria results for the DRI3 method
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' )
PCD MD MAD RMSD
10% 536.32 100.00% 5363.47 33683.48 70553.64
TOTEX2 20% 1080.12 98.40% 5400.71 33782.60 72487.39
30% 398.39 100.00% 1328.06 32449.49 72803.60
10% 897.11 100.00% 9043.98 51363.17 106374.39
TOTIN2 20% -1815.39 100.00% -9076.98 57429.24 148278.49
30% 356.50 100.00% 1188.31 51886.73 131429.61
Looking at Table 10, column (c), the bias of the VI is increasing in magnitude as
the NRR increases for TOTEX2 and TOTIN2. Compared to OMI and HDI3
where the bias increases tremendously as NRR increases, the increase in bias for
DRI3 is much slower. The bias of the data with twenty percent NRR is just twice
the bias of the data set with ten percent NRR. For TOTEX2, this method produces
larger bias for the mean of the imputed data in all NRR than the OMI and HDI3.
Contrary to the results in the OMI method under this criterion, results in column
(e) shows that the imputed data maintained the distribution of the actual data in all
NRR and VIs. It is even much better than HDI since all of the imputed data sets
under all the NRRs and VIs preserved the same distribution as the actual data. It is
interesting to note that the regression models that were used in this study did not
show the expected results that were mentioned in the related literature and
provided a distinct result. Earlier studies that made use of categorical auxiliary
variables, the matching variables that were transformed into dummy variables,
concluded that DRI is just the same as the mean imputation. However, in this
study, the independent variable was the first visit VIs and for each imputation
class there is a fitted model which registered a good R2.
measures of variability of the imputed data. For these criteria, the following
results were obtained: First, results from the three criteria are almost stable as
NRR increases for TOTEX2 and TOTIN2. The rate of change of the values for
MD, MAD and RMSD is minimal compared to OMI and HDI3. Second, the
MAD and RMSD have smaller values than for OMI and HDI3 for TOTEX2 and
TOTIN2. Fitting models with high R2 was the key factor that made this method
better than the other two IM previously evaluated.
5.4.4 Stochastic Regression Imputation
Table 11 shows the results of the different criteria in evaluating the imputed data using
the stochastic regression imputation method with three imputation classes (SRI3).
Table 11: Criteria Results for the SRI3 method
(c)
(a) (b) (d) (e) (f) (g)
10% 536.32 100.00% 5363.47 33683.48 70553.64
TOTEX2 20% 1080.12 98.40% 5400.71 33782.60 72487.39
30% 398.39 100.00% 1328.06 32449.49 72803.60
10% 897.11 100.00% 9043.98 51363.17 106374.39
TOTIN2 20% -1815.39 100.00% -9076.98 57429.24 148278.49
30% 356.50 100.00% 1188.31 51886.73 131429.61
Looking at Table 11, column (c), for TOTEX2 and TOTIN2, values produced for
this method yielded much better results than for DRI3. The bias for TOTEX2 and
TOTIN2 do not follow the same scenario for the previous three method that as the
NRR increases, the bias increases. The biases fluctuate from one NRR to another.
Compared to the three previously evaluated, this method provided the least bias in
the highest NRR for both TOTEX2 and TOTIN2. While the other methods
reached a four digit bias, SRI3 generated only a three digit bias. Moreover, there
is a huge disparity in the third NRR where it only produced less than twenty
percent of the bias produced by its deterministic counterpart.
2. Distribution of the imputed data
Results from the SRI3 performed better than HDI3 which also simulated the data
1000 times. Unlike in HDI3, SRI3 maintained the same distribution for all
imputed data sets for the first and third nonresponse rates. The SRI3 also
outperformed HDI3 for the twenty percent NRR. In earlier studies, the stochastic
regression imputation performs better than any of the three methods used here.
The random residual was added to the deterministic predicted value to preserve
the distribution of the data.
measures of variability of the imputed data. For this criteria, the following results
were obtained: First, similar to the results in measuring the bias of the mean of the
imputed data, results in TOTIN2 for all the criteria fluctuates from one NRR to
another. Second, in TOTEX2, only the RMSD criterion increase as NRR
increases while the MAD and MD fluctuates from one NRR to another. Third, the
data with the highest NRR yielded the lowest results for the MD criterion.
Fourth, for TOTIN2, the data with twenty percent NRR yielded the largest values
for the three criteria.
5.5 Distribution of the True vs. Imputed Values
To provide additional information on the distribution of the imputed data that was
discussed previously, the distribution of the true (deleted) values (TVs) and the
imputed values (IVs) from each of the IMs for all the VIs and NRRs were
obtained. Table 12, 13, and 14 shows the frequency distribution of the methods
with their corresponding relative frequencies (RFs) for the first, second, and third
NRR respectively. The RFs’ for the 1000 simulated data set from HDI3 and SRI3
were averaged. The first column represents the VIs frequency classes (FCs). This
was the same classes that were used in the Kolomogorov - Smirnov Goodness of
Fit Test in determining the estimated percentage of similar distributions of the
imputed data. For each NRR, the table containing the distribution of the actual
and imputed values will go as follows: (a) VIs, (b) FCs, (c) RFs of the TVs (TV),
(d) RFs of the OMI (OMI), (e) RFs of the HDI3 (HDI3), (f) RFs of the DRI3
(DRI3), and (g) RFs of the SRI3 (SRI3).
Table 12: Distribution of the TVs and IVs: 10% NRR
10% NRR
IMs
(a) (b) (c)
VI FCs TV (d) (e) (f) (g)
OMI HDI3* DRI3 SRI3*
<37869.5 10.90% 0.00% 13.90% 7.70% 9.50%
37869.5 – 47056.5 9.70% 0.00% 10.20% 8.70% 8.70%
47056.5 – 54922.0 9.70% 0.00% 9.70% 11.40% 6.10%
54922.0 – 62365.0 11.40% 0.00% 8.90% 12.30% 9.50%
63265.0 – 73868.0 8.70% 0.00% 9.10% 11.10% 11.40%
TOTEX2
73868.0 – 86103.0 9.70% 0.00% 9.40% 12.60% 11.10%
86103.0 - 101947.0 10.90% 0.00% 9.40% 8.00% 11.10%
101947.0 - 126254.5 11.10% 100.00% 8.90% 11.40% 8.50%
126254.5 - 169964.0 9.00% 0.00% 8.90% 9.00% 12.20%
>169964 8.90% 0.00% 11.60% 7.70% 12.10%
IMs
(a) (b) (c)
<40570 9.70% 0.00% 15.10% 6.10% 9.10%
40570.0 – 51564.0 10.20% 0.00% 11.90% 8.70% 7.90%
51564.0 – 62006.5 9.40% 0.00% 10.10% 14.50% 8.30%
62006.5 – 73900.5 10.20% 0.00% 9.50% 10.70% 10.00%
73900.5 – 88127.0 9.00% 0.00% 9.60% 12.80% 12.40%
TOTIN2
88127.0 - 104801.0 10.90% 0.00% 9.30% 9.20% 9.00%
104801.0 - 128000.0 11.90% 100.00% 9.80% 9.90% 10.50%
128000.0 - 161669.0 11.40% 0.00% 7.80% 11.10% 9.30%
161669.0 - 233907.0 7.70% 0.00% 8.00% 10.70% 11.20%
>233907 9.90% 0.00% 8.90% 6.30% 12.30%
* RF for each class was obtained by taking the average of the 1000 simulated data set.
20% NRR
IMs
(a) (b) (c)
<37869.5 9.40% 0.00% 14.30% 7.40% 8.20%
37869.5 - 47056.5 9.70% 0.00% 10.40% 9.60% 7.60%
47056.5 - 54922.0 11.60% 0.00% 9.70% 9.00% 8.20%
54922.0 - 62365.0 10.00% 0.00% 9.00% 11.00% 7.90%
63265.0 - 73868.0 9.60% 0.00% 9.20% 12.30% 10.30%
TOTEX2
73868.0 - 86103.0 8.40% 0.00% 9.40% 12.50% 11.90%
86103.0 - 101947.0 9.60% 0.00% 9.30% 9.90% 10.30%
101947.0 - 126254.5 11.30% 100.00% 8.70% 10.80% 11.80%
126254.5 - 169964.0 9.70% 0.00% 8.70% 8.80% 11.70%
>169964 10.70% 0.00% 11.30% 8.70% 12.10%
IMs
(a) (b) (c)
<40570 10.00% 0.00% 15.70% 4.80% 11.80%
40570.0 - 51564.0 10.30% 0.00% 12.10% 11.90% 12.20%
51564.0 - 62006.5 11.70% 0.00% 10.10% 10.20% 11.30%
62006.5 - 73900.5 10.20% 0.00% 9.60% 11.70% 9.90%
73900.5 - 88127.0 8.60% 0.00% 9.50% 11.90% 8.50%
TOTIN2
88127.0 - 104801.0 9.40% 0.00% 9.30% 9.60% 10.10%
104801.0 - 128000.0 9.10% 100.00% 9.70% 11.70% 9.00%
128000.0 - 161669.0 9.20% 0.00% 7.60% 9.80% 8.30%
161669.0 - 233907.0 11.30% 0.00% 7.80% 9.70% 8.90%
>233907 10.20% 0.00% 8.70% 8.70% 10.10%
30% NRR
IMs
(a) (b) (c)
<37869.5 9.80% 0.00% 14.30% 7.80% 10.30%
37869.5 - 47056.5 8.80% 0.00% 10.40% 9.00% 9.60%
47056.5 - 54922.0 9.60% 0.00% 9.70% 9.40% 8.30%
54922.0 - 62365.0 9.50% 0.00% 8.90% 10.80% 9.30%
63265.0 - 73868.0 11.00% 0.00% 9.20% 12.70% 10.10%
TOTEX2
73868.0 - 86103.0 10.70% 0.00% 9.40% 11.50% 10.60%
86103.0 - 101947.0 10.70% 0.00% 9.40% 12.10% 9.80%
101947.0 - 126254.5 9.40% 100.00% 8.70% 8.80% 10.10%
126254.5 - 169964.0 11.00% 0.00% 8.70% 9.00% 8.10%
>169964 9.50% 0.00% 11.30% 9.00% 13.70%
IMs
(a) (b) (c)
< 40570 9.40% 0.00% 15.60% 6.50% 8.90%
40570.0 - 51564.0 9.00% 0.00% 12.10% 10.40% 8.20%
51564.0 - 62006.5 9.90% 0.00% 10.10% 10.80% 8.80%
62006.5 - 73900.5 10.70% 0.00% 9.60% 11.50% 10.10%
73900.5 - 88127.0 10.20% 0.00% 9.50% 12.20% 11.00%
TOTIN2
88127.0 - 104801.0 10.30% 0.00% 9.30% 10.70% 10.20%
104801.0 - 128000.0 10.30% 100.00% 9.70% 10.50% 10.40%
128000.0 - 161669.0 9.80% 0.00% 7.60% 11.20% 10.80%
161669.0 - 233907.0 10.70% 0.00% 7.70% 8.20% 10.30%
>233907 9.90% 0.00% 8.70% 8.00% 11.30%
In all NRR, the results clearly illustrate the distortion of the distribution. Since the OMI
method assigns the mean of the first visit VI to all the missing cases, all the data sets
concentrated in one particular frequency class. The three other methods which
implemented imputation classes, gave a better outcome than OMI by spreading the
distribution of the imputed data.
For the HDI method, in all nonresponse rates, most of the imputed observations clustered
in the first frequency class, that is less than 37859.5 for TOTEX2 and 40570.0 for
TOTIN2. The clustering was also formed for the first and third nonresponse rate in last
frequency class for TOTEX2 and for the all nonresponse rates in second frequency class
for TOTIN2. The percentage of the data from the lowest class for TOTEX2 and TOTIN2,
for all nonresponse rate ranges from 14-16% as compared to the actual percentage which
only ranges from 9-11%.
While there is an over representation of the data for HDI3, an under representation was
observed from the interval 86103-126254.5 for the 10% and 20% nonresponse imputed
data sets respectively and from the interval 63265-101947 for the 30% nonresponse
imputed data sets. The percentage from the interval indicated for the 10% and 20% under
the actual data totaled about 30% while the imputed data only totaled less than 30%.
For the two regression imputation methods, unlike hot deck and OMI which had major
cluster, produced more spread distribution although there are some areas that are under
represented. The failure to consider a random residual term in deterministic regression
resulted into a severe under representation of the data in particular the first frequency
class. On the other hand, the SRI which considered a random residual provided better
results than DRI. However, there are some areas that the added random produced
significant excess mostly from the last frequency class.
5.6 Choosing the best imputation method
For this section, the rankings of all the tests are the basis to determine which of the
following IMs will be chosen as the best IMs for this particular study and data. The
selection of the best method will be independent for all VIs and NRRs. The ranking are
based on a four-point system wherein the rank value of 4 denotes the worst IM for that
specific criterion and 1 denotes the best IM for that criterion. In case of ties, the average
ranks will be substituted. The IM with the smallest rank total will be declared the best IM
for the particular VI and NRR. The ranking of IM will cover the following criteria: (a)
Bias of the mean of the imputed data (N.B.), (b) percentage of correct distributions
(PCD), and (c) Other measures of variability, namely, MD, MAD and RMSD. All in all,
there are five criteria that each IM will be rank in.
Tables 15, 16 and 17 show the ranking of the different imputation methods for the 10%,
20% and 30% NRR respectively. For each NRR, the table containing the rankings of the
IMs will go as follows: (a) VIs, (b) Criteria, (c) OMI, (d) HDI3, (e) DRI3, and (f) SRI3.
Table 15: Ranking of the Different IMs: 10% NRR
10% NRR
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 3 1 4 2
PCD 4 1.3 1.3 1.3
MD 3 1 4 2
TOTEX2
MAD 3 4 1 2
RMSD 4 3 1 2
TOTAL 17 10.3 11.3 9.3
Category Rank 4th 2nd 3rd 1st
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 1 2 4 3
PCD 4 1.3 1.3 1.3
MD 1 2 4 3
TOTIN2
MAD 3 4 1 2
RMSD 3 4 1 2
TOTAL 12 13.3 11.3 11.3
Category Rank 3rd 4th 1st 1st
Table 16: Ranking of the Different IMs: 20% NRR
20% NRR
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 2 1 4 3
PCD 4 3 1 2
MD 2 1 4 3
TOTEX2
MAD 3 4 1 2
RMSD 4 2 1 3
TOTAL 15 11 11 13
Category Rank 4th 1st 1st 3rd
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 3 4 2 1
PCD 4 1.3 1.3 1.3
MD 3 4 2 1
TOTIN2
MAD 3 4 1 2
RMSD 3 4 1 2
TOTAL 16 17.3 7.3 7.3
Table 17: Ranking of the different IMs: 30% NRR
30% NRR
VI CRITERIA IMs
OMI HDI3 DRI3 SRI3
N.B. 1 3 4 2
PCD 4 3 1.5 1.5
MD 1 3 4 2
TOTEX2
MAD 3 4 1 2
RMSD 4 2 1 3
TOTAL 13 15 11.5 10.5
Category Rank 3rd 4th 2nd 1st
IMs
VI CRITERIA
OMI HDI3 DRI3 SRI3
N.B. 3 4 2 1
PCD 4 3 1.5 1.5
MD 3 4 2 1
TOTIN2
MAD 3 4 1 2
RMSD 3 4 1 2
TOTAL 16 19 7.5 7.5
Rankings show that the two regression IMs provided better results than their model-free
counterparts. For all the nonresponse rates under the TOTIN2 variable, the two regression
imputation methods tied as the best IM, and surprisingly the HDI finished the worst IM
behind OMI. Under the TOTEX2 variable, mixed rankings were seen for all nonresponse
rates. The regression methods still provided good results. The SRI method finished first
in the 10% and 30% NRR and ranked third in the 20% NRR while the DRI method
finished third, first and second in the 10%, 20% and 30% NRR respectively. While the
HDI was seen as the worst IM for TOTIN2, the OMI was concluded the worst IM for
TOTEX2 by ranking last for both 10% and 20% NRR and third for the 30% NRR.
In conclusion, the best imputation method for this study is the SRI3 using the 1997 FIES
data. It is very closely followed by the DRI3 method. No records in the results show that
SRI3 method ranked last in all the criteria, NRRs and VIs, unlike for DRI3 which
provided the worst IM in the bias of the mean of the imputed data and MD criteria. The
researchers selected the HDI3 as the worst IM in this study. The HDI3 method fared the
worst in most of the criteria in particular to the other measures of variability in the 20%
and 30% NRR.

Evaluation of Different Imputation Methods

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation of Different Imputation Methods

Uploaded by

Copyright:

Available Formats

Generated by Foxit PDF Creator © Foxit Software

http://www.foxitsoftware.com For evaluation only.

5.4 Evaluation of Different Imputation Methods

and root mean square deviation (RMSD).

trials (PCD) , (e) MD, (f) MAD, and (g) RMSD.

5.4.1 Overall Mean Imputation

Table 8: Criteria results for the OMI method

1. Bias of the mean of the imputed data

decrease in magnitude of the respondents’ mean as NRR increase is the rationale

of the respondents’ mean decreases, variability caused by imputing a single value

magnitude of the respondents’ mean as NRR increases. However, unlike in

of the data set.

2. Distribution of the Imputed Data

for many post-analyses. (Cheng & Sy, 1999)

3. Other measures of Variability

measures of variability in TOTEX2. It’s worth noting that a huge increase in

have the lowest results.

5.4.2 Hot Deck Imputation

deck imputation (HDI3) method with three imputation classes.

Table 9: Criteria results for the HDI3 method

1. Bias of the mean of the imputed data

2. Distribution of the Imputed Data

twenty percent imputation of the total number of observations, maintained the

percent imputations of the total number of observations maintained the

which contained twenty percent imputations of the total number of

actual data set.

observations failed to maintain the distribution of the actual data. Much

3. Other measures of variability

imputations generated the highest value in the three criteria.

5.4.3 Deterministic Regression Imputation

Table 10: Criteria results for the DRI3 method

1. Bias of the mean of the imputed data

2. Distribution of the Imputed Data

class there is a fitted model which registered a good R2.

3. Other measures of variability

better than the other two IM previously evaluated.

5.4.4 Stochastic Regression Imputation

Table 11: Criteria Results for the SRI3 method

1. Bias of the mean of the imputed data

percent of the bias produced by its deterministic counterpart.

2. Distribution of the imputed data

the distribution of the data.

3. Other measures of variability

another. Second, in TOTEX2, only the RMSD criterion increase as NRR

for the three criteria.

5.5 Distribution of the True vs. Imputed Values

Fit Test in determining the estimated percentage of similar distributions of the

(DRI3), and (g) RFs of the SRI3 (SRI3).

Table 12: Distribution of the TVs and IVs: 10% NRR

Table 13: Distribution of the TVs and IVs: 20% NRR

Table 14: Distribution of the TVs and IVs: 30% NRR

distribution of the imputed data.

only ranges from 9-11%.

represented. The failure to consider a random residual term in deterministic regression

significant excess mostly from the last frequency class.

5.6 Choosing the best imputation method

there are five criteria that each IM will be rank in.

Table 15: Ranking of the Different IMs: 10% NRR

Table 16: Ranking of the Different IMs: 20% NRR

Table 17: Ranking of the different IMs: 30% NRR

and 30% NRR.

You might also like