Professional Documents
Culture Documents
PARTIAL NONRESPONSE
The Case of the Family Income and Expenditure Survey (FIES)
and
Nonresponse Bias
• The proportion of respondents and nonrespondents in
the sample is given by:
and
Nonresponse Bias
• The population total and mean of the population are given
by:
and
Nonresponse Bias
• The corresponding sample total and mean
are given by:
Nonresponse Bias
• Reasons include:
• Reasons include:
Advantages Disadvantages
• There are four IMs applied in this study, namely, the Overall
(Grand) Mean Imputation (OMI), Hot Deck Imputation
(HDI), Deterministic Regression Imputation (DRI) and
Stochastic Regression Imputation (SRI).
Imputation Procedures
• Imputation Class (IC) is a stratification class that divides the
data into groups before imputation takes place.
• Formation of ICs can be very useful if it were divided into
homogeneous groups.
• Variables used to define IC are called Matching Variables
(MV).
• The group of observations with a response are called donors.
• The group of observations that will be substituted by a
response are called recipients.
Imputation Procedures
• Problems might arise if one does not form IC with caution and
one of them is the determination of a definite number of IC.
Advantages Disadvantages
3. If there are cases that are missing within a particular cell in the
table, select a case from the set of available units from Y variable
and imputed the chosen Y value to the missing value.
2. The categorical variables from the first visit data must fit
into the criteria in order to be selected as a candidate
variable. The criteria are as follows:
and
Comparing the Distribution of the
Imputed vs. Actual Data
• A goodness – of –fit test was utilized for the comparison
of the distributions.
• The p-values for all the candidates were less than 0.0001
indicating that the association is very significant.
Formation of Imputation Classes
• The third imputation class generated the highest r2 while the first
imputation class generated the lowest r2 for all variables of interest and
nonresponse rates.
Results for the Overall Mean Imputation
• For the nonresponse bias and variance:
– For TOTIN2, the data with 20% NRR have the highest values
for all the three criteria.
Results for the Overall Mean Imputation
• For the other measures of variability (i.e. Mean Deviation, Mean
Absolute Deviation and Root Mean Square Deviation):
– In the Mean Deviation, the values show that the OMI for 10%
and 20% NRR underestimates the actual data which is
contrasting from the bias which overestimates the actual data
for the variable TOTEX2 while for the 30% the inverse result
shows.
– In the variable TOTIN2, the values show that the OMI for 10%
and 20% NRR, overestimates the actual data which is
contrasting from the bias which underestimates the actual data.
Results for the Hot Deck Imputation
• For the nonresponse bias and variance:
– The biases for the 10% and 20% NRR under HDI
performed better than OMI.
Results for the Hot Deck Imputation
• For the nonresponse bias and variance:
– The data with 10% NRR provided the least spread of the
population means and the data which contained the
largest number of imputation or 30% NRR provided the
worst spread.
Results for the Hot Deck Imputation
• For the distribution of the imputed data:
– For the variable TOTIN2 with 10% and 20% nonresponse, the
HDI was able to maintain the distribution of the actual data.
– For the variable TOTEX2, the HDI was able to maintain the
distribution of the actual data for the 10% NRR.
– For both TOTIN2 and TOTEX2 under the 30% NRR, the HDI
failed to maintain the distribution of the actual data with 1%
and 0% respectively
Results for the Hot Deck Imputation
• For the other measures of variability (i.e. Mean Deviation,
Mean Absolute Deviation and Root Mean Square
Deviation):
– For the variable TOTIN2, the results for the MD show that the
values were underestimated for all NRR . The values under the MD
decreases as the NRR increases. For the MAD and RMSD, the
values obtained were unusually large as compared to OMI.
– For the variable TOTEX2, the values for the MD show that the HDI
for 10% and 20% NRR, overestimates the actual data which is
consistent from the bias while for the 30% the inverse result shows.
The MAD and RMSD showed that the HDI was better than OMI.
Results for the Deterministic
Regression Imputation (DRI)
• For the nonresponse bias and variance:
– Unlike the results in OMI and HDI where the bias increases
tremendously as the nonresponse rate increases, the
increase in bias for this method is much slower
Results for the Deterministic
Regression Imputation (DRI)
• For the nonresponse bias and variance:
– For the variable TOTEX2, DRI more biased estimates for all
NRR than OMI and HDI.
– Same with the OMI, the variance for this method is also zero
since the population mean is constant due to a single
simulation of the missing observations.
Results for the Deterministic
Regression Imputation (DRI)
• For the distribution of the imputed data:
– The MAD and RMSD for both TOTIN2 and TOTEX2 provided
smaller values for all NRR which shows that this method is
better than the OMI and HDI.
Results for the Stochastic Regression
Imputation (SRI)
• For the nonresponse bias and variance:
– For both TOTIN2 and TOTEX2, SRI showed that there is no
relationship between the nonresponse rate and nonresponse
bias estimates of the population mean. The biases fluctuate
from one nonresponse to another. It also showed that this
method has the least bias for the 30% NRR.
– For both TOTIN2 and TOTEX2, the SRI was able to maintain
the distribution of the actual data for the 10% and 30% NRR.
– For the 20% NRR of the variable TOTEX2, the SRI was better
than the HDI in retaining the distribution of the actual data.
Results for the Stochastic Regression
Imputation (SRI)
• For the other measures of variability (i.e. Mean Deviation, Mean
Absolute Deviation and Root Mean Square Deviation):
– The same results for the MAD and RMSD wherein the SRI
ranked second to the DRI but outperformed the OMI and HDI.
Distribution of the True Values vs.
Imputed Values
• For the OMI, under all the nonresponse rates and variables of
interest, the tables illustrate the distortion of the distribution as the
missing values replaced by a single value is concentrated on a
single frequency class.
• For the HDI method, in all nonresponse rates, most of the imputed
observations clustered in the first frequency class for both
variables TOTIN and TOTEX.
• The clustering under HDI was also formed for the 10% and 30%
NRR in last frequency class for TOTEX2 and for the all
nonresponse rates in second frequency class for TOTIN2.
Distribution of the True Values vs.
Imputed Values
– The worst imputation method for this study is the Hot Deck
Imputation.
CONCLUSION
In Summary…
• After comparing and ranking the four methods, the SRI procedure
is considered the best imputation method for this study. This can
be attributed to the random residuals added to the deterministic
imputation which helped in making the estimates less biased than
its deterministic counterpart.
In Summary…