Professional Documents
Culture Documents
1. Which imputation technique is the most appropriate in handling partial nonresponse for
2. How do varying nonresponse rates affect the results for each imputation method?
1. To compare the imputation techniques namely overall mean imputation, hot deck
2. To investigate the effect of the varying rates of missing observations, particularly the
effect of 10%, 20% and 30% nonresponse rates on the precision of the estimates.
in surveys causes to create incomplete data, which could pose serious problems during
data analysis, particularly in the generation of statistically reliable estimates. For this
reason, the use of imputation techniques enables to account for the difference between
respondents and nonrespondents. This then helps reduce nonresponse bias in the survey
estimates.
Since most statistical packages require the use of complete data before conducting any
procedure for data analysis, the use of imputation techniques can ensure consistency of
results across analyses, something that an incomplete data set cannot fully provide.
In a news article by Obanil (2006) entitled Topmost Floor of the NSO Building gutted by
Fire posted at Manila Bulletin Online, it mentioned that last October 3, 2006 around 1
Million Pesos worth of documents were destroyed by the fire. Among the documents
gutted by the fire is the first-visit questionnaire of the FIES for the NCR which at the
In terms of statistical research, most countries in the developing world such as the United
States, Canada, UK and the Netherlands already employ imputation techniques in their
respective national statistical offices. In a country such as the Philippines, where data
collection is very difficult especially for some regions like the National Capital Region
(NCR), imputation will be able to ease the problem of data collection and nonresponse.
More importantly, given the great impact of this survey to the country, employing
nonresponse, which could lead to a more meaningful generalization about our country’s
income distribution, spending patterns and poverty incidence. Hence, having estimates
with less bias and more consistent results, this can contribute in making our policymakers
and economists provide better solutions in improving the lives of the Filipinos.
1.5 Scope and Limitations
Throughout this paper, only the data from the 1997 Family Income and Expenditure
Survey (FIES) will be used to tackle the problem of nonresponse and to examine the
impact of the different imputation methods applied in the dataset. With regards to the
extent of how these imputation methods will be applied and evaluated, this paper will
only cover the partial nonresponse occurring in the National Capital Region (NCR) since
NCR is noted as the region with highest nonresponse rate. Also, the variables that will be
imputed for this study would be the Total Income (TOTIN2) and Total Expenditures
The researchers will only focus on using the 1997 FIES data on the first visit to impute
the partial nonresponse that is present on the second visit. This paper also assumes that
the first visit data is complete and the pattern of nonresponse follows Missing Completely
at Random (MCAR) case. The MCAR case happens if the probability of response to Y is
unrelated to the value of Y itself or to any other variables; making the missing data
randomly distributed across all cases (Musil et. al, 2002). If the pattern on nonresponse
does not satisfy the MCAR assumption, imputation methods may not achieve its purpose.
As for the imputation techniques, only four imputation methods will be applied for this
paper namely: Overall Mean Imputation (OMI), Hot Deck Imputation (HDI),
methods, this will only be limited to the following: (a) Bias of the mean of the Imputed
Data, (b) Assessment of the Distributions of the Imputed vs. the Actual Data and (c) the
criteria mentioned in the report entitled Compensating for Missing Data (Kalton, 1983)
namely the Mean Deviation, Mean Absolute Deviation and the Root Mean Square
Deviation.
PROV
(Provincial Area Codes)
Classes Scope
39 Manila
74 Quezon City
Mandaluyong City
San Juan
Marikina
Pasig City
75 Caloocan
Malabon
Navotas
Valenzuela
76 Makati
Las Pinas
Muntinlupa
Paranaque
Pasay
Taguig
Pateros
CODEP1 (Recoded Total Employed
Household Members)
Classes Scope
0 No employed members
One to two employed
1
members
Three to four employed
2
members
At least five employed
3
members
Table 3:
Note: ….. CODIN stands for coded income for the first visit while CODEX1 stands for
Descriptive Statistics
Valid
VI IC Mean Minimum Maximum Std. Dev
n
93588.3 75619.5
9067.000 1340900 2635
IC1 2 2
186940. 281852.
TOTIN2 14490.00 4215480 1434
IC2 9 3
643191. 829409.
54790.00 4357180 61
IC3 2 3
74866.6 47517.6
9025.000 731937.0 2635
IC1 8 9
135510. 151984.
TOTEX2 13575.00 3203978 1434
IC2 8 3
413184. 532577.
40505.00 2726603 61
IC3 0 1
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 491.91 100.00% 4919.40 78071.61 79251.22
TOTEX2 20% 179.42 96.90% 897.18 78292.63 67149.16
30% -606.37 0.00% -2021.19 81395.79 71390.65
10% -717.52 100.00% -7175.25 105369.15 242022.99
TOTIN2 20% -3095.41 100.00% -15477.09 111748.04 297151.50
30% -6508.65 1.00% -21695.52 115087.13 313814.92
Table 10:
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% -720.46 100.00% -7204.56 23839.82 57726.62
TOTEX2 20% -1469.57 100.00% -7347.86 23231.65 53180.02
30% -2266.38 100.00% -7554.61 24082.88 59795.67
10% -1128.45 100.00% -11284.46 32115.80 77228.48
TOTIN2 20% -2211.82 100.00% -11059.09 35274.03 114957.43
30% -4137.78 100.00% -13792.60 34537.36 103253.12
Table 11:
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 536.32 100.00% 5363.47 33683.48 70553.64
TOTEX2 20% 1080.12 98.40% 5400.71 33782.60 72487.39
30% 398.39 100.00% 1328.06 32449.49 72803.60
10% 897.11 100.00% 9043.98 51363.17 106374.39
TOTIN2 20% -1815.39 100.00% -9076.98 57429.24 148278.49
30% 356.50 100.00% 1188.31 51886.73 131429.61
Figure 5:
100.00%
<37869.5
90.00%
37869.5 - 47056.5
80.00%
47056.5 - 54922.0
70.00%
54922.0 - 62365.0
60.00%
63265.0 - 73868.0
50.00%
73868.0 - 86103.0
40.00%
86103.0 - 101947.0
30.00%
101947.0 - 126254.5
20.00%
126254.5 - 169964.0
10.00%
>169964
0.00%
TV OM I HDI3* DRI3 SRI3*