You are on page 1of 8

1.

2 Statement of the Problem

This paper attempts to answer the following questions:

1. Which imputation technique is the most appropriate in handling partial nonresponse for

the FIES data?

2. How do varying nonresponse rates affect the results for each imputation method?

1.3 Objectives of the Study

The paper will attempt to achieve the following objectives:

1. To compare the imputation techniques namely overall mean imputation, hot deck

imputation, deterministic and stochastic regression imputation, in compensating partial

nonresponse in the FIES.

2. To investigate the effect of the varying rates of missing observations, particularly the

effect of 10%, 20% and 30% nonresponse rates on the precision of the estimates.

1.4 Significance of the Study

Nonresponse is a common problem in conducting surveys. The presence of nonresponse

in surveys causes to create incomplete data, which could pose serious problems during

data analysis, particularly in the generation of statistically reliable estimates. For this

reason, the use of imputation techniques enables to account for the difference between

respondents and nonrespondents. This then helps reduce nonresponse bias in the survey

estimates.
Since most statistical packages require the use of complete data before conducting any

procedure for data analysis, the use of imputation techniques can ensure consistency of

results across analyses, something that an incomplete data set cannot fully provide.

In a news article by Obanil (2006) entitled Topmost Floor of the NSO Building gutted by

Fire posted at Manila Bulletin Online, it mentioned that last October 3, 2006 around 1

Million Pesos worth of documents were destroyed by the fire. Among the documents

gutted by the fire is the first-visit questionnaire of the FIES for the NCR which at the

time of the fire has not yet been encoded.

In terms of statistical research, most countries in the developing world such as the United

States, Canada, UK and the Netherlands already employ imputation techniques in their

respective national statistical offices. In a country such as the Philippines, where data

collection is very difficult especially for some regions like the National Capital Region

(NCR), imputation will be able to ease the problem of data collection and nonresponse.

More importantly, given the great impact of this survey to the country, employing

imputation techniques will help statisticians in providing a method in handling

nonresponse, which could lead to a more meaningful generalization about our country’s

income distribution, spending patterns and poverty incidence. Hence, having estimates

with less bias and more consistent results, this can contribute in making our policymakers

and economists provide better solutions in improving the lives of the Filipinos.
1.5 Scope and Limitations

Throughout this paper, only the data from the 1997 Family Income and Expenditure

Survey (FIES) will be used to tackle the problem of nonresponse and to examine the

impact of the different imputation methods applied in the dataset. With regards to the

extent of how these imputation methods will be applied and evaluated, this paper will

only cover the partial nonresponse occurring in the National Capital Region (NCR) since

NCR is noted as the region with highest nonresponse rate. Also, the variables that will be

imputed for this study would be the Total Income (TOTIN2) and Total Expenditures

(TOTEX2) of the second visit of the FIES data.

The researchers will only focus on using the 1997 FIES data on the first visit to impute

the partial nonresponse that is present on the second visit. This paper also assumes that

the first visit data is complete and the pattern of nonresponse follows Missing Completely

at Random (MCAR) case. The MCAR case happens if the probability of response to Y is

unrelated to the value of Y itself or to any other variables; making the missing data

randomly distributed across all cases (Musil et. al, 2002). If the pattern on nonresponse

does not satisfy the MCAR assumption, imputation methods may not achieve its purpose.

As for the imputation techniques, only four imputation methods will be applied for this

paper namely: Overall Mean Imputation (OMI), Hot Deck Imputation (HDI),

Deterministic Regression Imputation (DRI) and Stochastic Regression Imputation (SRI).

Other methods of handling nonresponse will not be covered in this paper.


On the aspect of evaluating the efficacy and appropriateness of the four imputation

methods, this will only be limited to the following: (a) Bias of the mean of the Imputed

Data, (b) Assessment of the Distributions of the Imputed vs. the Actual Data and (c) the

criteria mentioned in the report entitled Compensating for Missing Data (Kalton, 1983)

namely the Mean Deviation, Mean Absolute Deviation and the Root Mean Square

Deviation.

5.2 Formation of Imputation Classes

PROV
(Provincial Area Codes)
Classes Scope
39 Manila
74 Quezon City
Mandaluyong City
San Juan
Marikina
Pasig City
75 Caloocan
Malabon
Navotas
Valenzuela
76 Makati
Las Pinas
Muntinlupa
Paranaque
Pasay
Taguig
Pateros
CODEP1 (Recoded Total Employed
Household Members)
Classes Scope
0 No employed members
One to two employed
1
members
Three to four employed
2
members
At least five employed
3
members

CODES1 (Recoded Education Status)


Classes Scope
No grade completed until
1
High School Graduate
College undergraduate or
2
college graduate
Educational attainment
3 higher than a bachelor's
degree

Table 3:

Note: ….. CODIN stands for coded income for the first visit while CODEX1 stands for

coded expenditure for the first visit.

Phi- Cramer's Contingency


Candidate
Coefficient V Coefficient
MV
CODIN1 CODEX1 CODIN1 CODEX1 CODIN1 CODEX1
PROV 0.192 0.183 0.111 0.105 0.188 0.18
CODES1 0.386 0.408 0.273 0.288 0.36 0.378
CODEP1 0.295 0.216 0.17 0.125 0.283 0.211
p.56

(changed font and font size)

Descriptive Statistics
Valid
VI IC Mean Minimum Maximum Std. Dev
n
93588.3 75619.5
9067.000 1340900 2635
IC1 2 2
186940. 281852.
TOTIN2 14490.00 4215480 1434
IC2 9 3
643191. 829409.
54790.00 4357180 61
IC3 2 3
74866.6 47517.6
9025.000 731937.0 2635
IC1 8 9
135510. 151984.
TOTEX2 13575.00 3203978 1434
IC2 8 3
413184. 532577.
40505.00 2726603 61
IC3 0 1

p.57 (edited, changed font and font size)

Observations Observations set to


VI NRR retained nonresponse (deleted)
n Mean n Mean
10% 3717 102748.610 413 99160.235
TOTEX2 20% 3304 102219.791 826 103069.697
30% 2891 100709.947 1239 106309.365
10% 3717 134821.662 413 127799.121
TOTIN2 20% 3304 133624.722 826 136098.155
30% 2891 130685.596 1239 142131.636
Table 9:

(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 491.91 100.00% 4919.40 78071.61 79251.22
TOTEX2 20% 179.42 96.90% 897.18 78292.63 67149.16
30% -606.37 0.00% -2021.19 81395.79 71390.65
10% -717.52 100.00% -7175.25 105369.15 242022.99
TOTIN2 20% -3095.41 100.00% -15477.09 111748.04 297151.50
30% -6508.65 1.00% -21695.52 115087.13 313814.92

Table 10:

(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% -720.46 100.00% -7204.56 23839.82 57726.62
TOTEX2 20% -1469.57 100.00% -7347.86 23231.65 53180.02
30% -2266.38 100.00% -7554.61 24082.88 59795.67
10% -1128.45 100.00% -11284.46 32115.80 77228.48
TOTIN2 20% -2211.82 100.00% -11059.09 35274.03 114957.43
30% -4137.78 100.00% -13792.60 34537.36 103253.12
Table 11:

(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 536.32 100.00% 5363.47 33683.48 70553.64
TOTEX2 20% 1080.12 98.40% 5400.71 33782.60 72487.39
30% 398.39 100.00% 1328.06 32449.49 72803.60
10% 897.11 100.00% 9043.98 51363.17 106374.39
TOTIN2 20% -1815.39 100.00% -9076.98 57429.24 148278.49
30% 356.50 100.00% 1188.31 51886.73 131429.61

Figure 5:

100.00%
<37869.5
90.00%
37869.5 - 47056.5
80.00%
47056.5 - 54922.0
70.00%
54922.0 - 62365.0
60.00%
63265.0 - 73868.0
50.00%
73868.0 - 86103.0
40.00%
86103.0 - 101947.0
30.00%
101947.0 - 126254.5
20.00%
126254.5 - 169964.0
10.00%
>169964
0.00%
TV OM I HDI3* DRI3 SRI3*

You might also like