Corrections Sa Thesis As Well As Dun Sa Mga Tables, Graphs, Etc

1.
2 Statement of the Problem
This paper attempts to answer the following questions:
1. Which imputation technique is the most appropriate in handling partial nonresponse for
the FIES data?
2. How do varying nonresponse rates affect the results for each imputation method?
1.3 Objectives of the Study
The paper will attempt to achieve the following objectives:
1. To compare the imputation techniques namely overall mean imputation, hot deck
imputation, deterministic and stochastic regression imputation, in compensating partial
nonresponse in the FIES.
2. To investigate the effect of the varying rates of missing observations, particularly the
effect of 10%, 20% and 30% nonresponse rates on the precision of the estimates.
1.4 Significance of the Study
Nonresponse is a common problem in conducting surveys. The presence of nonresponse
in surveys causes to create incomplete data, which could pose serious problems during
data analysis, particularly in the generation of statistically reliable estimates. For this
reason, the use of imputation techniques enables to account for the difference between
respondents and nonrespondents. This then helps reduce nonresponse bias in the survey
estimates.
Since most statistical packages require the use of complete data before conducting any
procedure for data analysis, the use of imputation techniques can ensure consistency of
results across analyses, something that an incomplete data set cannot fully provide.
In a news article by Obanil (2006) entitled Topmost Floor of the NSO Building gutted by
Fire posted at Manila Bulletin Online, it mentioned that last October 3, 2006 around 1
Million Pesos worth of documents were destroyed by the fire. Among the documents
gutted by the fire is the first-visit questionnaire of the FIES for the NCR which at the
time of the fire has not yet been encoded.
In terms of statistical research, most countries in the developing world such as the United
States, Canada, UK and the Netherlands already employ imputation techniques in their
respective national statistical offices. In a country such as the Philippines, where data
collection is very difficult especially for some regions like the National Capital Region
(NCR), imputation will be able to ease the problem of data collection and nonresponse.
More importantly, given the great impact of this survey to the country, employing
imputation techniques will help statisticians in providing a method in handling
nonresponse, which could lead to a more meaningful generalization about our country’s
income distribution, spending patterns and poverty incidence. Hence, having estimates
with less bias and more consistent results, this can contribute in making our policymakers
and economists provide better solutions in improving the lives of the Filipinos.
1.5 Scope and Limitations
Throughout this paper, only the data from the 1997 Family Income and Expenditure
Survey (FIES) will be used to tackle the problem of nonresponse and to examine the
impact of the different imputation methods applied in the dataset. With regards to the
extent of how these imputation methods will be applied and evaluated, this paper will
only cover the partial nonresponse occurring in the National Capital Region (NCR) since
NCR is noted as the region with highest nonresponse rate. Also, the variables that will be
imputed for this study would be the Total Income (TOTIN2) and Total Expenditures
(TOTEX2) of the second visit of the FIES data.
The researchers will only focus on using the 1997 FIES data on the first visit to impute
the partial nonresponse that is present on the second visit. This paper also assumes that
the first visit data is complete and the pattern of nonresponse follows Missing Completely
at Random (MCAR) case. The MCAR case happens if the probability of response to Y is
unrelated to the value of Y itself or to any other variables; making the missing data
randomly distributed across all cases (Musil et. al, 2002). If the pattern on nonresponse
does not satisfy the MCAR assumption, imputation methods may not achieve its purpose.
As for the imputation techniques, only four imputation methods will be applied for this
paper namely: Overall Mean Imputation (OMI), Hot Deck Imputation (HDI),
Deterministic Regression Imputation (DRI) and Stochastic Regression Imputation (SRI).
Other methods of handling nonresponse will not be covered in this paper.

On the aspect of evaluating the efficacy and appropriateness of the four imputation
methods, this will only be limited to the following: (a) Bias of the mean of the Imputed
Data, (b) Assessment of the Distributions of the Imputed vs. the Actual Data and (c) the
criteria mentioned in the report entitled Compensating for Missing Data (Kalton, 1983)
namely the Mean Deviation, Mean Absolute Deviation and the Root Mean Square
Deviation.
5.2 Formation of Imputation Classes
PROV
(Provincial Area Codes)
Classes Scope
39 Manila
74 Quezon City
Mandaluyong City
San Juan
Marikina
Pasig City
75 Caloocan
Malabon
Navotas
Valenzuela
76 Makati
Las Pinas
Muntinlupa
Paranaque
Pasay
Taguig
Pateros
CODEP1 (Recoded Total Employed
Household Members)
Classes Scope
0 No employed members
One to two employed
1
members
Three to four employed
2
members
At least five employed
3
members
CODES1 (Recoded Education Status)

Classes Scope
No grade completed until
1
High School Graduate
College undergraduate or
2
college graduate
Educational attainment
3 higher than a bachelor's
degree
Table 3:
Note: ….. CODIN stands for coded income for the first visit while CODEX1 stands for
coded expenditure for the first visit.
Phi- Cramer's Contingency

Candidate
Coefficient V Coefficient
MV
CODIN1 CODEX1 CODIN1 CODEX1 CODIN1 CODEX1
PROV 0.192 0.183 0.111 0.105 0.188 0.18
CODES1 0.386 0.408 0.273 0.288 0.36 0.378
CODEP1 0.295 0.216 0.17 0.125 0.283 0.211
p.56
(changed font and font size)
Descriptive Statistics
Valid
VI IC Mean Minimum Maximum Std. Dev
n
93588.3 75619.5
9067.000 1340900 2635
IC1 2 2
186940. 281852.
TOTIN2 14490.00 4215480 1434
IC2 9 3
643191. 829409.
54790.00 4357180 61
IC3 2 3
74866.6 47517.6
9025.000 731937.0 2635
IC1 8 9
135510. 151984.
TOTEX2 13575.00 3203978 1434
IC2 8 3
413184. 532577.
40505.00 2726603 61
IC3 0 1
p.57 (edited, changed font and font size)
Observations Observations set to

VI NRR retained nonresponse (deleted)
n Mean n Mean
10% 3717 102748.610 413 99160.235
TOTEX2 20% 3304 102219.791 826 103069.697
30% 2891 100709.947 1239 106309.365
10% 3717 134821.662 413 127799.121
TOTIN2 20% 3304 133624.722 826 136098.155
30% 2891 130685.596 1239 142131.636
Table 9:
(c)
(a) (b) (d) (e) (f) (g)
VI NRR BIAS( y ' ) PCD MD MAD RMSD
10% 491.91 100.00% 4919.40 78071.61 79251.22
TOTEX2 20% 179.42 96.90% 897.18 78292.63 67149.16
30% -606.37 0.00% -2021.19 81395.79 71390.65
10% -717.52 100.00% -7175.25 105369.15 242022.99
TOTIN2 20% -3095.41 100.00% -15477.09 111748.04 297151.50
30% -6508.65 1.00% -21695.52 115087.13 313814.92
Table 10:
(c)
(a) (b) (d) (e) (f) (g)
10% -720.46 100.00% -7204.56 23839.82 57726.62
TOTEX2 20% -1469.57 100.00% -7347.86 23231.65 53180.02
30% -2266.38 100.00% -7554.61 24082.88 59795.67
10% -1128.45 100.00% -11284.46 32115.80 77228.48
TOTIN2 20% -2211.82 100.00% -11059.09 35274.03 114957.43
30% -4137.78 100.00% -13792.60 34537.36 103253.12
Table 11:
(c)
(a) (b) (d) (e) (f) (g)
10% 536.32 100.00% 5363.47 33683.48 70553.64
TOTEX2 20% 1080.12 98.40% 5400.71 33782.60 72487.39
30% 398.39 100.00% 1328.06 32449.49 72803.60
10% 897.11 100.00% 9043.98 51363.17 106374.39
TOTIN2 20% -1815.39 100.00% -9076.98 57429.24 148278.49
30% 356.50 100.00% 1188.31 51886.73 131429.61
Figure 5:
100.00%
<37869.5
90.00%
37869.5 - 47056.5
80.00%
47056.5 - 54922.0
70.00%
54922.0 - 62365.0
60.00%
63265.0 - 73868.0
50.00%
73868.0 - 86103.0
40.00%
86103.0 - 101947.0
30.00%
101947.0 - 126254.5
20.00%
126254.5 - 169964.0
10.00%
>169964
0.00%
TV OM I HDI3* DRI3 SRI3*

Corrections Sa Thesis As Well As Dun Sa Mga Tables, Graphs, Etc

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Corrections Sa Thesis As Well As Dun Sa Mga Tables, Graphs, Etc

Uploaded by

Copyright:

Available Formats

1.

2 Statement of the Problem

This paper attempts to answer the following questions:

the FIES data?

1.3 Objectives of the Study

The paper will attempt to achieve the following objectives:

imputation, deterministic and stochastic regression imputation, in compensating partial

nonresponse in the FIES.

1.4 Significance of the Study

Nonresponse is a common problem in conducting surveys. The presence of nonresponse

time of the fire has not yet been encoded.

imputation techniques will help statisticians in providing a method in handling

(TOTEX2) of the second visit of the FIES data.

Deterministic Regression Imputation (DRI) and Stochastic Regression Imputation (SRI).

Other methods of handling nonresponse will not be covered in this paper.

5.2 Formation of Imputation Classes

CODES1 (Recoded Education Status)

coded expenditure for the first visit.

Phi- Cramer's Contingency

(changed font and font size)

p.57 (edited, changed font and font size)

Observations Observations set to

You might also like