Revised - Again Chapter 5

Chapter 5
Results and Discussion
5.1 Descriptive Statistics of Second Visit Data Variables
Table shows the descriptive statistics of the second visit variables of interests (VI),
TOTEX2 and TOTIN2. This was computed to provide a brief idea on how much a
household spends and earns in a period of time, measure the differences of the statistics
between the two variables and to compare the results with other tests later on. This
descriptive statistics will be also used in comparing the results of imputation classes (IC),
how well the observations are grouped.
Table : Descriptive Statistics using the complete data set
The average total spending of a household in the National Capital Region (NCR) is about
Php 102389.80 while the average total earnings amounted to P134119.40, a difference of
more than thirty thousand pesos. Observations from the TOTIN2 are larger and more
spread than the TOTEX2 because of a larger mean and standard deviation respectively.
The dispersion can be also seen by just looking at the minimum at maximum of the two
variables. The difference between the maximum and minimum of TOTIN2 which
measured more than four million against the range of TOTEX2 measured one million
lower than TOTIN2 can be also a sign of the extreme variability of the observations.
5.2 Formation of Imputation Classes
Table shows the results of the chi-square test where it was done to determine if the
candidate matching variables (MV) are associated with the VIs. The MV stated in the
methodology must be highly correlated to the variables of interest. The first visit
variables of interest, TOTIN1 and TOTEX1, were grouped into four categories in order to
satisfy the assumptions in the association tests. TOTIN1 and TOTEX1 were used in as the
variables to be tested for association rather than second visit variables of interest since the
second visit VIs already contained missing data.
The following candidate matching variables that were tested are the provincial area codes
(PROV), recoded education status (CODES1) and recoded total employed household
members (CODEP1). The PROV has four categories, CODES1 has three, and CODEP1
has also four. Originally, CODES1 and CODEP1 have more than what they have now.
Since the original matching variables have numerous categories (i.e. In CODES1 and
CODEP1, there were more than 60 and 7 categories respectively.), the matching variables
were recoded and further categorized into smaller groups.
The final categories for CODES1 are as follows:
IC1 = At most a high school graduate
IC2 = At most a college graduate and other course that where not specified
IC3 = Taking Masters and Doctoral Degrees

Table : Tests of Association for Matching Variable:
The Chi-Square Test of Independence
Note: Values below the χ2 statistics are the p-values.
The Chi-Squared test of association for the candidates and the variables of interest
showed that PROV, CODES1 and CODEP1 are associated to CODIN1 and CODEX1.
The p-values for all the candidates were very significant. The results of succeeding tests
of association will determine which of the three candidates will be chosen as the MV of
the study.
Table shows the other tests of association, namely, the Phi-Coefficient, Cramers V and
the contingency test. These tests were done in order to assess the degree of association of
the candidates to CODIN1 and CODEX1.
Table : Tests of Association for Matching Variable: Degree of Association

The table above displays the degree of association between the candidates and the
variables of interest. The degree of association for all the tests showed weak association.
In real complex data, the association between variable happens to be smaller or even no
association at all. In all the other tests of association, only CODES1 measured at a
minimum of 20% to be used in dividing the data into imputation classes. The matching
variable for this study is the CODES1 variable.
To have a detailed description of the CODES1 imputation classes, the descriptive
statistics for each imputation class was performed. Table 5 shows the descriptive statistics
of each imputation class of the data. The descriptive statistics will tell if the best MV
decreases the variability of the observations. In checking for the variability of each
imputation class, the standard deviation will be used and compared with the value from
the overall standard deviation of the variables of interest.
Table : Descriptive Statistics of the Data Grouped into Imputation Classes
The table shown above that in the IC1 for both VIs, the first IC which has the largest
number of observations produced lesser spread than the two ICs. The two ICs, IC2 and
IC3 produced large standard deviations however it is being neutralized by a low value
from IC1 which has the largest proportion of the data. It may be that reason why the
standard deviation and the mean of IC3 are large because majority of the extreme values
were contained on that class.
5.2.1 Mean of the Simulated Data by Nonresponse Rate for Each
Variable of Interest
Table shows the result of the means in both VIs under the varying rates of nonresponse.
This was generated to have a brief description on the effects on nonresponse rate on the
population mean ignoring the missing values. More importantly, the results below will
become input in the comparison of the estimates from the imputed data for each
imputation method (IM).
Table : Means of the Retained and Deleted Observations
The mean of the observations set to nonresponse and observations retained showed
contrasting results. When the nonresponse rate gets larger for both sets, the mean of
observations set to nonresponse increases. Conversely, the mean of observations set to

nonresponse decreases when nonresponse rate increases. It’s a possibility that large
values were set to nonresponse that increased the means of the data sets containing
nonresponse for the varying rates of nonresponse. Comparing the means for the varying
nonresponse rates under each VI, the results showed that there is little difference between
the population mean ignoring the missing data and the population mean of the actual
data. However, similar to the description above, as the number of missing values
increase, the deviation between the means of the actual and retained data slowly
increases.

Revised - Again Chapter 5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Revised - Again Chapter 5

Uploaded by

Copyright:

Available Formats

Chapter 5

Results and Discussion

5.1 Descriptive Statistics of Second Visit Data Variables

how well the observations are grouped.

Table : Descriptive Statistics using the complete data set

second visit VIs already contained missing data.

were recoded and further categorized into smaller groups.

The final categories for CODES1 are as follows:

IC1 = At most a high school graduate

IC3 = Taking Masters and Doctoral Degrees

The Chi-Square Test of Independence

Note: Values below the χ2 statistics are the p-values.

the candidates to CODIN1 and CODEX1.

Table : Tests of Association for Matching Variable: Degree of Association

variable for this study is the CODES1 variable.

To have a detailed description of the CODES1 imputation classes, the descriptive

the overall standard deviation of the variables of interest.

Table : Descriptive Statistics of the Data Grouped into Imputation Classes

were contained on that class.

5.2.1 Mean of the Simulated Data by Nonresponse Rate for Each

imputation method (IM).

Table : Means of the Retained and Deleted Observations

observations set to nonresponse increases. Conversely, the mean of observations set to

You might also like