You are on page 1of 6

Chapter 5

Results and Discussion

5.1 Descriptive Statistics of Second Visit Data Variables

Table shows the descriptive statistics of the second visit variables of interests (VI),

TOTEX2 and TOTIN2. This was computed to provide a brief idea on how much a

household spends and earns in a period of time, measure the differences of the statistics

between the two variables and to compare the results with other tests later on. This

descriptive statistics will be also used in comparing the results of imputation classes (IC),

how well the observations are grouped.

Table : Descriptive Statistics using the complete data set

The average total spending of a household in the National Capital Region (NCR) is about

Php 102389.80 while the average total earnings amounted to P134119.40, a difference of

more than thirty thousand pesos. Observations from the TOTIN2 are larger and more

spread than the TOTEX2 because of a larger mean and standard deviation respectively.

The dispersion can be also seen by just looking at the minimum at maximum of the two

variables. The difference between the maximum and minimum of TOTIN2 which

measured more than four million against the range of TOTEX2 measured one million

lower than TOTIN2 can be also a sign of the extreme variability of the observations.
5.2 Formation of Imputation Classes

Table shows the results of the chi-square test where it was done to determine if the

candidate matching variables (MV) are associated with the VIs. The MV stated in the

methodology must be highly correlated to the variables of interest. The first visit

variables of interest, TOTIN1 and TOTEX1, were grouped into four categories in order to

satisfy the assumptions in the association tests. TOTIN1 and TOTEX1 were used in as the

variables to be tested for association rather than second visit variables of interest since the

second visit VIs already contained missing data.

The following candidate matching variables that were tested are the provincial area codes

(PROV), recoded education status (CODES1) and recoded total employed household

members (CODEP1). The PROV has four categories, CODES1 has three, and CODEP1

has also four. Originally, CODES1 and CODEP1 have more than what they have now.

Since the original matching variables have numerous categories (i.e. In CODES1 and

CODEP1, there were more than 60 and 7 categories respectively.), the matching variables

were recoded and further categorized into smaller groups.

The final categories for CODES1 are as follows:

IC1 = At most a high school graduate

IC2 = At most a college graduate and other course that where not specified

IC3 = Taking Masters and Doctoral Degrees


Table : Tests of Association for Matching Variable:

The Chi-Square Test of Independence

Note: Values below the χ2 statistics are the p-values.

The Chi-Squared test of association for the candidates and the variables of interest

showed that PROV, CODES1 and CODEP1 are associated to CODIN1 and CODEX1.

The p-values for all the candidates were very significant. The results of succeeding tests

of association will determine which of the three candidates will be chosen as the MV of

the study.

Table shows the other tests of association, namely, the Phi-Coefficient, Cramers V and

the contingency test. These tests were done in order to assess the degree of association of

the candidates to CODIN1 and CODEX1.

Table : Tests of Association for Matching Variable: Degree of Association


The table above displays the degree of association between the candidates and the

variables of interest. The degree of association for all the tests showed weak association.

In real complex data, the association between variable happens to be smaller or even no

association at all. In all the other tests of association, only CODES1 measured at a

minimum of 20% to be used in dividing the data into imputation classes. The matching

variable for this study is the CODES1 variable.

To have a detailed description of the CODES1 imputation classes, the descriptive

statistics for each imputation class was performed. Table 5 shows the descriptive statistics

of each imputation class of the data. The descriptive statistics will tell if the best MV

decreases the variability of the observations. In checking for the variability of each

imputation class, the standard deviation will be used and compared with the value from

the overall standard deviation of the variables of interest.

Table : Descriptive Statistics of the Data Grouped into Imputation Classes

The table shown above that in the IC1 for both VIs, the first IC which has the largest

number of observations produced lesser spread than the two ICs. The two ICs, IC2 and
IC3 produced large standard deviations however it is being neutralized by a low value

from IC1 which has the largest proportion of the data. It may be that reason why the

standard deviation and the mean of IC3 are large because majority of the extreme values

were contained on that class.

5.2.1 Mean of the Simulated Data by Nonresponse Rate for Each

Variable of Interest

Table shows the result of the means in both VIs under the varying rates of nonresponse.

This was generated to have a brief description on the effects on nonresponse rate on the

population mean ignoring the missing values. More importantly, the results below will

become input in the comparison of the estimates from the imputed data for each

imputation method (IM).

Table : Means of the Retained and Deleted Observations

The mean of the observations set to nonresponse and observations retained showed

contrasting results. When the nonresponse rate gets larger for both sets, the mean of

observations set to nonresponse increases. Conversely, the mean of observations set to


nonresponse decreases when nonresponse rate increases. It’s a possibility that large

values were set to nonresponse that increased the means of the data sets containing

nonresponse for the varying rates of nonresponse. Comparing the means for the varying

nonresponse rates under each VI, the results showed that there is little difference between

the population mean ignoring the missing data and the population mean of the actual

data. However, similar to the description above, as the number of missing values

increase, the deviation between the means of the actual and retained data slowly

increases.

You might also like