Professional Documents
Culture Documents
Table shows the descriptive statistics of the second visit variables of interests (VI),
TOTEX2 and TOTIN2. This was computed to provide a brief idea on how much a
household spends and earns in a period of time, measure the differences of the statistics
between the two variables and to compare the results with other tests later on. This
descriptive statistics will be also used in comparing the results of imputation classes (IC),
The average total spending of a household in the National Capital Region (NCR) is about
Php 102389.80 while the average total earnings amounted to P134119.40, a difference of
more than thirty thousand pesos. Observations from the TOTIN2 are larger and more
spread than the TOTEX2 because of a larger mean and standard deviation respectively.
The dispersion can be also seen by just looking at the minimum at maximum of the two
variables. The difference between the maximum and minimum of TOTIN2 which
measured more than four million against the range of TOTEX2 measured one million
lower than TOTIN2 can be also a sign of the extreme variability of the observations.
5.2 Formation of Imputation Classes
Table shows the results of the chi-square test where it was done to determine if the
candidate matching variables (MV) are associated with the VIs. The MV stated in the
methodology must be highly correlated to the variables of interest. The first visit
variables of interest, TOTIN1 and TOTEX1, were grouped into four categories in order to
satisfy the assumptions in the association tests. TOTIN1 and TOTEX1 were used in as the
variables to be tested for association rather than second visit variables of interest since the
The following candidate matching variables that were tested are the provincial area codes
(PROV), recoded education status (CODES1) and recoded total employed household
members (CODEP1). The PROV has four categories, CODES1 has three, and CODEP1
has also four. Originally, CODES1 and CODEP1 have more than what they have now.
Since the original matching variables have numerous categories (i.e. In CODES1 and
CODEP1, there were more than 60 and 7 categories respectively.), the matching variables
IC2 = At most a college graduate and other course that where not specified
The Chi-Squared test of association for the candidates and the variables of interest
showed that PROV, CODES1 and CODEP1 are associated to CODIN1 and CODEX1.
The p-values for all the candidates were very significant. The results of succeeding tests
of association will determine which of the three candidates will be chosen as the MV of
the study.
Table shows the other tests of association, namely, the Phi-Coefficient, Cramers V and
the contingency test. These tests were done in order to assess the degree of association of
variables of interest. The degree of association for all the tests showed weak association.
In real complex data, the association between variable happens to be smaller or even no
association at all. In all the other tests of association, only CODES1 measured at a
minimum of 20% to be used in dividing the data into imputation classes. The matching
statistics for each imputation class was performed. Table 5 shows the descriptive statistics
of each imputation class of the data. The descriptive statistics will tell if the best MV
decreases the variability of the observations. In checking for the variability of each
imputation class, the standard deviation will be used and compared with the value from
The table shown above that in the IC1 for both VIs, the first IC which has the largest
number of observations produced lesser spread than the two ICs. The two ICs, IC2 and
IC3 produced large standard deviations however it is being neutralized by a low value
from IC1 which has the largest proportion of the data. It may be that reason why the
standard deviation and the mean of IC3 are large because majority of the extreme values
Variable of Interest
Table shows the result of the means in both VIs under the varying rates of nonresponse.
This was generated to have a brief description on the effects on nonresponse rate on the
population mean ignoring the missing values. More importantly, the results below will
become input in the comparison of the estimates from the imputed data for each
The mean of the observations set to nonresponse and observations retained showed
contrasting results. When the nonresponse rate gets larger for both sets, the mean of
values were set to nonresponse that increased the means of the data sets containing
nonresponse for the varying rates of nonresponse. Comparing the means for the varying
nonresponse rates under each VI, the results showed that there is little difference between
the population mean ignoring the missing data and the population mean of the actual
data. However, similar to the description above, as the number of missing values
increase, the deviation between the means of the actual and retained data slowly
increases.