You are on page 1of 24

Estimating Permanent Income Using Indicator Variables

B.D. Ferguson, A. Tandon, E. Gakidou, and C.J.L. Murray Evidence and Information for Policy Cluster World Health Organization, Geneva February 19, 2003

Abstract Household surveys in developing countries often lack modules on income and expenditure. When included, the resulting estimates show substantial measurement error and are subject to systematic reporting biases. Indicator-based indices proposed by several analysts show much promise in circumventing these diculties but nonetheless exhibit certain limitations. We present an alternative based on a variant of the probit model, which produces a series of indicator-specic cut-points on a latent scale (permanent income). These cut-points represent values above which respondents are more likely to respond armatively than not, and allow estimation of household permanent income when combined with an individual households responses. This analysis compares estimates of permanent income using the above approach with estimates resulting from principal components analysis using household survey data from Greece, Peru and Pakistan. This new approach yields estimates of permanent income that are comparable with those of other methods in terms of rank correlation with reported income or expenditure, and oers the potential for substantially enhanced comparability across populations and greater precision and eciency through item reduction methods.

Introduction

The empirical examination of the impact of economic and social policy on the objective of poverty alleviation especially at the micro level requires appropriate instruments and other mechanisms to measure poverty. This has become especially relevant in recent years given the increasing use of household survey data in research. Although economists have traditionally relied on reported income and expenditure as the preferred indicators of poverty and living standards, the use of such indicators is problematic. Not only does their measurement require lengthy modules and detailed questions which are not practical for household surveys with other priorities such as health, but the data resulting from such modules are fraught with substantial measurement error and are subject to systematic reporting biases [23]. Recent research into these reporting biases reveals that respondents often have diering interpretations of income and expenditure questions and demonstrates 1

the cognitive complexity of such questions for the typical respondent [7],[30],[2]. For these reasons, a number of analysts have developed methods to estimate household wealth or permanent income using information on the ownership of selected assets or on the use of certain services that correlate with permanent income. In addition to being consistent with a broader denition of poverty which has become increasingly prominent, such indices have enabled the analysis of poverty and inequality using otherwise rich household surveys that do not include income or expenditure modules. Such methods have been applied using the Demographic and Health Surveys (DHS) which provide consistent instruments and sampling frames as well as information on durable goods and dwelling characteristics [10],[15],[28]. Often developed by means of principal components or factor analysis, these asset and permanent income indices have a number of limitations. First, if the principal components or factor analysis is performed on a country by country basis using data from dierent survey instruments, it is not possible to compare the results across countries. An index of household wealth estimated in such a manner can neither be compared across populations nor over a period of time in the same population. Even when the same survey instrument is used, the tendency to acquire an asset such as a boat or air conditioning unit is certain to dier among households of dierent cultural backgrounds living in dierent environments. Similarly, supply and demand for assets such as electronic devices can change rapidly in the same setting over even a few years time, rendering inter-temporal comparisons invalid. Second, principal components and factor analysis do not provide information on the level of income at which dierent assets or goods and services will be purchased. Finally, these two approaches do not provide prospective guidance on the best assets or goods and services to include in future surveys to obtain more rened estimates of household permanent income. In this paper, we use a variant of the hierarchical ordered probit (DIHOPIT) model to develop an indicator of permanent income using household survey data from Greece, Peru and Pakistan. The HOPIT model was originally developed to enhance the cross-population comparability of self-report survey data [32]. We apply the model in order to estimate the cut-points for dierent indicator variables for each of the three surveys, which are combined with the households responses to each question to calculate an estimate of permanent income for that household. We then validate these estimates against reported household income and expenditure. Further analysis will demonstrate that the permanent income for each household can be estimated using dierent subsets of indicators and that systematic analysis of the indicator variable cut-points will enable more parsimonious design of future questionnaires. Only those indicators that are relevant for mapping the range of permanent income for a given country need then be included in the survey questionnaire.

Background

Modelling unobservables, such as permanent income and permanent consumption, is a longstanding issue in economics and econometrics. Friedmans (1957) permanent income hypothesis states that consumption is a function of permanent income. The central argument is that consumption decisions are made in a forward-looking manner and that current (measured) income is a poor determinant of consumption patterns. This, combined with the fact 2

that observed income shows considerable measurement error and is a poor proxy of permanent income since it does not incorporate expectations, has spawned a large literature on the measurement of permanent income. Though permanent income is not directly observable, there is general agreement that it is determined by physical and human resources, such as property, education, or experience which enable income generation. Standard economic theory would argue for specications in which permanent income (Y ) is a function of household characteristics, education, the stock of physical assets, and community and environmental characteristics [29]. For a variety of reasons, such denitions cannot be used to derive estimates of permanent income in cross-country settings. Arguably, one problem is that the stock of physical assets is not simply a causal determinant of permanent income, but rather also an observed indicator of permanent income. This is especially true in less-developed economies characterized by poorly developed nancial sectors which makes household physical asset ownership more of a correlate of permanent income than a determinant. A second problem is that the same bundle of physical assets may map to dierent levels of permanent income in dierent countries. Due to norms, expectations, price distortion, and other environmental factors, the same level of permanent income in two countries may imply a dierent probability of ownership of any given physical asset. Hence, the use of physical asset ownership as a determinant of permanent income may lead to estimates that are simply not comparable across populations. Due to the abundance of household survey data on asset ownership and the considerable biases and measurement error associated with reported income, a substantial literature has developed on asset-based measures of income. Several approaches, ranging from very simple to fairly complex, have been employed to approximate permanent income using asset, housing quality and other indicators from household surveys that do not include information on income or expenditure. One of the more simple approaches is that utilized by Townsend [33] who proposed a set of ve simple indicators to distinguish among households: the ratio of household rooms to persons; car ownership; number of economically active persons seeking work; children aged 5 to 15 who receive school meals free and number of times the household experienced disconnection of electricity in the previous 12 months. Townsend nds a high consistency of ranking across these ve indicators. Another approach by Montgomery et. al. [22] aims to control for the eect of permanent income by including a series of separate indicator variables for durable goods and housing quality measures in a multivariate regression. While this method allows the researcher to test whether consumptions eect on the dependent variable is statistically dierent from zero, it is not possible to isolate the direct eect of each indicator variable on the dependent variable from its indirect eect through household income. Adams et al. [1] and Takasaki et al. [31] adopt and validate a qualitative approach for stratifying households into wealth groups. Their method, consistent with the general approach known as Rapid Rural Appraisal (RRA), involves training interviewers in wealth ranking who then assign households to a wealth group according to pre-identied criteria. The key informant interviewers must reach consensus on the wealth group assigned to each household. The studies conclude that key informants can accurately dierentiate households according to an array of culturally appropriate criteria of wealth. However, it is dicult to establish the content validity of the wealth ranking technique, as it is unknown the extent to

which one criterion might have predominated over others in the process of decision making (i.e., implicit weighting of criteria) and the extent to which these wealth groups might be comparable across populations. A more common approach in the literature is to construct an index using the indicator variables available in a particular survey. The indices that have been proposed range from the seemingly simple to the computationally more complex. Muhuri [24] uses an indicator of whether the household owns at least one of ve durable goods or receives remittances. Jensen [17] and Havanon et al. [16] construct indices by equally weighting items such as durable goods and housing quality variables. Several researchers have constructed an index based on the sum of the number of consumer durable goods and other indicator variables for land ownership, quality of drinking water and sanitation facilities ([14],[5],[13],[27]). Additional approaches involve weighting the indicator variables used in the estimation of the index. Some researchers have tried to approximate household consumption using indices where each item owned by the household is weighted by its value [21]; however, a signicant limitation of this approach is that information on the value of indicator variables is not widely available from surveys or other sources of information. Layte et al. [18] have constructed a relative deprivation index in which each individual item is weighted by the proportion of households possessing that item in each country. As a consequence, not possessing an item is considered a more substantial deprivation in a country where a higher proportion of the population own one. As pointed out by the authors, this relative deprivation index is not suitable for comparisons of absolute levels of deprivation across countries. Similarly, Morris et al.[20] propose an index where they assign to each item in the list of assets (g) a weight equal to the reciprocal of the proportion of the households who own one or more of that item (wg ), then multiplying that weight by the number of units of asset g owned by the household (fg ), and summing the product over all possible assets. The resulting index proposed by Morris et al. for a household j would then be: score =
G X g=1

fgj wg

In addition to the asset score, the total value of household assets owned can be calculated by summing - over all assets owned- the reported current values of those assets (Vg ). This approach is based on the assumption that households with greater resources will purchase and own a greater number of durable goods. This weighting of the household assets assumes that households are progressively less likely to own a particular item the higher its monetary value, as pointed out by Morris et al. The authors also found that the household asset score is correlated highly with household asset values, indicating that the two measures classify households in a similar manner. Principal components analysis and principal factors analysis are two methods which have been used to derive individual weights for items in the construction of a wealth index. The principal components analysis approach to deriving weights employed by Filmer and Pritchett has been widely used by the World Bank in their analyses of socioeconomic 4

inequalities in health based on the DHS surveys [34]. Gakidou and King apply a similar approach based on principal factors analysis to the DHS in order to analyze the components of inequality in child survival [12]. Sahn and Stifel also use factor analysis in their multi-country study of poverty in Africa, and note that there is a high rank correlation between the index created from this method with that resulting from principal components analysis [28]. It is interesting to note that Bollen et al. [3] nd that simple proxies, such as the sum of durable goods and housing quality indicators, perform almost as well as these more complex data reduction methods, and that indices incorporating information on asset values seem to perform worse. They also nd that adding more consumer durable questions to the core set available in most surveys does not substantially improve the estimates of permanent income. Unfortunately these methods provide little guidance with respect to the number of questions which should be used as well as how questions appropriate for a specic country might be selected. In a subsequent section, we elaborate a model which combines information from indicator variables such as assets with other determinants to derive estimates of permanent income. The model assumes permanent income to be a function of household composition (such as household size and number of dependents), household characteristics (such as age and education of the household head), environmental factors (such as urban or rural residence), plus an unobserved component (or random eect) the magnitude of which is derived from the multiplicity of indicator variables available per household. Hence, the model uses information on asset ownership or access to services in order to estimate the magnitude of other unobserved factors that may help determine permanent income. Subsequently, using Bayes theorem, this information on the magnitude of unobserved determinants is incorporated to yield posterior estimates of permanent income. This approach builds on several of the existing measures mentioned earlier, such as the asset index proposed by Filmer and Pritchett. Analysis will show that the approach performs comparably with existing approaches while oering the potential for substantially enhanced comparability across populations. A further advantage is the capability of achieving more parsimonious survey instruments and more rened estimates of permanent income through item reduction methods.

Methods

The statistical model utilized in this analysis is developed in terms of a latent variable, yi , which denotes the permanent income of household i. This variable is, by denition, unobserved. What are observed are a series of asset and other indicator variables for each household i: These dichotomous variables take the value of 0 if the household does not possess or have access to the good or service, and 1 if it does. Examples of these indicators include whether the household has a separate kitchen, hot running water, a television, an automobile, and so on. In addition, we utilize a series of socio-demographic covariates that are expected to be correlated with permanent income such as education, age, sex, household size, or the number of adults in the household. The model can be formulated in terms of the latent variable, along with an observation mechanism for each of the assets and indicator variables. In mathematical terms, we assume that the latent variable yi is a function of 5

a vector of covariates Xi0 , a household-level random eect i with mean 0 and variance 2 which captures other systematic unobserved factors that aect permanent income for a v given household, plus an error term with mean 0 and variance set to 1.1

yi = Xi0 + i + i i N(0; 2 ) v i N(0; 1)

i = 1; :::; N

The observation mechanism is specied for each indicator variable a = 1; :::; A such that a the indicator variable yi :
a yi = 0 a yi = 1 < yi a a < yi +

if if

where a is an indicator-specic cut-point. The model species that there is some indicator-specic threshold a such that a household is more likely to respond armatively than not when its permanent income exceeds this threshold. Figure 1 visualizes the model. The solid line on the left of the graph represents the latent variable, while the line to the right shows the estimated cut-points for certain indicators such as ownership of a car, television or bicycle, or having electricity in the household. These indicator cut-points represent ownership thresholds on the underlying latent variable of permanent income. Given this set-up, we can derive the probability of an armative response conditional on covariates as follows:

Pr(yi = 0 | Xi ; i ) = Pr(yi = 1 | Xi ; i ) =

a=1 A Y a=1

A Y

Pr( <

yi

)=

Pr( a < yi +) =

a=1 A Y a=1

A Y

Pr( < Xi0 + i + i a ) Pr( a < Xi0 + i + i +)

Given the normal distribution assumption for error term , Pr(yi = 0 | Xi ; i ) = Pr(yi = 1 | Xi ; i ) =
1

a=1 A Y a=1

A Y

[( a Xi0 i )] [1 ( a Xi0 i )]

Since this is a latent variable model, the variance is unobserved. The assumption of variance set to 1 is one of mathematical convenience. The coecients on the covariates adjust in response to dierences in variance of the error term in the underlying data generating mechanism.

Figure 1: Hypothetical Indicator Cut-Points on the Permanent Income Latent Variable

Car

Television

Indicator Cut-Points
Electricity

Bicycle

Permanent Income (latent)

where () is the cumulative normal distribution. Conditioning out the random eect i , the probabilities can be written as: Z Z
A Y

Pr(yi = 0 | Xi ) = Pr(yi = 1 | Xi ) =

'( i ) '( i )

a=1 A Y a=1

[( a Xi0 i )]d i [1 ( a Xi0 i )]d i

where '() is the normal probability density function. Rewriting, we have ) Z + 2 =22 ( Y A e i a 0 p Pr(yi = 0 | Xi ) = [( Xi i )] d i 22 a=1 ) Z + 2 =22 ( Y A e i p Pr(yi = 1 | Xi ) = [1 ( a Xi0 i )] d i 22 a=1 Z
M X

The integral can be approximated using M-point Gauss-Hermite quadrature,


+

ex f(x)dx

m=1

$ f(a ) m m

where the $ denote the quadrature weights and the a denote the quadrature abscism m sas. Estimation of parameters can be done using standard maximum likelihood methods. 7

The likelihood is simply the product of all the individual probabilities since these are independent after conditioning on the covariates and the random eect. If there is a household-level random eect in the data i.e., when covariates in the model do not capture all the systematic variation in the latent variable permanent income then there remains information content in the set of responses across indicators for each household that has not been fully exploited. In order to exploit the information content in the set of responses we can make use of Bayes theorem to obtain estimates of the mean level of permanent income conditional on the observed set of responses for a given household. Let i Xi0 + i be the mean level of permanent income predicted by the model. Then Pr(i | yi ) can be estimated using Bayes formula: Pr(i | yi )= R Pr(yi | i )Pr(i ) Pr(yi | i )Pr(i )di (1)

where yi represents the vector of categorical responses on all indicator questions for household i. The way this is implemented is as follows. First, using the model with the random eect, all parameters are estimated including the variance of the random eect 2 . This estimate of the variance of the random eect is then used to simulate one hundred dierent values of i around the predicted Xi0 of the latent variable for each individual in the sample. Hence, for each simulated value of i , Pr(i ) can be calculated. Pr(yi | i ) can be derived using the probability specications as elaborated earlier. Integrating over all simulated values of i for each individual yields the denominator of equation (1). We contrast the results obtained using the above model with those obtained by calculating a weighted index using principal components analysis (PCA). PCA is an exploratory multivariate statistical technique for simplifying complex datasets [19],[4]. The dening characteristic that distinguishes PCA from principal factors analysis is that in PCA it is assumed that all of the variability in an item should be used in the analysis, while in principal factors analysis the concern is only the variability in an item which is shared with other items. While the two methods generally yield similar results, PCA is often preferred as a method for data reduction, while principal factors analysis is preferred where there is a need to detect structure. Given m observations on n variables, the goal of PCA is to reduce the dimensionality of the data matrix by nding r new variables, where r is less than n. Termed principal components, these new r variables together account for as much of the variance in the original n variables as possible while remaining mutually uncorrelated and orthogonal. Each principal component is a linear combination of the original variables, such that researchers often ascribe meaning to what the variables represent. PCA has been applied to asset questions in household surveys under the assumption that it is long-run wealth or permanent income that is the phenomenon attributable to the linear index of variables with the largest amount of information common to all of the variables. The result of such application of PCA is an asset index for each household according to the formula: Ai = f1 (ai1 a1 )=(s1 ) + ::: + fN (aiA aA )=(sA ) where f1 is the scoring factor for the rst asset as determined by the procedure, ai1 8

is the i-th households value for the rst asset and ai and si are the mean and standard deviation of the rst asset variable over all households. In a subsequent section, we provide the results of our assessment of the degree of correlation of the PCA and latent variable approaches with reported household income and expenditure.

Data

Data for this analysis come from nationally representative surveys carried out in three countries with considerably dierent socioeconomic characteristics: Greece, Peru and Pakistan. The surveys were selected based on their inclusion of questions or modules on either income or expenditure, or both, as well as a number of indicator variables covering items such as household ownership of durable goods, characteristics of the neighborhood and dwelling, and access to services such as water, sanitation and electricity. The data for Greece form part of the European Community Household Panel Survey (ECHP). In 1991, Eurostat, the Statistical Oce of the European Communities, completed a comprehensive review of existing data on income at the household and individual levels among EU Member States. One of the outcomes of this review was the decision to launch the ECHP survey, which was intended to allow exibility for adaptation to national specicities despite being designed centrally at Eurostat [8],[9]. The ECHP contains a wide range of comparable social statistics on income including social transfers, labour, poverty and social exclusion, housing and health, as well as several other indicators of living conditions. The longitudinal design of the ECHP (a total of three waves were carried out in 1994, 1995, and 1996) makes it possible to study relationships and transitions in these indicators over time at the micro level. A total of 16 countries participated in the ECHP, from which we have selected Greece for analysis. In addition to information on living conditions and durable goods, the ECHP data for Greece contains reported household income, which can be analyzed for a particular household over the three year period for approximately 4,400 households. The data for Peru come from the National Living Standards Measurement Survey (LSMS) carried out in 2000, based on the methodology developed by the World Bank to measure the well-being and quality of life of households in developing countries. Six such surveys have been carried out in Peru: in 1985-86, 1991, 1994, 1996, 1997 as well as 2000. The most recent national LSMS collected data on the levels of education, health, labor activity and migration for approximately 4,000 households, from which estimates of total household income and expenditure can be derived [26]. Data for Pakistan were collected through the 1991 Pakistan Integrated Household Survey (PIHS) which was conducted jointly by the Federal Bureau of Statistics (FBS), Government of Pakistan, and the World Bank [25]. This nationwide survey gathered individual and household level data on topics including housing conditions, education, employment characteristics, health, consumption, and household energy consumption for approximately 4,800 households. Community level and price data were also collected during the course of the survey, and estimates of total monthly income and expenditure have been calculated for each household. 9

Empirical Assessment

In this section, we validate the use of the DIHOPIT model to calculate an estimate of household permanent income in three dierent economic contexts: a high-income country, Greece, a low-income country, Pakistan and a middle-income country, Peru. Household surveys from these countries were selected because they included items on a range of consumer durables and household services or physical attributes plus full-scale modules on income as well as modules on expenditure in the case of Pakistan and Peru (see Table 1 for a complete list of variables used). For each country, we have analyzed the validity of the estimates of household permanent income through comparisons to reported income or expenditure of the household. For comparison, we have also examined the results of principal components analysis. As part of the analysis of the Peru household survey data, we demonstrate that reasonably comparable results for household permanent income can be obtained using two completely dierent sets of consumer durables or household services. Greece (ECHP, 1994-96) Household permanent income has been estimated for the ECHP sample for 1995 in Greece using responses for 23 dierent consumer durables, household services or household attributes. The ECHP dataset includes three waves for 1994, 1995 and 1996. Income between waves is highly correlated reecting the combination of small measurement error and relatively stable income for most households. Reported income for 1995 has a correlation coecient with the average for households over the three waves of 0.90. In the analysis below, we make use of the average reported income for the three waves of the panel as an indicator that is likely to be more highly correlated with permanent income than income reported in any one year. Table 2 shows the output of the DIHOPIT model applied to the data for 4,413 households in Greece. For this initial assessment, we have omitted the covariates on the latent variable and used only the random eect outlined above. More specically, the model we estimate is:

yi = i + i i N(0; 2 ) v i N(0; 1)

i = 1; :::; N

The observation mechanism remains as described earlier. The cutpoint on the latent variable of permanent income for each indicator variable was statistically signicant for all except indicators 3 (indoor ushing toilet) and 12 (telephone). ln( ) is the log of the estimated standard deviation of the household-level random eect. Figure 2 shows for Greece the name of each indicator variable on the latent variable at its estimated cut-point. The vertical line represents the permanent income latent variable while the horizontal dashes are the estimated cut-points. These cut-points represent points on the underlying scale above which the household is more likely than not to respond armatively to the question 10

Table 1: Variables Used in the Estimation and Validation of Permanent Income Using DIHOPIT Greece ECHP, 1994-96 Pakistan IHS, 1991 Peru LSMS, 2000
Predictors
A g e o f h o u seh o ld h ea d E m p loy m ent o f h o u se h o ld h e ad M ed iu m ed u ca tio n a ttain m ent H igh er ed u c atio n a tta in m ent H o u s eh old size N u m b e r o f h o u s eh o ld ad u lts N u m b er o f h o u s eh o ld a d u lts N u m b er o f h o u s eh o ld ch ild ren L itera cy o f h o u s eh o ld h e a d N u m era cy o f h o u s eh o ld h ea d A g e of h ou seh old h ea d R eligion of h ou seh old h ea d E th n icity o f h o u se h o ld h e a d C iv il s tatu s of h ou seh old h ea d L a n g u a g e o f h o u s eh o ld h e ad

Indicators
S e p a ra te k itch en B a th or sh ow er In d o o r u sh in g to ilet H o t ru n n in g w ater H e atin g o r s to ra ge h eaters P lac e to sit o u ts id e A u tom o b ile C olor telev isio n V id eo reco rd e r M icrowave oven D ishw as h e r Telep h o n e S e con d h om e C an a o rd kee p in g h o m e wa rm C an a ord an nu al h olid ay C an a ord rep la cin g fu rn itu re C an a ord n e w clo th es C an a ord to ea t m ea t often N u m b e r o f ro om s (d ich o tom ized ) Wa lls m ad e of co n c rete m aterial F in is h e d o ors C overe d w in d ow s P riva te ta p w a ter S o a k p it o r b e tte r s a n ita tio n O p e n d rain s o r b etter sa n itation U n d e rg ro u n d d ra in s Tru ck -co llected g a rb a ge C o m m u n a l la trin e o r b etter to ile t P riva te la trin e o r b e tter to ilet P rivate u sh to ilet Telep h o n e H o u s eh o ld m em b er w o rked a b ro a d E lectricity R efrig era to r Freeze r A ir c on d itio n e r R o o m h ea ter Wa ter h e a ter Telev isio n S ew in g m a ch in e G a s stove C y lin d er g as stove D o es n o t ow n a ke rose n e lam p N u m b er o f ro o m s (d ich o to m iz ed ) R a d io C olor telev isio n B len d er or fo o d p ro c esso r R efrig era to r S ew in g m a ch in e G a s stove R ecord p layer B icy cle E lectric fan Telep h o n e ( x e d -lin e) Telep h o n e (m o b ile ) Wash in g m a ch in e C lo th in g d ryer Va cu u m clean er V id e o cas sette rec ord er A u tom o b ile T h erm a Perso n a l co m p u ter M ic row ave oven K n ittin g m a ch in e Iron C a b le telev isio n C o m p a ny o r b u sin es s U rb a n p ro p erty

Validation
To tal h ou se h o ld in com e, 19 94 To tal h ou se h o ld in com e, 19 95 To tal h ou se h o ld in com e, 19 96 A v g . to ta l h o u s eh o ld in co m e (1 99 4-96 ) Tota l h ou seh old in com e To ta l h o u seh o ld e x p en d itu re To ta l h o u seh o ld in co m e To tal h ou seh o ld ex p en d itu re

11

Table 2: Results of Application of Random-Eect DIHOPIT to Greece ECHP (1995) Variable Coecient Std. Error Cut-Points
Separate kitchen Bath or shower Indoor ushing toilet Hot running water Heating or storage heaters Place to sit outside Automobile Color television Video recorder Microwave oven Dishwasher Telephone Second home Aord keeping home warm Aord annual holiday Aord replacing furniture Aord new clothes Aord meat often Home has 2+ rooms Home has 3+ rooms Home has 4+ rooms Home has 5+ rooms Home has 6+ rooms ln( ) rho -1.800 -0.263 -0.078 2.489 1.752 -0.188 1.541 0.113 2.182 3.949 3.036 0.071 3.112 1.675 2.018 2.823 1.277 1.371 -0.394 1.010 2.482 3.601 4.522 -0.450 0.389 0.034 0.049 0.046 0.039 0.038 0.047 0.038 0.044 0.039 0.051 0.042 0.044 0.042 0.038 0.039 0.041 0.039 0.038 0.051 0.039 0.039 0.046 0.065 0.027 0.007

regarding ownership of a good or access to a service. In other terms, if the predicted permanent income is greater than the cut-point for a given asset, then the probability that that household responds armatively is greater than 0.5. In Figure 2, we see that the cutpoints for the number of rooms in the house increases with permanent income. Ownership of a dishwasher or microwave occurs at a higher level of household permanent income than a television or telephone. Living in a home with 2 rooms or less, having an indoor ushing toilet, having a bath or shower, and having a separate kitchen are relatively low on the indicator ladder. The next step in this analysis is to validate the estimation of permanent income from the model. Using the ECHP data for Greece, this can be done by comparing the correlations of the estimated permanent income (using indicator responses from 1995) with household income for the individual years 1994-96, as well as with the average household income over this period. Table 3 shows these correlations as well as the correlation of estimated permanent income with total household income per adult consumption equivalent and total

12

Table 3: Correlation of Estimated Permanent Income with Reported Income Measures, Greece ECHP (1995) Variable Pearsons r Spearmans rho
Household income (1994) Household income (1995) Household income (1996) Average household income (1994-96) Average household income per capita (1994-96) Average household income per adult equivalent (1994-96) 0.50 0.57 0.55 0.60 0.41 0.56 0.61 0.65 0.62 0.67 0.47 0.63

household income per capita. As can be seen from the table, the highest correlation of estimated permanent income with household income for any of the three individual years of data is 0.57 in 1995. If we instead compare the permanent income estimate with the average reported household income over the three year period, the correlation improves to 0.60. In all cases, the rank (Spearman) correlation is considerably higher than the Pearsons correlation suggesting that the relationship between estimated permanent income and reported household income is somewhat non-linear. The rather high degree of correlation between the permanent income estimate and reported household income and the observation that this correlation increases as income is averaged over a period of time would support the assumption that it is permanent income or long-term wealth that is being measured. The higher correlation of estimated permanent income on the latent variable with household income rather than total household income per capita or per adult consumption equivalent conrms the theoretical premise of the model that consumer durables and household services are a function of household permanent income rather than attributes of particular individuals in the household. It is also worth noting that the estimated cut-points are highly stable over the three years of data. The correlation of the indicator cut-points using data from 1994 with those using data from 1995 is 1.00, as is the correlation for estimates using data from the years 1995 and 1996. The correlation between estimates from 1994 and 1996 is 0.99. Another comparison of interest would be how well the DIHOPIT model performs relative to a similar and commonly used approach based on principal components analysis. The correlation coecient and Spearmans rho values of the rank correlation between average reported income and the principal component analysis give nearly identical results, 0.61 and 0.68 respectively. The results above were obtained from application of DIHOPIT without including any covariates on the latent variable thereby allowing the random eect to capture as much of the systematic variance as possible. When additional variables such as age, employment status, educational attainment and household size (see Table 1) are included as predictors, the resulting correlations of the estimated permanent income with reported income in 1995 or average income for 1994-1996 are only slightly improved. While the addition of such information presumably increases validity, the increase is so slight that the results can be

13

Figure 2: Indicator Variable Ladder for 23 Indicators, Greece ECHP (1995).


6+ room home 5+ room home Dishwasher Hot running water Video recorder Heating or storage heaters Automobile Afford new clothes Second home Afford replacing furniture 4+ room home Afford annual holiday Afford keeping home warm Afford meat often 3+ room home Microwave oven

Color television Separate kitchen Place to sit outside 2+ room home

Telephone Indoor flushing toilet Bath or shower

interpreted as being basically robust to the specication of covariates on the latent variable. Pakistan Integrated Household Survey, 1991 For the second validation study, we examine how estimates of permanent income based on the application of the DIHOPIT model function in a low-income setting. Household surveys in populations with lower levels of education, less formal sector employment and in some cases less interaction with the monetized market often have much higher levels of measurement error especially for reported income [6]. The challenges of income and expenditure surveys are illustrated by the 1991 Pakistan Integrated Household Survey in which the correlation of reported income and expenditure was only 0.15. The Spearmans rho was 0.46 reecting the non-linear relationship in the data between reported income and expenditure. Not surprisingly, average income was 94% of average reported expenditure. Given the presumed high level of measurement error in both reported income and expenditure, we would expect estimates of permanent income to have lower correlations with these two variables than in Greece or Peru. Table 4 provides the results from the application of random eect DIHOPIT without covariates on the latent variable for 4,752 households. We see that the cutpoints are statistically signicant for all 30 indicator variables on the latent variable permanent income. Figure 4 shows each consumer durable, household service or household attribute shown on the latent variable. In addition to the statistical signicance of the cutpoints, their ordering has face validity - households with low levels of permanent income are more likely to have a home with a soak pit or communal latrine than a private ush toilet. At the other end of the spectrum, only those households with the highest levels of permanent income in this survey are likely to have an air conditioner, freezer or telephone. Table 5 provides a summary of the validation of the estimates of permanent income for

14

Table 4: Results of Application of Random-Eect DIHOPIT to Pakistan IHS (1991) Variable Coecient Std. Error Cut-Points
Walls made of concrete material Finished oors Covered windows Home has 2+ rooms Home has 3+ rooms Home has 4+ rooms Home has 5+ rooms Home has 6+ rooms Home has 7+ rooms Private tap water Soak pit or better sanitation Open drains or better sanitation Underground drains Garbage collected by truck Private ush toilet Private latrine Communal latrine Telephone Household member has worked abroad Electricity Refrigerator Freezer Air conditioner Room heater Water heater Television Sewing machine Gas stove Cylinder gas stove Kerosene lamp (inverted) ln( ) rho 0.133 -0.072 -0.220 -0.994 0.289 1.104 1.773 2.270 2.701 0.388 -0.808 -0.543 1.114 1.562 0.354 -0.585 -0.708 2.036 2.200 -1.179 1.201 2.812 2.630 2.670 3.346 0.430 2.451 1.210 1.866 0.400 -0.027 0.493 0.025 0.030 0.030 0.031 0.030 0.033 0.038 0.045 0.055 0.030 0.031 0.031 0.033 0.036 0.031 0.031 0.031 0.042 0.043 0.039 0.034 0.059 0.054 0.054 0.084 0.031 0.049 0.034 0.038 0.030 0.021 0.005

15

Figure 3: Indicator Variable Ladder for 30 Indicators, Pakistan IHS (1991).


Freezer Room heater Sewing machine Member worked abroad Cylinder gas stove Garbage collected by truck Refrigerator 4+ room home Kerosene lamp (inverted) Private flush toilet Walls made of concrete Covered windows Private latrine Soak pit Electricity Water heater 7+ room home Air conditioner 6+ room home Telephone 5+ room home Gas stove Underground drains Television Private tap water 3+ room home Finished floors Open drains Communal latrine 2+ room home

Pakistan based on the application of the DIHOPIT model. The correlation with reported household income is 0.17 which is much lower than in Greece but consistent with the low correlation between reported income and expenditure. The relationship is quite non-linear so that the Spearmans rho for reported income and estimated permanent income is 0.53. Notably, estimated permanent income has a closer relationship to reported income than reported total household expenditure. Simultaneously, estimated permanent income has a correlation coecient with total household expenditure of 0.33 and a Spearmans rho of 0.53. In other words, the estimated household permanent income is more closely related both to reported income and expenditure than they are to each other. This is consistent with a hypothesis that both are in truth related to permanent income but measured in the survey with substantial error. As for Greece, the results in Table 5 illustrate that the latent variable appears to be measuring household permanent income rather than permanent income per capita or per adult consumption equivalent. As before, we have rerun the DIHOPIT model to determine the eect of including certain predictors of household permanent income as covariates on the latent variable (see Table 1). As expected, the addition of covariates leads to a negligible increase in the Pearsons correlation of estimated permanent income and reported income and expenditure, equal to 0.17 and 0.34, respectively). In general, we believe that adding covariates that are related to permanent income to the model will improve estimation of household permanent income but the improvement is relatively small in the cases we have investigated. Estimates of household permanent income or wealth using principal components analysis yields very similar correlations coecients (0.16 for reported household income and 0.34 for total household expenditure). Both the DIHOPIT model and the PCA model in this case are capturing similar information about household permanent income or wealth. Peru Living Standards Measurement Survey, 2000

16

Table 5: Correlation of Permanent Income Estimates with Reported Household Income and Expenditure, Pakistan IHS (1991) Variable Pearsons r Spearmans rho
Household Household Household Household Household Household income expenditure income per capita income per adult equivalent expenditure per capita expenditure per adult equivalent (1994-96) 0.17 0.33 0.18 0.18 0.30 0.34 0.53 0.53 0.47 0.52 0.43 0.52

Figure 4: Indicator Variable Ladder for 24 Indicators, Peru LSMS (2000).


Knitting machine Microwave Personal computer Vacuum cleaner Washing machine Electric fan Record player Sewing machine Refrigerator Gas stove Iron Urban property Telephone (fixed-line) Bicycle Company Color television Blender Clothes dryer Cable television Therma Telephone (mobile) Automobile Video cassette recorder

Radio

Our nal validation study included in this paper is Peru. Income and expenditure in this LSMS survey are strongly related (correlation coecient of 0.79), suggesting that there is a combination of more stable income for many households and lower levels of measurement error. Reported household income is 121% of total household expenditure on average. Table 6 shows the output of the model when applied without covariates to the Peru LSMS data using 24 indicator variables. The full list of indicator variables can also be found in Table 1 as well as in Figure 4, which shows the indicator variable ladder resulting from the cut-points predicted by the model. Estimated permanent income using the DIHOPIT model show a strong relationship to reported income (correlation coecient of 0.59) and to reported expenditure (correlation coecient of 0.61). The corresponding Spearmans rho are 0.72 and 0.73 for income and expenditure respectively. Results from application of principal components analysis to the Peru dataset are again similar to the results using the DIHOPIT approach, with Spearman correlation coecients of 0.72 for household income and 0.73 for household expenditure. The analysis for Peru has been rerun with covariates on the latent variable in addition to the household random eect. Results from this application of the model made no dierence 17

Table 6: Results of Application of Random-Eect DIHOPIT to Peru LSMS (2000) Variable Coecient Std. Error Cut-Points
Radio Color television Blender or food processor Refrigerator Sewing machine Gas stove Record player Bicycle Electric fan Telephone (xed line) Telephone (mobile) Washing machine Clothing dryer Vacuum cleaner Video cassette recorder Automobile Therma Personal computer Microwave oven Knitting machine Iron Cable television Company or business Urban property ln( ) rho -1.500 1.660 1.601 1.877 2.197 1.603 2.590 2.315 2.939 2.614 3.883 3.501 4.596 3.687 3.274 3.545 3.931 3.916 4.076 4.539 1.164 4.128 2.073 0.847 -0.133 0.467 0.033 0.037 0.037 0.038 0.038 0.037 0.040 0.038 0.042 0.040 0.055 0.049 0.078 0.051 0.046 0.049 0.056 0.056 0.060 0.075 0.037 0.060 0.037 0.037 0.0028 0.007

18

Table 7: Correlation of Permanent Income Estimates with Reported Household Income and Expenditure, Peru LSMS (2000) Variable Pearsons r Spearmans rho
Household Household Household Household Household Household income expenditure income per capita income per adult equivalent expenditure per capita expenditure per adult equivalent (1994-96) 0.59 0.61 0.52 0.58 0.48 0.59 0.72 0.73 0.69 0.73 0.66 0.73

to the correlation of estimated permanent income with reported income or total household expenditure. Using the Peru survey, we can illustrate one of the main advantages of this approach to the estimation of permanent income using indicator variables on ownership of consumer durables, household services and household attributes. From the 24 original indicator variables used, we have created two non-overlapping sets of 12. The two sets, shown in Table 7, have been created by assigning each indicator in an alternating fashion as one moves of the indicator ladder in Figure 4 to one group or the other. The DIHOPIT model has been rerun for each set of indicator variables separately as if only that set of variables was available. The resulting estimates of household permanent income can be compared both to the original estimation using 24 indicator variables and to reported income and total household expenditure. Remarkably, both estimates based on only 12 indicator variables are highly correlated with the estimation based on 24 variables, with an average correlation coecient of 0.94 for the two subsets. This shows the potential to undertake item reduction in surveys and obtain similar estimates of household permanent income or wealth using many fewer variables. Table 8 shows that both sets of 12 indicator variables yield estimates of permanent income that are equally highly correlated with reported income and expenditure. In other words, estimates of household permanent income do not seem to be biased by the particular set of indicator variables that are used in the analysis. The combination of the potential for item reduction illustrated in moving from 24 indicators to 12 with minimal loss of information and the robustness of the estimation of permanent income to changing the set of indicator variables used provides substantial exibility in both survey design and analysis overtime.

Discussion

This paper has demonstrated the use of the DIHOPIT model to estimate permanent income from household surveys where information on dwelling characteristics, durable goods and other indicator variables is routinely collected. The implications of this analysis are that, with appropriate data, the indicator variables for a particular country can be mapped onto a latent variable which is a measure of permanent income. The model is able to identify 19

Table 8: Item Reduction Subsets, Peru LSMS (2000) Item Subset #1 Item Subset #2
Radio Iron Gas stove Refrigerator Sewing machine Record player Electric fan Washing machine Vacuum cleaner Personal computer Microwave oven Knitting machine Urban property Blender or food processor Color television Company or business Bicycle Telephone (xed line) Video cassette recorder Automobile Telephone (mobile) Therma Cable television Clothing dryer

Table 9: Spearman Rank Correlation of Permanent Income Estimated from Indicator Subsets with Full-Set Permanent Income, Household Income and Household Expenditure, Peru LSMS (2000) Variable Subset #1 Subset #2
Household income Household expenditure Estimated household permanent income (full set) 0.67 0.68 0.93 0.66 0.67 0.93

20

indicator-specic points on the latent variable scale that mark the transition such that at values of the latent variable that are higher than the cut-point, the household is more likely to have access to the good or service than not. Given that we let the data tell us the extent to which any given indicator variable maps to the latent variable, one major advantage of this approach is that the set of indicator variables need not be the same across countries. Designers of future surveys can choose the most appropriate set of indicator variables based on this analysis to better understand the role of the indicator and its relation to permanent income in any given country. In this sense, the approach is analogous to adaptive testing in educational surveys where items are allowed to vary by specic criteria such as respondent ability. In addition, our analysis shows that there is signicant potential for item reduction in that similar results are obtained using fewer questions. This also has implications for questionnaire design: if preliminary analysis suggests that certain items are redundant in that they do not have a signicant marginal contribution in the (posterior) estimation of permanent income then these items may be removed from future rounds of the survey. In the case of Peru, reduction of the number of indicator variables by half yields unbiased estimates of permanent income which show a high degree of correlation with those of the full set. It is likely that discrimination between the permanent income of dierent households depends on the location on the latent variable of the various indicator variables used. The high correlation achieved with two distinct sets of indicator variables may in part be due to the fact that each set of 12 was spaced from low levels of permanent income to high levels. This type of consideration will be important in the prospective design of surveys that want to include a short list of these indicator variables. Furthermore, the comparison with estimates produced using principal components analysis shows that our approach is at least as good as this method in terms of estimating permanent income. Due to norms, price distortion and other factors, however, the same level of permanent income in two countries is likely to imply a dierent probability of ownership of any given physical asset. Hence, one of the key limitations of the principal components analysis approach is that use of physical asset ownership as a determinant of permanent income may lead to estimates that are not comparable across populations. This analysis has not explicitly addressed the problem of cross-population comparability; however, the DIHOPIT model used to estimate permanent income has the potential to be modied so that estimates of permanent income can be directly compared across countries. There are three potential paths that could be pursued to enhance the comparability of permanent income estimates: 1) an exogenous estimate of the average level (mean) and variance of the permanent income distribution by country can be used to adjust the latent scale of permanent income to a comparable scale across countries, in the units of income; 2) the level on the permanent income scale for two or more of the indicator variables can be xed across countries; or 3) the cut-points on the permanent income scale for the indicator variables can be allowed to vary across socioeconomic variables but not across countries. Any of these three methods would place the permanent income estimates on the same scale across countries, thus enhancing the cross-population comparability of the estimates. The approach proposed in this paper is similar to previously proposed asset indices in that it has the potential to provide a more accurate measurement of permanent income 21

than values of reported current income from survey questionnaires, as it is likely that the measurement error in these indicator variables is much less than the error associated with reported income. More research is required to further validate this approach in a larger number of countries, enhance the item-reduction analysis to come up with the optimal set of indicator questions to ask in each country, and nally to make estimates of permanent income directly comparable across countries.

References
[1] Adams, A., Evans, T., Mohammed, R., & Farnsworth, J. (1997). Socioeconomic stratication by wealth ranking: is it valid? World Development, 25 (7):1165-1172. [2] Bogen, K. (1995). Results of the Third Round of SIPP CAPI Cognitive Interviews. Unpublished U.S. Bureau of the Census memorandum, June 13, 1994. [3] Bollen, K., Glanville, J., & Stecklov, G. (2001). Economic status proxies in studies of fertility in developing countries: does the measure matter? MEASURE Evaluation Working Paper. WP-01-38. Carolina Population Center, Chapel Hill, NC. [4] Basilevsky, A. (1994). Statistical Factor Analysis and Related Methods: Theory and Applications, New York, NY: John Wiley & Sons. [5] Bollen, K., Guilkey, D., & Mroz, T. (1995). Binary outcomes and endogenous explanatory variables tests and solutions with an application to the demand for contraceptive use in Tunisia. Demography, 32 (1):111-131. [6] Deaton, A. (1997). The Analysis of Household Surveys: A Microeconometric Approach to Development Policy, Baltimore, MD: Johns Hopkins University Press. [7] Dippo, C. & Norwood, J. (1994). A Review of Research at the Bureau of Labor Statistics. In Questions about Questions (J. Tanur, ed.), New York: Russell Sage Foundation. [8] European Community Household Panel (ECHP) (1996). Volume 1 Survey Methodology and Implementation, Theme 3, Series E, Eurostat, OPOCE, Luxembourg. [9] European Community Household Panel (ECHP) (1996), Volume 1 Survey Questionnaires: Waves 1-3, Theme 3, Series E, Eurostat, OPOCE, Luxembourg. [10] Filmer, D., & Pritchett, L. (2001). Estimating wealth eects without expenditure data or tears: an application to educational enrollments in states of India. Demography, 38 :115-132. [11] Friedman, M. (1957). A Theory of the Consumption Function, Princeton: Princeton University Press. [12] Gakidou, E., & King, G. (2002). Measuring total health inequality: adding individual variation to group-level dierences. International Journal for Equity in Health, 1 :3.

22

[13] Gorabach, P., Hoa, D., Nhan, V., & Tsui, A. (1998). Contraception and abortion in two Vietnamese communes. American Journal of Public Health, 88 (4):660-663. [14] Guilkey, D., & Jayne, S. (1997). Fertility transition in Zimbabwe: Determinants of contraceptive use and method choice. Population Studies, 51 (2):173-190. [15] Hammer, J. (1998). Health Outcomes Across Wealth Groups in Brazil and India. DECRG, The World Bank. Washington, DC. [16] Havanon, N., Knodel, J., & Werasit, S. (1992). The impact of family size on wealth accumulation in rural Thailand. Population Studies, 46 :37-51. [17] Jensen, E. (1996). The fertility impact of alternative family planning distribution channels in Indonesia. Demography, 33 (2):153-165. [18] Layte, R., Maitre, B., Nolan,B.,Whelan, C. (2001). Persistent and consistent poverty in the 1994 and 1995 waves of the European Community Household Panel. Review of Income and Wealth, 47 (4):427-450. [19] Lawley, D., & Maxwell, A. (1971). Factor Analysis as a Statistical Method, London: Butterworth. [20] Morris, S., Carletto, C., Hoddinott, J., Christiaensen, L. (2000). Validity of rapid estimates of household wealth and income for health surveys in rural Africa. Journal of Epidemiology and Community health, 54 (5):381-387. [21] Dargent-Molina, P., James, S., Strpoatz, D., & Savitz, D. (1994). Association between maternal education and infant diarrhea in dierent household and community environments. Social Science and Medicine, 38 (2):343-350. [22] Montgomery, M., Grangnolati, M., Burke, K., & Paredes, E. (2000). Measuring living standards with proxy variables. Demography, 37 (2):155-174. [23] Moore, J., Stinson, L., & Welniak, E. (2000) Income measurement error in surveys: a review. Journal of Ocial Statistics, 16 (4):331-361. [24] Muhuri, P. (1996). Estimating seasonality eects on child mortality in Bangladesh. Demography, 33 (1):98-110. [25] Pakistan Integrated Household Survey (PIHS) (1991). PIHS Section, Federal Bureau of Statistics, G - 8 Markaz, Islamabad, Pakistan. [26] Peru National Living Standards Measurement Survey (LSMS) (2000). LSMS Data Manager, DECRG, The World Bank, Washington DC, USA. [27] Razzaque, A., Alam, N., Wai, L., & Foster A. (1990). Sustained eects of the 1974-75 famine on infant and child mortality in rural area of Bangladesh. Population Studies, 44 (1):145-54. [28] Sahn, D., & Stifel, D. (2000). Poverty comparisons over time and across countries in Africa. World Development, 28 (1), 2123-2155. 23

[29] Singh, I., Squire, L., & Strauss, J. (1986). Agricultural Household Models: Extensions, Applications and Policy, Baltimore: Johns Hopkins University Press. [30] Stinson, L. (1997). The Subjective Assessment of Income and Expenses: Cognitive Test Results. Unpublished U.S. Bureau of Labor Statistics report. Washington, DC. [31] Takasaki, Y., Barham, B., & Goomes, O. (2000). Rapid-rural appraisal in humid tropical forests: an asset possession-based approach and validation methods for wealth assessment among forest peasant households. World Development, 28 (11)1961-1977. [32] Tandon, A., Murray, C., Salomon, J., & King, G. (2001). Statistical Models for Enhancing Cross-Population Comparability. Global Programme on Evidence for Health Policy Discussion Paper No. 42, Geneva: World Health Organization. [33] Townsend, P., Simpson, D., Tibbs, N., (1985). Inequalities in health in the city of Bristol: a preliminary review of statistical evidence. International Journal of Health Services, 15 (4):637-663. [34] World Bank. (2000) Country Reports on Health, Nutrition, Population and Poverty. Available: http://www.worldbank.org/poverty/health/data/intro.htm.

24

You might also like