Conceptual and Empirical Approaches To Developing Fammily-Based Assessment Procedures: Resolving The Case of Family Environment Scale

| | | Conceptual and Empirical Approaches to Developing Family-Based Assessment Procedures: Resolving the Case of the Family Environment Scale* RUDOLF H. MOOS, Ph.D.+ This article focuses on the reliability and validity of the Family Environment Seale (FES). The FES subscales generally show adequate internal consistency reliability and stability over time when applied in samples that are diverse; the items also have good content and face validity. An extensive body of research supports the construct, concurrent, and predictive validity of the FES. More generally, reliability and validity are a Joint function of scale items and response formats and of the characteristics and diversity of specific samples. To contribute to further advances in family assess- ‘ment, researchers need to use both concep- tualand psychometric eriteria rather than rely too heavily on the pursuit of internal consistency reliability and factor analytic * Preparation of this manuscript was supported by ‘NIAAA grants AA02863 and AA06699 and by Veter- ans Administration Medical and Health Services Research and Development Service research funds, ‘Karen McMahon coordinated the content validity project; Bernice Moos conducted the data analyses; and Jim Breckenridge, Lee Cronbach, Ruth Cronkite, dohn Finney, and Ralph Swindle made helpful ‘comments in the development of the main ideas. + Social Beology Laboratory and Far West Health Services Field Program, Stanford University and Department of Veterans Affairs Medical Centers, Palo Alto CA 94904. approaches to seale construction and validation. Fam Proc 29:199-208, 1990 vER the last decade, researchers have developed an impressive array of assessment procedures that focus on family characteristics. As informed scholars and clinicians, how should we evaluate the growing body of information about the reliability, validity, and clinical utility of these assessment procedures? How can we develop realistic standards for reliability and validity that allow for a diversity of conceptual and empirical approaches to test construction and help to advance the field of family assessment? I focus on these issues here primarily with reference to the Family Environment Scale (FES). How- ever, the issues are general ones that apply to other family-focused assessment, procedures, as well as to many procedures that assess individual personality, attitudinal, and behavioral characteristics. In pursuing these issues, T comment on Roosa and Beals’ (27) discussion of measurement issues in family assessment and correct some apparent misconceptions about the development of the FES, the meaning and value of internal consistency reliability, and the use of factor analysis in 199 (0014-7370/90/2804-0199/$02.00/0 © 1990 Family Process, Inc,: 200 / scale construction and validation. More specifically, I describe the approach we ‘used to develop the FES, and present new information on the internal consistency and stability of the five FES subscales on which Roosa and Beals focus. Following that, I discuss the content, construct, concurrent, and predictive validity of the FES. Finally, I consider some general issues and trade-offs involved in different approaches to test construction, emphasize the impor- tance of conceptual analyses and broadband measures in family assessment, and describe some guidelines researchers and journal editors can adopt to promote the development and appropriate use of family- based assessment procedures. CONCEPTUAL AND EMPIRICAL DEVELOPMENT OF THE FES T have described my approach to scale development both in relation to the FES (18, 19, 28, 24) and, more generally, with respect to the Social Climate Scales (20). ‘These publications present a detailed pic- ture of the conceptual and empirical steps involved in the development of the FES. Conceptual Item Formulation Procedures In brief, the initial choice and wording of FES items were guided by information obtained from observations and interviews with families, and by a conceptual formulation of three general areas and a number of specific dimensions that might differentiate among families. Bach of the 200 items chosen for pilot testing identified an aspect of the family environment that reflects the quality of interpersonal relationships (such as the degree of cohesion), the focus on an area of personal growth (such as the degree of achievement or moral-religious emphasis), or the emphasis on system mainte- nance (such as family organization), For example, family cobesion is inferred from items such as “Family members really help and support one another,” and “There is a PAMILY PROCESS feeling of togetherness in our family.” Family organization is inferred from items such as “Activities in our family are carefully planned,” and “We are generally very neat and orderly.” ‘Thus, we built content and face validity into the FES subscales at the outset by defining constructs such as cohesion and organization, preparing items to fit the construct definitions, and selecting items that: were conceptually related to a dimension. Empirical Scale Construction Procedures Our initial assignment of items to a set of 12 hypothesized dimensions was based on conceptual considerations and was prelimi- nary. Next, we developed empirical information on a diverse sample of more than 1,000 individuals in 285 families. We recruited families from newspaper adver- tisements, church groups, local high school students, and people in crisis situations who were undergoing treatment in psy atric and correctional facilities. We i cluded adolescents as well as adults in these families. To broaden the sample, our Black and Hispanic research assistants recruited a group of Black and Hispanic families. We then applied several empirical criteria to select the final set of items and develop the ten FES subscales. Specifi- cally, we tried to select items that (a) had a reasonable response distribution, that is, were not answered in one direction (true or false) by more than 80% of the respondents; (b) discriminated significantly among families; (c) were positively correlated with other items on their subscale; and (d) correlated more highly with their subscale than with any other subscale. Thus, the selected items met empirical criteria in addition toa conceptual criterion of “fit” with the dimension to which they were assigned. The resulting FES subscales had acceptable internal consistencies in this sample.Moos INTERNAL CONSISTENCY AND STABILITY OF THE FES SUBSCALES Roosa and Beals (27) correctly point out that it is appropriate to cross-validate the psychometric characteristics of a new assessment procedure and that I should have followed this approach in the development of the FES. Given the need for a diverse and heterogeneous sample, however, this is not such an easy task. The psychometric characteristics of a scale should not be “cross-validated” on a narrow sample with restricted item or subscale variability. Although we have not obtained information on other samples as varied as the FES normative sample, my colleagues and I are using the FES in three ongoing projects. One project focuses on depressed patients at treatment intake and at 1- and 4-year followups. These patients and their spouses are compared with matched case / 201 controls and their spouses (for descriptions of the samples and measures, see 3, 4, 13, 14, and 17). The second project focuses on alcoholic patients and their spouses as compared to matched case controls and their spouses (21, 22). The third project involves a study of family-based psychosocial factors linked to the course and outcome of juvenile rheumatic disorders (9). Representative FES Subscale Internal Consistencies Roosa and Beals focus on five of the FES subscales, Table i shows the internal consistencies of these five FES subscales in these samples. The average alphas for cohesion and conflict are essentially identical to those reported in the FES Manual (23, 24). The average alphas for expressiveness, organization, and control are somewhat lower than those reported in the Tante 1, Family Environment Seale Subscale Internal Consistencies for Different Samples Subscales| ‘Expres Organi- Sample Cohesion _sivencss Conflict zation Control. FRI” ‘Depressed and contro! families (Time 1; N= 904) 19 63 78 70 58 84 Depressed and control families (Time 2; N= 791) 18 06 16 69 60 Depressed and control families (Time 3; N= 740) 16 62 5 m 63 83 Alcoholic and control famn- (N = 356) 16 60 2 nm - Bt Families of children with rheumatic disease : (Time 1; N — 886) a 8 4 60 58 83 Average alpha nm 2 1 68 60 83 Alpha reported in FES. ‘Manual 8 09 5 18 cu 3 ‘Note: Phe number of cases for each subscale varies somewhat due to missing data, * Family Relationships Index "The FES Control subscale was not used in this sample. ©The alpha for the Family Relationships Index is based on a representative community sample as originally reported by Holahan and Moos (sec 12). Fam. Proc., Vol. 29, June, 1990202 / Manual. Overall, there is little “shrinkage” in the alphas, as one would expect with seales whose development is conceptually based (1). Moreover, these samples are not as diverse as the original normative sample; specifically, the new samples include fewer individuals of low socioeconomic status, fewer minority individuals, fewer adolescents, and fewer currently distressed families. Given the reduced variability in these new samples, the FES subscale internal consistencies are quite acceptable.’* In addition, they are all higher than those Roosa and Beals obtained in their samples. Sample Characteristics and Internal Con- sistency Roosa and Beals point out that subscale internal consistencies may vary from sample to sample, but they donot consider why such variations occur or apply potential explanations to their samples. In general, the internal consistency of a subscale is higher in more diverse samples, that is, samples in which there is more item and subscale variance (28).' The mean and * RRoosa and Beals state that intemal consistency reliability may be inflated when more than one person per family is included in a sample. ‘This is not correct. iff each person's response were used twice ity analysis, independence would be violated but the alpha coefficient would remain the same, In any case, the FES subscale alphas for subsamples in which only one family member is included (for example, either depressed pationts or their spouses) are closely comparable to those shown, in Table 1 for the overall samples. 2 FHS eubseale means and standazd deviations for the samples shown in Table 1 are available on request. ‘As an aside, the internal consistencies of four of the ‘other five FES subscales in these samples are also ‘comparable but slightly lower than those reported in the FES Manual. The Independence subscale some- times shows relatively low internal consistency. The formula for Cronbach's alpha (7) i8 i FAMILY PROCESS: standard deviation of a subscale are a joint property of the items and response formats and of the specific sample in which they are applied. The same conclusion holds for internal consistency and test-retest reliability. Thus, to interpret reliability statistics on a sample, we need specific information about the item and subscale means and variances in that sample. Roosa and Beals may have obtained relatively low internal consistencies for the five FES subscales because of the restricted nature of their samples. They state that their combined sample is comparable to the initial FES normative sample, but this is not correct. Their sample is much less diverse; it is composed wholly of adults (we included a substantial number of adolescents), 86% women (we included many more men), 87% Caueasian (we included more ethnic minority respondents), and primarily well-educated and middle- to upper-middle-class individuals. Group Comparisons and the Pursuit of Internal Consistency Roosa and Beals state that a measure ‘cannot be used to make meaningful comparisons between samples when the measure has either low or different internal consistencies in those samples. This is not correct. Consider a researcher who wants to find a way to discriminate reliably between the families of depressed patients and those of normal case controls. After reviewing available measures, the researcher selects the FES and administers it to a large group of families. Assume that respon- where N= the number of items, XV; = the sum of the item variances, and V,, = the variance of the subseale. [As the formula shows, the primary components of Cronbach’s alpha are the item and subseale variances. ‘Thus, alphas necessarily vary in line with the characteristics of specific samples.: Moos dents in families with a depressed family member all obtain a score between zero and four on the FES Cohesion subscale, whereas respondents in the normal control families all obtain scores ranging between five and nine. Due to the restricted range of within- group scores, the internal consistency of the Cohesion subscale is likely to be quite low in both groups, but the subscale discriminates perfectly between the groups! Moreover, the alpha for the Cohesion subscale in the combined group is likely to be high, We ran an example on the Cohesion subscale to illustrate this point. We used the depressed patients and case controls at time 1 (N = 904; see Table 1) and compared subsamples of respondents who scored between zero and four and those who scored between five and nine. The low-cohesion group had a mean cohesion score of 243 (SD = 1,87); the subscale alpha was .10. The high-cohesion group had a mean cohesion score of 7.57 (SD = 1.27). ‘The alpha was .34, As shown in Table 1, the alpha is .79 for the overall group. Thus, researchers must be careful not to overlook validity (reliable group discrimination in this example) in an unre- alistic pursuit of reliability. Stability of the FES Subscales and Profile ‘The test-retest reliability and longer- term stability of an assessment procedure in different samples are also important psychometric characteristics. In this respect, the FES subscales have good 8-week test-retest reliabilities (varying from .73 to .86 for the five subscales we focus on here) and 4-month stabilities (varying from .66 to .78 for these five subscales). The 12- month subscale stabilities for averaged family subscale means vary from .63 for cohesion to .81 for organization, The overall FES profiles are also reasonably stable over these time intervals (for more details, seo 24). Fam. Proc., Vol. 29, June, 1990 f 203 We examined the subscale stabilities over 12month, 36-month, and 48-month intervals for the 676 individuals assessed at all three intervals in the sample of depressed and control families shown in Table 1. The 12-month subscale stabilities varied from 59 for cohesion to .67 for conflict and control (mean for the five subscales = .63). ‘The stabilities ranged from .47 to 58 over the 36-month interval (mean — .56) and from .45 to .54 over the 48-month interval (mean = .53). Thus, FES subscale scores may be quite stable over intervals of as long as 4 years. Just. as with internal consistency, how- ‘over, these stabilities are likely to vary in different groups of families, depending on whether families are in crisis situations, are experiencing changes in farnily composition or structure, or are in counseling or treatment. In this respect, the FES sub- seales reflect changes that occur in family environments over time. CONTENT VALIDITY OF THE FES As noted earlier, the FES items were initially assigned to potential dimensions on the basis of their item content and conceptual connection to specific family constructs. These decisions were supported empirically by selecting items that were more highly correlated with their FES subscale than with any other subscale, Roosa and Beals asked twelve psychology graduate students to assign 45 of the FES items to the correct five subscales on the basis of the very brief subscale descriptions given in the FES Manual (23). These descriptions provide an overall idea about each dimension; they are not intended to provide sufficient information for raters to accurately place items on specific dimensions. Moreover, Roosa and Beals in- structed raters to place items in a discard pile unless they were certain about the correct: dimension. Using this procedure, the majority (67%) of the panelists were certain (and correct} about placement of 24an 204 / of the 45 items. Thus, these raters did moderately well in placing the FES items on their correct dimensions even when they had little information about the dimension and were asked to make judgments at a high level of certainty. We have found that untrained raters can place most of the FES items on the correct dimension when they are given reasonably adequate information about the conceptual content of the dimension and are allowed to provide a “probable” judgment. To address this point, we asked nine raters to assign each of the 45 FES items to one of the five subscales if they were (a) reasonably certain or thought it was (b) probable that the item belonged to that subscale.‘ A total of 39 of the 45 items were categorized correctly by at least six of the nine (67%) raters; more specifically, 67% or more of the raters categorized 30 items as “cer- tainly” belonging on the correct dimension and 9 more items as “probably” belonging ‘on the correct dimension. Thus, Roosa and Beals’s relatively modest results probably were due to the paucity of information they provided and the high level of certainty they required. Our findings show that the FES items have good content and face validity. CONSTRUCT, CONCURRENT, AND PREDICTIVE VALIDITY OF THE FES. Space constraints permit only a brief ‘comment on the extensive evidence of the construct, concurrent, and predictive validity of the FES subscales. With respect to construct validity, for example, FES cohesion is positively related to measures of dyadic and marital adjustment as well as to reports of support from other family members. FES conflict is positively associated with family arguments, and FES organization and control are linked to reliance on predictable and regular family routines. “Copies of the subscale descriptions used in the rating task are available upon request. FAMILY PROCESS ‘The FES dimensions tend to be predict- ably related to external criteria in both concurrent and predictive studies. For example, aspects of the family environment, ‘as measured by the FES subscales, are associated with adaptation to pregnancy and parenthood, childhood and adolescent adjustment to parental divorce, adaptation to chronic childhood illness and other life stressors, children’s cognitive and social development, adjustment among families of psychiatric and medical patients, and the outcome of treatment for alcoholism, depression, and other psychiatric and medical disorders (for an integrated overview of more than 150 studies that focus on these validity issues, see 24, pp. 24-48). CONCEPTUAL AND PSYCHOMETRIC CRITERIA: SOME CHOICES AND TRADE-OFFS Roosa and Beals raise some important test construction and psychometric issues. Their conclusions about the FES are incorrect; a considerable body of evidence shows that the FES is reasonably internally consistent, stable, and valid when applied to moderately diverse samples. Neverthe- less, researchers can and should improve on the FES, as well as on other currently available family assessment procedures. ‘The problem is to specify the test construction criteria that are likely to maximize advances in the field. Tadvocate the use of conceptual considerations as the primary initial guidelines for test construction. | also believe that a focus on relatively broad constructs (such as cohesion and organization) is consistent with the current state of conceptualization in the field of family assessment. As knowl- edge about broad constructs accumulates, we may be able to accurately define and assess more specific constructs (aspects of cohesion and organization, for example) and to develop more internally reliable subscales. Of course, such scales may or may not be as stable and valid as scales that‘Moos assess broadband constructs. To examine some of the choices and trade-offs that may arise in developing an assessment procedure, I briefly describe three sets of decisions involved in constructing the FES. item Content and Internal Consistency Because we wanted to measure relatively broad family constructs (such as cohesion, expressiveness, and conflict), we selected items with some diversity in content. Thus, for example, if two items were highly intercorrelated, we dropped one in order to reduce item redundancy and broaden the content of the final item set. If we had selected less diverse items, we could have developed more internally consistent subscales. But we would have measured a narrower construct and probably sacrificed some generalizability and construct validity. Subscale Length and Internal Consis- tency ‘The length of a subscale also affects its internal consistency. We wanted to develop a family-based screening procedure that could provide a quick “snapshot” of some of the major dimensions that differentiate families. In prior work we found that such a screening procedure needs to be limited to between 80 and 100 items. Given ten dimensions, we thought each dimension should have a maximum of nine or ten items. However, when the breadth of item content (or, more specifically, the average item intercorrelations) is held constant, longer subscales are likely to be more internally consistent. ‘To illustrate the effect of subscale length (number of items), we obtained internal consistencies for the Family Relationships Index (FRI), which assesses the overall quality of family relationships. ‘The FRI is 2.27-item subscale that is a combination of the three FES Relationship dimensions: cohesion, expressiveness, and conflict, Fam. Proc., Vol. 29, June, 1990 / 205 which is reverse scored (11, 12}. As shown in Table 1, the alpha for the 27-item FRI is higher than the alphas for the separate 9-item Cohesion, Expressiveness, and Con- flict subscales. Response Format and Internal Consis- tency A third decision that affects internal consistency is whether to use dichotomous (true/false) or 3- or 4-point response options. We opted for a dichotomous response format because of its simplicity and tomake the FES more applicable in clinical situations for cognitively impaired individuals (for the rationale, see 20). However, a multipoint response format is likely to result in a more internally consistent subscale. In this respect, Roosa and Beals cite Bloom (5), who used a 4-point response format, as providing data on the internal consistency of the FES subscales, but they fail to emphasize that he obtained alphas varying from .65 to .85 for the 9-item version of the five FES subscales in question here, and alphas varying from .64 to .84 for 5-item versions of these subscales. THE PROMISE AND PERILS OF FACTOR ANALYSIS Roosa and Beals conducted two confirmatory factor analyses (CFA) of their FES data. One analysis used 45 items while the other focused on the 24 items their raters placed correctly on one of the five subscales. The five-subseale model did not quite fit the data adequately in the analysis on all 45 items, although the Goodness of Fit Index (GFI) indicates that the model accounted for a considerable proportion of the variation in the data. The model did fit the data quite well in the analysis on the reduced set of 24 items. Diagnostic Indicators and Improving Fac- tor Models. More important, the five-factor model may fit the 45-item data set quite well when206 / modifications are made to the model other than by dropping items or reassigning items to different dimensions. In this respect, the computer output. generated by CFA provides diagnostic indicators (modi- fication indices) that suggest how the fit between the model and the data could be improved. Roosa and Beals might have been able to improve the fit by changing their assumptions about fixed parameters, adjusting the allowance for correlated dimensions and error terms, and so on. Decisions about the adequacy of factor models also may vary depending on the specific form of the input data (raw data, correlation matrix, covariance matrices), the size of the sample, the choice of which of more than 25 alternative, goodness-of-fit indicators should be used, and what value of an index constitutes an acceptable fit (2, 6, 16). A related point involves the question of whether the LISREL model is appropriate for dichotomous data and whether the resulting standard errors and GFls can be judged against standards developed under quite different distributional assumptions (15). Factorial Fidelity and Conceptual Breadth ‘The fundamental approach typically used in factor analysis presents a more serious problem. Researchers typically use factor- analyti¢ procedures to try to create relatively circumscribed and highly internally consistent dimensions. Our intent, however, was to create conceptually broad subscales composed of a diverse set of items. A broadband measure is likely to have more stability over time and greater validity, although it may be somewhat less internally consistent (for a discussion of the bandwidth versus fidelity issue, see 8). A related point is that factor-analytic solutions depend on conceptual considerations, aspects of the specific sample (such as its diversity), and a range of decisions about statistical procedures, factoring criteria, goodness-of-fit indices, and so on.Thus, PAMILY PROCESS. for example, investigators who have examined the subscale factor structure of the FES in different samples have obtained widely varying solutions ranging from two through six factors (for the references, see 24, p. 22). Three item-level factor analyses have each identified eight factors in the FES, but the specific item composition of the eight factors has varied (10, 25, 26). It is reasonable to try to identify a consistent set of constructs to characterize family social environments, but there are serious problems in looking for “the” factor struc- ‘ture of an assessment procedure. Like the mean, standard deviation, and internal consistency of specific subscales, the factor structure of an assessment procedure depends largely on the sample used in the analysis, In general, more factors are likely to emerge in more heterogeneous samples. PROMOTING ADVANCES IN FAMILY ASSESSMENT Roosa and Beals recommend that family journals require researchers to report the internal consistency of the assessment procedures they use in their specific samples. Although I agree with this recommen- dation, there is danger in a narrow focus on specific psychometric criteria. To interpret the meaning of an internal consistency statistic in a particular sample, researchers need to report more specific information about that sample, such as the context in which the sample was recruited, family composition and demographic characteris- ties, and the subseale means and standard deviations. More important, researchers and journal editors need to understand that any assessment procedure may have relatively low internal consistency and test-retest reliability, and a uni- or bidimensional factor structure, when it is applied in homoge- neous samples. As we saw earlier, a measure that discriminates reliably between two subgroups may show low within-group variation and internal consistency reliabil-‘Moos ity in each of these subgroups. Thus, we need to temper our emphasis on reliability in light of related information on the validity of an assessment procedure in different samples and the context in which the procedure will be used. From a broader perspective, we should find better ways in which new and promis- ing assessment procedures can develop and evolve over time. Family assessment procedures, such as the FES, need to be adapted and updated to keep abreast of changing times, family compositions, and cultural and value contexts. A lively debate on how ‘to achieve this aim can contribute to advances in the field of family assessment and, ultimately, to the effective planning and evaluation of prevention and interven- tion programs, REFERENCES 1. Anastasi, A. Evolving concepts of test validation, In M.R, Rosenzweig & L.W. Porter (eds.), Annual review of psychology (Vol. 37). Palo Alto CA: Annual Reviews, Inc., 1986. 2, Bentler, P., & Bonet, D.G. Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bul- letin 88: 588-606, 1980. 3. Billings, A., Cronkite, R., & Moos, R. Social environmental factors in unipolar depression: Comparisons of depressed pationts and nondepressed controls, Journal of Abnormal Psychology 92: 119-133, 1983. , & Moos, R. Psychosocial processes of remission in unipolar depression: Com- paring depressed patients with matched community controls. Journal of Consult- ing and Clinical Psychology 53: 314-825, 1985. 5. Bloom, B.L. A factor analysis of self-report measures of family functioning. Family Process 24: 225-239, 1985. 6. Breckenridge, J.N. Structural equation models for depression prevention research. In RF, Munoz (ed.), Depression prevention: Research directions. New York: Hemi- sphere Publishing, 1987. Fam. Proc., Vol. 29, June, 1990 10. 1. 12, 13. 14. 15. 16. I. 18. 19. / 207 . Cronbach, L. Coefficient alpha and the internal structure of tests. Psychometrika 16; 297-934, 1951. » Essentials of psychological testing (6th ed.}. New York: Harper & Row, 1990. . Daniels, D., Moos, R., Billings, A., & Miller, J. Psychosocial risk and resistance factors among children with chronic illness, healthy siblings, and healthy controls, Journal of Abnormal Child Psychology 16: 295-808, 1987. Garfinkle, A.8, Genetic and environmental influences on the development of Piage- tian logico-mathomaticsl concepts and other specific cognitive abilities: A twin study. Acta Geneticae Medicae Gemello- logiae 31: 10-61, 1982, Holahan, C.J., & Moos, R. Social support and adjustment: Predictive benefits of social climate indices. American Journal ‘of Community Psychology 10: 403-415, 1982. » & Moos, R. The quality of social support: Measures of family and work relationships. British Journal of Clinical Psychology 22: 187-162, 1988. & Moos, R. ‘The personal and contextual determinants of coping strate- gies. Journal of Personality and Social Psychology 52: 946-955, 1987. , & Moos, R. Risk, resistance, and psychological distress: A longitudinal analysis with adults and children. Journal of Abnormal Psychology 96:3-18, 1987, Joreskog, K.G., & Sorbom, D. LISREL VII: A guide to the program and applications. Chicago: SPSS Inc., 1988. Marsh, H.W., Balla, J.R., & McDonald, RP. Goodness-of-fit. indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin 103:391~410, 1988, Mitchell, R., Cronkite, R., & Moos, R. Stress, coping, and depression among married couples. Journal of Abnormal Psychology 92: 433-448, 1983. ‘Moos, R. Family Environment Seale prelim- inary manual. Palo Alto CA: Consulting Peychologists Press, 1974. . Evaluating correctional and community settings. New York: John Wiley & Sons, 1975.‘The Social Climate Seales: A user's guide, Palo Alto CA: Consulting Psychol- ‘ogists Press, 1987. , Finney, J., & Chan, D. The process of recovery from alcoholism: 1. Comparing alcoholic patients and matched community controls. Journal of Studies on ‘Alcohol 42: 383-402, 1981. 22,___, Fimey, J., & Gamble, W. The process of recovery from alcoholism: TL Comparing spouses of alecholic patients and spouses of matched community controls, Journal of Studies on Aleohol 43: 1888-909, 1982. 21. 23. , & Moos, B, Family Environment ‘Scale manual, Palo Alto CA: Consulting Paychologists Press, 1981. 2. , & Moos, B, Family Environment ‘Scale manual: Second edition. Palo Alto CA: Consulting Psychologists Press, 1986. FAMILY PROCESS 25. Oliver, J.M., Handal, P.J., & Enos, D.M., & May, M.J. Factor strueture of the Family Environment Seale: Factors based on items and subscales. Educational and Paychological Measurement 48: 469-471, 1988, 26, Robertson, D., & Hyde, J. The factorial validity of the Family Environment Seale. Educational and Psychological Measure- ‘ment 42; 1283-1241, 1982. 27. Roosa, M.W., & Beals, J. Measurement issues in family assessment: The case of the Family Environment Seale, Family Process 29: 191-198, 1990. 28, Zeller, R.A., & Carmines, E.G. Measure- ‘ment in the social sciences: The link between theory and data, New York: ‘Cambridge University Press, 1980. Manuscript received and accepted June 6, 1989. Seen A "superb*," "exciting**" new book co-authored by SALVADOR MINUCHIN “Merges the almost poetic anti-psychiatric tradition of R. D. Laing with the pragmatic force of Minuchin’s previous work to make a wise and moving contribution to the politics of mental health and the family.” : ~Carlos E. Sluzki, M.D,, editor-in chief, Family Process ‘*Uses four complex cases to show the inter- |Institutionalzing action of families, patients, and institutions. ‘includes in-depth sessions, extensive Die interviews, and five-year follow-ups. Provides step-by-step guidelines for treating troubled children and their families, ‘Goes beyond the family to address larger systems, such as the school, the hospital, the community. {$24.05 at bookstores or direct from the publisher Twa PHOTOES 49. sara St, NY 10022. Tollfree with cect card 800-628-9080ROOSA AND BEALS / 209 A FINAL COMMENT ON THE CASE OF THE FAMILY ENVIRONMENT SCALE e Mark W. Roosa, Ph. Janette Beals, Ph: Aimerse disagreeing with some specific arguments put forward by Moos (3), we agree with his general argument for achieving a balance in conceptual and empirical issues in the development of measures of family characteristics (or individual or group characteristics, for that matter). Furthermore, we applaud the me- ticulous and painstaking steps taken by Moos (2) in the development of the Family Environment Scale (FES). As we acknowl- edged earlier (6), more than the average amount of care and effort seems to have been used in the development of this instrument. However, one major question remains unanswered after reviewing Moos’s (3) comments: What does it mean conceptually when the reliability on a subscale of a popular and supposedly psychometrically sound instrument varies from .10 to .79 across samples and subsamples? A related question is: How does one interpret a correlation (or path) coefficient when one of the variables had an internal consistency coefficient of .10? According to the standard interpretation of reliability coefficients, an alpha of .10 indicates that responses to the individual items were basi- cally haphazard or random, and that responses to an individual item had little or xno relationship to responses to other items on the same subscale. In short, reliabilities of this magnitude indicate that the instrument did a poor job of measuring the construct of interest. As a result, correla- } Department of Family Resources and Human Development, Program for Prevention Research, Arizona State University, Tempe AZ 85287-2502, National Center for American Indian and Alaska Native Mental Health Research, University of Colo- rado Health Sciences Center, Denver CO. Fam. Proc., Vol. 29, June, 1996 tion coefficients based, in part, on scores from scales with coefficients such as .10 (see 3) or .34 (see 6) will be largely determined by error variance, not true variance (7). ‘The resulting correlation coefficients would be lower than the true correlation between the constructs of interest. Similarly, in a multiple regression model using able measured with high reliabi another with low reliability, the impor- tance of the first variable will be exagger- ated. On a separate issue, Moos (3) presents internal consistencies from three ongoing projects in his Table 1. Once again, however, it seems that each sample is com- prised of multiple family members. As we pointed out in the case of the validation study (6), this leads to probable noninde- pendence of observations; in other words, ‘one would expect higher internal consistencies when a study has over 1,000 people rating 285 family environments. Rather than addressing the above criti- isms, Moos (3) implies that the low reliabilities reported by us (6) were due to lack of diversity in the sample. In support of this, he notes that the sample was composed primarily of Caucasian women, and that the largest proportion were middle- to upper-middle-class. How much diversity must asample have and on which characteristics to obtain suificiont reliabilities on the FES? Why did this same sample generate quite acceptable reliability coefficients on a wide variety of other measures? Our sample included participants from intact married families, divorced families, families that had experienced the death of a parent, and alcoholic families. One would think that variation on marital status would produce4 a ARE enn een NN pipet nena 210 / variability in responses to questions about, cohesiveness and control within the family» Similarly, the differences between alcoholic families and non-alcoholic families have been reported to be relatively dra- matic in studies that have used the FES (4) and other measures (1). Furthermore, if the seliability of the FES depends solely on variation in age, gender, ethnicity, and social class, we might ask if it is measuring family characteristics or individual characteristics. ‘Moos (3) presents a case in which the teliability of the cohesion subscale is extremely low when a sample is split between those scoring below 4 on the Cohesion subscale and those scoring above 5. The internal consistencies of the subscales for the two groups dropped from .79 overall to 10 and .34, respectively. He uses this example to argue that a scale can have low internal consistency and still perfectly discriminate between groups. Yet it is unclear what groups he is referring to: those scoring high/low on the cohesion subscale (a tautol- ogy), or whether the depressed and control groups were actually discriminated by this split. More importantly, this example well illus- trates the problems that we raised in our article (6). The internal consistency reliabilities for the FES subscales vary widely across samples and methods (multiple family members versus one respondent per family)—see 2, 8, 6. Moos (3) argues that the emphasis on internal consistency is probably misplaced because the FES was designed to measure broadband constructs with relatively few items. Conventionally, however, internal consistency is considered to be very ismportant, for example, a classic text (5) on psychometries states: “If it [coefficient alpha] proves to be very low, cither the test is too short or the items have very little in common (p. 230)."" In other words, two people with the identical score FAMILY PROCESS on an FES subscale may well have en- dorsed completely different items, items that have little empirical relation to one another, which makes interpretation of their scores difficult. Even if all subscale items had demonstrable face validity, lack of internal consistency hampers the use of these subscales in multivariate analyses. ‘Nunnally also states: “In prediction problems, the reliability of a predictor places a limit on its ability to forecast a criterion” (p. 238). Correlations between FES subscales and other measures will be attenu- ated in proportion to their respective reliabilities. Researchers and clinicians who are seek- ing measures of the constructs contained in the Family Environment Scale are left with a dilemma. Although Moos (2) provides impressive documentation of the strengths of the FES, both he and we provide evidence that the instrument's performance, as judged by reliability coofficients, can be extremely low for some groups for largely unknown reasons. A few years ago we chose to use the FES for a large-scale study of highly stressed families based on the documentation of the FES characteristics (see 2) and its common and apparently success- ful use in a wide varicty of studies. Only after the expense of collecting data from 385 families and examining the reliability coefficients for the five subscales we used did we become concerned, Because of the error variance that is present in scores of seales with low reliability coefficients, we have made the decision not to conduct analyses to test hypotheses based on the FES subscales: Interpretation of any results would be tentative and risky. Bthi- cally, we feel itis safer to make the decision to abandon these data rather than publish potentially misleading results. Given the low reliability of the five FES subscales that we reported (6), how could we manage to publish the results? Easily.ROOSA AND BEALS As we reported earlier, only a small portion of those who have used the FES have reported the reliabilities on the FES subscales generated by their samples. Did some of these studies unknowingly have very low reliabilities on the FES? If so, are their results interpretable? It is for this reason that we recommend that journal editors and reviewers begin requiring authors to report the reliability for the scales they used for their sample. We would like to go a step further and suggest that journals provide space for brief (one- or two-page) research notes so that researchers and clinicians can report instances in which their reliabilities were considerably below those generally reported, and note the circumstances of this discovery. Under current conditions, researchers in such circumstances have the option of not mentioning their reliability coefficients in an article or of not publishing. In either case, valuable information is lost to the community of clinicians and scholars. Undoubtedly, some reports criti- cizing a measure will be aberrations or the result of poor procedures. However, those interested in family measures would ben- fit greatly from the chance to accumulate evidence regarding the performance of our most commonly used measures. Fam. Proc., Vol. 29, June, 1990 fu REFERENCES: 1. Callan, V.s., & Jackson, D. Children of alcoholic fathers and recovered alcoholic fathers: Personal and family functioning. Journal of Studies on Aleohol 47: 180-182, 1986. 2. Moos, RH. Family Environment Scale pre- liminary manual. Palo Alto CA: Consult- ing Psychologists Press, 1974. - Conceptual and empirical approaches to developing family-based assessment procedures. Resolving the case of the Family Environmental Seale. Family Process 29: 199-208, 1990. 4. Moos, R.H., & Billings, A.G. Children of aleoholics during the recovery process: Aleoholie and matched control families, Addictive Behavior 7: 155-163, 1982, 5. Nunnally, J.C, Psychometric theory, New York: McGraw-Hill, 1978, 6. Roosa, M.W., & Beals, J. Measurement issues in family assessment: The case of the Family Environment Seale. Family Pro- cess 29: 191-198, 1990. 7. West, S.G., & Finch, J.F, Measurement, analysis, and design issues in personality psychology: An introduction, In $, Briggs, R, Hogan, & W. Jones (eds.), Handbook of personality psychology. New York: Aca- demic Press, in press. Manuscript received and accepted October 18, 1989,

Conceptual and Empirical Approaches To Developing Fammily-Based Assessment Procedures: Resolving The Case of Family Environment Scale

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conceptual and Empirical Approaches To Developing Fammily-Based Assessment Procedures: Resolving The Case of Family Environment Scale

Uploaded by

Copyright:

Available Formats

You might also like