Factor Analysis

Marketing Letters 11:3 (2000): 261275 # 2000 Kluwer Academic Publishers, Manufactured in The Netherlands
A Meta-Analysis of Variance Accounted for and Factor Loadings in Exploratory Factor Analysis
ROBERT A. PETERSON Department of Marketing Administration University of Texas, Austin, Texas 78712, Email: rap@mail.utexas.edu
Abstract
A meta-analysis of two factor analysis outcome measures, the percentage of variance accounted for and the average (absolute) factor loading, in 803 substantive factor analyses was undertaken. The average percentage of variance accounted for was 56.6%, and the average (absolute) factor loading was 0.32. Number of variables factor analyzed, nature of the sample from which data were collected, sample size, number of factors extracted, and (minimal) number of scale categories employed inuenced the percentage of variance accounted for in a factor analysis. Number of factors extracted, analytical approach, and number of variables analyzed inuenced the average factor loading obtained in a factor analysis. Factor analysis of synthetic (random) data possessing the general structure as the observed data in the meta-analysis accounted for 50.2% of the variance in the data and produced an average factor loading of 0.21. The latter gures imply that many factor analyses have produced outcome measures of questionable meaningfulness. Key words: Factor analysis, factor loading, meta-analysis, variance accounted for
1. Introduction Since its introduction to the scientic community nearly a century ago by Charles Spearman (1904) in the context of his search for a general intelligence factor, factor analysis has become one of the most widely used multivariate statistical techniques in behavioral research (Fabrigar, Wegener, MacCallum, and Strahan 1999). Indeed, perusal of the behavioral literature reveals that factor analysis has become an indispensable part of a researcher's statistical repertoire, second in application only to regression analysis among multivariate techniques. Recent illustrative applications of factor analysis in marketing include those of Aaker (1997), Brown, Cron, and Slocum (1997), Frazier and Lassar (1996), Hulland and Vandenbosch (1996), Ittner and Larcker (1997), Luce, Payne, and Bettman (1999), Lynn, Simpson, and Souder (1997), Palan, Areni, and Kiecker (1999), and Rose (1999). Because of its versatility and robustness, factor analysis has been applied in countless research endeavors and has been the subject of literally hundreds of methodological inquiries. A search of commonly used literature data bases revealed that more than 3,000
262
R. A. PETERSON
articles have been published in the last ve years alone with the term ``factor analysis'' in the title or abstract or used as a keyword. The ubiquity of factor analysis is indisputable. The general objective of factor analysis is data reductionexpressing a large number of (observed) variables or indicators by means of a smaller set of linear composites known variously as components, variates, underlying or latent dimensions, or, most commonly, factors. This objective is said to be accomplished when a set of factors, smaller in number than the number of variables or indicators, has been extracted that conveys all, or at least what is deemed to be an acceptable amount of, the information (variance) contained in the variables or indicators. Simply stated, the objective of factor analysis is analytical parsimony tempered with interpretative plausibility. Conceptually, factor analysis consists of a family of multivariate statistical techniques linked together through a series of analytical decisions. In recent years a distinction has been made between conrmatory factor analysis and exploratory factor analysis. Conrmatory factor analysis can be considered a constrained version of factor analysis in which prespecied hypotheses are tested. Exploratory factor analysis is unrestricted factor analysis in which relationships are described or hypotheses generated. Exploratory factor analysis (hereafter referred to simply as factor analysis for expository ease) tends to be used more frequently than conrmatory factor analysis; it is the focus of this paper. When undertaking a factor analysis, a researcher must make several interrelated methodological decisions. One decision is determining the appropriate variable variance to factor analyze. Should the diagonal in the correlation matrix (i.e., the trace) being analyzed consist of unities or common variance (communality) estimates? (If the diagonal consists of unities, the analytical approach is termed principal component analysis. If the diagonal consists of communality estimates, the analytical approach is termed common factor analysis.) Another decision is choosing the factoring method to be used: maximum likelihood, least squares, or some other method. The third decision a researcher must make is determining the number of factors to extract and retain. The fourth relates to factor rotation: Should factors be rotated and, if so, should the rotation be orthogonal (e.g., varimax, equimax) or oblique (e.g., oblimax, oblimin)? These decisions have been widely discussed and debated in the literature (see, for example, Nunnally and Bernstein (1994) and the January 1990 issue of Multivariate Behavioral Research for overviews and discussions of selected issues). Oft-cited articles that have addressed possible analytical or interpretational implications of the decisions include Comrey (1978), Ford, MacCallum, and Tait (1986), Stewart (1981), and Tinsley and Tinsley (1987). In general, most discussions of factor analysis decisions and their implications are based on critical reviews of previously conducted factor analyses, intuition, `èxperience,'' selective (i.e., limited) factor analyses of substantive data, reanalyses of prior factor analyses, or factor analyses of simulated data. For example, Fabrigar et al. (1999) evaluated the ``major design and analytical decisions that must be made when conducting a factor analysis'' (p. 272). Their evaluation consisted of a qualitative review of past research, a reanalysis of three existing data sets, and a description of current factor analysis practices as reported in two prominent behavioral journals. Interestingly enough, although very extensive, the extant methodological literature on factor analysis has yet to empirically examine the effect of research design characteristics
EXPLORATORY FACTOR ANALYSIS
263
or analytical decisions on the outcomes of a factor analysis. In particular, there is a lacuna in the literature as to whether and how research design characteristics and analytical decisions affect or inuence (1) the percentage of variance accounted for and (2) the magnitude of the factor loadings obtained in a factor analysis. Intuitively it would seem that knowledge of the percentage of variance typically accounted for in a factor analysis and the typical factor loadings obtained could prove insightful when interpreting or evaluating the results of that factor analysis. Even though these output measures depend in part on the purpose of the analysis as well as on the nature of the variables and entities being studied, information about their typical values in a variety of research conditions should facilitate their interpretation and evaluation. Such information may not only complement that typically provided by tests of statistical signicance, it may actually be superior. This is because the variables selected for a factor analysis have been chosen a priori by a researcher because of a belief (due to theory, prior research, or even a hunch) that they are related. Therefore, although it is technically possible to statistically test whether the percentage of variance accounted for by a factor analysis is signicantly different from zero or whether a given factor loading is signicantly different from zero (e.g., Cudeck and O'Dell 1994), such tests do not possess much substantive meaning. The percentage of variance accounted for or a factor loading believed to be salient will nearly always be different from zero because of the manner in which the variables are selected and analyzed and the relatively large samples typically employed. Consider the issue of placing a particular variance accounted for percentage in perspective. Recently Brown, Cron, and Slocum (1997) reasonably concluded that 92% of the variance accounted for in a factor analysis they conducted was ``very high.'' What percentage of variance accounted for in a factor analysis should be considered ``high'' or ``low'' or even `àcceptable'' or `ùnacceptable''? Stated somewhat differently, what percentage of variance accounted for in a factor analysis should be `èxpected,'' given specic research design conditions and analytical decisions? Tinsley and Tinsley (1987, p. 421) reported that often ``less than 50% of the total variance is explained by a factor solution.'' They also noted (p. 420) that `àn analysis in which the factors explain only 30 to 40% of the estimated common variance obviously leaves an alarming [emphasis added] amount of common variance unexplained.'' Unfortunately, they did not indicate what percentage might be `àcceptable.'' Nunnally and Bernstein (1994, pp. 45051) stated that ``the goal [of a factor analysis] is to explain the most variance (or related property) with the smallest number of factors. For example, ve factors might explain 80% of the variance among 20 tests. This suggests that these factors described the relations among the initial 20 variables well'' [emphasis added]. Thus, according to Nunnally and Bernstein, in this situation the amount of variance accounted for (80%) would be `àcceptable.'' Merenda (1997, p. 158) simply stated that, as a rule of thumb, ``for the number of `real' factors and components, the proportion [of variance accounted for] should be at least 0.50.'' Unfortunately, none of the cited sources provided any generalizations regarding acceptability thresholds, or even a rationale for their conclusions. Similarly, there is no consensus as to what constitutes a ``high'' or ``low'' factor loading, both in general and in specic research conditions. Apart from calculating whether a factor
264
R. A. PETERSON
loading is signicantly different from zero, researchers appear to use a heuristic based on expertise or intuition when interpreting or evaluating the signicance of a factor loading. According to Merenda (1997, p. 160), ``[I]t seems from the general literature in the social and behavioral sciences that [a threshold factor loading of] 0.30'' is the minimum that is traditionally used when deciding to `àccept an item or variable as belonging to a factor or component.'' As Hair, Anderson, Tatham and Black (1998, p. 111) have noted, ``factor loadings greater than 0.30 are considered to meet the minimal level; loadings of 0.40 are considered more important; and if the loadings are 0.50 or greater, they are considered practically signicant.'' Consequently, different researchers apply different ``cutoff values'' when determining whether a given factor loading is ``salient.'' Given the lack of empirical evidence regarding the general magnitudes of outcome measures derived from substantive factor analyses, the present study had two parallel objectives. The rst objective was to empirically ascertain and document the percentage of variance accounted for in a wide variety of factor analyses and, relatedly, determine whether systematic relationships exist between the percentage of variance accounted for in a factor analysis and selected research design characteristics and analytical decisions. The second objective differed from the rst only in that the dependent variable of interest was the average factor loading. Thus, the second study objective was to empirically ascertain and document the average factor loading obtained in a factor analysis and determine whether systematic relationships exist between it and selected research design characteristics and analytical decisions. (A related objective was to document the threshold or cutoff values used when interpreting and evaluating factor loadings in factor analyses.) A meta-analysis of a large representative sample of substantive factor analyses was undertaken to accomplish the study objectives. In addition, to provide a benchmark against which to evaluate the results of the meta-analysis, a Monte Carlo-type simulation was also undertaken. Factor analyses were conducted on sets of synthetic (random) data that were congured to mirror the general, collective structure of the data sets investigated in the meta-analysis. The outcome measures produced by the simulation study constitute `èxpected'' outcome measures against which the typical percentage of variance accounted for and average factor loading found in the meta-analysis can be compared. Accomplishing the study objectives should result in comparison standards for interpreting and evaluating the two outcome measures, as well as yield insights into research design characteristics and analytical decisions that moderate or inuence the outcome measures in substantive factor analyses. 2. Method To obtain the requisite data for the meta-analysis, selected academic journals that marketing and consumer behavior researchers publish in and use as references were searched for reported factor analyses. These journals primarily included the Journal of Applied Psychology, Journal of the Academy of Marketing Science, Journal of Marketing, Journal of Marketing Research, Journal of Retailing, and Marketing Letters. In addition, several other journals (e.g., Journal of Personality and Social Psychology, Personality and
265
Individual Differences) and proceedings (e.g., AMA and ACR) were selectively searched, as was a sample of manuscripts submitted to, but rejected for, publication in numerous journals or not accepted for conference presentation. Every article published in the sampled journals and proceedings from 1964 to 1999, as well as every available rejected manuscript, was thoroughly searched for instances of factor analyses. Overall, nearly 19,000 articles and proceedings papers were searched for factor analyses. The time period searched was deemed to be sufciently long to provide a relatively large, representative sample of published and unpublished factor analyses yet sufciently recent to provide relevant data (1964 was chosen as the initial year because, beginning in 1964, factor analyses could be produced using computers with standard, offthe-shelf software packages). Two research assistants and the author were involved in data collection. Because the data collection task was straightforward (minimal judgment was required), few data collection checks were performed. Checks that were performed indicated there was little disagreement among the three individuals with respect to data collection decisions. Data were obtained for 803 different factor analyses reported in 568 articles, proceeding papers, or unpublished manuscripts. Only factor analyses meeting four criteria were eligible for inclusion in the meta-analysis. First, to be included in the meta-analysis, a factor analysis had to have been conducted in the context of a substantive investigation. Methodologically oriented factor analyses (e.g., factor analyses employing articial data) were excluded from the study. Second, data in a factor analysis must have been obtained from, and pertain to, people, as opposed to companies or other entities. Third, data had to be psychological or behavioral in nature (i.e., attitudes, personality, product usage) and had to be measured using rating scales. Finally, the discussion of the factor analysis had to contain sufcient information to permit coding or calculating the variables of interest. Factor analyses not based on correlation coefcients, or in which only one factor was extracted or in which 100% of the variance in the variables studied was reported as being extracted, were treated as outliers and excluded from the meta-analysis. Four dependent variables were coded or calculated for the factor analyses examined. These were the total percentage of variance accounted for by the factor analysis, the percentage of variance accounted for by the rst factor, both with respect to the extractable variance and relative to the total variance accounted for by the factor analysis, and the average (absolute) factor loading. A nal, `ìnformational'' dependent variable investigated was the threshold or cutoff value used when determining the signicance or saliency of factor loadings. In addition, the following were coded and used as independent variables in the investigation: year of article publicationayear of manuscript rejection publication outletasubmission outlet number of variables in the factor analysis number of observations (study participants) in the factor analysis number of factors extracted whether the analysis was a principal component or common factor analysis
266
R. A. PETERSON
nature of the study participants (whether they were students or nonstudents) whether factors were rotated and, if so, how (either varimax or another rotation method) minimal number of categories in any of the scaled variables employed in the factor analysis. An attempt was made to classify and code the variables contained in the factor analyses, but because of reporting ambiguities, inconsistencies, and a general lack of information, it was not possible to do so reliably. Similar to the typical meta-analysis, this investigation was limited to studying those variables commonly reported in the factor analyses of interest. Seventy-eight percent of the factor analyses examined were obtained from journal articles, 10% came from proceedings papers, and 12% were obtained from unpublished manuscripts. Comparison of the average percentage of variance accounted for in each of the sources revealed that the mean percentage of variance accounted for in unpublished manuscripts was higher than that in journal articles and proceedings papers. There was no difference in the average factor loading across sources. In general, these ndings suggest that no signicant exclusionary publication bias (Rosenthal 1979) existed in the data. Furthermore, analyses (see, for example, Kayande and Bhargava 1994) of the percentage of variance accounted for and the average factor loading over time revealed no systematic trend (either in general or across data sources). Finally, there was no signicant difference between marketing-related journal articles or proceedings papers and psychology journal articles in terms of the percentage of variance accounted for or the average factor loading. As a consequence of these preliminary ndings, all data were combined into a single database for analysis purposes. 2.1. Data description Principal component factor analysis with varimax rotation was the most frequently reported variant of factor analysis applied across the analyses examined. A majority of the factor analyses investigated, 67%, used a principal component approach. Ninety-two percent of the factor analyses examined rotated the initial factor solution and, of these, 82% used a varimax rotation strategy. More specically, 56% of the factor analyses examined consisted of a principal component analysis with varimax rotation. This utilization gure generally corroborates the conclusion of Ford, MacCallum, and Tait (1986) that principal component factor analysis with varimax rotation is typically the factor analysis approach of choice. Nonstudents were used more often than students were in the factor analyses examined (63% of the factor analyses were conducted on nonstudent samples). The median number of variables analyzed in a factor analysis was 18. The most common number of rating scale categories used was ve. (Only 61% of the factor analyses reported the number of scale categories used. Of those so reporting, 43% used 5-category scales. The second-most frequently used number of scale categories was seven, with 29% of those factor analyses reporting the number of scale categories using a 7-category rating scale.)
267
The average sample size employed in a factor analysis was 398 individuals (the median was 207 individuals), and the average ratio of study participants to variables analyzed was 14 to 1. (See Guadagnoli and Velicer (1988) and MacCallum et al. (1999), among others, for discussions of the ``proper'' sample size or ratio of study participants to variables analyzed). The number of factors extracted ranged from 230. The modal number of factors extracted was three (26% of the factor analyses reported extracting this number); 20% of the factor analyses reported extracting two factors, 31% extracted four or ve factors, and 23% extracted six or more factors. The average (absolute) factor loading for a factor analysis was computed only for the 402 factor analyses reporting complete sets of loadings. Even so, nearly 27,000 individual factor loadings were used in computing the averages. Because factor loadings can be positive or negative, the averages computed and analyzed were the averages of the absolute values of the loadings reported. Comparison of the research design characteristics of studies reporting complete sets of factor loadings and those not reporting complete sets showed that the two differed only with respect to the number of variables analyzed and the number of factors extracted. As might be expected, studies reporting complete sets of factor loadings tended to incorporate fewer variables (17 vs. 39 on average) and extract fewer factors (3.6 vs. 5.5 on average) than studies not reporting complete sets of factor loadings. Visual inspection of a sample of studies not reporting complete sets of factor loadings revealed that they usually reported only those loadings that were signicant or salient. Hence, based on the ndings and the inspection of studies, one could speculate that the reporting of factor loadings might be at least partially a function of publication space constraints. Because of the absence of substantive differences in the design characteristics respectively associated with the two types of analyses (e.g., reported factor loading cutoff values and the percentage of factor analyses employing principal component analysis were virtually identical for factor analyses reporting or not reporting complete sets of factor loadings), ndings relating to factor loadings are deemed generalizable to all of the factor analyses examined. Finally, factor loading threshold or cutoff values were reported in 401 of the factor analyses examined. Because the relationship between reporting threshold values and complete sets of factor loadings was not isomorphic, meaningfully analyzing the relationship between threshold value and average factor loading was not possible. 2.2. Synthetic data A Monte Carlo-type simulation was employed to create synthetic data that were factor analyzed to provide insights into the observed percentages of variance accounted for and average factor loadings in the meta-analysis. Eight cells were congured to represent the general structure of the data in the substantive factor analyses examined. These cells respectively reected combinations of number of variables (15 or 30), sample size (100 or 300), and number of rating scale categories (5 or 7) drawn from the distributional characteristics of the factor analyses examined. Two hundred data sets were randomly generated for each of the cells and subjected to a principal component analysis and a
268
R. A. PETERSON
common factor analysis. Nonconverging solutions were treated as accounting for no variance, and average factor loadings were only computed for converging solutions. Weighted output measures (i.e., percentage of variance accounted for and average factor loading) were calculated and summarized for comparison with their observed counterparts from the meta-analysis. (Weighting was required to ensure that the simulation outcome measures were comparable to the average outcome measures derived from the metaanalysis.) Because of the nature of the data and one of the objectives of the investigation (to provide standards or benchmarks for evaluating variance accounted for and typical factor loading), the primary data analysis technique employed was analysis of variance (ANOVA). This technique was supplemented with elementary frequency distributions and correlation and regression analyses. 3. Results 3.1. Percentage of variance accounted for On average, the factor analyses examined accounted for 56.6% of the variance in the respective sets of variables analyzed; the median was 58.0%. The range was 8% to 99%. Table 1 presents the percentage of total variance accounted for in the factor analyses examined by decile. The table reveals, for instance, that 30% of the factor analyses examined accounted for less than 50% of the variance in the underlying variables. Thus, the table can serve as a benchmark against which variance accounted for percentages obtained in substantive factor analyses can be compared. 3.2. ANOVA results for total percentage of variance accounted for For a variety of reasons, including the desire for analytical parsimony and the need to present the results in a form amenable to benchmarking, continuous research design
Table 1. Percentage of total variance accounted for by all factors Decile 10th 20th 30th 40th 50th 60th 70th 80th 90th Percentage 34 42 49 53 58 61 65 69 76
269
characteristics were collapsed to provide only a few analysis categories. Once this was done, the recongured research design characteristics were used in a series of ANOVAs as independent variables to determine whether they systematically inuenced the percentage of variance accounted for in a factor analysis. Table 2 reports the results of the ANOVAs conducted. The table reveals that there are statistically signicant relationships between the percentage of variance accounted for in a factor analysis and ve of the six research design characteristics investigated. Only the relationship between type of factor analysis (principal component or common factor analysis) was not statistically signicant. The absence of a relationship conrms what previous simulation studies have demonstrated. Although principal component analysis and common factor analysis are conceptually distinct, in practice the variable variance analyzed does not signicantly inuence the percentage of variance accounted for.
Table 2. ANOVA results for percentage of variance Mean percentage 56.0 57.5 53.4 58.5 63.2 59.6 54.0 48.1 63.7 56.0 55.6 55.3 52.0 57.7 53.6 55.0 60.5 58.2 47.5 59.2 54.9 58.5 63.5
Characteristic Analytical approach Principal components Common factor analysis Nature of sample College students Non-students Number of variables 110 1120 2130 31 or more Sample size 100 101175 176300 301500 501 or more Number of factors 2 3 4 5 6 or more Number of scale categories 24 5 6 7 8 or more
N 530 257 299 502 139 339 139 186 158 189 174 136 146 162 212 148 104 177 65 210 38 144 35
F 1.6 19.2 34.4
p 0.21 0.00 0.00
12.0
0.00
4.4
0.00
8.6
0.00
270
R. A. PETERSON
Of the ve statistically signicant relationships, that involving the number of variables analyzed appears to be the strongest (based on an examination of F ratios). The larger the number of variables analyzed, the smaller the percentage of variance accounted for (r 0X20). Similarly, the larger the sample size, the smaller the percentage of variance accounted for (r 0X12). On average, the percentage of variance accounted for in a factor analysis using nonstudent data was 5% greater than one using student data (r 0X15). The relationship between number of factors extracted and percentage of variance accounted for, although statistically signicant, is difcult to interpret because it appears to be nonlinear. The statistically signicant relationship between the (minimal) number of scale categories and the percentage of variance accounted for is due primarily to scales with four or fewer categories (r 0X20); very coarse data appear to result in relatively less variance accounted for in a factor analysis. Finally, when the six independent variables were used in a linear multiple regression analysis to predict the percentage of variance accounted for, an R2 of 0.07 resulted.
3.3. Variance accounted for by rst factor In general, the rst factor accounted for 28.2% of the total variance and 49.3% of the explained variance (i.e., that which was accounted for by the factor analysis). As might be expected, the percentage of variance accounted for by the rst factor was signicantly larger when factor structures were not rotated (34.7% of the variance) than when they were rotated (27.4% of the variance). There was no difference, however, in the relative percentage of variance accounted for (53.9% of the explained variance when factor structures were not rotated vs. 48.7% when they were rotated). Similarly, there was no relationship between the absolute or relative percentage of variance accounted for by the rst factor as a function of whether a varimax or other rotational method was applied. With two expected exceptions, the relationships between percentage of variance accounted for by the rst factor and the research design characteristics studied mirrored those found when analyzing the total percentage of variance accounted for in a factor analysis. The exceptions occurred for the number of variables analyzed and the number of factors extracted. Considering only those factor analyses in which rotation took place, as the number of variables and the number of factors increased, both the absolute and the relative percentages of variance accounted for by the rst factor decreased monotonically. For example, the relative variance accounted for by the rst factor decreased from 69% when two factors were extracted to 35% when six or more factors were extracted.
3.4. Average factor loading The average (absolute) factor loading across the factor analyses reporting complete sets of factor loadings was 0.32. Twenty-ve percent of the reported factor loadings were less than 0.23; 25% were greater than 0.37. Table 3 contains the results of conducting a series of
271
ANOVAs with the average factor loading as the dependent variable and the six research design characteristics as independent variables. Three of the six research design characteristics produced statistically signicant differences in the average factor loadinganalytical approach, number of variables analyzed, and number of factors extracted. Principal component analysis produced larger factor loadings than common factor analysis (r 0X15). Number of variables analyzed and number of factors extracted were monotonically related to the magnitude of an average factor loading. The more variables analyzed and the greater the number of factors extracted, the smaller the average factor loading, ndings that have not previously been reported in the literature (r 0X52 and r 0X67 respectively). Furthermore, as might be intuitively expected, there was a signicant relationship between the average factor loading and the percentage of variance accounted for. The larger the average factor loading, the larger the percentage of variance accounted for (r 0X28). Finally, a
Table 3. ANOVA Results for factor loadings Characteristic Analytical approach Principal components Common factor analysis Nature of sample College students Non-students Number of variables 110 1120 2130 31 or more Sample size 100 101175 176300 301500 501 or more Number of factors 2 3 4 5 6 or more Number of scale categories 24 5 6 7 N Mean loading 0.33 0.29 0.31 0.32 0.39 0.32 0.24 0.23 0.32 0.32 0.30 0.32 0.32 0.43 0.32 0.25 0.24 0.22 0.29 0.31 0.28 0.32 F 8.9 1.1 56.3 p 0.00 0.28 0.00
275 126 153 247 102 194 73 33 99 88 83 58 74 112 120 77 45 48 24 91 17 79
0.7
0.62
146.5
0.00
1.2
0.29
272
R. A. PETERSON
regression analysis with the average factor loading as the dependent variable and the six research design characteristics as independent variables produced an R2 of 0.47. The reported factor-loading threshold or cutoff values used in the analyses to establish saliency tend to correspond to the heuristics cited in the literature. The most common cutoff value was 0.40; a third of the factor analyses reported using this value (0.40 was also the average (absolute) cutoff value). Twenty-six percent of the factor analyses incorporated a cutoff value of 0.30, and an additional 19% used a cutoff value of 0.50. The remaining factor analyses reported using cutoff values ranging from 0.20 to 0.70.
3.5. Synthetic data analysis Factor analyses of the synthetic data resulted in a weighted average percentage of variance accounted for of 50.2%. Comparison of this `èxpected'' percentage with the average percentage found in the meta-analysis, 56.6%, leads to the somewhat disquieting inference that either a substantial proportion of the factor analyses conducted on behavioral data are of questionable meaningfulness or the percentage of variance accounted for is not a useful or interpretable outcome measure. The synthetic data factor analyses also resulted in an average (absolute) factor loading of 0.22 when principal component analysis was applied and 0.18 when common factor analysis was applied (with the overall average being 0.21). Comparing these `èxpected'' values with the average factor loadings respectively found in the meta-analysis suggests that caution be exercised when using a threshold or cutoff value of 0.30 or lower when judging the saliency of factor loadings.
4. Discussion and conclusions This investigation represents the rst attempt to systematically and quantitatively document two common outcome measures in factor analyses of behavioral data. The percentage of variance accounted for and the average factor loading were the primary dependent variables of interest, whereas the independent variables consisted of common research design characteristics or analytical decisions. In general, the results of the investigation serve as a ``reality check'' for methodological conclusions about factor analysis based on analyses of limited data sets or articial data, experience, or intuition. More specically, the results collectively constitute a frame of reference for interpreting and evaluating factor analysis outcome measures. Consequently, no longer will a researcher have to rely solely on intuition or personal experience when interpreting an obtained percentage of variance accounted for in a factor analysis or when determining a reasonable threshold to use when evaluating the saliency of factor loadings. The results of the meta-analysis empirically document that the average percentage of variance accounted for in substantive factor analyses of behavioral data is 56.6%, and the
273
average (absolute) factor loading is 0.32. The results further document that the average percentage of variance accounted for and the average factor loading vary systematically as a function of certain research design characteristics. Thus, for example. the larger the number of variables analyzed, the smaller the percentage of variance accounted for and the smaller the average factor loading. Additionally, factor analyses of synthetic data having a general structure similar to that of the collective data in the substantive factor analyses examined provide important insights into the interpretation and evaluation of a factor analysis. Consider the nding that 50.2% of the variance in random data was `àccounted for'' by a factor analysis. Given the meta-analysis nding that 30% of the factor analyses examined accounted for less than this ``chance'' percentage, one must question the meaningfulness of factor analyses accounting for less than the chance percentage or the usefulness of the outcome measure per se. Likewise, given that the average (absolute) factor loading derived from analyses of random data was 0.21, the use of threshold factor loadings of less than 0.30 when determining loading saliency is probably not warranted. Moreover, the results provide insights into the effects of certain factor analysis decisions. Consider, for example, the decision on whether to employ principal component analysis or common factor analysis. The meta-analysis found no difference between the average percentage of variance accounted for in principal component factor analyses and that accounted for in common factor analyses. This nding corroborates conclusions of studies based on synthetic data (e.g., Fabrigar et al. 1999; Velicer and Fava 1998; Velicer, Peacock, and Jackson 1982) that the factor analysis approach used does not practically affect the outcome of an analysis. Thus, although there may be theoretical reasons for employing principal component or common factor analysis, in practice the factor analysis approach does not seem to matter, at least with respect to the percentage of variance accounted for. At the same time, however, the meta-analysis and the simulation study corroborated previous conclusions (e.g., Stewart 1981, Widaman 1993) that the factor loadings produced by a principal component analysis tend to be slightly larger than those produced by a common factor analysis. Even though a wide range constructs, modes of data collection, and so forth were reected in the data analyzed, systematic and intuitively logical relationships were uncovered between selected research design characteristics or analytical decisions and the two factor analysis outcome measures investigated. This suggests that researchers need to take into account the characteristics of a research design when assessing the results of a factor analysis. If, for example, the percentage of variance accounted for is being used as a heuristic when determining the number of factors to retain or rotate (cf. Coovert and McNelis 1988; Lehmann 1989, p. 602), knowledge of the underlying research design characteristics might affect the particular value chosen. It must be noted, however, that with few exceptions the statistically signicant relationships uncovered in the metaanalysis did not appear to be substantively signicant. This suggests that although vigilance is required when determining research design characteristics and interpreting the results of a factor analysis, especially in light of the simulation study ndings, factor analysis is still a useful multivariate technique.
274 References
R. A. PETERSON
Aaker, Jennifer (1997). ``Dimensions of Brand Personality,'' Journal of Marketing Research 34, 347356. Brown, Steven P., William L. Cron, and John W. Slocum, Jr. (1997). `Èffects of Goal-Directed Emotions on Salesperson Volitions, Behavior, and Performance: A Longitudinal Study,'' Journal of Marketing 61, 3950. Comrey, Andrew L. (1978). ``Common Methodological Problems in Factor Analysis Studies,'' Journal of Consulting and Clinical Psychology 46, 648659. Coovert, Michael D. and Kathleen McNelis (1988). ``Determining the Number of Common Factors in Factor Analysis: A Review and Program,'' Educational and Psychological Measurement 48, 687692. Cudeck, Robert and Lisa L. O'Dell (1994). `Àpplications of Standard Error Estimates in Unrestricted Factor Analysis: Signicance Tests for Factor Loadings and Correlations,'' Psychological Bulletin 115, 475487. Fabrigar, Leandre R., Duane T. Wegener, Robert C. MacCallum, and Erin J. Strahan. (1999). `Èvaluating the Use of Exploratory Factor Analysis in Psychological Research,'' Psychological Methods 4, 272299. Ford, J. Kevin, Robert C. MacCallum, and Marianne Tait (1986). ``The Application of Exploratory Factor Analysis in Applied Psychology: A Critical Review and Analysis,'' Personnel Psychology 39, 291314. Frazier, Gary L. and Walfried M. Lassar (1996). ``Determinants of Distribution Intensity,'' Journal of Marketing 60, 3951. Guadagnoli, Edward, and Wayne F. Velicer. (1988). ``Relation of Sample Size to the Stability of Component Patterns,'' Psychological Bulletin 103, 265275. Hair, Joseph F. Jr., Rolph E. Anderson, Ronald L. Tatham, and William C. Black. (1998). Multivariate Data Analysis, 5th ed. Upper Saddle River, NJ: Prentice Hall. Hulland, John and Mark Vandenbosch (1996). `Èstimating Choice Models in Data-Sparse Environments: Taking Advantage of Perceived Similarity,'' Marketing Letters 7, 329339. Ittner, Christopher D. and David F. Larcker (1997). ``Product Development Cycle Time and Organizational Performance,'' Journal of Marketing Research 34, 1323. Kayande, Ujwal and Mukesh Bhargava (1994). `Àn Examination of Temporal Patterns in Meta-Analysis,'' Marketing Letters 5, 141151. Lehmann, Donald R. (1989). Market Research and Analysis, 3rd ed. Homewood, IL: Irwin. Luce, Mary Frances, John W. Payne, and James R. Bettman (1999). `Èmotional Trade-Off Difculty and Choice,'' Journal of Marketing Research 36, 143159. Lynn, Gary S., James T. Simpson, and William E. Souder (1997). `Èffects of Organizational Learning and Information-Processing Behaviors on New Product Success,'' Marketing Letters 8, 3339. MacCallum, Robert C., Keith F. Widaman, Shaobo Zhang, and Sehee Hong. (1999). ``Sample Size in Factor Analysis,'' Psychological Methods 4, 8499. Merenda, Peter F. (1997). `À Guide to the Proper Use of Factor Analysis in the Conduct and Reporting of Research: Pitfalls to Avoid,'' Measurement and Evaluation in Counseling and Development 30, 156164. Nunnally, Jum C. and Ira H. Bernstein. (1994). Psychometric Theory. 3rd ed. New York: McGraw-Hill, Inc. Palan, Kay M., Charles S. Areni, and Pamela Kiecker (1999). ``Reexamining Masculinity, Femininity, and Gender Identity Scales,'' Marketing Letters 10, 363377. Rose, Gregory M. (1999). ``Consumer Socialization, Parental Style, and Developmental Timetables in the United States and Japan,'' Journal of Marketing 63, 105119. Rosenthal, Robert (1979). ``The `File Drawer Problem,' and Tolerance for Null Results,'' Psychological Bulletin 30, 185193. Spearman, Charles (1904). ``General Intelligence, Objectively Determined and Measured,'' American Journal of Psychology 15, 201293. Stewart, David W. (1981). ``The Application and Misapplication of Factor Analysis in Marketing Research,'' Journal of Marketing Research 18, 5162. Tinsley, Howard E.A. and Diane J. Tinsley (1987). `Ùses of Factor Analysis in Counseling Psychology Research,'' Journal of Counseling Psychology 34, 414424. Velicer, Wayne F. and Joseph L. Fava (1998). `Èffects of Variable and Subject Sampling on Factor Pattern Recovery,'' Psychological Methods 3, 231251. Velicer, Wayne F., Andrew C. Peacock, and Douglas N. Jackson (1982). `À Comparison of Component and Factor Patterns: A Monte Carlo approach,'' Multivariate Behavioral Research 17, 371388.
275
Widaman, Keith F. (1993). ``Common Factor Analysis Versus Principal Component Analysis: Differential Bias in Representing Model Parameters?'' Multivariate Behavioral Research 28, 263311.

Factor Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Factor Analysis

Uploaded by

Copyright:

Available Formats

Marketing Letters 11:3 (2000): 261275 # 2000 Kluwer Academic Publishers, Manufactured in The Netherlands

EXPLORATORY FACTOR ANALYSIS

EXPLORATORY FACTOR ANALYSIS

EXPLORATORY FACTOR ANALYSIS

EXPLORATORY FACTOR ANALYSIS

F 1.6 19.2 34.4

p 0.21 0.00 0.00

EXPLORATORY FACTOR ANALYSIS

275 126 153 247 102 194 73 33 99 88 83 58 74 112 120 77 45 48 24 91 17 79

EXPLORATORY FACTOR ANALYSIS

EXPLORATORY FACTOR ANALYSIS

You might also like