You are on page 1of 8
Pages / Home The Minimum Sample Size in Factor Analysis Why | Studied This Issue? | did an apprentice project studying the reasons why students withdrew from ther anne courses. In this projec, | gota dataset that had 36 variables indicating various withdrawal reasons. I wanted to use factor analysis to reduce the 36 variables to a few categories of wiicrawal reasons. However, | only have 47 casese in the dataset. Many people suggested thatthe number of cases was fo small for performing a factor analysis. But, | really do not want to waste the time and energy | had spent and just throw away ths dataset. Yes! want to “explain the most wih the least" Henson & Roberts, 2006, p, 393). Thus, | decelded to find out what s the minimum sample size (.e., the minimum number of eases - some researchers called it subjects) or performing {actor analysis. Here isthe relate information| foun ‘The General Recommendations Tore are two categories of general recommendations in terms of minimum sample size in factor analysis. One category says tat the absolute rumber of cases (N) i important, while the another says thatthe subject‘o-variable ratio (p) is important. Aindell and van der Ende (1985), Velicer and Fava (1998), and MacCatum, Widaman, Zhang and Hong (1899) have reviewed many of these recommendations. of sample size + Rule of 100: Gorsuch (1983) and Kline (1978, p, 40) recommanded atleast 100 (MacCallum, Widaman, Zhang & Hong, 1998), No sample should be less than 100 even though the number of variables is less than 20 (Gorsuch, 1974, p. 333; in Artindell& van der Ende, 1985, p. 168) Hatcher (1994) recommanded that the numberof subjects should be the larger of 5 times the rumber of varables, or 100. Even more subjects are needed wien communalties are tow andlor few variables oad on each factor (in David Garson, 2008). “+ Rule of 150; Hutcheson and Sofroniou (1999) recommends at least 160 - 300 cases, more toward the 150 end when there are a few highly correlated variables, as would be the case when collapsing highly mulcolinear variables (in David Garson, 2008). + Rule of 200. Guiford (1964, p. 523) suggested that W should be at least 200 eases (in MacCallum, Widaman, Zhang & Hong, 1999, p84; in Arindall ‘van der Ende, 1985; p. 166). + Rule of 260. Catel (1978) claimed the minimum desirable N to be 250 (in MacCallum, Widaman, Zhang & Hong, 1999, pl). Rule of 300. There should be atleast 300 cases (Noru?is, 2005: 400, in David Garson, 2008), + Significance rule. Lawley and Maxwell (1971) suggested 81 more casos than 1 Garson, 2008). + Rule of 500. Comrey and Lee (1882) thought that 100 = poor, 200 = far, 300 = good, 500 = very good, 1,000 or more = excellent They urged researchers to obtain samples of 500 or more observations whenever possible (in MacCallum, Widaman, Zhang & Hong, 1993, p84) 1 numberof variables, to support ch-square testing (in David of subjects-to-variables (STVjratio + Aratio of 20:1 Hai, Anderson, Tatham, and Black (1995, in Hogarty, Hines, Kromroy, Ferron, & Mumford, 2005) + Rule of 10. There should be a least 10 cases for each item inthe instrument being used. (David Garson, 2008; Evert, 1975; Evert, 1975, Nunnally, 41978, p. 276, in Arrndell & van der Ende, 1985, p. 186; Kunce, Cook, & Miler, 1975, Marascullor & Levin, 1983, in Volcer & Fava, 1998, p. 232) + Rule of. The subjects-o-variables ratio should be no lower than 5 (Bryant and Yarnold, 1995, in David Garson, 2008; Gorsuch, 1983, in MacCallum, Widaman, Zhang & Hong, 1999; Evert, 1975, in Azrindell & van der Ende, 1985; Gorsuch, 1974, in Avrindell & van der Ende, 1985, p. 166) + Aratio of 3-1) t0 6:1) of STV is accoptable ifthe lower mit of variables to-actors ratio ls 9 to 6. Bu, the absolute minimum sample size should not be less than 250,(Cattel, 1978, p. 508, in Arndall& van der Endo, 1985, p. 165) + Ratio of 2 "There should be at least twice as many subjects as variables in factor-analytc investigations, This means that in any large study on ths ‘account alone, one should have to use mora than the minimum 109 subjacts” (Kline, 1979, p. 40). Statistical Research Findings on Minimum Sample Size Lite statistical research in the fel of Education and Behaviour Science has shed ight onthe issue of establishing a minimum desirable level of sample size (MacCallum, Widaman, Zhang & Hong, 1999). These studies used either artificial ar empirical data to investigate the minimam sample size or STV ratio that is required in order to recover the population factor structure. In this section, | wil summarize the minimum sample size and STV rato that these sucies had examined, + Barrett and Kine (1981, in MacCallum, Widaman, Zhang & Hong, 1988) used two large empirical data sets to investigate this issue. They drew sub- samples of various size from the original full samples and performed facar analysis wth each sub-sample to compara the results of sub-samples ‘withthe result of full samples. They obtained good recovery + from a sub-sample of = 48 (1 for one data at that has 16 variables, which represents a STV ratio of 3.0 + ang from a sub-sample of N= 112 for another data set that has 90 variables, which STV rato is 1.2 [1 This number was reported as 60 "tobe the minimum to yield a clear, recognizable factor pattern” (p. 167) in Arindell and van der End's paper (1985). + Artindell and van dor Ende (1985) used two large empirical data eats that have 1104 cases and 960 cases respectively fo examine the minimum sample sizes and STV ratios that can produce stable factor structure. By drawing sub-samples from the two large datasets, the authors found that: + for the frst dataset, which had 76 varables, the minimum STV rao (p) that required to produce clear, recognizable factor solution was 1.3, ‘ang the corresponding sample size (N) was 100; + for the second data set, which has 20 variables, the minimum STV ratio (p) was 3.8 and the corresponding sample size (N) was 78 + MacCallum, Widaman, Zhang & Hong (1998) conducted @ Monte Carla Study on sample size efects, They obtained an excellent recovery (100% convergence) of population factor structure with a sample size (N) of 80 and 20 variables. However, ths resul was obtained only when the leva of ‘communality (over 7 in average) and overcetermination (3 loaded factors) were high (Table 1 on page 93) + Preacher & MacCallum (2002) conducted a Monte Carlo study. Their conclusion is: + had by far the largest effect on factor recovery, which exhibited a sharp drop-off below Ns of 20 or so. (p.187) ‘The Minimum Sample Size or STV Ratio Used in Practical Studies + Henson and Roberts (2006) reported a reviw af 60 exploratary factor analysis in four journals: Educational and Psychological Measurement Journal of Educational Psychology, Personalty and Individual Differences, and Psychological Assessment, + Minimum sample siz reported: 42 + Minimum STV ratio reportee: 3,25:1; 11.86% of reviewed studios used a ratio less than 5:1. + Fabrigar, Wegener, MacCalum, and Strahan (1999) reported a review of articles that used EFA in two journals: Journal of Personally and Social Psychology (UPSP} and Joural of Applied Psychology (JAP) ‘+ Sample size: 20 (18.9%) articles in JPSP and & (13.8%) in JAP were 100 or ls. + Ratio of variable to facors: 56 (24.8%) papers in JPSP and 20 (34.4%) in JAP were 4:1 or less. + Costello and Osborne (2005) surveyed two year's PeychINFO articles that reported principal components or exploratory factor analysis, StVratio of studies Cumulative % 2torless 14.7% 1478, 224,761 25.8% 40.5% >6A1,2101 22.7% ea2% > 10,7201 184% 7a8% >20:1,7 1001 184% 97.0% > 1001 3.0% 100.0% + Ford, MacCallum, and Tal (1986) examined articles publshed in Journal of Applied Psychology, Porsonnal Psychology, and Organizational Behavior ‘and Human Performance during the period of 1974 - 1984, + RTV allo: 27.3% of the studies wore less than 6:1, 86% were loss than 10:1. Factors Related to Sample Size Research has demonstrated tha the general rue of thumb of the minimum sample size are not valid and useful (MacCalum, Widaman, Zhang, & Hong, 11999; Preacher & MacCallum, 2002). Itishard and simplity to say whether absolute sample size is important or the STV rato is important in factor analysis. The minimum level of NV (sample size) was dependent on ather aspects of design, such as: + Communal ofthe variables + The communality measures the percent of variance ina given variable explained by all the factors jointly and may be interpreted as the reliability ofthe incieator (Gason, 2008), + fcommunalties are high, recovery of population factors in sample data is normally very good, aknost regardless of sample size, lovel of ‘ovordetormination, or the presence of model error (MacCallum, Widaman, Preacher, and Hong, 2001, p. 636) + MacCallum, Widaman, Zhang, and Hong (1999) suggested communalites should all greater than 6, or the moan level of communality to be at least 7 (p. 86). + Item communalties are considered "high? if they are all.8 or greater - bu this untkely to occur in real data (Costello & Osborne, 2008, p. 4) of everdetermination ofthe factor (or number of factorsnumber of variables) ‘+ Ovordetermination isthe factor-to-variabe ratio (Preacher & MacCallum, 2002). + Six oF seven indicators per factor and a rather small number offacors is considered as high overdetermination of factors i many or all ‘communaitis are under 50 (MacCalum, Widaman, Zhang, & Hong, 1999). + minimum of 3 variables per factor is eetcal This confirms the theoretical resus of T, W. Anderson and Rubin (1956; also see MeDonalé & Krane, 1977, 1979, and Rindskopf, 1984. (Velicer, & Fava, 1998, p. 243) + Atleast four measured varlables for each common factor and perhaps as many as sb (Fabriga, Wegener, MacCallum, & Strahan, 1988, p 282) + factor with fewer than three imesis generally weak and unstable (Costello & Osbore, 2008, p.§) + Size of lading + Item loading magnitude accounted for significant unique variance inthe expected direction in all but ane case, and in most cases was the ‘svongest unique precicior of congruence between sample and population (Osborne, & Costello, 2004), +The sample-o-population patter ft was very good forthe high (.80) loading condition, moderate for the middle (.60) loading condition, and very poor (40) forthe low loading condition (Volicer & Fava, 1988) + 5 or more strongly loaging toms (50 or beter) are desirable and indicate a solid facor (Costello & Osbome, 2006, p. 5). + tf components possess four or more varables with loadings above .0, the pattern may be interprated whatavar the sample #20 used ‘Similar, a pattem composed of many variables per component (10 to 12) but low loadings (= 40) should be an accurate solution at al but the lowest sample sizes (N < 150), Ifa solution possesses components with only afew variables per component and low component loadings, the pattern should not be interpreted unless a sample size of 300 or more observations has been used. (Guadagnol: & Velicer, 1988, p. 274) + Mode! ft (9 + Itis defined in terms ofthe population root mean squared residual (RMSR) (Preacher & MacCallum, 2002), + RMSR =_00, 03, 08, respectvely cortespond to perfect, good, and fair model ft the population (Preacher & MacCallum, 2002). + Lack of fit of tha modelin the population wll not, on the average, intuence recovery of population factors in analysis of sample data, regardless of degree of model ertor and regardless of sample size (MacCalkim, Widaman, Preacher, & Hong, 2001, p. 611). + Mose! fithas its effect on factor recovery. Itis probably very rare in practice to find factor models exhibiting simultaneously high ‘communalties and poor ft (Preacher & MacCalum, 2002, p. 17) + the diferences between (extraction) methods with respect lo ably to reproduce the population pattern were generally minor (Vel 1998, p. 248) Fava, Conclusion +The general rule of thumb of the minimum sample se are not val and use + What | aid withthe cata Ihave: 1. Repeat the method Garson (hiplieaw2.chass. nest edulgarson/pa76S\tactorhimBkme) proposed unt the KMO overalls over 60. 2. Check the communal of each variable. Drop the variables that has the smallest communaliy,untl the communalties of al variables are above 69 23. Check the mean value of allcommunalties to ensure thatthe mean value fs over 07. not, repeat step 2 4. Use Kalser strategy (dropping al components wit eigenvalues under 1.0) and Seree plot to determine the number of factors. 5. Set the loading size cutof value as .60, and drop the facors that has less than 3 variables. Final, with principal component analysis, | gol 4 factors wth 32 variables, representing a STV ratio of +.48:1 (47192). The ov minimum vali ofall communaliis i 62, the maximum value of communal i879, the mean value of communaiies ie.770 with a standard deviation of 074. There is na cross loading among tne 4 facors. Two ofthe 4 factors each have S loaded variables, one nas 4 loaded variables, and one has 3 loaded variables. The valable-o-factor ratio fs 8 (32/4). | thnk this can be considered as a moderate to high degree of overdetermination, “As long as communalties are high, the numberof expecte factors i elatvely small, and model error is low (a consition which offen goes hand- in-hand with high communalties), researchers and reviewers should not be overly concerned about small samplo szos." (Preacher & MacCallum, 2002, p. 160) ‘Sirong data in factor analysis means uniformly high communalties without cross loadings, plus several variables loading strongly on each factor. (Costello and Osbome, 2005, p. 4) References + Anderson, 7 W., & Rubin, H. (1956). Slatistca inference in factor analysis. In J. Neyman (Ez), Procoedlngs of the Third Borkoley Symposium on Mathematical Statistics and Probabilty (pp. 111-150). Berkeley: University of California Press. + Arrindell, W. A. & van der Ende. J. (1985). An empirical test ofthe uty ofthe observations-o-variables ratio in factor and components analysis. ‘Applied Psychological Measurement, 9, 165 - 178. + Barrett, P.., & Kine. P. (1981), The observation to variable ratio in factor analysis. Personality Study in Group Behavior, 1, 23-92 Bryant, F.B., & Yarnold, P.R. (1998). Principal components analysis and exploratory and confirmatory factor analysis. In L. G. Grimm & R R. Yarnold (Eds.), Reading and understanding muttvaiale statistics (op. 98-196). Washington, DC: American Psychological Associaton, + Cattell RB. (1978). The Sciontifc Use of Factor Analysis. New York: Plenum + Comey. AL, & Lee, H.B. (1982). first Course in Factor Analysis. Milsdale, NJ: Erlbaum Costello, A.B. & Osbome, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment Research & Evaluation, 10(7). Retrieved uly 3, 2008 from hitplIpareonline.neipatvOn7a.pa. + Evert, 1:1. §. (1975), Mutvariate analyse: The need for data, and other problems. Bish Journal of Psychiatry, 126, 287-240. + Fabrigar, LR, Wegener, D.., MacCallum, RC, & Strahan, . J. (1998). Evaluating the use of exploratory factor analysis in psychological research, Psychological Methods, 4, 272-288. + Ford, JK. MacCallum, 8. C., & Tal, M. (1988). The application of exploratory factor analysis in applied psychology: A crlical review and analysis Personnel Psychology, 39, 291-314 + Garson, D.G, (2008). Factor Analysis: Statnotes, Retileved March 22, 2008, from North Carolina State Unversity Public Administration Program, itpliwww2.chass.ncsu.edu/garsonipa76Sifactor him + Gorsuch, RL. (1983). Factor analysis (2nd ee). Milsdale.NJ: Erlbaum + Guadagnol £., &velcr, WF (1988). Relation of sample size to the stably of component pattems. Psychological bulltn, 103, 265-278, + Guilford, J.P, (1954), Psychometria methods (2nd e¢.), New York: MeGrave-Hil, + Hair, J F.., Anderson, RE, Tatham, RL, & Black W. C. (1986), Mutvarite data analysis (Ath ed.) Saddle River, NJ: Prentice Hall + Hatcher, L. (1984). A Stop-by-Stop Approach to Using the SAS® System for Factor Analysis and Structural Equation Modeling. Cary, NC: SAS Institute, Ine + Hogary, KY, Hines, ©. V, Kromrey, J.0., Ferron, J M., & Mumford KR. (2008). The qualiy of factor solutions in exploratory factor analysis: The influence of sample size, communality, and overdeterminaton, Educational and Psychological Measurement, 65, 202-226. + Henson, RK, & Roberts, J. K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measuremant, 66, 383-416, + Hutcheson, G., & Soffoniou, N. (1999). The multivariate social scents: Introductory statistics using generalized linear models. Thousand Oaks, CA: Sage Pubications + Kline, P. (1978). Psychometrics and psychology. Landon: Acaderre Press. + Kune, J.T, Cook, W. D., & Miler, D.E. (1975). Random variables and correlational overkill. Educational and Psychological Measurement, 36, 529- 534, + Lawley, 0. N., & Maxwell, AE. (1971). Factor analysis as a statistical method. London: Butterworth and Co. + McDonald, RP, & Krane, W. R. (1977). A note on local identiablty and degr ofMathematical and Statistical Psychology, 30, 198-203. 35 of freedom in the asymptotic likethood ratio test. British Journal + McDonald, R.P, & Krane, W. R (1978). A Monto Caro study of local identiabilty and degrees of freedom inthe asymptotic ikoinood rao tot Bntish Journal of Mathematical and Statistical Psychology, 32, 121-132. + Marascuilo, 1... & Levin, J, R (1983). Multivariate statistics in the socialsciences. Monterey, CA: BrooksiCole + MacCallum, R.C., Widaman, K.F., Preacher, K. J, & Hong S. (2001), Sample size in factor analysis: The role of model ertor. Multivariate Behavioral Research, 36, 611-637. + MacCallum, R.C., Widaman, K.F., Zhang, S. & Hong 8. (1988). Sample size in factor analysis. Psychological Methods, 4, 84.98, + Norutis, M. J, (2005). SPSS 13.0 Statistica! Procedures Companion, Chicago: SPSS, Inc. “+ Nunnally, J. . (1978). Psychometric theory (2nd Ee). New York: McGraw-Hill + Osborne, J. & Costello, A 8, (2004). Sample size and subject ta item rato in principal components analyls. Practical Assessment, Research & Evaluation, 9(11), Retrieved July 1, 2008 from hitp:PARFoniine.netigetvn.asp7V=98"="1, + Proacher, K.., & MacCallum, RC. (2002). Exploratory Factor Analysis in Schavior Genetics Research: Factor Recovery with Small Sample Sizes. Bohavior Gonoties, 32, 153-161 + Rindskopf. D. (1984). Structural equation models Empirical identification, Heywood cases, and related problems. Sociological Methods and Research, 13, 108-118, + Valier, W.F., & Fava, J. L, (1998). Efects of variable and subject sampling on factor pattem recovery, Psychological Methods, 3, 231-251 Nolabels Powarad bya es Atsaslan Conflsnes Community Lcanes gra

You might also like