You are on page 1of 19

Factor Analysis

(Adapted from Hare et al 1998)

1.

Factor Analysis An Introduction

Factor analysis is the name given to a group of statistical techniques that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors). The approach involves condensing the information contained in a number of original variables into a smaller set of dimensions (factors) with a minimum loss of information In more technical terms Factor analysis addresses the problem of analyzing the structure of the interrelationships (correlations) among a large number of variables (e.g., test scores, test items, questionnaire responses) by defining a set of common underlying dimensions, known as factors. Factor analysis differs from the dependence techniques discussed in the next section (e.g. multiple regression,), in which one or more variables are explicitly considered the criterion or dependent variables and all others are the predictor or independent variables. Factor analysis is an interdependence technique in which all variables are simultaneously considered, each related to all others. To reduce a large number of variables to a smaller number of factors for modelling purposes, where the large number of variables precludes modelling all the measures individually. As such, factor analysis is integrated in structural equation modeling (SEM), helping create the latent variables modelled by SEM. However, factor analysis can be and is often used on a stand-alone basis for similar purposes. To select a subset of variables from a larger set, based on which original variables have the highest correlations with the principal component factors. To create a set of factors to be treated as uncorrelated variables as one approach to handling multicollinearity in such procedures as multiple regression To validate a scale or index by demonstrating that its constituent items load on the same factor, and to drop proposed scale items which cross-load on more than one factor. To establish that multiple tests measure the same factor, thereby giving justification for administering fewer tests. Factor analysis can be used for exploratory or confirmatory purposes. Exploratory factor analysis seeks to uncover the underlying structure of a relatively large set of variables. The researcher's priori assumption is that any indicator may be associated with any factor. This is the most common form of factor analysis and one which will be covered in this note.

Confirmatory factor analysis seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of pre-established theory

2.

An Example of Factor Analysis

Assume that through qualitative research a retailer has identified 80 different characteristics of retail shops and their service that consumers have mentioned as affecting their choice among stores. The retailer wants to understand how consumers make decisions but feels that it cannot evaluate 80 separate characteristics or develop action plans for this many variables, because they are too specific. Instead, it would like to know if consumers think in more general terms rather than in just to specific items. To identify these dimensions, the retailer could commission a survey asking for consumer evaluations on each of these specific items. Factor analysis would then be used to identify the underlying dimensions. Specific items that correlate highly are assumed to be a belong to a particular dimension. These dimensions become composites of specific variables, which in turn allow the dimensions to be interpreted and described. In this example, the factor analysis might identify such dimensions as product assortment, product quality, prices, store personnel, service, and store atmosphere as dimensions used by the respondents Table One represents the imaginary correlation matrix for nine store image elements. Included in this set are measures of the product offering, store personnel, price levels, and in-store service and experiences. The question a researcher may wish to address is: Are all of these elements separate in the minds of customers or do they "group" into some more general way? For example, do all of the product elements group together? Where does price level fit, or is it separate? How do the in-store features (e.g., store personnel, service, and atmosphere) relate to one another? Visual inspection of the original correlation matrix (Table One , part 1) does not easily reveal any specific pattern. There are scattered high correlations, but variable groupings are not apparent. The application of factor analysis results in the grouping of variables as reflected in part 2 of Table One . Here some interesting patterns emerge. First, four variables all relating to the in-store experience of shoppers are grouped together. Then, three variables describing the product assortment and availability are grouped together. Finally, product quality and price levels are grouped. Each group represents a set of highly interrelated variables that may reflect a more general dimension. In this case, we might label the three variable groupings by the labels in-store experience, product offerings, and value.

PART 1: ORIGINAL CORRELATION MATRIX V1 V1 Price level V2 Store personnel V3 Return policy V4 Product availability VS Product quality V6 Assortment depth V7 Assortment width V8 In-store service V9 Store atmosphere 1.000 .427 .302 .470 .765 .281 .354 242 .372 1.000 .771 .497 .406 .445 .490 719 .737 1.000 .427 .307 .423 .471 .733 .774 1.000 .472 .713 .719 .428 .479 1.000 .325 .378 .240 .326 1.000 .724 .311 .429 1.000 .435 466 1.000 .710 1.000 V2 V3 V4 V5 V6 V7 V8 V9

PART 2: CORRELATION MATRIX OF VARIABLES AFTER GROUPING ACCORDING TO FACTOR ANALYSIS V3 V3 Return policy V8 In-store service V9 Store atmosphere V2 Store personnel V6 Assortment depth V7 Assortment width V4 Product availability V1 Price level V5 Product quality 1.000 .733 .774 .741 .423 .471 .427 .302 .307 1.000 .710 .719 311 .435 .428 .242 .240 1.000 .787 .429 .468 .479 .372 .326 1.000 .445 .490 497 .427 .406 1.000 .724 .713 .281 .325 1.000 .719 .354 .378 1.000 .470 .472 1.000 .765 1.000 V8 V9 V2 V6 V7 V4 V1 V5

Shaded areas represent variables grouped together by factor analysis Table One Example of the Use of Factor Analysis to Identify Structure within a Group of Variables

3.
3.1

Undertaking Factor Analysis


Objectives of Factor Analysis

The starting point in factor analysis, as with other statistical techniques, is the research problem. In this note we will assume that our question involves exploratory research and involves a requirement to condense the data from a number of variables into a smaller set of dimensions with a minimum loss of information. Factor analysis can identify the structure of relationships among either variables or respondents by examining either the correlations between the variables or the correlations between the respondents. For example, suppose we have data on 100 respondents in terms of 10 characteristics. If the objective of the research is to summarize the characteristics, the factor analysis is applied to a correlation matrix of the variables. This is the most common type of factor analysis, and is referred to as R factor analysis. R factor analysis analyzes a set of variables to identify the underlying dimensions Factor analysis also may be applied to a correlation matrix of the individual respondents based on their characteristics. This is referred to as Q factor analysis, a method of combining or condensing large numbers of people into distinctly different groups within a larger population. The Q factor analysis approach is not utilized very frequently. This note will focus on R factor analysis

3.1.1

Variable Selection

Once the purpose of factor analysis is specified, the researcher must then define the set of variables to be examined. The researcher implicitly specifies the potential dimensions that can be identified through the character and nature of the variables submitted to factor analysis. For example, in assessing the dimensions of store image, if no questions on store personnel were included, factor analysis would not be able to identify this dimension. The use of factor analysis as a data summarization technique does not exclude the need for a conceptual basis for any variables analyzed.

3.2

Designing a Factor Analysis

The design of a factor analysis involves three basic decisions: (1) Choice of the input data (a correlation matrix) to meet the specified objectives of grouping variables or respondents; (2) the design of the study in terms of number of variables, measurement properties of variables, and the types of allowable variables; and (3) the sample size necessary, both in absolute terms and as a function of the number of variables in the analysis.

3.2.1

Correlations among Variables or Respondents

The first decision in the design of a factor analysis focuses on the choice of the correlation matrix to be used. The researcher could derive the input data matrix from the computation of correlations between the variables. This would be an R-type factor analysis. Alternatively the researcher could also elect to derive the correlation matrix from the correlations between the individual respondents. In this Q-type factor analysis, the results would be a factor matrix that would identify similar individuals. In this note it is assumed that R-type will be used and that Cluster Analysis would be used if relationships between individuals is the focus of the research.

3.2.2

Variable Selection and Measurement Issues

Two specific questions must be answered at this point: (1) How are the variables measured? and (2) How many variables should be included? Variables for factor analysis are generally assumed to be of metric measurement. In some cases, dummy variables (coded 0-1), although considered non-metric, can be used. The researcher should also attempt to minimize the number of variables included but still maintain a reasonable number of variables per factor. If a study is being designed to assess a proposed structure, the researcher should be sure to include several variables (five or more) that may represent each proposed factor. The strength of factor analysis lies in finding patterns among groups of variables, and it is of little use in identifying factors composed of only a single variable. Finally, when designing a study to be factor analyzed, the researcher should, if possible, identify several key variables (sometimes referred to as key indicants or marker variables) that closely reflect the hypothesized underlying factors. This will aid in validating the derived factors and assessing whether the results have practical significance.

3.2.3

Sample Size

Hare et al recommend that factor analysis should not be undertaken on a sample of fewer than 50 observations, and preferably the sample size should be 100 or larger. As a general rule, the minimum is to have at least five times as many observations as there are variables to be analyzed, and the more acceptable size would have a ten-to-one ratio. Some researchers even propose a minimum of 20 cases for each variable. One

must remember, however, that with 30 variables, for example, there are 435 correlations in the factor analysis. At a .05 significance level, perhaps even 20 of those correlations would be deemed significant and appear in the factor analysis just by chance!

3.3

Assumptions in Factor Analysis

In addition to the statistical bases for the correlations of the data matrix, the researcher must also ensure that the data matrix has sufficient correlations to justify the application of factor analysis. If visual inspection reveals no substantial number of correlations greater than .30, then factor analysis is probably inappropriate. The correlations among variables can also be analyzed by computing the partial correlations among variables, that is, the correlations between variables when the effects of other variables are taken into account. If "true" factors exist in the data, the partial correlation should be small, because the variable can be explained by the factors. If the partial correlations are high, then there are no "true" underlying factors, and factor analysis is inappropriate Another mode of determining the appropriateness of factor analysis examines the entire correlation matrix. The Bartlett test of sphericity, a statistical test for the presence of correlations among the variables, is one such measure. It provides the statistical probability that the correlation matrix has significant correlations among at least some of the variables. A basic assumption of factor analysis is that some underlying structure does exist in the set of selected variables. It is the responsibility of the researcher to ensure that the observed patterns are conceptually valid and appropriate to study with factor analysis, because the technique has no means of determining appropriateness other than the correlations among variables. The researcher must also ensure that the sample is homogeneous with respect to the underlying factor structure. It is inappropriate to apply factor analysis to a sample of males and females for a set of items known to differ because of gender. When the two sub samples (males and females) are combined, the resulting correlations and factor structure will be a poor representation of the unique structure of each group. Thus, whenever differing groups are expected in the sample, separate factor analyses should be performed, and the results should be compared to identify differences not reflected in the results of the combined sample.

3.4

Deriving Factors and Assessing Overall Fit

Once the variables are specified and the correlation matrix is prepared, the researcher is ready to apply factor analysis to identify the underlying structure of relationships (see Table One). In doing so, decisions must be made concerning The method of extracting the factors (common factor analysis versus components analysis) The number of factors selected to represent the underlying structure in the data. Component analysis is used when the objective is to summarize most of the original information (variance) in a minimum number of factors for prediction purposes. In contrast, common factor analysis is used primarily to identify underlying factors or dimensions that reflect what the variables share in common.

3.4.1

Common Factor Analysis versus Component Analysis

There are two basic models to obtain factor solutions. They are known as common factor analysis and component analysis. To select the appropriate model, the researcher must first understand the differences between types of variance in factor analysis of which there are three types (1) Common variance is defined as that variance: in a variable that is shared with all other variables in the analysis
(2) Specific variance (sometimes called unique) is that variance associated with

only a specific variable.


(3) Error variance is that variance due to unreliability in the data-gathering process,

measurement error, or a random component in the measured phenomenon Component analysis, also known as Principal Components Analysis, considers the total variance and derives factors that contain small proportions of unique variance and, in some instances, error variance. Specifically, with component analysis, unities (1s) are inserted in the diagonal of the correlation matrix, so that the full variance is brought into the factor matrix. Conversely, with common factor analysis, communalities are inserted in the diagonal. Communalities are estimates of the shared, or common, variance among the variables. Factors resulting from common factor analysis are based only on the common variance. The common factor and component analysis models are both widely used. The selection of one model over the other is based on two criteria: (1) the objectives of the factor analysis and (2) the amount of prior knowledge about the variance in the variables. The component factor model is appropriate when the primary concern is about prediction or the minimum number of factors needed to account for the maximum portion of the variance represented in the original set of variables, and when prior knowledge suggests that specific and error variance represent a relatively small proportion of the total variance. In contrast, when the primary objective is to identify the latent dimensions or constructs represented in the original variables, and the researcher has little knowledge about the amount of specific and error variance and therefore wishes to eliminate this variance, the common factor model is most appropriate. Common factor analysis suffers from the problem that several different factor scores can be calculated from the factor model results in other words there is no single unique solution, as found in component analysis. When a decision has been made on the factor model, the researcher is ready to extract the initial unrotated factors. By examining the unrotated factor matrix, the researcher can explore the data reduction possibilities for a set of variables and obtain a preliminary estimate of the number of factors to extract. Final determination of the number of factors must wait, however, until the results are rotated and the factors are interpreted.

3.4.2

Criteria for the Number of Factors to Extract

Hare et al make the analogy for choosing the number of factors to be interpreted with focusing a microscope. Too high or too low an adjustment will obscure a structure that is obvious when the adjustment is just right. Therefore, by examining a number of different factor structures derived from several trial solutions, the researcher can compare and contrast to arrive at the best representation of the data. They suggest the following stopping criteria for the number of factors to be extracted:

Latent Root Criterion The most commonly used technique is the latent root criterion. This technique is simple to apply to either components analysis or common factor analysis. The rationale for the latent root criterion is that any individual factor should account for the variance of at least a single variable if it is to be retained for interpretation. Each variable contributes a value of 1 to the total eigenvalue. Thus, only the factors having latent roots or eigenvalues greater than 1 are considered significant; all factors with latent roots less than 1 are considered insignificant and are disregarded. A Priori Criterion This can be useful when the researcher already knows how many factors to extract before undertaking the factor analysis. The researcher simply instructs the computer to stop the analysis when the desired number of factors has been extracted. This approach is useful when testing a theory or hypothesis about the number of factors to be extracted. It also can be justified in attempting to replicate another researcher's work and extract the same number of factors that was previously found. Percentage of Variance Criterion The percentage of variance criterion is an approach based on achieving a specified cumulative percentage of total variance extracted by successive factors. The purpose is to ensure practical significance for the derived factors by ensuring that they explain at least a specified amount of variance. No absolute threshold has been adopted for all applications. Scree Test Criterion With the component analysis factor model, the factors extracted contain both common and unique variance. Although all factors contain at least some unique variance, the proportion of unique variance is substantially higher in later than in earlier factors. The scree test is used to identify the optimum number of factors accounts for 60 percent of the total variance (and in some instances even less) as satisfactory. that can be extracted before the amount of unique variance begins to dominate the common variance structure. The scree test is derived by plotting the latent roots against the number of factors in their order of extraction, and the shape of the resulting curve is used to evaluate the cutoff point. The figure below shows an example. Starting with the first factor, the plot slopes steeply downward initially and then slowly becomes an approximately horizontal line. The point at which the curve first begins to straighten out is considered to indicate the maximum number of factors to extract. In the present case, the first 10 factors would qualify. Beyond 10, too large a proportion of unique variance would be included; thus these factors would not be acceptable.

Scree Plot
6

Eigenvalue

0 1 2 3 4 5 6 7 8 9 10

Component Number

Scree Plot

Concluding points on selection criterion Firstly In practice, most researchers seldom use a single criterion in determining how many factors to extract. Secondly some words of caution about selecting the final set of factors. If too few factors are used, then the correct structure is not revealed, and important dimensions may be omitted. If too many factors are retained, then the interpretation becomes more difficult when the results are rotated.

3.5

Interpreting the Factors

Three steps are involved in the interpretation of the factors and the selection of the final factor solution (1) The initial unrotated factor matrix is computed to assist in obtaining a preliminary indication of the number of factors to extract. The factor matrix contains factor loadings (see discussion in the next paragraph) for each variable on each factor. In computing the unrotated factor matrix, the researcher is simply interested in the best linear combination of variables-best in the sense that the particular combination of original variables accounts for more of the variance in the data as a whole than any other linear combination of variables. Therefore, the first factor may be viewed a s the single best summary of linear relationships exhibited in the data. The second factor is defined as the second-best linear combination of the variables, subject to the constraint that it is orthogonal to the first factor. To be orthogonal to the first factor, the second factor must be derived from the variance remaining after the first factor has been extracted. Thus the second factor may be defined as the linear combination of variables that accounts for the most residual variance after the effect of the first factor has been removed from the data. Subsequent factors are defined similarly, until all the variance in the data is exhausted. (2) The second step employs a rotational method to achieve simpler and theoretically more meaningful factor solutions. In most cases rotation of the factors improves the interpretation by reducing some of the ambiguities that often accompany initial unrotated factor solutions. These achieve the objective of data reduction, in most instances will not provide information that offers the most adequate interpretation of the variables under examination. Factor loading is the means of interpreting the role each variable plays in defining each factor. Factor loadings are the correlation of each variable and the factor. Loadings indicate the degree of correspondence between the variable and the factor, with higher loadings making the variable representative of the factor. The unrotated factor solution may not provide a meaningful pattern of variable loadings. If the unrotated factors are expected to be meaningful, the user may specify that no rotation be performed. Generally, rotation will be desirable because it simplifies the factor structure, and it is usually difficult to determine whether unrotated factors will be meaningful. (3) The researcher assesses the need to respecify the factor model owing to (a) the deletion of a variable(s) from the analysis, (b) the desire to employ a different rotational method for interpretation, (c) the need to extract a different number of factors, or (d) the desire to change from one extraction method to another.

3.5.1

Factor Rotation

An important tool in interpreting factors is factor rotation. The term rotation means exactly what it implies. Specifically, the reference axes of the factors are turned about the origin until some other position has been reached. As indicated earlier, unrotated factor solutions extract factors in the order of their importance. The first factor tends to be a general factor with almost every variable loading significantly, and it accounts for the largest amount of variance. The second and subsequent factors are then based on

the residual amount of variance. Each accounts for successively smaller portions of variance. The ultimate effect of rotating the factor matrix is to redistribute the variance from earlier factors to later ones to achieve a simpler, more meaningful factor pattern. The simplest case of rotation is an orthogonal rotation, in which the axes are maintained at 90 degrees. It is also possible to rotate the axes and not retain the 90degree angle between the reference axes. When not constrained to being orthogonal, the rotational procedure is called an oblique rotation. Orthogonal factor rotations are demonstrated in Figure one in which five variables are depicted in a two-dimensional factor diagram, illustrates factor rotation. The vertical axis represents unrotated factor II, and the horizontal axis represents unrotated factor I. The axes are labeled with 0 at the origin and extend outward to +1.0 or a -1.0. The numbers on the axes represent the factor loadings. The five variables are labeled V1, V2, V3, V4, and V 5 . The factor loading for variable 2 (V2) on unrotated factor II is determined by drawing a dashed line horizontally from the data point to the vertical axis for factor II. Similarly, a vertical line is drawn from variable 2 to the horizontal axis of unrotated factor I to determine the loading of variable 2 on factor I. On the unrotated first factor, all the variables load fairly high. On the unrotated second factor, variables 1 and 2 are very high in the positive direction. Variable 5 is moderately high in the negative direction, and variables 3 and 4 have considerably lower loadings in the negative direction. Inspection of Figure one, shows that there are two clusters of variables. Variables 1 and 2 go together, as do variables 3, 4, and 5. However, such patterning of variables is not so obvious from the unrotated factor loadings. By rotating the original axes clockwise, as indicated in Figure 3.7, we obtain a completely different factor loading pattern. Note that in rotating the factors, the axes are maintained at 90 degrees. This procedure signifies that the factors are mathematically independent and that the rotation has been orthogonal. After rotating the factor axes, variables 3, 4, and 5 load very high on factor I, and variables 1 and 2 load very high on factor II. Thus the clustering or patterning of these variables into two groups is more obvious after the rotation than before, even though the relative position or configuration of the variables remains unchanged.

Figure One Orthogonal Factor Rotation (Source Hare 1998)

10

Comparison Between Rotated and Unrotated Factor Loadings


Variables I Unrotated Factor Loadings II .80 .70 -.25 -.30 -.50 I Rotated Factor Loadings II .94 .90 .24 .15 -.13

VI V2 V3 V4 V5

.50 .60 .90 .80 .60

.03 .16 .95 .84 .76

The same general principles of orthogonal rotations pertain to oblique rotations. The oblique rotational method is more flexible because the factor axes need not be orthogonal. The major option available is to choose an orthogonal or oblique rotation method. The ultimate goal of any rotation is to obtain some theoretically meaningful factors and, if possible, the simplest factor structure. Orthogonal rotational approaches are more widely used because all computer packages with factor analysis contain orthogonal rotation options, whereas the oblique methods are not as widespread and are the subject to considerable controversy. Orthogonal Rotation Methods In practice, the objective of all methods of rotation is to simplify the rows and columns of the factor matrix to facilitate interpretation. In a factor matrix, columns represent factors, with each row corresponding to a variable's loading across the factors. By simplifying the rows, we mean making as many values in each row as close to zero as possible (i.e., maximizing a variable's loading on a single factor). By simplifying the columns, we mean making as many values in each column as close to zero as possible (i.e., making the number of "high" loadings as few as possible). Three major orthogonal approaches have been developed: QUARTIMAX focuses on rotating the initial factor so that a variable loads high on one factor and as low as possible on all other factors. In these rotations, many variables can load high or near on the same factor because the technique centers on simplifying the rows. VARIMAX this method maximizes the sum of variances of required loadings of the factor matrix centers on simplifying the columns of the factor matrix. EQUIMAX approach is a compromise between the QUARTIMAX and VARIMAX approaches and is used infrequently. Oblique Rotation Methods Oblique rotations are similar to orthogonal rotations, except that oblique rotations allow correlated factors instead of maintaining independence between the rotated factors. The objectives are comparable to the orthogonal methods, with the added feature of correlated factors. With the possibility of correlated factors, the factor researcher must take additional care to validate obliquely rotated factors. SPSS provides OBLIMIN for oblique rotation Selection of Rotational Method VARIMAX is the most popular Orthogonal method The choice of an orthogonal or oblique rotation should be made on the basis of the particular needs of a given research problem. If the goal of the research is to reduce the number of original variables, regardless of how meaningful the resulting factors may be, the appropriate solution would be an orthogonal one. Also, if the researcher wants to reduce a larger number of variables to a smaller set of uncorrelated variables

11

for subsequent use in regression or other prediction techniques, an orthogonal solution is the best. However, if the ultimate goal of the factor analysis is to obtain several theoretically meaningful factors or constructs, an oblique solution is appropriate because, in reality, very few factors can be expected to be uncorrelated, as in an orthogonal rotation.

3.5.2

Criteria for the Significance of Factor Loadings

In interpreting factors, a decision must be made regarding which factor loadings are worth considering. The following discussion details issues regarding practical and statistical significance, as well as the number of variables, that affect the interpretation of factor loadings. Ensuring Practical Significance The first suggestion is not based on any mathematical proposition but relates more to practical significance. It is a rule of thumb used frequently as a means of making a preliminary examination of the factor matrix. In short, factor loadings greater than .30 are considered to meet the minimal level; loadings of .40 are considered more important; and if the loadings are .50 or greater, they are considered practically significant. Thus the larger the absolute size of the factor loading, the more important the loading in interpreting the factor matrix. Because factor loading is the correlation of the variable and the factor, the squared loading is the amount of the variable's total variance accounted for by the factor. Thus, a .30 loading translates to approximately 10 percent explanation, and a .50 loading denotes that 25 percent of the variance is accounted for by the factor. The loading must exceed .70 for the factor to account for 50 percent of the variance. The researcher should realize that extremely high loadings (.80 and above) are not typical and that the practical significance of the loadings is an important criterion. These guidelines are applicable when the sample size is 100 or larger. The emphasis in this approach is practical, not statistical, significance. Assessing Statistical Significance A factor loading represents the correlation between an original variable and its factor. In determining a significance level for the interpretation of loadings, an approach similar to determining the statistical significance of correlation coefficients could be used. However, it has been shown has that factor loadings have substantially larger standard errors than typical correlations; thus, factor loadings should be evaluated at considerably stricter levels., Table 3.2 contains the sample sizes necessary for each factor loading value to be considered significant. For example, in a sample of 100 respondents, factor loadings of .55 and above are significant. However, in a sample of 50, a factor loading of .75 is required for significance. In comparison with the prior rule of thumb, which denoted all loadings of .30 as having practical significance, this approach would consider loadings of .30 significant only for sample sizes of 350 or greater. These are quite conservative guidelines when compared with the guidelines of the previous section or even the statistical levels associated with conventional correlation coefficients. Thus, these guidelines should be used as a starting point in factor loading interpretation, with lower loadings considered significant and added to the interpretation based on other considerations.

12

TABLE Guidelines for Identifying Significant Factor Loadings Based on Sample Size
Factor Loading .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 Sample Size Needed for Significance1 350 250 200 150 120 100 85 70 60 50

1 Significance is based on a .05 significance level (), a power level of 80 percent, and standard errors assumed to be twice those of conventional correlation coefficients. Source: Computations made with SOLO Power Analysis, BMDP Statistical Software, Inc., 1993.

Interpreting a Factor Matrix Interpreting the complex interrelationships represented in a factor matrix can seem complicated. By following the procedure outlined in the following paragraphs, however, one can considerably simplify the factor interpretation procedure.

Factor Matrix Factor 100m (sec) Long Jump (m) Shot Put (m) High Jump (m) 400 m (sec) 110 m hurdles (sec) Discus (m) Pole Vault (m) Javelin (m) 1500 m (sec) 1 -.781 .778 .736 .554 -.654 -.830 .677 .846 .620 -.180 2 .275 -.261 .578 -1.15E-02 .601 .187 .568 7.765E-02 .366 .722 3 .184 9.873E-02 .110 7.227E-02 5.298E-02 .311 4.312E-02 -4.33E-02 .276 -.415 4 .107 7.653E-02 -.186 .455 .191 -7.63E-02 -4.68E-02 2.299E-02 -1.42E-02 4.567E-02

Extraction Method: Principal Axis Factoring.

Examine the Factor Matrix of Loadings Each column of numbers in the factor matrix represents a separate factor. The columns of numbers are the factor loadings for each variable on each factor. For identification purposes, the computer printout identifies the factors from left to right by the numbers 1, 2, 3, 4, and so forth. It also identifies the variables by name from top to bottom. Identify the Highest Loading for Each Variable The interpretation should start with the first variable on the first factor and move horizontally from left to right, looking for the highest loading for that variable on any factor. When the highest loading (largest absolute factor loading) is identified, it should be underlined if significant. Attention then focuses on the second variable and, again moving from left to right horizontally, looking for the highest loading for that variable on any factor and underlining it. This procedure should be continued for each variable until all variables have been underlined once for

13

their highest loading on a factor. Recall that for sample sizes of less than 100, the lowest factor loading to be considered significant would in most instances be .30. The process of underlining only the single highest loading as significant for each variable is an ideal that should be sought but seldom can be achieved. When each variable has only one loading on one factor that is considered significant, the interpretation of the meaning of each factor is simplified considerably. In practice, however, many variables may have several moderate size loadings, all of which are significant, and the job of interpreting the factors is much more difficult. The difficulty arises because a variable with several significant loadings must be considered in interpreting (labeling) all the factors on which it has a significant loading. Most factor solutions do not result in a simple structure solution (a single high loading for each variable on only one factor). Thus, the researcher will, after underlining the highest loading for a variable, continue to evaluate the factor matrix by underlining all significant loadings for a variable on all the factors. Ultimately, the objective is to minimize the number of significant loadings on each row of the factor matrix (that is, make each variable associate with only one factor). A variable with several high loadings is a candidate for deletion. Label the Factors When a factor solution has been obtained in which all variables have a significant loading on a factor, the researcher attempts to assign some meaning to the pattern of factor loadings. Variables with higher loadings are considered more important and have greater influence on the name or label selected to represent a factor. Thus the researcher will examine all the underlined variables for a particular factor and, placing greater emphasis on those variables with higher loadings, will attempt to assign a name or label to a factor that accurately reflects the variables loading on that factor. The signs are interpreted just as with any other correlation coefficients. On each factor, like signs mean the variables are positively related, and opposite signs mean the variables are negatively related. In orthogonal solutions the factors are independent of one another. Therefore, the signs for factor loading relate only to the factor on which they appear, not to other factors in the solution. This label is not derived or assigned by the factor analysis computer program; rather, the label is intuitively developed by the researcher based on its appropriateness for representing the underlying dimensions of a particular factor. This procedure is followed for each extracted factor. The final result will be a name or label that represents each of the derived factors as accurately as possible.

3.6

Validation of Factor Analysis

The issue of generalizability is critical for each of the multivariate methods, but it is especially relevant for the interdependence methods because they describe a data structure that should be representative of the population as well. The most direct method of validating the results is to move to a confirmatory perspective and assess the replicability of the results, either with a split sample, if sample size permits, in the original data set or with a separate sample.

3.7

Factor Scores

Depending upon the objectives for applying factor analysis, the researcher may stop with factor interpretation or further engage in one of the methods for data reduction. If the objective is simply to identify logical combinations of variables and better understand the interrelationships among variables, then factor interpretation will suffice. This provides an empirical basis for judging the structure of the variables and the impact of this structure when interpreting the results from other multivariate techniques. If the objective, however, is to identify appropriate variables for subsequent application to other statistical techniques then an option is to create a smaller set of variables to replace the original set is the computation of factor scores. Factor scores are also composite measures of each factor computed for each subject. Conceptually the factor

14

score represents the degree to which each individual scores high on the group of items that have high loadings on a factor. Thus, higher values on the variables with high loadings on a factor will result in a higher factor score. Most statistical programs, including SPSS can easily compute factor scores for each respondent. By selecting the factor score option, these scores are saved for use in subsequent analyses. The one disadvantage of factor scores is that they are not easily replicated across studies because they are based on the factor matrix, which is derived separately in each study.

References
Multivariate Data Analysis Joseph F. Hair, Ronald L. Tatham, Rolph Anderson Prentice-Hall; ISBN: 0130329290

15

Glossary of Terms used in Factor Analysis


Anti-image correlation matrix Matrix of the partial correlations among variables

after factor analysis, representing the degree to which the factors "explain" each other in the results. The diagonal contains the measures of sampling adequacy for each variable, and the off-diagonal values are partial correlations among variables. Bartlett test of sphericity Statistical test for the overall significance of all correlations within a correlation matrix. Common factor analysis Factor model in which the factors are based on a reduced correlation matrix. That is, communalities are inserted in the diagonal of the correlation matrix, and the extracted factors are based only on the common variance, with specific and error variances excluded. Common variance Variance shared with other variables in the factor analysis. Communality Total amount of variance an original variable shares with all other variables included in the analysis. Component analysis Factor model in which the factors are based on the total variance. With component analysis, unities (1s) are used in the diagonal of the correlation matrix; this procedure computationally implies that all the variance is common or shared. Composite measure See summated scale. Conceptual definition Specification of the theoretical basis for a concept that is represented by a factor. Content validity Assessment of the degree of correspondence between the items selected to constitute a summated scale and its conceptual definition. Correlation matrix Table showing the intercorrelations among all variables. Cronbach's alpha Measure of reliability that ranges from 0 to 1, with values of .60 to .70 often judged to be the lower limit of acceptability. Dummy variable Binary metric variable used to represent a single category of a nonmetric variable. Eigenvalue Column sum of squared loadings for a factor; also referred to as the latent root. It represents the amount of variance accounted for by a factor. Error variance Variance of a variable due to errors in data collection or measurement. Factor Linear combination (variate) of the original variables. Factors also represent the underlying dimensions (constructs) that summarize or account for the original set of observed variables.

16

Factor indeterminancy Characteristic of common factor analysis such that several different factor scores can be calculated for a respondent, each fitting the estimated factor model. This means the factor scores are not unique for each individual. Factor loadings Correlation between the original variables and the factors, and the key to understanding the nature of a particular factor. Squared factor loadings indicate what percentage of the variance in an original variable is explained by a factor. Factor matrix A table displaying the factor loadings of all variables on each factor. Factor pattern matrix One of two factor matrices found in an oblique rotation that is most comparable to the factor matrix in an orthogonal rotation. Factor rotation Process of manipulation or adjusting the factor axes to achieve a simpler and pragmatically more meaningful factor solution. Factor score A composite measure created for each observation on each factor extracted in the factor analysis. The factor weights are used in conjunction with the original variable values to calculate each observation's score. The factor score then can be used to represent the factor(s) in subsequent analyses. Factor scores are standardized to have a mean of 0 and a standard deviation of 1. Factor structure matrix A factor matrix found in an oblique rotation that represents the simple correlations between variables and factors, incorporating the unique variance and the correlations between factors. Most researchers prefer to use the factor pattern matrix when interpreting an oblique solution. Indicator Single variable used in conjunction with one or more other variables to form a composite measure. Latent root See eigenvalue. Measure of sampling adequacy (MSA) Measure calculated both for the entire correlation matrix and each individual variable evaluating the appropriateness of applying factor analysis. Values above .50 for either the entire matrix or an individual variable indicate appropriateness. Measurement error Inaccuracies in measuring the "true" variable values due to the fallibility of the measurement instrument (i.e., inappropriate response scales), data entry errors, or respondent errors. Multicollinearity Extent to which a variable can be explained by the other variables in the analysis. Oblique factor rotation Factor rotation computed so that the extracted factors are correlated. Rather than arbitrarily constraining the factor rotation to an orthogonal solution, the oblique rotation identifies the extent to which each of the factors are correlated.

17

Orthogonal Mathematical independence (no correlation) of factor axes to each other (i.e., at right angles, or 90 degrees). Orthogonal factor rotation Factor rotation in which the factors are extracted so that their axes are maintained at 90 degrees. Each factor is independent of, or orthogonal to, all other factors. The correlation between the factors is determined to be 0. Q factor analysis R factor analysis Forms groups of respondents or cases based on their similarity on a Analyzes relationships among variables to identify groups of set of characteristics (also see the discussion of cluster analysis in chapter 9). variables forming latent dimensions (factors). Reliability Extent to which a variable or set of variables is consistent in what it is intended to measure. If multiple measurements are taken, reliable measures will all be very consistent in their values. It differs from validity in that it does not relate to what should be measured, but instead to how it is measured. Reverse scoring Process of reversing the scores of a variable, while retaining the distributional characteristics, to change the relationships (correlations) between two variables. Used in summated scale construction to avoid a "canceling out" between variables with positive and negative factor loadings on the same factor. Specific variance Variance of each variable unique to that variable and not explained or associated with other variables in the factor analysis. Summated scales Method of combining several variables that measure the same concept into a single variable in an attempt to increase the reliability of the measurement. In most instances, the separate variables are summed and then their total or average score is used in the analysis. Surrogate variable Selection of a single variable with the highest factor loading to represent a factor in the data reduction stage instead of using a summated scale or factor score. Trace Represents the total amount of variance on which the factor solution is based. The trace is equal to the number of variables, based on the assumption that the variance in each variable is equal to l. Validity Extent to which a measure or set of measures correctly represents the concept of study-the degree to which it is free from any systematic or nonrandom error. Validity is concerned with how well the concept is defined by the measure(s), whereas reliability relates to the consistency of the measure(s). VARIMAX One of the most popular orthogonal factor rotation methods.

18

19

You might also like