You are on page 1of 30

Business Research Methods

Reliability & Reliability Analysis

Introduction
The scores are the subject's responses to items on an instrument (e.g., a mail questionnaire). Observed scores may be broken down into two components: the true score plus the error score.
The error score, in turn, can be broken down into systematic error (non-random error reflects some systematic bias, as due, for instance, to the methodology used -- thus also called method error) and random error (due to random traits of the subjects.
This is also called trait error.

The greater the error component in relation to the true score component, the lower the reliability, which is the ratio of the true score to the total (true + error) score.

Reliability is used to measure the extent to which an item, scale, or instrument will yield the same score when administered in different times, locations, or populations, when the two administrations do not differ in relevant variables.
Reliability coefficients are forms of correlation coefficients. The forms of reliability below measure different dimensions of reliability and thus any or all might be used in a particular research project.

So
Multiple-item or multiple-observation scales are often developed to assess characteristics about individuals. One important element in deciding the value of such a scale is its reliability and validity. A number of methods can establish a scales reliability including test-retest, equivalent-forms, and internal consistency estimates of reliability.
With test-retest reliability, individuals are administered a measure on two occasions with some time interval between them. Equivalent-forms estimates are based on a similar methodology, except an equivalent form is administered on the second occasion rather than the same measure.
For either of these methods, the easiest way to compute a reliability coefficient is through the use of the Bivariate Correlation procedures. In these cases, the reliability estimate is the correlation between the scores obtained on the two occasions.

With an internal consistency estimate of reliability, individuals are measured on a single occasion using a scale with multiple parts.
The parts may be items on a paper-and-pencil measure, responses to questions from structured interview, multiple observations on an observational measure, or some other units of a measure that are summed to yield scale scores. For ease of discussion, we will frequently refer to items rather than describing the analyses in terms of all types of parts. The reliability procedure computes estimates of reliability based on the consistency among the items (parts). Here, well look at two internal consistency estimates,
split-half and coefficient alpha.

Split-half estimates and coefficient alpha may be used to estimate the reliability of the total score if a scale has multiple items and the multiple items are summed to obtain a total score.
If a measure consists of multiple scales, separate internal consistency estimates should be computed for each scale score. In some instances, you may need to transform one or more items (or whatever the parts are) on a measure prior to conducting the analyses so that the total score computed by the Reliability procedure is meaningful.

Well look at two types of applications, which vary depending on whether or how items are transformed:
No transformation of items. If the responses to these items are in the same metric, and if high scores on them represent high scores on the underlying construct, no transformations are required.
The Reliability Analysis procedure uses the untransformed item scores.

Reverse-scoring of some item scores. This is the case when all items on a measure use the same response scale, but high item scores represent either high or low scores on the underlying construct.
Low item scores that represent high scores on the construct need to be reverse-scaled. Such items are commonly found on attitude scales.

Applying the Reliability Procedure


No Transformation of Items
Sarah is interested in whether a measure she developed has good reliability. She has 83 students complete the 20-item Emotional Expressiveness Measure (EEM). Ten of the items are summed to yield a Negative Emotions scale, and the other 10 items are summed to produce a Positive Emotions scale. Sarahs SPSS data file contains 83 cases and 20 items as variables. These 20 items are the variables analyzed using the Reliability program. She computes an internal consistency estimate of reliability (split half or coefficient alpha) for the Negative Emotions scale and another internal consistency estimate for the Positive Emotions scale.

Reverse-Scoring of Some Items


Janet has developed a 10-item measure called the Emotional Control Scale. She asks 50 individuals to respond to these items on a 0 to 4 scale, with 0 being completely disagree and 4 being completely agree. Half the items are phrased so that agreement indicates a desire to keep emotions under control (under control items), while the other half are written so that agreement indicates a desire to express emotions openly (expression items). Janets SPSS data file contains 50 cases and 10 item scores for each. The expression items need to be reverse-scaled so that a response of 0 is transformed to a 4, a 1 becomes a 3, a 2 stays a 2, a 3 becomes a 1, and a 4 is transformed to a 0. The scores used by the Reliability Analysis procedure contain the scores for the five under-control items and the transformed item scores for the five expression items. She computes an internal consistency estimate of reliability for the 10-item Emotional Control scale.

Understanding Internal Consistency Estimates


The coefficients for split-half reliability and alpha assess reliability based on different types of consistency. The split-half coefficient is obtained by computing scores for two halves of a scale.
With SPSS, scores are computed for the first and second halves of the scale. The value of the reliability coefficient is a function of the consistency between the two halves.

In contrast, consistency with coefficient alpha is assessed among items.


The greater the consistency in responses among items, the higher coefficient alpha will be. If items on a scale are ambiguous and require individuals to guess a lot or make unreliable responses, there will be a lack of consistency between halves or among items, and internal consistency estimates of reliability will be small.

Both the split-half coefficient and coefficient alpha should range in value between 0 and 1.
Values close to 0 indicate that a measure has poor reliability, while values close to 1 suggest that the measure is reliable.

Assumptions Underlying Internal Consistency Reliability Procedures


Assumption 1: The parts of the measure must be equivalent
For split-half coefficients, the partstwo halves of the measuremust be equivalent.
With equivalent halves, individuals who score high on one half of the scale should score high on the other half of the scale, and individuals who score low on one half of the scale should also score low on the other half of the scale if the halves of the scale contain no measurement error. You can add the odd-numbered items together to create one half and add the even-numbered items to create the other half. You can then use these two halves to compute split-half coefficients.

For coefficient alpha, every item is assumed to be equivalent to every other item.
All items should measure the same underlying dimension. Differences in responses should occur only as a function of measurement error. It is unlikely that this assumption is ever met completely, although with some measures it may be met approximately. To the extent that the equivalency assumption is violated, internal consistency estimates tend to underestimate reliability.

Assumption 2: Errors in measurement between parts are unrelated


A respondent's ability to guess well on one item or one part of a test should not influence how well he or she guesses on another part. The unrelated-errors assumption can be violated a number of ways.
Internal consistency estimates (split half or coefficient alpha) should not be used if respondents' scores depend on whether they can complete the scale in an allotted time. For example, coefficient alpha should not be used to assess the reliability of a 100-item math test to be completed in 10 minutes because the scores are partially a function of completing the test. Second, sets of items on a scale are sometimes linked together. Neither coefficient alpha nor split half measures should be used as a reliability estimate for these scales since items within a set are likely to have correlated errors and yield overestimates of reliability.

Assumption 3: An item or half test score is a sum of its true and its error scores
This assumption is necessary for an internal consistency estimate to reflect accurately a scale's reliability. It is difficult to know whether this assumption has been violated or not.

The example Data Set


The data set used here contains the results of a survey of 50 respondents. Half the items are phrased so that agreement indicates a desire to keep emotions under control (under control items), and the other half are written so that agreement indicates a desire to express emotions openly (expression items). Variable Definition
Item 1 Item 2 Item 3 Item 4 tem 5 Item 6 I keep my emotions under control. Under stress I remain calm. I like to let people know how I am feeling. I express my emotions openly. It is a sign of weakness to show how one feels. Well-adjusted individuals are ones who are confident enough to express their true emotions. ltem7 Emotions get in the way of clear thinking. Item 8 I let people see my emotions so that they know who I am. Item 9 If I am angry with people, I tell them in no uncertain terms that I am unhappy with them. Item 10 I try to get along with people and not create a big fuss.

The Research Question


The research question can be phrased, "How reliable is our 10-item measure of emotional control?

Conducting a Reliability Analysis


Before conducting any internal consistency estimates of reliability, we must determine if all items use the same metric and whether any items have to be reverse-scaled. All items share the same metric since the response scale for all items is 0 to 4 (completely disagree to completely agree). However, the five items in which high scores indicate a willingness to express emotion must be reverse-scaled so that high scores on the total scale reflect a high level of emotional control. These items are 3,4, 6, 8, and 9. You may want to peek at how to reverse-scale items for a Likert scale. Here, I reverse-scale items 3, 4, 6, 8, and 9 before going through the steps to compute coefficient alpha and split-half internal consistency estimates.

Computing Coefficient Alpha


(1) Click Scale, then click Reliability Analysis. You'll see the Reliability Analysis dialog box. (2) Hold down the Shift key, and click item1, and then click item9 to select all the items. (3) Click to move them to the Items box. (4) Click Statistics. You'll see the Reliability Analysis: Statistics dialog box. (5) Click Item, click Scale in the Descriptives for area, then click Correlations in the Inter-Item area. (6) Click Continue. In the Reliability Analysis dialog box, make sure that Alpha is chosen in the box labeled Model. (7) Click OK.

Selected SPSS Output for Coefficient Alpha


As with any analysis, the descriptive statistics need to be checked to confirm that the data Have no major anomalies.
For example, are all the means within the range of possible values (0 to 4)? Are there any unusually large values of variances that might indicate that a value has been mistyped? In general, are the correlations among the variables positive? If not, should you have reversed-scaled that item? Once it appears that data have been entered and scaled appropriately, the reliability estimate of alpha can be interpreted.

The output reports two alphas, alpha and standardized item alpha.
In this example, we are interested in the alpha. The only time that we would be interested in the standardized alpha is if the scale score is computed by summing item scores that have been standardized to have a uniform mean and standard deviation (such as zscores).

Computing Split-Half Coefficient Estimates


SPSS computes a split-half coefficient by evaluating the consistency in responding between the first half and the second half of a measure. It is important to carefully choose which items to include in each half of a measure so that the two halves are as equivalent as possible. Different item splits may produce dramatically different results. The best split of the items is the one that produces equivalent halves (see Assumption 1). For our example, we chose to split the test into two halves in the following fashion:
Half 1: Item 1, Item 3, Item 5, Item 8, and Item 10 Half 2: Item 2, Item 4, Item 6, Item 7, and Item 9

We chose this split to take into account the ordering of items (with one exception, no two adjacent items are included on the same half) as well as the two type of items, under control and expression items (2 items of one type and 3 of the other on a half). To compute a split-half coefficient, follow these steps: (1) Click Statistics, click Scale, then click Reliability Analysis. (2) Click Reset to clear the dialog box. (3) Hold down the cntl key, and click the variables that are in the first half: item!, item3, item5, item8, and item10. (4) Click to move them to the Items box. (5) Hold down the cntl key, and click on the variables that are in the second half: item2, item4, item6, item7, and item9. (6) Click ~ to move them to the Items box in the Reliability Analysis dialog box. (7) Click Statistics. (8) Click Item and Scale in the Descriptives for area. (9) Click Correlations in the Inter-Item area. (10) Click Continue. (11) Click Split-half in the drop-down menu in the Reliability Analysis dialog box. (12) Click OK.

Selected SPSS Output for Split-Half Reliability


The descriptive statistics need to be checked to confirm that the data have no anomalies as described in our earlier discussion of coefficient alpha. The descriptive statistics associated with the split-half coefficient are identical to the descriptives for coefficient alpha. The most frequently reported split-half reliability estimate is the one based on the correlation between forms.
The correlation between forms is .78, but it is not the reliability estimate. At best, it is the reliability of half the measure (because it is the correlation between two half-measures).

The Spearman-Brown corrected correlation, r = .87, is the reliability estimate.

If there were an odd number of items, a split would produce an unequal number of items in each half. Under these conditions, the value for the Unequal-length Spearman-Brown should be reported because it will likely differ from the Equal-length Spearman-Brown value.
APA-Style Results Section

You might also like