You are on page 1of 12

Scaling

Scaling is the branch of measurement that involves the construction of an instrument that associates qualitative constructs with quantitative metric units. Scaling evolved out of efforts in psychology and education to measure "unmeasurable" constructs like authoritarianism and self esteem. In many ways, scaling remains one of the most arcane and misunderstood aspects of social research measurement. And, it attempts to do one of the most difficult of research tasks -measure abstract concepts. Scaling is the assignment of objects to numbers according to a rule. In scaling, the objects are text statements, usually statements of attitude or belief. It's how we get numbers that can be meaningfully assigned to objects -- it's a set of procedures. Scales are generally divided into two broad categories: unidimensional and multidimensional. Ex. Unidimensional Satisfaction, Motivation, Attitude (measured as high-Low, PositiveNegative, Yes-No) Multidimensional Intelligence (9 factors), Personality The unidimensional scaling methods were developed in the first half of the twentieth century and are generally named after their inventor. Purposes of Scaling Why do we do scaling? Why not just create text statements or questions and use response formats to collect the answers? First, sometimes we do scaling to test a hypothesis. We might want to know whether the construct or concept is a single dimensional or multidimensional one (more about dimensionality later). Sometimes, we do scaling as part of exploratory research. We want to know what dimensions underlie a set of ratings. For instance, if you create a set of questions, you can use scaling to determine how well they "hang together" and whether they measure one concept or multiple concepts. But probably the most common reason for doing scaling is for scoring purposes. When a participant gives their responses to a set of items, we often would like to assign a single number that represents that's person's overall attitude or belief.

People often confuse the idea of a scale and a response scale. A response scale is the way you collect responses from people on an instrument. You might use a dichotomous response scale like Agree/Disagree, True/False, or Yes/No. Or, you might use an interval response scale like a 1-to-5 or 1-to-7 rating. But, if all you are doing is attaching a response scale to an object or statement, you can't call that scaling. As you will see, scaling involves procedures that you do independent of the respondent so that you can come up with a numerical value for the object. In true scaling research, you use a scaling procedure to develop your instrument (scale) and you also use a response scale to collect the responses from participants. But just assigning a 1-to-5 response scale for an item is not scaling! The differences are illustrated in the table below.

Scale

Response Scale

results from a process

is used to collect the response for an item

each item on scale has a scale value

item not associated with a scale value

refers to a set of items

used for a single item

Comparative scaling techniques

Comparative scales involve the direct comparison of stimulus objects. Comparative scale data must be interpreted in relative terms and have only ordinal or rank order properties.

Pairwise comparison scale a respondent is presented with two items at a time and asked to select one (example : Do you prefer Pepsi or Coke?). This is an ordinal level technique when a measurement model is not applied. Krus and Kennedy (1977) elaborated the paired comparison scaling within their domain-referenced model. The BradleyTerryLuce (BTL) model (Bradley and Terry, 1952; Luce, 1959) can be applied in order to derive measurements provided the data derived from paired comparisons possess an appropriate structure. Thurstone's Law of comparative judgment can also be applied in such contexts.

Rasch model scaling respondents interact with items and comparisons are inferred between items from the responses to obtain scale values. Respondents are subsequently also scaled based on their responses to items given the item scale values. The Rasch model has a close relation to the BTL model. Rank-ordering a respondent is presented with several items simultaneously and asked to rank them (example : Rate the following advertisements from 1 to 10.). This is an ordinal level technique. Bogardus social distance scale measures the degree to which a person is willing to associate with a class or type of people. It asks how willing the respondent is to make various associations. The results are reduced to a single score on a scale. There are also noncomparative versions of this scale. Q-Sort Up to 140 items are sorted into groups based a rank-order procedure. Guttman scale This is a procedure to determine whether a set of items can be rank-ordered on a unidimensional scale. It utilizes the intensity structure among several indicators of a given variable. Statements are listed in order of importance. The rating is scaled by summing all responses until the first negative response in the list. The Guttman scale is related to Rasch measurement; specifically, Rasch models bring the Guttman approach within a probabilistic framework. Constant sum scale a respondent is given a constant sum of money, script, credits, or points and asked to allocate these to various items (example : If you had 100 Yen to spend on food products, how much would you spend on product A, on product B, on product C, etc.). This is an ordinal level technique.

Magnitude estimation scale In a psychophysics procedure invented by S. S. Stevens people simply assign numbers to the dimension of judgment. The geometric mean of those numbers usually produces a power law with a characteristic exponent. In crossmodality matching instead of assigning numbers, people manipulate another dimension, such as loudness or brightness to match the items. Typically the exponent of the psychometric function can be predicted from the magnitude estimation exponents of each dimension.

Non-comparative scaling techniques In Noncomparative scales, each object is scaled independently of the others in the stimulus set. The resulting data are generally assumed to be interval or ratio scaled.

Continuous rating scale (also called the graphic rating scale) respondents rate items by placing a mark on a line. The line is usually labeled at each end. There are sometimes a

series of numbers, called scale points, (say, from zero to 100) under the line. Scoring and codification is difficult.

Likert scale Respondents are asked to indicate the amount of agreement or disagreement (from strongly agree to strongly disagree) on a five- to nine-point scale. The same format is used for multiple questions. This categorical scaling procedure can easily be extended to a magnitude estimation procedure that uses the full scale of numbers rather than verbal categories. Phrase completion scales Respondents are asked to complete a phrase on an 11-point response scale in which 0 represents the absence of the theoretical construct and 10 represents the theorized maximum amount of the construct being measured. The same basic format is used for multiple questions. Semantic differential scale Respondents are asked to rate on a 7 point scale an item on various attributes. Each attribute requires a scale with bipolar terminal labels. Stapel scale This is a unipolar ten-point rating scale. It ranges from +5 to 5 and has no neutral zero point.

Thurstone scale This is a scaling technique that incorporates the intensity structure among indicators. Mathematically derived scale Researchers infer respondents evaluations mathematically. Two examples are multi dimensional scaling

The Major Unidimensional Scale Types There are three major types of unidimensional scaling methods. They are similar in that they each measure the concept of interest on a number line. But they differ considerably in how they arrive at scale values for different items. The three methods are Thurstone or Equal-Appearing Interval Scaling, Likert or "Summative" Scaling, and Guttman or "Cumulative" Scaling. A. Thurston Scale

It was developed by Louis Leon Thurstone in 1928, as a means of measuring attitudes towards religion. It is made up of statements about a particular issue, and each statement has a numerical value indicating how favorable or unfavorable it is judged to be. People check each of the statements to which they agree, and a mean score is computed, indicating their attitude.

Steps: 1. Develop the focus of measurement 2. Generate scale items 3. Ask the judges to rate them on the scale of 1 to 11, in terms of how much each statement indicates a favorable attitude towards favourable. 4. Computing Scale Score Values for Each Item. For each statement, you need to compute the Median and the Inter quartile Range. 5. Selecting the Final Scale Items. Select statements that are at equal intervals across the range of medians. Select one statement for each of the eleven median values. Within each value, you should try to select the statement that has the smallest Inter quartile Range. This is the statement with the least amount of variability across judges. Items with higher scale values should, in general, indicate a more favorable attitude towards the focus. 6. Administering the Scale. the focus -- 1- least favourable, 11- most

B.

Likert Scale

It was developed Rensis Likert. Here the respondents are asked to indicate a degree of agreement and disagreement with each of a series of statement. Each scale item has 5 response categories ranging from strongly agree and strongly disagree. Each statement is assigned a numerical score ranging from 1 to 5.

5 Strongly agree

4 Agree

3 Indifferent

2 Disagree

1 Strongly disagree

It can also be scaled as -2 to +2. -2 -1 0 1 2

Steps: 1. Defining the Focus of what it is you are trying to measure. You might operationalize the definition as an instruction to the people who are going to create or generate the initial set of candidate items for your scale. 2. Generating the Items. You have to create the set of potential scale items. These should be items that can be rated on a 1-to-5 or 1-to-7 Disagree-Agree response scale. 3. Rating the Items. have a group of judges rate the items. Usually you would use a 1-to-5 rating scale where: 1. = strongly unfavorable to the concept 2. = somewhat unfavorable to the concept 3. = undecided 4. = somewhat favorable to the concept 5. = strongly favorable to the concept In scaling methods, the judges are not telling you what they believe -- they are judging how favorable each item is with respect to the construct of interest. 4. Selecting the Items . a. You have to compute the intercorrelations (Item-Total correlation) between all pairs of items, based on the ratings of the judges. In making judgements about which items to retain for the final scale there are several analyses. Throw out any items that have a low correlation with the total (summed) score across all items b. For each item, get the average rating for the top quarter of judges and the bottom quarter. Then, do a t-test of the differences between the mean value for the item for the top and bottom quarter judges.Higher t-values mean that there is a greater difference between the highest and lowest judges. In more practical terms, items with higher t-values are better discriminators, so you want to keep these items 5. Administering the Scale. The final score for the respondent on the scale is the sum of their ratings for all of the items (this is why this is sometimes called a "summated" scale), considering positive & negative items for which you have to do reverse scoring.

After the questionnaire is completed, each item may be analyzed separately or in some cases item responses may be summed to create a score for a group of items. Hence, Likert scales are often called summative scales. Whether individual Likert items can be considered as interval-level data, or whether they should be considered merely ordered-categorical data is the subject of disagreement. Many regard such items only as ordinal data, because, especially when using only five levels, one cannot assume that respondents perceive all pairs of adjacent levels as equidistant. On the other hand, often (as in the example above) the wording of response levels clearly implies a symmetry of response levels about a middle category; at the very least, such an item would fall between ordinal- and interval-level measurement; to treat it as merely ordinal would lose information. Further, if the item is accompanied by a visual analog scale, where equal spacing of response levels is clearly indicated, the argument for treating it as interval-level data is even stronger. When treated as ordinal data, Likert responses can be collated into bar charts, central tendency summarised by the median or the mode (but not the mean), dispersion summarised by the range across quartiles (but not the standard deviation), or analyzed using nonparametric tests, e.g. chi-square test, MannWhitney test, Wilcoxon signed-rank test, or KruskalWallis test. Parametric analysis of ordinary averages of Likert scale data is also justifiable by the Central Limit Theorem, although some would disagree that ordinary averages should be used for Likert scale data. Responses to several Likert questions may be summed, providing that all questions use the same Likert scale and that the scale is a defendable approximation to an interval scale, in which case they may be treated as interval data measuring a latent variable. If the summed responses fulfill these assumptions, parametric statistical tests such as the analysis of variance can be applied. These can be applied only when more than 5 Likert questions are summed. Data from Likert scales are sometimes reduced to the nominal level by combining all agree and disagree responses into two categories of "accept" and "reject". The chisquare, Cochran Q, or McNemar test are common statistical procedures used after this transformation.

C.

Guttman Scale

It was developed by Louis Guttman. In Guttman scale, items are arranged in an order so that an individual who agrees with a particular item also agrees with items of lower rank-order. For example, a series of items could be (1) "I am willing to be near ice cream"; (2) "I am willing to smell ice cream"; (3) "I am willing to eat ice cream"; and (4) "I love to eat ice cream". Agreement with any one item implies agreement with the lower-order items The concept of Guttman scale applies to series of items in other kinds of tests that have binary outcomes. The Guttman scale is used mostly when researchers want to design short questionnaires with good discriminating ability. The Guttman model works best for constructs that are hierarchical and highly structured such as social distance, organizational hierarchies, and evolutionary stages. Steps: 1. Define the Focus. Variable should give answer in binary measures Mercy killing (right wrong) , Capital Punishment(Should be banned & not be banned) 2. Develop a large set of items that reflect the concept. 3. Rate the Items. Ask a group of judges to rate the items in terms of how the statements are related ,in terms of their favorableness / unfavorableness, to the concept. 4. Develop the Cumulative Scale.k a group of judges to The key to Guttman scaling is in the analysis. We construct a matrix or table that shows the responses of all the respondents on all of the items. We then sort this matrix so that respondents who agree with more statements are listed at the top and those agreeing with fewer are at the bottom. For respondents with the same number of agreements, we sort the statements from left to right from those that most agreed to those that fewest agreed to. Attach a scale value to each item by carrying out scalogram analysis. 5. Administering the Scale. Scoring will be sum of the scale values of all the items for which the respondent has said agree.

D.

Semantic Differential Scale

This is a 7 point scale where end points are associated with bipolar labels. Bipolar means Bi-polar means two opposite streams. Individual can score between 1 to 7 or 3 to 3. On the basis of these responses profiles are made. We can analyse for two or three products and by joining these profiles we get profile analysis. It could take any shape depending on the number of variables. Mean and median are used for comparison. This scale helps to determine overall similarities and differences among objects. When Semantic Differential Scale is used to develop an image profile, it provides a good basis for comparing images of two or more items. The big advantage of this scale is its simplicity, while producing results compared with those of the more complex scaling methods. The method is easy and fast to administer, but it is also sensitive to small differences in attitude, highly versatile, reliable and generally valid. 1 Unpleasant 2 Submissive 3 4 5 6 7 Pleasant Dominant

E.

Staple Scale

It was developed by Jan Stapel. This scale has some distinctive features:a. b. c. d. Each item has only one word/phrase indicating the dimension it represents. Each item has ten response categories. Each item has an even number of categories. The response categories have numerical labels but no verbal labels.

For example, in the following items, suppose for quality of ice cream, we ask respondents to rank from +5 to -5. Select a plus number for words which best describe the ice cream accurately. Select a minus number for words you think do not describe the ice cream quality accurately. Thus, we can select any number from +5,for words we think are very

accurate, to -5,for words we think are very inaccurate. This scale is usually presented vertically. This is a unipolar rating scale. +5 +4 +3 +2 +1 High Quality -1 -2 -3 -4 -5 F. The Q Sort technique The instrumental basis of Q methodology is the Q sort technique which conventionally involves the rank-ordering of a set of statements from agree to disagree It requires the participant to evaluate (or sort) a number of items along a continuum from, for example, very like me to very unlike me The respondent arranges the statements into a forced normal distribution of most to least agreement, yielding a model of subjective preferences within the given universe of discourse The data from Q methodology are literally what participants make of a pool of items germane to the topic of concern, when asked to rank them The Q sort is usually a self-directed process. To carry out a study there needs to be something for the participants to rank. This usually consists of between 10 and 100 items (the Q set). The activity of sorting them is known as Q sorting. Items are ordinarily provided on cards or on paper which the participants are asked to cut up themselves. The Q set consists

of a sample of items to be ranked by the research participants along a continuum, the poles of which are defined by the researcher in accordance with the demands of the research topic. It is used to discriminate among large number of objects quickly. It uses a rank order procedure and the objects are sorted into piles based on similarity with respect to some criteria. The number of objects to be sorted should be between 60-140 approximately. For example, here we are taking nine brands. On the basis of taste we classify the brands into tasty, moderate and non tasty. We can classify on the basis of price also-Low, medium, high. Then we can attain the perception of people that whether they prefer low priced brand, high or moderate. We can classify sixty brands or pile it into three piles. So the number of objects is to be placed in three piles-low, medium or high. Thus, the Q-sort technique is an attempt to classify subjects in terms of their similarity to attribute under study. It is used in psychology and other social sciences to study people's "subjectivity" -- that is, their viewpoint. Q was developed by psychologist William Stephenson. It has been used both in clinical settings for assessing patients, as well as in research settings to examine how people think about a topic. The methodology is particularly useful when researchers wish to understand and describe the variety of subjective viewpoints on an issue. The name "Q" comes from the form of factor analysis that is used to analyze the data. Normal factor analysis, called "R method," It involves finding correlations between variables (say, height and age) across a sample of subjects. Q, on the other hand, looks for correlations between subjects across a sample of variables. Q factor analysis reduces the many individual viewpoints of the subjects down to a few "factors," which represent shared ways of thinking. The data for Q factor analysis comes from a series of "Q sorts" performed by one or more subjects. A Q sort is a ranking of variablestypically presented as statements printed on small cardsaccording to some "condition of instruction." For example, in a Q study of people's views of a celebrity, a subject might be given statements like "He is a deeply religious man" and "He is a liar," and asked to sort them from "most like how I think about this celebrity" to "least like how I think about this celebrity." The use of ranking, rather than asking subjects to rate their agreement with statements individually, is meant to capture the idea that people think about ideas in relation to other ideas, rather than in isolation.

The sample of statements for a Q sort is drawn from a "concourse" -- the sum of all things people say or think about the issue being investigated. Since concourses do not have clear membership lists (as would be the case in the population of subjects), statements cannot be drawn randomly. Commonly Q methodologists use a structured sampling approach in order to ensure that they include the full breadth of the concourse. One salient difference between Q and other social science research methodologies, such as surveys, is that it typically uses many fewer subjects. This can be a strength, as Q is sometimes used with a single subject. In such cases, a person will rank the same set of statements under different conditions of instruction. For example, someone might be given a set of statements about personality traits and then asked to rank them according to how well they describe herself, her ideal self, her father, her mother, etc.

G. Constant Sum Scale 1. Respondents allocate a constant sum of units, such as 100 points to attributes of a product to reflect their importance. 2. If an attribute is unimportant, the respondent assigns it zero points. 3. If an attribute is twice as important as some other attribute, it receives twice as many points. 4. The sum of all the points is 100. Hence, the name of the scale.

You might also like