You are on page 1of 16

PSY 395 Lab #2

Norms and Reliability


Norms
Psychological tests are usually relative we
compare one score or groups of scores to another.
John is more agreeable than Sam
Females tend to have higher verbal ability
scores than do males.
Norm-based interpretation - when a persons test
score is interpreted by comparing it to the scores of a
specific group of people. Many different kinds of
groups can be used for this comparison (e.g. the U.S.
population, all high school seniors in Wyoming, an
MSU psychology class)
Norms are
Based on the means and standard deviations
(descriptive statistics) of the sample used to
represent the comparison group.
Expressed as raw scores, standard scores (e.g. zscores, T-scores), or percentiles.

PSY 395 Lab 2


Pg. 1

To calculate norms, we use standard scores which


transform a raw score into values that help to
compare peoples scores across different scales.
For instance, person As mechanical knowledge
test may have scores that range from 0-200, and
person B may have a similar test whose scores
range from 0-50. Standardizing helps to make
Person As and Person Bs scores comparable to
one another by putting them on the same type of
scale.
Examples of standard scores:
z-score Expresses the value of a raw score on the
standard normal distribution. Raw scores get
transformed into z-scores using a formula.

The mean score always has z-score = 0. Scores one


standard deviation above the mean = z-score of +1;
one standard deviation below the mean = z-score of
1; two standard deviations above the mean = z-score
of +2, etc. (So Person A might have a z-score of +1 on
a measure of anxiety, and Person B might have a zPSY 395 Lab 2
Pg. 2

score of -1.5 on a different measure of anxiety, but


these two scores can usefully be compared with one
another, at least to the extent the anxiety measures are
measuring pretty much the same thing.)
T-score Another way to express z-scores, but Tscores use only positive numbers: The mean is always
assigned a T-score of 50, and for each standard
deviation away from the mean you add or subtract 10
from 50. Scores that are one standard deviation above
the mean = T score of 60, one standard deviation
below the mean = T score of 40, two standard
deviations above the mean = T score of 70, etc.
T = z*10 + 50
We can also express norms in terms of percentiles:
the score at or below which a specified percentage of
scores in a distribution falls. There are percentiles that
correspond to the mean and standard deviations of a
distribution

PSY 395 Lab 2


Pg. 3

- Here are example z, t, and percentiles for the normal


distribution. These are always the same no matter
what the raw scores were so long as the data are
normally distributed:

z -3
T 20
% 1%

-2
30
2%

-1
40
16%

0
50
50%

+1
60
84%

+2
70
98%

Accessing Data
Open Internet Explorer (DO NOT use Netscape)
Go to class website
Click on Lab 2 Materials
PSY 395 Lab 2
Pg. 4

+3
80
99%

Click on the IPIP-self dataset


When it asks you if you want to save or open this file,
click SAVE TO DISK
Another window will open, be sure to browse to
wherever you want to save the file (AFS space, floppy
disk SAVE YOUR WORK IN TWO PLACES,
TWO DISKS OR AFS + DISK ETC.)
When its done saving, click OPEN this will open
the data in SPSS

PSY 395 Lab 2


Pg. 5

Plotting Norms on National Norm Chart


1) Need to find your raw score for each scale in the
data file.
Find the last five digits of your PID in the data file.
Scroll to the right to find your composite scale scores
for each scales (neuro= neuroticism scale, extra =
extraversion scale, etc.)
2) Need a national norm chart sheet
3) Plot your raw scores on the national chart (for
both genders).
4) How many standard deviations are each of your
scores from the mean? (Calculate your z scores
for norms on your gender).
5) What are the T-scores for each of your scores?
(Convert the z's from #4 above to T).
6) At what percentile do you fall for each of the
scales? (use the z table to figure this out).

PSY 395 Lab 2


Pg. 6

Plotting Norms on Class Norm Chart


1) Need to find the norms for our class, which are?
a) Use SPSS, go to Analyze Descriptive Statistics
Descriptives
b) Put each of the IPIP scale variables into the
variables box
c) Hit OK
2) Use means and standard deviations to create a
class norm chart
For example: Suppose the mean of the neuroticism
scale was 28.46 with a standard deviation of 7.91
28.46 + 7.91= 36.37 one standard deviation above
the mean
28.46 + 7.91+ 7.91= 44.28 two standard deviations
above the mean
28.46 7.91= 20.55 one standard deviation below
the mean
28.46 7.91 7.91= 12.64 two standard deviations
below the mean
3) Plot your raw scores on the chart

PSY 395 Lab 2


Pg. 7

4) How many standard deviations are each of your


scores from the mean?
5) What are the T-scores for each of your scores?
6) At what percentile do you fall for each of the
scales?

PSY 395 Lab 2


Pg. 8

Reliability
What is reliability: how consistent a test score is.
This is the first requirement for good measurement.
For a test to be an adequate measure of an attribute, it
must be reliable it must be consistent.
Different types of reliability estimates:
- test-retest consistency of test scores when the same
test is taken at two different times consistency of
scores over time. We find this by correlating the score
of the test taken at time 1 with the score of the test
taken at time 2.
- internal consistency (alpha) consistency of a test
across its items. Basically assesses if all the items of a
test are measuring the same thing, if they are
consistent with each other. It is the average
correlation of all of the test items, but also corrected
for (increased by) the length of the test.

PSY 395 Lab 2


Pg. 9

- interobserver (or interrater) consistency of a


measure across different raters (different people
completing the measure). It is the correlation between
test score of rater 1 and test score of rater 2. This is
not interrater agreement (e.g., like kappa). How is
reliability different from agreement?
- Why so many different types of reliability?

PSY 395 Lab 2


Pg. 10

Acceptable Levels of Reliability


Reliability estimates are like correlations, and the
highest correlation = 1, so the closer a reliability
estimate is to 1 the higher it is.
Usually, reliability estimates from .8 to 1.0 indicate
high to moderate reliability, from .7 to .8 indicate
moderate reliability, and under .7 indicate low
reliability.
A reliability estimate of .8 indicates that 20% of the
variability in the test scores is due to measurement
error (because 1 - .80 = .20). A reliability estimate of .
5 means that true scores and error have equal effects
on test scores (because 1 - .50 = .50), which is not
desirable.
Remember that the acceptable level of reliability also
depends on what the test is used for. For example, for
making rough preliminary decisions or for screening
purposes (e.g., on the MMPI eliminate the lowest 1%
applying to be postal-service employees), lower
reliability levels might be okay.

PSY 395 Lab 2


Pg. 11

Split Half Reliability and Spearman-Brown


Correction
One way to assess reliability of your measure is to
perform a split-half reliability test. To do this, you
divide the test into two halves. Then, you correlate
performance on one half of the test with performance
on the other half of the test.
To the extent that the correlation coefficient is high
(closer to 1), this indicates that your test is reliable.
To the extent that the correlation coefficient is low
(closer to 0), this is indication that your test is
unreliable.
In the data set, we have split the conscientiousness
scale into two halves (Consc1 & Consc2). One score
is for one half of the items, and the other score is for
the other half of the items. To assess the split-half
reliability, we will need to correlate responses on one
half with responses on the other.

PSY 395 Lab 2


Pg. 12

In SPSS:
Go to Analyze - Correlate - Bivariate. Highlight the
two halves of the conscientiousness scale and select
them over to the box on the right. Select OK.
The correlation coefficient, r, should give you the
value for split-half reliability.
What is it?
However, this value is the reliability coefficient for
only half of the items. If we want the reliability
coefficient for the whole test, we need to apply the
Spearman-Brown Correction Formula:
rSB

k rxx
1 (k 1)rxx

Use the formula to calculate what the reliability would


be for the full conscientiousness scale. Compare this
to alpha when we calculate it below.

PSY 395 Lab 2


Pg. 13

Internal Consistency Reliability


Coefficient alpha
Alpha compares every item to every other item it
looks at the consistency across the items in a particular
scale, so we use individual items in the computation
(not scale scores, which are sums of items scores).
Notice the alpha of .63 is higher than any item
correlation in the matrix, which is generally true
because scales are more reliable than individual items.
Part of your homework assignment for this week is to
calculate coefficient alpha for all 5 IPIP scales.

PSY 395 Lab 2


Pg. 14

Calculating Alpha
Using SPSS, go to Analyze Scale Reliability
analysis
For each IPIP scale, put the items (not the scale
scores) for each particular scale in the items box (so,
youll do this five separate times, once for each scale)
there are 20 items for each scale
Example: n1, n6, n11, n16, n21, etc. until youve
entered all n items. Make sure you use the
reversed scored items where appropriate.
After entering the items for a scale, hit OK
Youll do this 5 times, one time for each of the IPIP
scales.

PSY 395 Lab 2


Pg. 15

Homework #2
[Ask your TA what to turn in
and what to e-mail]
Norms
1.National norms chart with z, T, and % reported for
each scale.
2.Class norms chart with z, T, and % reported for each
scale.
3.Pick one IPIP scale. Describe in 3-4 sentences why
your scores differ when comparing the class
norms to the national norms.
Reliability
4.Report alphas for all 5 IPIP scales.
5.Answer the following questions: What is alpha if
items in a scale are completely uncorrelated?
What happens to alpha if you kept adding items to
the scale that correlated positively with the other
items?
6.Answer the following questions: What do you
conclude if you have a high alpha reliability
coefficient? What do you conclude if alpha is
low? (Optional question for thought: Can a test
still be reliable if alpha is low?)
PSY 395 Lab 2
Pg. 16

You might also like