You are on page 1of 55

Non-parametric Tests

Learning Objectives
1. Distinguish Parametric & Nonparametric Test Procedures 2. Explain commonly used Nonparametric Test Procedures

3. Perform Hypothesis Tests Using Nonparametric Procedures

Introduction
The word parametric comes from parameter, or

characteristic of a population. The parametric tests (include assumptions about the shape of the population distribution (e.g. normally distributed). Non-parametric techniques, do not have such stringent requirements and do not make assumptions about the underlying population distribution (which is why they are sometimes referred to as distribution-free tests).

Hypothesis Testing Procedures


Hypothesis Testing Procedures

Parametric

Nonparametric

Wilcoxon Rank Sum Test

Kruskal-Wallis H-Test

Z Test

t Test

One-Way ANOVA

Many More Tests Exist!

Parametric Test Procedures


1. Involve Population Parameters (Mean)

2. Have Stringent Assumptions (Normality) 3. Examples: Z Test, t Test, F test

Nonparametric Test Procedures


1. Do Not Involve Population Parameters Example: Probability Distributions, Independence

2. Data Measured on Any Scale (Ratio or Interval, Ordinal or Nominal) 3. Example: Mann-Whitney, KruskalWallis , Wilcoxon Rank Sum Test, 2 Test

Advantages of Nonparametric Tests


1. 2. Used With All Scales Easier to Compute Fewer

3. Make Assumptions

4.

Need Not Involve Population Parameters


Results May Be as
1984-1994 T/Maker Co.

5. Exact

as Parametric Procedures

Disadvantages of Nonparametric Tests


1. May Waste Information Parametric model more efficient if data Permit Difficult to Compute by
1984-1994 T/Maker Co.

2.

hand for Large Samples 3. Tables Not Widely Available

Popular Nonparametric Tests


1. 2. 3. 4. 5. 6. 7. 8. 9.

Mann-Whitney Rank Sum Test Sign Test Wilcoxon Test Kruskal-Wallis Test Friedman Test Spearmans Rank Correlation Kolmogorov-Smirnov Chi-square for independence

Nonparametric tests vs. Parametric


Comparison
2 Independent 2 Matched/Related >2 Independent Two-way Correlation Distribution

NonParametric Test
Mann-Whitney Rank Sum Test Sign Test Wilcoxon Test Kruskal-Wallis Test Friedman Test Spearmans Rank Correlation Kolmogorov-Smirnov Chi-square for independence

Parametric Equivalent
Independent t-test Paired samples t-test One-way ANOVA Two-way ANOVA Pearsons Correlation None None

Assumptions for non-parametric techniques


Random samples.
Independent observations. Each person or case

can be counted only once, they cannot appear in more than one category or group, and the data from one subject cannot influence the data from another. The exception to this is the repeated measures techniques (Wilcoxon Signed Rank Test, Friedman Test), where the same subjects are retested on different occasions or under different conditions. Some of the techniques discussed in this lecture have additional assumptions that should be

1. Chi-square
There are two different types of chi-square

tests, both involving categorical data:


1. The chi-square for goodness of fit (also referred to as one-sample chi-square) explores the proportion of cases that fall into the various categories of a single variable, and compares these with hypothesised values. 2. The chi-square test for independence is used to determine whether two categorical variables are related. It compares the frequency of cases found in the various categories of one variable across the different categories of another variable. For example: Is the proportion of smokers to non-smokers the same

1. Chi-square test for independence


This test is used when you wish to explore the relationship between two categorical variables. Each of these variables can have two or more categories.

Summary for chi-square


Example of research question:

There are a variety of ways questions can be phrased: Are males more likely to be smokers than females? Is the proportion of males that smoke the same as the proportion of females? Is there a relationship between gender and smoking behaviour? What you need: Two categorical variables, with two or more categories in each, for example: Gender (Male/Female); and Smoker (Yes/No).

Cont.
Additional assumptions :

The lowest expected frequency in any cell should be 5 or more. Some authors suggest a less stringent criteria: at least 80 per cent of cells should have expected frequencies of 5 or more. If you have a 1 by 2 or a 2 by 2 table, it is recommended that the expected frequency be at least 10. If you have a 2 by 2 table that violates this assumption you should consider using Fishers Exact Probability Test instead (also provided as part of the output from chi square).

Procedure for chi-square


1. From the menu at the top of the screen click on: Analyze, then click on Descriptive Statistics, then on Crosstabs. 2. Click on one of your variables (e.g. sex) to be your row variable, click on the arrow to move it into the box marked Row(s). 3. Click on the other variable to be your column variable (e.g. smoker), click on the arrow to move it into the box marked Column(s). 4. Click on the Statistics button. Choose Chi-square. Click on Continue. 5. Click on the Cells button. 6. In the Counts box, click on the Observed and Expected boxes.

The output

Cont.

Interpretation of output from chi-square


Assumptions
The first thing you should check is whether you have

violated one of the assumptions of chi-square concerning the minimum expected cell frequency, which should be 5 or greater (or at least 80 per cent of cells have expected frequencies of 5 or more). This information is given in a footnote below the final table (labelled Chi-Square Tests). Footnote b in the example provided indicates that 0 cells (.0%) have expected count less than 5. This means that we have not violated the assumption, as all our expected cell sizes are greater than 5 (in our case greater than 35.87).

Cont.
Chi-square tests The main value that you are interested in from the

output is the Pearson chi square value. If you have a 2 by 2 table (i.e. each variable has only two categories), then you should use the value in the second row (Continuity Correction). In the example presented above the corrected value is .337, with an associated significance level of .56 (Asymp. Sig. (2-sided). In this case the value of .56 is larger than the alpha value of .05, so we can conclude that our result is not significant. This means that the proportion of males that smoke is not significantly different from the proportion of females that smoke.

Summary information
To find what percentage of each sex smoke you will

need to look at the summary information provided in the table labelled SEX*SMOKE Crosstabulation. This table may look a little confusing to start with, with a fair bit of information presented in each cell. To find out what percentage of males are smokers you need to read across the page in the first row, which refers to males. In this case we look at the values next to % within sex. For this example 17.9 per cent of males were smokers, while 82.1 per cent were non-smokers. For females, 20.6 per cent were smokers, 79.4 per cent non-smokers. If we wanted to know what percentage of the sample as a whole smoked we would move down to the total row, which summarises across both sexes. In this case we would look at the

2. The Chi-Square Test for Goodnessof-Fit


The chi-square test for goodness-of-fit

uses frequency data from a sample to test hypotheses about the shape or proportions of a population. Each individual in the sample is classified into one category on the scale of measurement. The data, called observed frequencies, simply count how many individuals from the sample are in each category.

23

The Chi-Square Test for Goodnessof-Fit (cont.)


The null hypothesis specifies the proportion of the

population that should be in each category. The proportions from the null hypothesis are used to compute expected frequencies that describe how the sample would appear if it were in perfect agreement with the null hypothesis.

24

Procedure for chi-square

3. Mann-Whitney U Test
This technique is used to test for differences

between two independent groups on a continuous measure. For example, do males and females differ in terms of their self-esteem? This test is the non-parametric alternative to the ttest for independent samples. Mann-Whitney U Test compares medians. It converts the scores on the continuous variable to ranks, across the two groups. It then evaluates whether the ranks for the two groups differ significantly. As the scores are converted to ranks, the actual

Summary for Mann-Whitney U Test


Example of research question:
Do males and females differ in terms of their levels of

self-esteem? Do males have higher levels of self-esteem than females?


What do you need: Two variables:

one categorical variable with two groups (e.g. sex); and one continuous variable (e.g. total self-esteem).
Assumptions: the general assumptions for non-

parametric techniques presented at the beginning of this presentation. Parametric alternative: Independent-samples t-test.

Procedure for Mann-Whitney U Test


1. From the menu at the top of the screen click on: Analyze, then click on Nonparametric Tests, then on 2 Independent Samples. 2. Click on your continuous (dependent) variable (e.g. total self-esteem) and move it into the Test Variable List box. 3. Click on your categorical (independent) variable (e.g. sex) and move into Grouping Variable box. 4. Click on Define Groups button.Type in the value for Group 1 (e.g. 1) and for Group 2 (e.g. 2). These are the values that were used to code your values for this variable (see your codebook). Click on Continue. 5. Make sure that the Mann-Whitney U box is ticked under the section labelled Test Type. Click on OK.

The output

Interpretation of output from MannWhitney U Test


The two values that you need to look at in your output are the Z value and the significance level, which is given as Asymp. Sig (2-tailed). If your sample size is larger than 30, SPSS will give you the value for a Z-approximation test which includes a correction for ties in the data. In the example given above, the Z value is 1.23 (rounded) with a significance level of p=.22. The probability value (p) is not less than or equal to .05, so the result is not significant. There is no statistically significant difference in the self-esteem scores of males and females.

4. Wilcoxon Signed Rank Test


The Wilcoxon Signed Rank Test (also referred to

as the Wilcoxon matched pairs signed ranks test) is designed for use with repeated measures: that is, when your subjects are measured on two occasions, or under two different conditions. It is the non-parametric alternative to the repeated measures t-test, but instead of comparing means the Wilcoxon converts scores to ranks and compares them at Time 1 and at Time 2. The Wilcoxon can also be used in situations involving a matched subject design, where subjects are matched on specific criteria.

Summary for Wilcoxon Signed Rank Test


Example of research question:

Is there a change in the scores on the Fear of Statistics test from Time 1 to Time 2? What do you need: One group of subjects measured on the same continuous scale or criterion on two different occasions. The variables involved are scores at Time 1 or Condition 1, and scores at Time 2 or Condition 2. Assumptions: See general assumptions for non-parametric techniques presented at the beginning of this presentation. Parametric alternative: Paired-samples t-test.

Procedure for Wilcoxon Signed Rank Test


1. From the menu at the top of the screen click on: Analyze, then click on Nonparametric Tests, then on 2 Related Samples. 2. Click on the variables that represent the scores at Time 1 and at Time 2 (e.g. fost1, fost2). Move these into the Test Pairs List box. 3. Make sure that the Wilcoxon box is ticked in the Test Type section. Click on OK.

The output

Interpretation of output from Wilcoxon Signed Rank Test


The two things you are interested in the output are

the Z value and the associated significance levels, presented as Asymp. Sig. (2-tailed). If the significance level is equal to or less than .05 (e.g. .04, .01, .001) then you can conclude that the difference between the two scores is statistically significant. In this example the Sig. value is .000 (which really means less than .0005). Therefore we can conclude that the two sets of scores are significantly different.

5. Kruskal-Wallis Test
The Kruskal-Wallis Test (sometimes referred to as

the Kruskal-Wallis H Test) is the non-parametric alternative to a one-way between-groups analysis of variance. It allows you to compare the scores on some continuous variable for three or more groups. It is similar in nature to the Mann-Whitney test presented earlier in this chapter, but it allows you to compare more than just two groups. Scores are converted to ranks and the mean rank for each group is compared. This is a between groups analysis, so different people must be in

Summary for Kruskal-Wallis Test


Example of research question:

Is there a difference in optimism levels across three age levels? What you need: Two variables: one categorical independent variable with three or more categories (e.g. agegp3: 1829, 3044, 45+); and one continuous dependent variable (e.g. total optimism). Assumptions: See general assumptions for non-parametric techniques presented at the beginning of this presentation. Parametric alternative:

Procedure for Kruskal-Wallis Test


1. From the menu at the top of the screen click on: Analyze, then click on Nonparametric Tests, then on K Independent Samples. 2. Click on your continuous (dependent variable) (e.g. total optimism) and move it into the Test Variable List box. 3. Click on your categorical (independent variable) (e.g. agegp3) and move it into the Grouping Variable box. 4. Click on the Define Range button. Type in the first value of your categorical variable (e.g., 1) in the Minimum box. Type the largest value for your categorical variable (e.g. 3) in the Maximum box. Click on Continue.

The output

Interpretation of output from Kruskal-Wallis Test


The main pieces of information you need from this output are: Chi-Square value, the degrees of freedom (df) and the significance level (presented as Asymp. Sig.). If this significance level is a value less than .05 (e.g. .04, .01, .001), then you can conclude that there is a statistically significant difference in your continuous variable across the three groups. You can then inspect the Mean Rank for the three groups presented in your first output table. This will tell you which of the groups had the highest overall ranking that corresponds to the highest score on your continuous variable. In the output presented above the significance level was .01 (rounded). This is less than the alpha level of .05, so these results suggest that there is a difference in optimism levels across the different age groups. An inspection of the mean ranks for the groups suggest that the older group (45+) had

6. Friedman Test
The Friedman Test is the non-parametric

alternative to the one-way repeated measures analysis of variance. It is used when you take the same sample of subjects or cases and you measure them at three or more points in time, or under three different conditions.

Summary for Friedman Test


Example of research question:

Is there a change in Fear of Statistics scores across three time periods (pre-intervention, post-intervention and at follow-up)? What do you need: One sample of subjects, measured on the same scale or measured at three different time periods, or under three different conditions. Assumptions: See general assumptions for non-parametric techniques . Parametric alternative:

Procedure for Friedman Test


1. From the menu at the top of the screen click on: Analyze, then click on Nonparametric Tests, then on K Related Samples. 2. Click on the variables that represent the three measurements (e.g. fost1, fost2, fost3). 3. In the Test Type section check that the Friedman option is selected. Click on OK.

The output

Interpretation of output from Friedman Test


The results of this test suggest that there are significant differences in the Fear of Statistics scores across the three time periods. This is indicated by a Sig. level of .000 (which really means less than .0005). Comparing the ranks for the three sets of scores, it appears that there was a steady decrease in Fear of Statistics scores over time.

Reporting Statistics in APA Style


The following examples illustrate how to report

statistics in the text of a research report. You will note that significance levels in journal articles-especially in tables--are often reported as either "p > .05," "p < .05," "p < .01," or "p < .001." APA style dictates reporting the exact p value within the text of a manuscript (unless the p value is less than .001). Please pay attention to issues of italics and spacing. APA style is very precise about these. Also, with the exception of some p values, most statistics should be rounded to two decimal places.

EXAMPLES
Mean and Standard Deviation are most clearly

presented in parentheses:The sample as a whole was relatively young (M = 19.22, SD = 3.45).The average age of students was 19.22 years (SD = 3.45). Percentages are also most clearly displayed in parentheses with no decimal places: Nearly half (49%) of the sample was married.

CONT.
Reporting a significant single sample t-test (

0): Students taking statistics courses in psychology at the University of Washington reported studying more hours for tests (M = 121, SD = 14.2) than did UW college students in in general, t(33) = 2.10, p = .034.
Reporting a significant t-test for dependent groups

(1 2): Results indicate a significant preference for pecan pie (M = 3.45, SD = 1.11) over cherry pie (M = 3.00, SD = .80), t(15) = 4.00, p = .001.

CONT.
Reporting a significant t-test for independent

groups (1 2): UW students taking statistics courses in Psychology had higher IQ scores (M = 121, SD = 14.2) than did those taking statistics courses in Statistics (M = 117, SD = 10.3), t(44) = 1.23, p = .09. Over a two-day period, participants drank significantly fewer drinks in the experimental group (M= 0.667, SD = 1.15) than did those in the wait-list control group (M= 8.00, SD= 2.00), t(4) = -5.51, p=.005.

CONT.
Reporting a significant omnibus F test for a one-

way ANOVA: An analysis of variance showed that the effect of noise was significant, F(3,27) = 5.94, p = .007. Post hoc analyses using the Scheff post hoc criterion for significance indicated that the average number of errors was significantly lower in the white noise condition (M = 12.4, SD = 2.26) than in the other two noise conditions (traffic and industrial) combined (M = 13.62, SD = 5.56), F(3, 27) = 7.77, p = .042.

CONT.
Reporting the results of a chi-square test of

independence: A chi-square test of independence was performed to examine the relation between religion and college interest. The relation between these variables was significant, X2 (2, N = 170) = 14.14, p <.01. Catholic teens were less likely to show an interest in attending college than were Protestant teens. Reporting the results of a chi-square test of goodness of fit: A chi-square test of goodness-of-fit was performed to determine whether the three sodas were equally preferred. Preference for the three sodas was not equally distributed in the population, X2 (2, N = 55) =

Cont.
Regression results are often best presented in a table.

APA doesn't say much about how to report regression results in the text, but if you would like to report the regression in the text of your Results section, you should at least present the unstandardized or standardized slope (beta), whichever is more interpretable given the data, along with the t-test and the corresponding significance level. (Degrees of freedom for the t-test is N-k-1 where k equals the number of predictor variables.) It is also customary to report the percentage of variance explained along with the corresponding F test. Social support significantly predicted depression scores, b = -.34, t(225) = 6.53, p < .001. Social support also explained a significant

Cont.
Correlations are reported with the degrees of freedom

(which is N-2) in parentheses and the significance level: The two variables were strongly correlated, r(55) = .49, p < .01.
Tables are useful if you find that a paragraph has

almost as many numbers as words. If you do use a table, do not also report the same information in the text. It's either one or the other. Based on: American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC:

THE END

You might also like