Presentation By Dr.S.SelvaRani, Principal, Sri Sarada Niketan College for Women, Amaravathipudur, Karaikudi
.
.
.
.
.
. Tests of Hypotheses Statistical tests arm the researcher to objectively interpret the data, without intuitive, biased or unconcerned generalization or particularization. Medical test information from the diagnostic labs tells like: Your Blood Total Sugar Borderline high High Interpretation 172mg/dL 200-239 mg/dL >239 mg/dL Fine; you are not a Sweet person! 220mg/dL 200-239 mg/dL >239 mg/dL Pre-caution; you are becoming a sweet person! 259mg/dL 200-239 mg/dL >239 mg/dL Alas; you are a Sweet person! Its Dangerous!! Here Pre-caution advice is based on Range Band. That is what statistical test do. They place a confidence level. Outside confidence level , Reject H0. the the Parametric and Non-parametric tests Statistical tests are of two broad types- Parametric and Non-parametric. Parametric and nonparametric statistical procedures that test hypotheses involving different assumptions. There are assumptions about: - Shape of Data distribution- normal or any other or No definite shape per se. - Nature of Data Measurement or Counting based - Variance - Randomization of sampling, etc. All these decide to use or not a particular class of test- Parametric and Non-parametric.
Parametric statistics test hypotheses based on the assumption that the samples come from populations that are normally distributed or conform to some other distribution. Also, parametric statistical tests assume that there is homogeneity of variance (variances within groups are the same). The level of measurement for parametric tests is assumed to be ratio or interval.
Parametric Tests
Nonparametric Tests Nonparametric statistical procedures test hypotheses that do not require normal distribution or variance assumptions about the populations from which the samples were drawn.
Nonparametric statistical procedures are designed for ordinal or nominal data. Non-Parametric Tests are versatile. Nominal data are data that consist of names, labels, or categories only. The data cannot be arranged in an order scheme (such as low to high). For example the number 24, 28, 18, on the shirts of the a team of football players are substitutes for names. They dont count or measure anything, so they are categorical data. If you record a number (width, height, speed, errors, etc.,) its a measurement. If you record a label its Nominal (sex, popularity, beauty, etc., ) If the data capable of being ranked. These are then Ordinal . You know there are Interval and Ratio data. Choosing A Statistical Procedure - Guidelines Two Independent Variables Interval or Ratio Independent t-test Dependent t-test One-Way ANOVA Repeated Measures ANOVA Two -Factor ANOVA Two-Factor ANOVA Repeated Measures Ordinal Mann- Whitney U Wilcoxon Kruskal- Wallis Friedman Nominal Chi-Square Chi-Square Chi-Square Multiple Independent Groups Multiple Dependent Groups Factorial Designs Independent Groups Dependent Groups Measurement Scale of the Dependent Variable One Independent Variable Two Levels More than 2 Levels Two Independent Groups Two Dependent Groups Easy to Compute. Really easy, if we know!!! Without making assumptions about population values or parameters.
Distribution-free Tests. They compare medians rather than means. Nonparametric Tests: Features Many nonparametric methods do not use the raw data and instead use the rank order of data for analysis Nonparametric methods can be used with small samples Not requiring the assumption of Normality or the assumption of Homogeneity of variance. Many Non-Parametric Tests Exist Tests Description Chi-Square Test Tests for significance of difference between expected and actual frequency distributions Anderson- Darling Test Tests whether a sample is drawn from a given distribution Friedman Test Two Way ANOVA Row- wise Ranks Tests whether k treatments in randomized block designs have identical effects Kendalls Tau Measures statistical dependence between two variables Many Non-Parametric Tests Tests Description Mann- Whitney U or Wilcoxon Rank Sum Test Tests whether two samples are drawn from the same distribution, as compared to a given alternative hypothesis Median Test Tests whether two samples are drawn from distributions with equal medians Kolmogorov- Smirnov Test Tests whether a sample is drawn from a given distribution, or whether two samples are drawn from the same distribution Kruskal-Wallis One- way ANOVA Rank data Tests whether >2 independent samples are drawn from the same distribution Many Non-Parametric Tests Tests Description Kuipers Test Tests whether a sample is drawn from a given distribution, sensitive to cyclic variations such as day of the week Sign Test Tests whether matched pair samples are drawn from distributions with equal medians Spearmans Rank Correlation Coefficient Test Measures statistical dependence between two variables using a monotonic function Wilcoxon Signed Rank Test Tests whether matched pair samples are drawn from populations with different mean ranks Caution About Using Non-Parametric tests The main weakness of nonparametric tests is that they are less powerful than parametric tests. They are less likely to reject the null hypothesis when it is false. We loose some information when data are ordinal changed. So, when the assumptions of parametric tests can be met, parametric tests should be used because they are the most powerful tests available. Type I Error Null hypothesis is rejected when it is actually true. Probability of Type I error set by , (Alpha). Type II Error Null hypothesis is accepted when it is actually false Probability of making a Type II error is called, , Beta. Increasing alpha decreases beta and vice versa Setting alpha and beta depends upon the cost of making either type of error. Chi-Square Test Chi-Square Test The Chi Square (X 2 ) test is undoubtedly the most important and most used member of the nonparametric family of statistical tests. Chi Square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such as that which may be expected due to chance or probability.
Chi-SquareTest Chi Square can also be used to test differences between two or more actual samples. We dont have scores, we dont have means. We just have numbers, or frequencies. In other words, we have nominal data or head- count data. Counting the Heads, nor their weight!!! It is versatile with many applications.
Chi-Square Distribution Chi-Square Distribution CDF Determining the Cutoff for a C- hi-. It is all Chi-Square Distribution Starts with Value Zero and Extends upto Infinity Degrees of Freedom (df )
Probability (p) Chi-Square vales for different Alpha and Degrees of Freedom -- 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46 7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12 9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 Nonsignificant Significant A great question now. What? Isnt Chi-Square Parametric!!!! Chi-square is a statistic that is related to the central limit theorem in the sense that proportions are in fact means, and that proportions are normally distributed (with a mean of p [not 3.141592653...] and a variance of [(p) (1-p)]. Therefore, we can perform a normal curve test for examining the difference between proportions such that Z squared = chi square on one degree of freedom. Since Z is indubitably a parametric test, and chi square can be related to Z, we can infer that it is, in fact, parametric. Chi-Square Test for Goodness of Fit Is any theoretical distribution like, Beta, Binomial, Poisson, Normal or any other a good fit in a given case? For instance we may assume, gender of new born is binomial distributed, with q=p=0.5. In families each with four children, we test this. Binomial distributions acceptance needs testing here. We may use Chi-Square. In project management the activity time estimates (pessimistic, most-likely and optimistic) is said to be a four parameter beta distributed. We may test this using Chi-Square. In an empirical distribution like, 2 : 3 : 5 as the ratio of number of new-born babies with underweight, normal weight and overweight in a rich country, to its goodness of fit, we may use Chi-Square.
Do birds forage randomly on any tree or are they choosing particular kind of trees? A study on bird foraging behavior in a forest in Oregon revealed the following. In a managed forest, 54% of the canopy volume was Douglas fir, 40% was ponderosa pine, 5% was grand fir, and 1% was western larch. On 156 observations of foraging by nuthatches; 70 observations (45% of the total) in Douglas fir, 79 (51%) in ponderosa pine, 3 (2%) in grand fir, and 4 (3%) in western larch. The biological null hypothesis is that the birds forage randomly, without regard to what species of tree they're in; The statistical null hypothesis is that the proportions of foraging events are equal to the proportions of canopy volume.
Problem Expressed in Frequency Types of Trees Douglas Fir Ponderosa Pine Grand Fir Western Larch Total Birds Actual Tree Forage Distribution = O 70 79 3 4 156 Birds Expected Tree Preference Distribution = Tree Distribution in the Managed Forest = E . 84 62 8 2 156
The difference in proportions is significant ( =11.296, table Value= 7.82 for 3 d.f. without merging) P=0.0035. * The formula for calculating chi-square
Degrees of freedom (df) = n-1 where n is the number of classes Contingency Table based Chi-square Contingency table is also referred to a cross tabulation or cross tab. T is a type of matrix table displaying multiple variables in frequency. The term was originally used by great statistician Karl Pearson in the context of his study. But now it stands generalized. Are variables in questions are independent or dependent? Are the attributes Related or Un-related, Associated or Un-associated? We use Chi-square. Most of you know this very well. Yet have a cursory glance over an example. Asia Africa South America Totals Malaria A 31 14 45 90 Malaria B 2 5 53 60 Malaria C 53 45 2 100 Totals 86 64 100 250 Suppose you have the following categorical data set. Incidence of three types of malaria in three tropical regions.
Observed Expected |O -E| (O E)2 (O E)2/ E 31 30.96 0.04 0.0016 0.0000516 14 23.04 9.04 81.72 3.546 45 36.00 9.00 81.00 2.25 2 20.64 18.64 347.45 16.83 5 15.36 10.36 107.33 6.99 53 24.00 29.00 841.00 35.04 53 34.40 18.60 345.96 10.06 45 25.60 19.40 376.36 14.70 2 40.00 38.00 1444.00 36.10 We could now set up the following table: Chi Square = 125.516; DF= (c - 1)(r - 1) = 2(2) = 4 H0 Rejected. See Below. Df 0.5 0.10 0.05 0.02 0.01 1 0.455 2.706 3.841 5.412 6.635 2 1.386 4.605 5.991 7.824 9.210 3 2.366 6.251 7.815 9.837 11.345 4 3.357 7.779 9.488 11.668 13.277 5 4.351 9.236 11.070 13.388 15.086 Reject Ho because 125.516 is greater than 9.488 (for alpha = 0.05) Thus, we would reject the null hypothesis that there is no relationship between location and type of malaria. Inference : There is a relationship between type of malaria and location. Mann Whitney U Test Nonparametric equivalent of the independent t test Two independent groups Ordinal measurement of the Decision Variable The sampling distribution of U is known and is used to test hypotheses in the same way as the t distribution.
Other Names: MannWhitney Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon MannWhitney test)
Mann Whitney U Test To compute the Mann Whitney U: Rank the scores in both groups together from highest to lowest. Sum the ranks of the scores for each group. The sum of ranks for each group are used to make the statistical comparison. Rank Total
Income Rank Income Rank 25 12 27 10 32 5 19 17 36 3 16 20 40 1 33 4 22 14 30 7 37 2 17 19 20 16 21 15 18 18 23 13 31 6 26 11 29 8 28 9 85 125 Non-Directional Hypotheses Null Hypothesis: There is no difference in scores of the two groups (i.e. the sum of ranks for group 1 is no different from the sum of ranks for group 2). Alternative Hypothesis: There is a difference between the scores of the two groups (i.e. the sum of ranks for group 1 is significantly different from the sum of ranks for group 2). Computing the Mann Whitney U Using SPSS Enter data into SPSS spreadsheet; two columns 1 st
column: groups; 2 nd column: scores (ratings) Analyze Nonparametric 2 Independent Samples Select the independent variable and move it to the Grouping Variable box Click Define Groups Enter 1 for group 1 and 2 for group 2 Select the dependent variable and move it to the Test Variable box Make sure Mann Whitney is selected Click OK Interpreting the Output Ranks 10 12.50 125.00 10 8.50 85.00 20 Income Status Income Produci ng No Income Total Equal Ri ghts Atti tudes N Mean Rank Sum of Ranks Test Statistics b 30.000 85.000 -1.512 .131 .143 a Mann-Whitney U Wil coxon W Z Asymp. Si g. (2-tai l ed) Exact Si g. [2*(1-tai led Si g.)] Equal Rights Attitudes Not corrected for ti es. a. Groupi ng Variabl e: Income Status b. The output provides a z score equivalent of the Mann Whitney U statistic. It also gives significance levels for both a one-tailed and a two-tailed hypothesis. where n 1 is the sample size for sample 1, and R 1 is the sum of the ranks in sample 1. Note that it doesn't matter which of the two samples is considered a sample. Take either U1 or U2 as U and compute z. Same result gets.
where, m U and U are the mean and standard deviation of U. Z A standard normal deviate whose significance can be checked in tables of the normal distbn. MW U Test U=R1- [(n1)(n1+1)]/2 =85-[(10)(11)]/2 =85-55=30. = Sq.Rt. [(n1.n2(n1+n2+1)/12] = Sq.Rt. (2100/12)=13.23 Mu = n1*n2/2= 50 Z= [U- Mu]/ = [30-50]/13.23 =-1.512
Mann-Whitney U test II Pair No.Unmaried Married thorax thorax width width 1 4 2.8 2 3 2.7 3 2.6 2.6 4 3.85 2.7 5 2.65 2.6 6 2.7 2.6 7 2.85 2.7 8 2.85 2.8 9 3.2 2.9 10 2.9 2.6 Unmarried Married thorax thorax Rank width Rank width 3 2.6 3 2.6 6 2.65 3 2.6 8.5 2.7 3 2.6 13.5 2.85 3 2.6 13.5 2.85 8.5 2.7 15.5 2.9 8.5 2.7 17 3 8.5 2.7 18 3.2 11.5 2.8 19 3.85 11.5 2.8 20 4 15.5 2.9 Mann-Whitney U test Rank both lists as one combined list I found this a time consuming task Mann-Whitney U test Sum the ranks for each sample N1= # obs in 1 N2= # obs in 2 Unmarried Married thorax thorax Rank width Rank width 3 2.6 3 2.6 6 2.65 3 2.6 8.5 2.7 3 2.6 13.5 2.85 3 2.6 13.5 2.85 8.5 2.7 15.5 2.9 8.5 2.7 17 3 8.5 2.7 18 3.2 11.5 2.8 19 3.85 11.5 2.8 20 4 15.5 2.9 R1=134 R2=76 Wilcoxon Tests Wilcoxon Dependent Sample Signed-Rank Test Nonparametric equivalent of the dependent (paired-samples) t test Two dependent groups (within design) Ordinal level measurement of the DV. The test statistic is T, and the sampling distribution is the T distribution. Let N be the sample size, the number of pairs. Thus, there are a total of 2N data points. Procedure for Testing Hypotheses About Differences in Related Samples Data : The Data are computed from pairs of measurements on each of the n elements in the sample. Assumptions : The random variables are independent and identically distributed, and their distribution is symmetric.
Test Statistic Let R = and
The test statistics
Example :
The reading scores in this section represent differences for 10 randomly selected individuals who were measured before (Y) and after (X) taking a speed reading course. These differences were determined by the Lilliefors test to be not normally distributed. Therefore, the Wilcoxon signed ranks test is used to test with = .01. Alternate Hypothesis (The mean reading scores are higher after the course). Null Hypothesis (The mean reading scores are not changed by the course) Person Score After Course X 1
= 4.27 TR=Mean of R/(SR/n) =4.7/(4.27/10) = 4.7/1.35= 3.48 From t distribution with 9 degrees of freedom the critical value = 2.8214. Since, test statistic T= 3.48 is greater than 2.8214, Ho is Rejected. Inference : The mean reading scores are changed by the course. Computing the Wilcoxon Test Using SPSS Enter data into SPSS spreadsheet; two columns 1 st column: pretest scores; 2 nd column: post-test scores Analyze Nonparametric 2 Related Samples Highlight both variables move to the Test Pair(s) List Click OK To Generate Descriptives: Analyze Descriptive Statistics Explore Both variables go in the Dependent box Click Statistics Make sure Descriptives are checked Click OK
Wilcoxon Test To compute the Wilcoxon T: Determine the differences between scores. Rank the absolute values of the differences. Place the appropriate sign with the rank (each rank retains the positive or negative value of its corresponding difference) T = the sum of the ranks with the less frequent sign
Pretest Posttest Difference Sgn. Rank 36 21 15 11 23 24 -1 -1 48 36 12 10 54 30 24 12 40 32 8 7 32 35 -3 -3 50 43 7 6 44 40 4 4 36 30 6 5 29 27 2 2 33 22 11 9 45 36 9 8 Non-Directional Hypotheses Null Hypothesis: There is no difference in scores before and after an intervention (i.e. the sums of the positive and negative ranks will be similar). Non-Directional Research Hypothesis: There is a difference in scores before and after an intervention (i.e. the sums of the positive and negative ranks will be different).
Interpreting the Output Ranks 10 a 7.40 74.00 2 b 2.00 4.00 0 c 12 Negati ve Ranks Posi ti ve Ranks Ti es Total POSTTEST - PRETEST N Mean Rank Sum of Ranks POSTTEST < PRETEST a. POSTTEST > PRETEST b. POSTTEST = PRETEST c. Test Statistics b -2.746 a .006 Z Asymp. Si g. (2-tail ed) POSTTEST - PRETEST Based on posi ti ve ranks. a. Wil coxon Si gned Ranks Test b. The T test statistic is the sum of the ranks with the less frequent sign. The output provides the equivalent z score for the test statistic. Two-Tailed significance is given.
The Kruskal-Wallis Test The Kruskal-Wallis test is a nonparametric test that can be used to determine whether three or more independent samples were selected from populations having the same distribution. H 0: There is no difference in the population distributions. H a: There is a difference in the population distributions.
Combine the data and rank the values. Then separate the data according to sample and find the sum of the ranks for each sample. R i = the sum of the ranks for sample i. Kruskal-Wallis Test ( ) ( ) 1 3 1 12 2 2 2 2 1 2 1 + | | . |
\ | + + + + = N n R n R n R N N H k k
Given three or more independent samples, the test
statistic H for the Kruskal-Wallis test is where k represents the number of samples, n i is the size of the i th sample, N is the sum of the sample sizes, and R i is the sum of the ranks of the i th sample. Reject the null hypothesis when H is greater than the critical number. (always use a right tail test.) You know the one-way ANOVA is an extension of the two independent groups t-test to a 3 or more population problem. The Kruskal-Wallis test is an extension of the Mann-Whitney U test to a 3 or more population problem
The Kruskal-Wallis test handles k-independent groups of samples, based on chi-square. Here it is Non-parametric.
Like the Mann-Whiteny U test, this test uses ranks.
Note: Kruskal-Wallis test in the rank transformed pattern is One-way ANOVA, based on F test. Here it is Parametric. Procedure
1. Combine the observations of the various groups
2. Arrange them in order of magnitude from lowest to highest
3. Assign ranks to each of the observations and replace them in each of the groups
4. Original ratio data have therefore been converted into ordinal or ranked data
5. Ranks are summed in each group and the test statistic, H is computed
6. Ranks assigned to observations in each of the k groups are added separately to give k rank sums
You want to compare the hourly pay rates of accountants who work in Michigan, New York and Virginia. To do so, you randomly select 10 accountants in each state and record their hourly pay rate as shown below. At the .01 level, can you conclude that the distributions of accountants hourly pay rates in these three states are different? MI(1) NY(2) VA(3) 14.24 21.18 17.02 14.06 20.94 20.63 14.85 16.26 17.47 17.47 21.03 15.54 14.83 19.95 15.38 19.01 17.54 14.9 13.08 14.89 20.48 15.94 18.88 18.5 13.48 20.06 12.8 16.94 21.81 15.57 H 0 : There is no difference in the hourly pay rate in the 3 states. H a : There is a difference in the hourly pay in the 3 states. 0.01 o = 1. Write the null and alternative hypothesis 2. State the level of significance 3. Determine the sampling distribution The sampling distribution is chi-square with d.f. = 3-1 = 2 From Table ,the critical value is 9.210. _ 2 5. Find the rejection region 4. Find the critical value Test Statistic Data State Rank 12.8 VA 1 13.08 MI 2 13.48 MI 3 14.06 MI 4 14.24 MI 5 14.83 MI 6 14.85 MI 7 14.89 NY 8 14.9 VA 9 15.38 VA 10 15.54 VA 11 15.57 VA 12 15.94 MI 13 16.26 NY 14 16.94 MI 15 17.02 VA 16 17.47 MI 17.5 17.47 VA 17.5 17.54 NY 19 18.5 VA 20 18.88 NY 21 19.01 MI 22 19.95 NY 23 20.06 NY 24 20.48 VA 25 20.63 VA 26 20.94 NY 27 21.03 NY 28 21.18 NY 29 21.81 NY 30 Michigan salaries are in ranks: 2, 3, 4, 5, 6, 7, 13, 15, 17.5, 22 The sum = 94.5 New York salaries are in ranks: 8, 14, 19, 21, 23, 24, 27, 28, 29, 30 The sum is 223 Virginia salaries are in ranks: 1, 9, 10, 11, 12, 16, 17.5, 20, 25, 26 The sum is 147.5 R 1 = 94.5, R 2 = 223, R 3 =147.5 n 1 =
10, n 2 =10 and n 3 =10, so N = 30
H = 2 2 2 12 94.5 223 147.5 3(30 1) 10.76 30(30 1) 10 10 10 | | + + + = | + \ . 10.76 9.210 Make Your Decision Interpret your decision The test statistic, 10.76 falls in the rejection region, so Reject the null hypothesis There is a difference in the salaries of the 3 states. Find the test statistic Purpose Allows a scientist to test the influence of the independent variable upon the dependent variable Controls for the influence of other variables Conclusions Primary question How reasonable are these results if chance alone were responsible? If the results are not due to chance, then the results are attributed to the experimental manipulation
Example. A study is being conducted on whether entering college students gain weight during the freshman year. Below are the "Before" and "After" weights for a random sample of 30 students. Test to see whether there is a significant "gain" in weights after the freshman year in college. Before After Before After 133 135 + 121 125 + 152 160 + 144 140 _ 169 180 + 106 108 + 156 154 _ 182 175 + 178 185 + 122 120 _ 220 226 + 110 114 + 145 150 + 130 134 +
Sign test Calculation No. of positive Ranks =20; Negative Ranks =8; Ties = 2 Null hypothesis: No difference in weight in pre-post stages H0: Assumed Population P = Q=0.5 Sample p =20\28 =0.71 & III ly, Sample: q =8\28 =0.29. SE of Proportion = Sq.Rt [(PQ)/n]= Sq.Rt [(.5 * .5)/28]= 0.0298 Z= [p-P]/ SE = [0.71-0.5]/ 0.0298 = 0.21/ 0.0298= 7.05 H0 Rejected at 5% Significance level. SPSS After entering the data in the appropriate lists and executing the SIGNTEST program by entering 2 for the alternative X < Y, we see that there are 20 persons who increased in weight (pos. changes) out of 28 persons who actually changed weight (changes). If there were "no difference" (i.e., if p = P(After > Before) = 0.5) ), then there would be only a 0.01785 probability (from the right-tail P-value) of there being as many as 20 people out of 28 who gained weight. This low p-value gives evidence to reject the claim that there is no difference in favor of the alternative that there is tendency to gain weight. Thank you