You are on page 1of 6

Exploration of Normality and Equal Variance Assumptions in ANOVA test

% % % % %

The ANOVA (analysis of variance) tests hypotheses on the means of samples from several different populations. It is used especially to test whether the means of two or more samples are equal, which is relevant to analysis of behavioral data, ie. testing whether task performance is different across feedback conditions.

% The goal of this assignment is to explore the normality and equal variance % assumptions for the parametric test. % (1) NORMALITY: % By assuming the data is normal, we accept that the mean represents the % central tendency of the data. If data is non-normal, we are increasing % our chance of finding a false positive for the test. We expect the % ANOVA will be fairly robust to non-normal data. % % % % (2) EQUAL VARIANCE: The equal variance assumption assumes that each data group that is compared comes from a population with equivalent spread. We expect the ANOVA will be less robust to violations of this assumption.

% To goal of this tutorial is to % (a) Explore the effects of violating the normality and equal variance % assumptions by considering the rate of false positives for the test % (the percent of time the ANOVA finds a difference in group means when % there is none). % (b) We consider how each of these assumptions is affected by sample size, % equal/unequal group numbers, and population characteristics. clear all close all clc % Set significance value alpha = 0.05; %%%%%%%%%%%%%%%%%%%%%%% Part 0: The ANOVA Test %%%%%%%%%%%%%%%%%%%%%%%%%%%% % Three test groups are randomly sampled from normal populations with % equal variance; one group has a different mean. We run an ANOVA on the % random samples 1000 times. iterations = 1000; % Conduct a 1-way ANOVA, where a p-value < 0.05 indicates a significant

www. statisticsassignmentexperts.com email us at info@ statisticsassignmentexperts.com or call us at +1 520 8371215

% difference in means between any two (or more) groups. for i = 1:iterations % Draw G1 = 2 G2 = 2 G3 = 3 data = three test groups (each with 100 samples) + 2.*randn(100,1); % Normal distribution with mean = 2; SD = 2 + 2.*randn(100,1); + 2.*randn(100,1); % Normal distribution with mean = 3; SD = 2 [G1, G2, G3];

% Run one-way ANOVA, suppress table output [p_0] = anova1(data,[],'off'); % Collect p-values in a vector pval_0(i) = p_0; % Collect data G1_ALL (i,:) = G2_ALL (i,:) = G3_ALL (i,:) = end % Visualize the averaged data figure('Name','Demonstration of ANOVA test') PlotHistogramData(mean(G1_ALL,1), mean(G2_ALL,1), mean(G3_ALL,1)) title('Averaged Samples from Three Normal Populations') % Calculate the percentage of p-values that incorrectly find no difference false_0 = find(pval_0 >= alpha); PercentFalse_0 = length(false_0)/iterations*100; % Plot the p-values across tests figure('Name','Results of Demonstration ANOVA') PlotHistogramPval(pval_0) title(sprintf('Demonstration Data\n False: %f %%',PercentFalse_0)); clear G1 G2 G3 G1_ALL G2_ALL G3_ALL % Results of the ANOVA show that a significant difference between group % means is found with a very low rate of a false negative (< 5%). %%%%%%%% Part 1. Exploring Non-Normality %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Case 1: As a baseline measure, we test for a difference in mean between % three groups of data generated from the same normal distribution with % mean = 2 and SD = 2. When we run an ANOVA test with all groups having the % same (large) number of samples = 100, we expect that there will be no % significant difference in the means. % Draw three test groups (each with 100 samples) and conduct ANOVA for i = 1:iterations % Define test groups of data G1 = 2 + 2.*randn(100,1); G2 = 2 + 2.*randn(100,1); samples G1; G2; G3;

www. statisticsassignmentexperts.com email us at info@ statisticsassignmentexperts.com or call us at +1 520 8371215

G3 = 2 + 2.*randn(100,1); data = [G1, G2, G3]; % Perform one-way ANOVA using built-in MATLAB function [p_AllNormal] = anova1(data,[],'off'); % Collect data samples in vectors pval_AllNormal(i) = p_AllNormal; G1_ALL (i,:) = G1; G2_ALL (i,:) = G2; G3_ALL (i,:) = G3; end % Calculate percentage of iterations that are false positives false_AllNormal = CalculatePercentFalse(pval_AllNormal, alpha, iterations); % Visualize the data figure('Name','Visualization of Data Samples Including Non-normal Populations') subplot(2,3,1) PlotHistogramData(mean(G1_ALL,1), mean(G2_ALL,1), mean(G3_ALL,1)) title('Averaged Samples from All Normal Populations') clear G1 G2 G3 G1_ALL G2_ALL G3_ALL % % % % % % Given that we start with data sampled from the same, normalized distribution, we expect the group means to be statistically equivalent and predict that 95% of p-values should be greater than the significance level, alpha = 0.05. The ANOVA test finds a significant difference in the means between the three samples approximately 5% of the time, as expected.

%-------------------------------------------------------------------------% Case 2: Now we consider the case when one of the data groups is % non-normal, but all population means are expected to be equal. The third % data group is sampled from an exponential distribution with mean = 2. The % ANOVA is balanced, where each group has the same (large) number of % samples = 100; To identify the effect only of the normality parameter, % the variance is kept the same. The variance of the normal populations is % 4, and the variance of the exponential distribution is 4. % % % % % Characteristics of Exponential distribution 1/lambda = mean; 1/(lambda*lambda) = variance; mean = 2; lambda = 1/2; variance = 4; sd = 2;

for i = 1:iterations G1 = 2 + 2.*randn(100,1); G2 = 2 + 2.*randn(100,1); G3 = random('exp',2,100,1); data = [G1, G2, G3];

www. statisticsassignmentexperts.com email us at info@ statisticsassignmentexperts.com or call us at +1 520 8371215

[p_wExpLarge] = anova1(data,[],'off'); pval_wExpLarge(i) = p_wExpLarge; G1_ALL (i,:) = G1; G2_ALL (i,:) = G2; G3_ALL (i,:) = G3; end % Calculate percentage of false positives false_wExpLarge = CalculatePercentFalse(pval_wExpLarge, alpha, iterations); % Visualize the data subplot(2,3,2) PlotHistogramData(mean(G1_ALL,1), mean(G2_ALL,1), mean(G3_ALL,1)) title('Non-Normal Exp. Group, Lg. Equal Size: n = 100') clear G1 G2 G3 G1_ALL G2_ALL G3_ALL % Of all iterations, approximately 5% return false positive results showing ANOVA % test is robust to non-normality, given the same (large) number of samples % in each group. % -----------------------------------------------------------------------% Case 3: We consider non-normally distributed data when we have equal, but % small sample sizes = 20. for i = 1:iterations G1 = 2 + 2.*randn(20,1); G2 = 2 + 2.*randn(20,1); G3 = random('exp',2,20,1); % Mean of exponential distribution is 2 data = [G1, G2, G3]; [p_wExpSmall] = anova1(data,[],'off'); pval_wExpSmall(i) = p_wExpSmall; G1_ALL (i,:) = G1; G2_ALL (i,:) = G2; G3_ALL (i,:) = G3; end % Calculate percentage of false positives false_wExpSmall = CalculatePercentFalse(pval_wExpSmall, alpha, iterations); % Visualize the data subplot(2,3,3) PlotHistogramData(mean(G1_ALL,1), mean(G2_ALL,1), mean(G3_ALL,1)) title('Non-Normal Exp. Group, Sm. Equal Size: n = 20') clear G1 G2 G3 G1_ALL G2_ALL G3_ALL %-------------------------------------------------------------------------% Case 4: When we have unequal sample sizes and data from a % non-normal distribution. The ANOVA test is unbalanced and we do not

www. statisticsassignmentexperts.com email us at info@ statisticsassignmentexperts.com or call us at +1 520 8371215

% expect it to be robust against non-normality. for i = 1:iterations group = {'G1','G1','G1','G1','G1','G1','G1','G1','G1','G1',... 'G1','G1','G1','G1','G1','G1','G1','G1','G1','G1',... 'G1','G1','G1','G1','G1','G1','G1','G1','G1','G1',... 'G2','G2','G2','G2','G2','G2','G2','G2','G2','G2',... 'G2','G2','G2','G2','G2','G2','G2','G2','G2','G2',... 'G3','G3','G3','G3','G3','G3','G3','G3','G3','G3',... }; G1 = G2 = G3 = data 2 + 2.*randn(30,1); 2 + 2.*randn(20,1); random('exp',2,10,1); % Mean of exponenetial data is = [G1', G2', G3'];

[p_wExpUnequalGrp] = anova1(data, group,'off'); pval_wExpUnequalGrp(i) = p_wExpUnequalGrp; G1_ALL (i,:) = G1; G2_ALL (i,:) = G2; G3_ALL (i,:) = G3; end % Calculate percentage of false positives false_wExpUnequalGrp = CalculatePercentFalse(pval_wExpUnequalGrp, alpha, iterations); % Visualize the data subplot(2,3,4) PlotHistogramData(mean(G1_ALL,1), mean(G2_ALL,1), mean(G3_ALL,1)) title('Non-Normal Exp. Group, Unequal Size: n(G1,G2,G3) = 30,20,10') clear G1 G2 G3 G1_ALL G2_ALL G3_ALL %-------------------------------------------------------------------------% Case 5: Now the non-normal data is taken from the Poisson distribution % with equal mean and variance. We specify the Poisson distribution by % selecting the following parameters: % % % % lambda = mean; lambda = variance; mean = 2; lambda = 2; variance = 2; sd = sqrt(2);

% In this case, two small groups (n = 15) come from a non-normal Poisson % distribution for i = 1:iterations G1 = 2 + sqrt(2).*randn(15,1); G2 = 2 + sqrt(2).*randn(15,1); G3 = random('poiss',2,15,1);

www. statisticsassignmentexperts.com email us at info@ statisticsassignmentexperts.com or call us at +1 520 8371215

data = [G1, G2, G3]; [p_wPoissSmall] = anova1(data,[],'off'); pval_wPoissSmall(i) = p_wPoissSmall; G1_ALL (i,:) = G1; G2_ALL (i,:) = G2; G3_ALL (i,:) = G3; end % Calculate percentage of false positives false_wPoissSmall = CalculatePercentFalse(pval_wPoissSmall, alpha, iterations); % Visualize the data subplot(2,3,5) PlotHistogramData(mean(G1_ALL,1), mean(G2_ALL,1), mean(G3_ALL,1)) title('Non-normal Poisson Group, Sm. Equal Size: n = 15') clear G1 G2 G3 G1_ALL G2_ALL G3_ALL

www. statisticsassignmentexperts.com email us at info@ statisticsassignmentexperts.com or call us at +1 520 8371215

You might also like