Professional Documents
Culture Documents
ANALYSIS OF VARIANCE FOR FIXED EFFECT MODELS PROC GLM for General Linear Models . . . . . . . . . . . . . PROC ANOVA for Balanced Designs . . . . . . . . . . . . . . Comparing Group Means with PROC ANOVA and PROC GLM PROC TTEST for Comparing Two Groups . . . . . . . . . . .
ANALYSIS OF VARIANCE FOR MIXED AND RANDOM EFFECT MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 ANALYSIS OF VARIANCE FOR CATEGORICAL DATA AND GENERALIZED LINEAR MODELS . . . . . . . . . . . . . . . . . . . . . 59 NONPARAMETRIC ANALYSIS OF VARIANCE . . . . . . . . . . . . . . 60 CONSTRUCTING ANALYSIS OF VARIANCE DESIGNS . . . . . . . . . 61 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
52
Chapter 4
CATMOD GENMOD
GLM
MIXED
54
55
Tests of Effects
Analysis of variance tests are constructed by comparing independent mean squares. To test a particular null hypothesis, you compute the ratio of two mean squares that have the same expected value under that hypothesis; if the ratio is much larger than 1, then that constitutes signicant evidence against the null. In particular, in an analysisof-variance model with xed effects only, the expected value of each mean square has two components: quadratic functions of xed parameters and random variation. For example, for a xed effect called A, the expected value of its mean square is
E MS(A) = Q
+
Under the null hypothesis of no A effect, the xed portion Q( ) of the expected mean square is zero. This mean square is then compared to another mean square, say 2 . The ratio of the MS(E), that is independent of the rst and has expected value e two mean squares
F = MS(A) MS(E)
has the F distribution under the null hypothesis. When the null hypothesis is false, the numerator term has a larger expected value, but the expected value of the denominator remains the same. Thus, large F values lead to rejection of the null hypothesis. The probability of getting an F value at least as large as the one observed given that the null hypothesis is true is called the signicance probability value (or the p -value). A p -value of less than 0.05, for example, indicates that data with no real A effect will yield F values as large as the one observed less than 5% of the time. This is usually considered moderate evidence that there is a real A effect. Smaller p -values constitute even stronger evidence. Larger p -values indicate that the effect of interest is less than random noise. In this case, you can conclude either that there is no effect at all or that you do not have enough data to detect the differences being tested.
56
i = 1; 2; : : : ; n
where yi is the response for the ith observation, k are unknown parameters to be estimated, and xij are design variables. Design variables for analysis of variance are indicator variables; that is, they are always either 0 or 1. The simplest model is to t a single mean to all observations. In this case there is only one parameter, 0 , and one design variable, x0i , which always has the value of 1:
yi
= =
0 0i + i 0+ i
The least-squares estimator of 0 is the mean of the yi . This simple model underlies all more complex models, and all larger models are compared to this simple mean model. In writing the parameterization of a linear model, 0 is usually referred to as the intercept. A one-way model is written by introducing an indicator variable for each level of the classication variable. Suppose that a variable A has four levels, with two observations per level. The indicator variables are created as follows: Intercept 1 1 1 1 1 1 1 1 The linear model for this example is A1 1 1 0 0 0 0 0 0 A2 0 0 1 1 0 0 0 0 A3 0 0 0 0 1 1 0 0 A4 0 0 0 0 0 0 1 1
yi =
0+
To construct crossed and nested effects, you can simply multiply out all combinations of the main-effect columns. This is described in detail in Specication of Effects in Chapter 30, The GLM Procedure.
57
Linear Hypotheses
When models are expressed in the framework of linear models, hypothesis tests are expressed in terms of a linear function of the parameters. For example, you may want to test that 2 , 3 = 0. In general, the coefcients for linear hypotheses are some set of Ls:
H0: L0
0+
L1
1+
+ Lk
k =0
Several of these linear functions can be combined to make one joint test. These tests can be expressed in one matrix equation:
H0: L
=0
For each linear hypothesis, a sum of squares (SS) due to that hypothesis can be constructed. These sums of squares can be calculated either as a quadratic form of the estimates SSL
= 0 =
or, equivalently, as the increase in sums of squares for error (SSE) for the model constrained by the null hypothesis SSL
= 0 =
SSE(constrained) , SSE(full)
This SS is then divided by appropriate degrees of freedom and used as a numerator of an F statistic.
58
59
60
References
scription of PROC RANK in the SAS Procedures Guide and in Conover and Iman (1981).
61
References
Analysis of variance was pioneered by R.A. Fisher (1925). For a general introduction to analysis of variance, see an intermediate statistical methods textbook such as Steel and Torrie (1980), Snedecor and Cochran (1980), Milliken and Johnson (1984), Mendenhall (1968), John (1971), Ott (1977), or Kirk (1968). A classic source is Scheffe (1959). Freund, Littell, and Spector (1991) bring together a treatment of these statistical methods and SAS/STAT software procedures. Schlotzhauer and Littell (1997) cover how to perform t tests and one-way analysis of variance with SAS/STAT procedures. Texts on linear models include Searle (1971), Graybill (1976), and Hocking (1984). Kennedy and Gentle (1980) survey the computing aspects. Conover, W.J. and Iman, R.L. (1981), Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics, The American Statistician, 35, 124129. Fisher, R.A. (1925), Statistical Methods for Research Workers, Edinburgh: Oliver & Boyd. Freund, R.J., Littell, R.C., and Spector, P.C. (1991), SAS System for Linear Models, Cary, NC: SAS Institute Inc.
62
The correct bibliographic citation for this manual is as follows: SAS Institute Inc., SAS/STAT Users Guide, Version 8, Cary, NC: SAS Institute Inc., 1999. SAS/STAT Users Guide, Version 8 Copyright 1999 by SAS Institute Inc., Cary, NC, USA. ISBN 1580254942 All rights reserved. Produced in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. U.S. Government Restricted Rights Notice. Use, duplication, or disclosure of the software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.22719 Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, October 1999 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. The Institute is a private company devoted to the support and further development of its software and related services.