81 views

Uploaded by Fanny Sylvia C.

- Nonlinear Regression
- S64_Rideability Prediction for HMA Overlay Treatment of Flexible and Composite Pavements for Louisiana_LTC2013
- Statistics - Glossary _ Coursera
- A Novel Sensitivity Analysis Method in Structural Performance of Hydraulic Press
- Data Anna 1
- Dummy
- Ejemplos del libro Métodos cuantitativos para negocios
- Final Thesis
- ANOVA 1
- Production and Characterization of Fungal Amylase
- 39-Development of a cutting tool condition monitoring system.pdf
- Thin Layer Drying of sliced Mango
- ANOVA Example
- Gardner 2006 Analysis of Variance With Categorical and Continuous Factors Beware the Landmines
- Excel Regression Analysis Output Explained
- Regression Analysis in SPSS
- Basic Concepts of Probability and Statistics in the Law
- CHAPTER 8 SIMPLE LINEAR REGRESSION
- Regression
- Anova Assignment

You are on page 1of 11

This chapter discusses the analysis of variance model for two categorical explanatory variables. In

particular, it discusses the case where one of the factors is a blocking variable. Two-way ANOVA

can be analyzed as a regression model with two categorical explanatory variables. Each categorical

variable is represented by a set of indicator variables.

generally measured by collecting data at several levels of the factor.

• Each combination of levels from a set of factors comprises a treatment. If only one factor is

present, then the treatments are just the levels of that factor.

Blocking

Suppose we are measuring corn yield in a field experiment for 4 varieties (A, B, C, D). A square

field will be divided up into 16 plots and varieties randomly assigned to plots. Suppose also there is a

moisture gradient running East-West across the field. Random assignment of varieties to plots might,

by chance, end up assigning more of the plots for variety A to the East side of the field than the West

side and vice-versa for variety B. More importantly, the moisture gradient in the analysis causes

there to be great variation in yields within each variety; if we can adjust for the moisture gradient,

then we can more easily detect any difference between varieties.

• One way to adjust for the moisture gradient is to divide the field into 4 blocks of 4 plots each

from east to west and then randomly assign all 4 treatments within each block. The block factor

is then included in the analysis.

• If treatments are assigned randomly to plots within each block, then this is called a randomized

complete block design. (Note: the “complete” refers to the fact that every treatment appears in

every block).

combination. The corn experiment is balanced because there is exactly one plot in every one of

the 16 variety-block combinations. How would the design look if we wanted two plots in every

combination?

page 2

• Blocking is very similar to the idea behind doing matched pairs for comparing two treatments.

In fact, matched pairs is a special case of blocking where there are only two treatments and the

pairs are the blocks. The analysis of matched pairs took advantage of the blocking by analyzing

the difference within each pair. The analysis of a block design in ANOVA does so by including

Block as a factor in the ANOVA.

• We are primarily interested in making inferences about the treatment variable in a randomized

block design. The blocking variable is included simply to make us better able to detect a

treatment effect – we’re not usually interested in testing for a block effect, because we assume

there is one – that’s why we blocked.

• Both Variety and Block can be modeled with three indicator variables, say V1, V2 and V3 and

B1, B2 and B3. How might we define these indicator variables?

• Write the regression model for mean yield with only the main effects of Variety and Block.

How many coefficient parameters are there in the model?

• According to this model what is the mean yield for each of the 16 Block-Variety combination?

Treatment differences

A B C D A-D B-D C-D

Block 1

Block 2

Block 3

Block 4

page 3

What would a plot of mean yield versus variety look like, with each block having a separate line (see

example on p. 383). This is sometimes called a “profile plot.”

• What would the regression model be if we included the Block by Variety interaction? How

many coefficient parameters are there in the model?

• According to this model what is the mean yield for each of the 16 Block-Variety combination?

Treatment differences

A B C D A-D B-D C-D

Block 1

Block 2

Block 3

Block 4

page 4

• What would the profile plot look like?

• Note: we can’t estimate σ 2 in the model with Variety by Block interaction because there are no

within cell replicates. This model has 16 coefficient parameters plus σ 2 and only 16

observations. Without an estimate of σ 2 , we can’t carry out statistical inferences. Our choices

would be: a) don’t include the interaction, b) include replicates within each Variety by Block

combination. What are the advantages/disadvantages of each?

• Observations within each cell (a “cell” means a particular combination of levels of the two

factors) are independent observations from a normal distribution.

• The cell samples are drawn independently of each other (or there is random assignment to cells).

page 5

Case study 13.1: Intertidal Seaweed Grazers

8 blocks, 6 treatments, 2 replicates per block by treatment combination.

A convenient graphical representation for the data in a two-way classification is the one in Display

13.7 on p. 383, sometimes called a “profile plot.” In SPSS, such a plot can be gotten by:

Graphs…Line…Multiple; choose “Other summary function.” The default function is “Mean” which

is what is desired. You can also obtain the same plot from Graphs…Interactive…Line. This plot

illustrates differences between treatments, between blocks, and treatment by block interaction (if

there is no interaction the profiles are parallel).

The plot below has the blocks in numerical order. The plot on p. 383 has the plots ordered from

smallest mean to largest. An advantage of the latter plot is that it makes it clear that there is more

variability in the means as the means increase (left to right). This suggests that nonconstant variance

might be a problem. To get a plot with the blocks in a different order in SPSS, we would create a

new variable with values 1 to 8 which indicates the desired order of the blocks (for example, 1 would

be for the block with the smallest mean and 8 for the largest). Then use values labels to indicate that

“1” is really “Block 1” and “8” is really “Block 4.”

Treat

CONTROL

f

75.00 fF

L

Lf

LfF

Cover

50.00

Dot/Lines show Means

25.00

0.00

BLOCK 1 BLOCK 3 BLOCK 5 BLOCK 7

BLOCK 2 BLOCK 4 BLOCK 6 BLOCK 8

Block

We should also examine the model assumptions through a residual analysis. To fit a two-way model,

you could create all the indicator variables necessary (7 for block and 5 for treatment plus the 35

products for interaction), and use the Regression procedure, but it’s much easier to use

Analyze…General Linear Model…Univariate. Block and Treatment are entered as “Fixed

factors.” Residuals can be saved under “Save.” The default model includes the interaction; the

model is specified under “Model.”

A residual plot (p. 384) confirms the suspicion of nonconstant variance. Since the responses are

percentages between 0 and 100, which can be converted to proportions between 0 and 1, it’s not too

surprising that the variance is not constant since the variance of a binomial proportion is

page 6

p (1 − p ) / n which is not constant and is greatest for p=.5. The logit transformation is often

useful for proportions. If Y is a proportion, then

⎛ Y ⎞

Logit(Y) = ln⎜ ⎟

⎝1− Y ⎠

The quantity Y/(1-Y) is called the odds ratio since it represents the odds of an event whose probability

is Y.

Remembering to divide Cover by 100 before taking the logit, the profile plot (p. 385) is much

“improved” and there is less evidence of interaction. A residual plot also indicates fewer problems:

Tests of Between-Subjects Effects

Type III Sum

Source of Squares df Mean Square F Sig.

Corrected Model 188.462a 47 4.010 13.241 .000

Intercept 145.854 1 145.854 481.618 .000

Treat 96.993 5 19.399 64.055 .000

Block 76.239 7 10.891 35.963 .000

Treat * Block 15.230 35 .435 1.437 .121

Error 14.536 48 .303

Total 348.853 96

Corrected Total 202.999 95

a. R Squared = .928 (Adjusted R Squared = .858)

Type III sum of squares (the default) indicates that the sum-of-squares for each effect is gotten by

comparing the full model (Treat + Block +Treat*Block) to the model without that effect in it; thus,

• for Treat*Block, compare the full model to the model Treat + Block

page 7

• for Treat, compare the full model to the model Block + Treat*Block

• for Block, compare the full model to the model Treat + Treat*Block

The latter two tests really make no sense since a model with Treat*Block, but not Treat, doesn’t make

sense. Therefore, only the test for the Treat*Block interaction makes sense. Unfortunately, some

people naively use these tests to test for the main effects. There are other options:

If the Treat*Block interaction is not significant, leave it out and refit the model Block + Treat. Then

the test of Treat makes sense (we’re not generally interested in the test of Block). We might also

carefully examine the profile plot, to make sure it’s reasonable to leave out Block*Treat even if it’s

not significant (not significant does not necessarily mean it’s zero).

If the Treat*Block interaction is significant, or if we simply want to conservative and not assume it’s

zero, we can:

1. Test Treat by comparing the model Block to the full model Block+Treat+Block*Treat (this is

not discussed in the text but is advocated by some authors)

2. Realize that an interaction means that the effect of Treat is different in different blocks, and

examine the effect in each block separately. This makes more sense than number 1.

In the Seaweed Grazers example, since the interaction is not significant, and the profile plot indicates

treatment effects that are somewhat consistent across blocks, we might fit the model Block + Treat:

Tests of Between-Subjects Effects

Type III Sum

Source of Squares df Mean Square F Sig.

Corrected Model 173.232a 12 14.436 40.252 .0000

Intercept 145.854 1 145.854 406.691 .0000

Block 76.239 7 10.891 30.368 .0000

Treat 96.993 5 19.399 54.090 .0000

Error 29.767 83 .3586

Total 348.853 96

Corrected Total 202.999 95

a. R Squared = .853 (Adjusted R Squared = .832)

• Note that the SS for Treat and Block have not changed, but that the SS for Error has. It has

become the SS for Error in the full model plus the SS for Treat*Block. The reason the SS for

Block and Treat are unchanged is that the design is balanced (equal samples sizes in all cells).

In unbalanced designs, the SS do change which makes the choice of an appropriate analysis

more important.

page 8

• There is very strong evidence (P<.0001) that there is a difference in the mean log regeneration

ratios among the treatments. There is also very strong evidence of a block effect, but that is

of less interest; we expected a block effect; that’s why blocking was used.

• The Block main effect should always be in the model even if it’s not statistically significant.

This is because we believe there is a block effect (that’s why we blocked) even if we don’t

find strong evidence of it in our particular experiment. It’s just as with paired data: we would

always use a paired t-test and wouldn’t ever use the two-sample t even if there didn’t appear

to be differences between the pairs of subjects.

First, to get a table of cell means as in Display 13.12 on p. 388, in General Linear Model…

Univariate use Options…Descriptive statistics. This will also give the Block averages and

Treatment averages.

Descriptive Statistics

Block Treat Mean Std. Deviation N

BLOCK 1 CONTROL -1.5118 .42920 2

f -1.6217 .66331 2

fF -2.0491 .20949 2

L -3.1781 .00000 2

Lf -3.2103 .37594 2

LfF -4.2435 .49731 2

Total -2.6357 1.07644 12

BLOCK 2 CONTROL -.9424 .45723 2

f -1.3077 .71783 2

fF -1.9659 .32712 2

L -2.5145 .10207 2

Lf -3.1138 .51234 2

LfF -3.2103 .37594 2

Total -2.1758 .95409 12

BLOCK 3 CONTROL 1.1123 .57146 2

f .2220 .20076 2

fF -.1206 .17053 2

L -.3108 .89607 2

Lf -1.5569 1.07022 2

LfF -2.5326 .30964 2

Total -.5311 1.33233 12

BLOCK 4 CONTROL 2.8480 .13640 2

f 1.8382 .35717 2

fF .6382 .50401 2

L -.8068 .26558 2

Lf -.5215 1.13616 2

LfF -1.9262 .93410 2

Total .3450 1.76499 12

BLOCK 5 CONTROL -.2716 .55397 2

f -.6857 .03174 2

fF -.6844 .51138 2

L -1.3995 .97761 2

Lf -2.6290 .44605 2

LfF -2.8480 .13640 2

Total -1.4197 1.10962 12

BLOCK 6 CONTROL .7107 .54860 2

f -.1836 .37290 2

fF -.4062 .11793 2

L -1.2292 .60677 2

Lf -.6639 .54031 2

LfF -1.8914 .43246 2

Total -.6106 .91913 12

BLOCK 7 CONTROL -.7851 .94036 2

f -.0809 .28425 2

fF -.7354 .22629 2

L -2.5969 .21863 2

Lf -2.5852 .83836 2

LfF -2.3799 .79843 2

Total -1.5272 1.16484 12

BLOCK 8 CONTROL .2837 .23134 2

f -.6898 .22280 2

fF -1.2481 1.19167 2

L -1.6601 .10534 2

Lf -1.7544 .33664 2

LfF -2.7656 .25297 2

Total -1.3057 1.06369 12

Total CONTROL .1805 1.39899 16

f -.3137 1.07482 16

fF -.8214 .95985 16

L -1.7120 1.02149 16

Lf -2.0044 1.13986 16

LfF -2.7247 .83100 16

Total -1.2326 1.46179 96

page 9

• The General Linear Model procedure will give you estimates of and standard errors for some

specific types of linear combinations of cell means, but not for arbitrary linear combinations as

was possible in the One-way ANOVA procedure.

• Pairwise comparisons of means for each factor, along with some specific types of contrasts, can

be obtained in a couple of different ways which are discussed further down.

To obtain SE’s for arbitrary linear combinations of cell means, as on p. 389, you must do the work by

hand using the formulas we learned in Chapter 6, pp. 154-7. The key is that the estimate of the

common cell standard deviation σ is MSE from your final model. In the Seaweed Grazers

example, the final model is BLOCK + TREAT and MSE = .3586 = .599 with 83 d.f. This is used

for s p in the formulas of Chapter 6.

Example: In number 1, p. 389, the text examines the effect of large fish on the regeneration ratio

through the following contrast in the treatment means (why is it a contrast?):

µ fF − µ f µ LfF − µ Lf

γ1 = +

2 2

This is estimated by the same function of the sample means. Note that the sample means are pooled

over blocks so the sample size for each mean is 16. Therefore,

g= + = + = −0.6140

2 2 2 2

(12 )2 (− 12 )2 (12 )2 (− 12 )2 1

SE(g) = MSE + + + = .599 = .1497

16 16 16 16 16

Thus a 95% confidence interval for the contrast is -.614 ± t 83 (.975)(.1497) = -.614 ± 1.989(.1497) =

-.614 ± .298 = -.912 to -.316.

Can you reproduce the estimates and standard errors for the remaining contrasts on p. 389?

page 10

• Choose Post Hoc on the General Linear Model window. You can get comparisons of Block

means and/or Treatment means. The text recommends (Sec. 13.5.5, p. 401) the Tukey-Kramer

procedure, as in one-way ANOVA, which is listed as “Tukey” in SPSS. Post Hoc will not give

results if there are only two levels of a factor; I have no idea why (there is only one comparison,

but no reason it can’t be done). You can use the Contrasts option (described below) with

Simple chosen, to get the SE and confidence interval for the difference in the two means.

• Some pairwise comparisons are also available by choosing Options, putting Treatment under

Display Means for: and checking Compare main effects. However, only “LSD”,

“Bonferroni” and “Sidak” are available; “Tukey” is not. In addition, while this procedure will

give the exact same confidence intervals as Post Hoc (if the same procedure is chosen) for

balanced designs (same number of observations in every cell), it won’t when the design isn’t

balanced. Stick with Post Hoc and don’t mess with this procedure.

• Some other types of contrasts can be obtained through the Contrasts option on the General

Linear Models window. We can define a set of contrasts to be evaluated for each factor. The

choices are limited to the ones listed. Simple gives contrasts that compare each level with a

reference level (either the first or last level). Deviation gives the deviation of the mean of each

level from the overall mean. The deviations are what are displayed as the “Block effect” and

“Treatment effect” in the last column and row of Display 13.12. In the Seaweed experiment for

the Treatment variable, the deviation contrasts have the form

µ1 + µ 2 + … + µ 6

µ i − µ where µ = is the mean of all the treatment means.

6

• Note on profile plots: the General Linear Models window also has a Plots.. option which gives

“Profile” plots. However, it doesn’t plot the observed means, but the estimated means under the

fitted model. While it will give the “right” plot when a full factorial model has been selected

and the design is balanced, I recommend using Graphs…Line…Multiple as described earlier.

page 11

Pairwise comparisons in the presence of an interaction

• If the Block*Treat interaction is significant and appears important, comparison of treatment

means over all blocks may not be meaningful, since the treatment means are averages over all

blocks. A negative effect in one block could be offset by a positive effect in another block so

that there appears to be no effect when averaging over all blocks.

• The presence of an interaction doesn’t mean that averaging over blocks is necessarily

meaningless. If the treatment effects are in the same direction in all blocks, but simply differ

somewhat in size, then averaging over blocks may still be useful. Also, with large sample sizes,

the interaction may be statistically significant but small in size.

• If an interaction is present and is important, then you should compare means within blocks.

You can use the procedure for arbitrary linear combinations of cell means to do this (Section

13.3.4; also described on a previous page of these notes). Just remember that the sample sizes

are the sample sizes for the cells involved in the comparison (for example, in the Seaweed

experiment, to compare two treatments within block 1, the samples sizes are both 2). You can

see that the resulting SE’s will be much larger than if we can pool across blocks.

• It’s possible that some intermediate model may describe what’s going on with an interaction

present. For example, perhaps the treatment effects are very similar for all blocks but one, so

we might analyze those blocks together.

The General Linear Models procedure uses the linear regression approach to estimating the

parameters of the model. Therefore, it works properly for unbalanced designs. It is not necessary to

use the Regression procedure in SPSS with user-defined indicator variables to analyze the Pygmalion

data, as described in the text (Section 13.4, p. 392). For example, the General Linear Models will

give the regression estimate for treatment effect discussed in Section 13.4.3.

- Nonlinear RegressionUploaded bymaracaverik
- S64_Rideability Prediction for HMA Overlay Treatment of Flexible and Composite Pavements for Louisiana_LTC2013Uploaded bywalaywan
- Statistics - Glossary _ CourseraUploaded byShubham Sharma
- A Novel Sensitivity Analysis Method in Structural Performance of Hydraulic PressUploaded byanirudhmehlawat
- Data Anna 1Uploaded bySuaeba Nur
- DummyUploaded byBathu
- Ejemplos del libro Métodos cuantitativos para negociosUploaded byKaren Marian García Ramírez
- Final ThesisUploaded byPrince Wamiq
- ANOVA 1Uploaded bybhartisha
- Production and Characterization of Fungal AmylaseUploaded byPuspita Puspita
- 39-Development of a cutting tool condition monitoring system.pdfUploaded byCaio Cruz
- Thin Layer Drying of sliced MangoUploaded byRuel Peneyra
- ANOVA ExampleUploaded byLuis Valens
- Gardner 2006 Analysis of Variance With Categorical and Continuous Factors Beware the LandminesUploaded byhoorie
- Excel Regression Analysis Output ExplainedUploaded byYuvraaj Singh
- Regression Analysis in SPSSUploaded byriungumartin
- Basic Concepts of Probability and Statistics in the LawUploaded byLoc Bao Do
- CHAPTER 8 SIMPLE LINEAR REGRESSIONUploaded byNur Iffatin
- RegressionUploaded byfa2heem
- Anova AssignmentUploaded byMadiha Ghouri
- regression after midterm 5.pptUploaded byNataliAmiranashvili
- Ch 6. Simple RegressionUploaded byDrake Adam
- description: tags: table 02Uploaded byanon-785349
- 10.1.1.193.6236Uploaded byVirojana Tantibadaro
- 204-391-3-PBUploaded byIsmail Hasan Saputra
- Projec MPBUploaded byJankim Hazarika
- SLRAssumGraphsUploaded byKhushbakht Kanwal Baloch
- multiple-anovaa.pptxUploaded byCj Queñano
- Wages Oct 97Uploaded byBathu
- MIS ProjectUploaded byRoobesh K Nair

- Chapter 8Uploaded byFanny Sylvia C.
- Chapter 21Uploaded byFanny Sylvia C.
- Charles TaylorUploaded byFanny Sylvia C.
- Chapter 9Uploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Chapter 6Uploaded byFanny Sylvia C.
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- R Matrix TutorUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan

- SCR Dimming Technology in LED Lighting_Final.pdfUploaded byCarlitos Dueñe
- CIT231 Ch12 HomeworkUploaded bydimmestbeauty
- human behavior at workUploaded byDeepak Kumar
- FORMULAS for CALCULATING RATES.pdfUploaded byĐào Hùng
- 07_EMappingUploaded byحساب فارغ
- ch7Uploaded bykatral-jamerson-9379
- Nucl.Phys.B v.606.pdfUploaded bybuddy72
- hw3_solUploaded byAnkur Agarwal
- Crystal LatticeUploaded byShazia Farheen
- Three Dimensional Carbonate Reservoir Geomodeling 2015 Petroleum ExplorationUploaded byVishakha Gaur
- gb springUploaded byshivkumar1587
- Lecture1(1).pdfUploaded byHien Ngo
- pptUploaded byDivya Ailani
- stats.docxUploaded byMaestro Jay
- DCT Model Question PaperUploaded bysafu_117
- Representation Paradigms for Masonry Modulation in BIM ToolsUploaded byFabFabFabFab
- NotesUploaded bySoumya Rampal
- Mathematical Model for Multi-Component Forces and Torque Determination in Friction Stir WeldingUploaded bysales_amcoengineering
- McLarenEtAl-EdGamesVsProblemSolving-IJGBL2017Uploaded byRina Juanti Sahara
- Application to a Biped Walking RobotUploaded byAPRIL_SNOW
- Finite Element Analysis.docxUploaded byUmair Nawaz
- Calc16.3Uploaded byJeoff Libo-on
- 2 Interval EstimationUploaded byDeepika Padukone
- Thermodynamics of AdsorptionUploaded byFren2008
- Why Scientific Models Should Not Be Regarded as Works of FictionUploaded byilinxx
- Value IpoUploaded byEndi Nugroho
- AbstractUploaded byariezuhuy
- Polar V800Uploaded byAdrián
- 2 1 packetUploaded byapi-327561261
- Investigation of Solar Photo Voltaic Simulation SoftwaresUploaded bypriyanshi12