Professional Documents
Culture Documents
To illustrate dummy variables, consider the simple regression model for a posttest-only two-
group randomized experiment. This model is essentially the same as conducting a t-test on the
posttest means for two groups or conducting a one-way Analysis of Variance (ANOVA). The
key term in the model is β 1, the estimate of the difference between the groups. To see how
dummy variables work, we'll use this simple model to show you how to use them to pull out the
separate sub-equations for each subgroup. Then we'll show how you estimate the difference
between the subgroups by subtracting their respective equations. You'll see that we can pack an
enormous amount of information into a single equation using dummy variables. All I want to
show you here is that β 1 is the difference between the treatment and control groups.
To see this, the first step is to compute what the equation would be for each of our two groups
separately. For the control group, Z = 0. When we substitute that into the equation, and recognize
that by assumption the error term averages to 0, we find that the predicted value for the control
group is β 0, the intercept. Now, to figure out the treatment group line, we substitute the value of
1 for Z, again recognizing that by assumption the error term averages to 0. The equation for the
treatment group indicates that the treatment group value is the sum of the two beta values.
Now, we're ready to move on to the second step -- computing the difference between the groups.
How do we determine that? Well, the difference must be the difference between the equations for
the two groups that we worked out above. In other word, to find the difference between the
groups we just find the difference between the equations for the two groups! It should be obvious
from the figure that the difference is β 1. Think about what this means. The difference between
the groups is β 1. OK, one more time just for the sheer heck of it. The difference between the
groups in this model is β 1!
Whenever you have a regression model with dummy variables, you can always see how the
variables are being used to represent multiple subgroup equations by following the two steps
described above: