You are on page 1of 9

Alright. We're still in it. Still in week six. This is module four of five.

We're going to talk about research designs. Again, it's part of this idea of statistical or research inference, scientific inference. But we're going to talk about how to design a study as opposed to run some statistics. And again, this is a key point much like identification. Not taught enough, in my opinion, and I want social epidemiologists to think long and hard about these issues. Of course, you can take PhDs that focus on research design. So I'm just going to highlight a few that I think are important for at least beginning our conversation. Most people learn about statistics in college, or somewhere like that. Some learn in advance work some fancy statistical models. But not enough people learn about research designs. I want to say again the research designs are more important, at least for social EPI than fancy statistics. The idea, in my opinion, is to have a simple design. A simple design that can identify your effect with intense content. It turns out if you look at the history of public health advancement, epidemiologic advancement, you'll see that the most important studies often had a simple design. Doesn't need to be complicated, it needs to be good, or intense content. Here's one of my favorite kind of quotes, I've adapted a little bit for our purposes here. The quality and strength of evidence provided by a research study is determined largely by its design. Excellent methods of analysis will not salvage a poorly designed study. Aspects of design include the framing of the questions, and assessment of measure. Bias and precision. An observational study that begins by examining an outcome variable, is a formless undisciplined investigation that lacks design. Design anticipates analysis. In other words, summarizing Paul

Rosenbaum, design is most important. If you just get some data, and start analyzing it, you're going to make mistakes. You're not going to be able to draw the correct conclusions. A quick note on traditional epidemiologic designs, which I'm not going to focus a lot on. As some of you may know, there are sort of 3 major designs in conventional or traditional epidemiology. The cross-sectional survey. We survey a bunch of persons, perhaps a representative sample or not. Maybe from a state fair, maybe from a national survey. At one point in time, and we ask them about their health. Good or bad. And we might ask them about things they were exposed to. Did you smoke? Did you eat this? Did you have that toxin in your environment? Cross-sectional survey. Second is the cohort study, sometimes called a panel study, sometimes a longitudinal study outside epidemiology. A cohort study is one where we get a bunch of people who do not have a disease, We follow them through time. Sometimes 20-30 years or more. And we see who gets the disease. Then we see the persons who got the disease all smoked. Or all did this or did that. That's a cohort study. Finally, epidemiology is the case-control study, where we look at people who have the disease, and we find people who, to be simple, don't have the disease, but could have. That's the case-control. It turns out case-control studies are just an efficient, cost effective way to do a cohort study. But there's lots involved. In any case, these are conventional epidemiologic studies. I thought I should mention them in this course but I don't want to focus on them here. I want to focus here on experiments because I want you to think about experiments. The key point Experiments, doing things, are used to test our knowledge of the world and correct it, if our knowledge is

found to be lacking. This is important. We run experiments to say, do we understand the world correctly? And then if we don't, correct our knowledge. And this is an iterative process. Well, what is an experiment? There are two importatn points. One is we, and I'll define these terms. We have an exogenous intervention or treatment, and we reandomly assign subjects to that treatment or not. So exogenous treatment and randomization. Let me explain. Exogenous means something from the outside, like a meteor plummeting towards Earth. It's not something that comes from within. So in a epidemologic cohort study, a virus might get introduced to a person's life. But it's not as if the person made the virus. If the person made that virus, that treatment, that would be an endogenous exposure as opposed to exogenous. And turns then all the inference and statistics get all jumbled up. So the key idea of an experiment is that something happens from the outside. Often the scientist herself is manipulating something that subjects can't control. That's how experiments work. The second component is random, due to chance alone. Often we randomly assign people to treatment or control conditions, as opposed to letting them choose. If we let persons choose, then they are automatically different because some choose treatment and some choose control, and therefore there's automatically a difference between the treatment and control group, and then we have a competing explanation. Is it the treatment that changed the health, or is it their selection, something in their value system, or whatever, that is the cause of the disease. That's an identification problem. So it's better to randomize. It balances out competing explanations otherwise called confounding. The word random can get confused. There are two ways it's used in research. One is random sampling.

That's good, that's one way to get a representative sample from a given target population. Random sampling. The way I'm using it here though, is random assignment. We are randomly assigning, I have a coin, heads treatment, tails control. That's not about sampling, it's assignment. Here, in experiments, we're talking about assignment. Now it's true, some of the best experiments randomly sample people and then randomly assign them to treatment or a control. But experiments entail random assignment, not necessarily random sampling. Random assignment balances both measured things that are different and unmeasured things. So that's important to know. One of the troubles of conventional epidemiologic studies is that we always wonder about the stuff that we didn't measure. Randomization takes care of that. This all comes from, again, Ronald Fisher's work in crops, agricultural experiments. It's also worth noting that randomization only works when we have lots of people. You can flip a coin three times and get three heads. It's not very improbable. But do it three thousand times and you're likely to get approximately 1,500 heads unless, of course, your coin is biased. So experiments are manipulating something exogenously and randomly assigning people. An observational study is not the same. In an observational study, we still have the exogenous intervention or exposure. But there's no random assignment. So people are inherently different between the treatment and the control group. Some selected it, some did not. So that's an observation study. And that's why observational studies are so difficult to analyze: because we have to worry about competing explanations. It's an identification problem. In an observational study, we find the causes -- we find the effects, and we wonder what the causes could be. We see that this group has a disease, and that group doesn't Hey, is it because the first group smoked and the second group

didn't? Or is it because of their age, or something else? The food they eat? These are the competing explanations. So in observational studies, we're sort of working backwards. In summary,experimental designs find effects of causes. We're doing something hypothesized to change health and we see what happens. Experiment designs we manipulate, we the scientist, the researcher, manipulates the treatment. We do it at random, random assignment, and we rule out competing explanations by the design of the study. By contrast, observational studies. We find the causes of the effects, that is, we see the outcome, we wonder what caused it. We're working backwards. There's no manipulation of the treatment. There's no random assignment and we rule out competing explanations by analysis. Recalling an earlier quote that design trumps analysis, this is why experiments are superior when possible. Budget and ethics often prevent them, but when possible, to observational studies. I want to briefly go with you some simple research designs. Here's some nomenclature. O represents an observation or a measurement. X an intervention or treatment. This is what we do to people. R indicates that the groups were randomized and NR not randomized. Here's something simple. This is the posttest only design. Here, we can see that these two groups, 1 group, and 2 group, are randomized. We have an intervention, and we observe them after the intervention. With enough people, we can compare these two groups, because they are counterfactual substitutes. Posttest only design works, with lots of people. We have the same values of the outcome but for the treatment if there is a treatment effect. Here's the very simple design, very similar design but in this case we also have pretests. Look, there's pretest, intervention, posttest for one group. Pretest and post-test for the other or control group.

Both of which randomize, which makes these identical. Therefore, we can compare this, the outcomes as we did before, or compare this before and after. Ideally, we do both. That's why this is a stronger design than the former. When it comes to medical science, the double-blind randomized control experiment is so-called gold standard. Why is this? Because we're doing something, subjects can't control. We are randomly assigning groups of subjects to the treatment to not Importantly, the researchers don't know if the pill they're giving is an active drug or a placebo. And of course, neither do the patient subjects. Therefore the idea of this must be the drug, gets all jumbled up, which is a good thing scientifically. Only the primary researchers or statisticians know who got what. That's why this is the gold standard for the Food and Drug Administration. The American, sort of body that says, this works, this doesn't, is so important. The double blind RTC, however, is not every hardly feasible for social epidemiology. We have to find alternative methods. One of the best alternative methods is what's called the group or community randomized trial. Here, running an experiment but on intact social groups. So we might take this office, that office, ten, 12 offices or schools or neighborhoods. Assign half to some treatment condition: better food policy, poverty reduction, vaccines, and the other half, not. Then we compare those outcomes. Now what's important here is, we're doing the intervention to the whole group. The community, the office, the school, and the idea is that the members within that group are interacting. This creates conditions such as herd immunity, which might be biologic, and it might also be social. Kids in our school don't behave that way. Those kinds of social norms are so powerful. So the group randomized trial is in my view the most important.

Social epidemiologic design. One of the troubles though is it's hard to find lots of groups. So typically, this is a bit of jargon, group randomized trials are underpowered. So that means it's hard to disentangle the effect we observe from statistical chance. Natural experiments are getting yet another. Here, mother nature, mother nature is doing something to persons, not in the control of their researchers but it's making persons equivalent on average. Sometimes policy does this. A federal policy, a government policy says this group gets that, that group doesn't. This is exogenous to them. Those two groups happen to have been equal, that's why policy change can be a good experiment. Human lottery, such as military drafts also make good natural experiments.And Economists in particular have studied the outcome of such things. Talk briefly about observational designs. Let's talk about how we can design policy change and some of the hurdles to understanding how a policy affects an outcome. Let's look at this diagram. We see that the light blue group has some level of health outcome variable. It's lower than the red group. The difference is arbitrarily 6, and down here there's time. And this vertical line indicates some change in policy in some given point in time. So if we see such a situation that after a policy the two groups are different we can say, hey the policy caused a change in these two groups. One group is healthier than the other. The red is healthier than the blue. This is true, except for if this is the situation, that both groups went up. Now we have a situation where both groups went up equivalently from before to after the policy change and now we have no observable difference. Fair enough, but there might be more information going on. We could have this situation. One group went up, one group didn't. Or we have lots of noise before the policy, that these groups were moving around quite a bit. Averages or some sort of statistic was

changing quite a bit before the policy change. And so what we're observing after the policy change could be due either to the policy change or random noise. This is an identification problem. Same things's true here. Did this policy change, change the values? On average, yes, but is this some sort of secular trend? It's hard to know. In this case, this looks like a policy made the red group go up and the blue group go down. This would be rather convincing evidence, rarely happens in practice. The trouble is, if we only observe this sort of situation we saw earlier. If we don't observe the before and the after, we can draw inappropriate conclusions. This is what makes our work so difficult. I want to close with a few issues and assumptions. Experiments tell us that some cause yields some effect. It's something hypothesized. It rarely tells us how something came about, the mechanism. A full explanation requires deep understanding of the mechanisms. Some of the best work in social epidemiology today is looking at the mechanisms of change. So what should you do? This is difficult territory. Here's my advice, seek causal effects. Don't be satisfied with correlations or sometimes called associations. Causal effects are difficult to establish, but if you try for them you'll do better work. Second, always think, imagine, an experiment that'll help you clarify what data you have and what data you don't have. The difference between your ideal imagined experiment and the data you have are all the sources for bias. Conduct an analysis that mimics the desired analysis in the hypothetical experiment. That will help you rule out competing explanations. So this is about imagine an experiment even if you've not run one, and ruling out competing explanations by design to the extent possible. This is the best way to advance research

in social epidemiology.

You might also like