You are on page 1of 6

Thanks for staying with me.

This is still week six, design and inference, and this is module three of five. We're going to talk about identification, more technically, effect identification. This is a key idea in social epidemiology, if not elsewhere. And it's not often taught or discussed so I hope you find this module useful. Well what's the core idea? We're back to our grapes. The core idea of effect identification is we are interested in causal inference. That is, we are not saying the grapes are sour. Even though it's very difficult to establish that some virus caused some outcome, or some social condition caused some health outcome, we still want to try. And that's what effect identification is about, trying to get that causal inference. What's the idea about? Well, we say that identification means the effects are discernible. We can tease out the desired effect of the virus or the social condition from all the competing explanations. If there are competing explanations you run into trouble, we say the effect is not identified. Competing explanations are ruled out and confirmation bias is avoided in the best studies. And effect is identifiable if it is theoretically possible To learn the true value of the parameter, sort of mother nature's law if you will. Right? The parameter, how the world works. If we can learn the true value of the parameter when the sample size approaches infinity. So this is important. Often we talk about statistics The issue is how much precision is there in our sample size, in our effect estimate. We'll talk about this more in the last module for this week. But for now, we want to assume complete information, sample size of infinity, five million, huge. So statistically in precision Is put aside. This means that issues of confidence intervals and p-values, if you know what those are, if you don't, we'll get to

those, don't have role when we talk about affected annofication. So, so often in research, researchers focus on confidence intervals, and p-values, and estimating effects. With identification, none of that comes into the picture. We are simply trying to decide if we have complete data. In that case can we find the effect we're looking for. That's why the idea is important. I call it pre-statistical. So apart from imprecision, if more than one explanation for your effect exists, there's more than one explanation, you have an identification problem. So imagine this little diagram. We have X 1 causing Y. Great, but If it's also true that X2 causes Y, how do we know if it's X1 alone or if it's X1 and X2 or if X1 doesn't play any role at all? Similarly, X3, now what do we do? There's two competing explanations for the cause of X1 on Y. And of course there can be infinite things, things coming from right field, if you wish, x sub n. So when we have this kind of scenario, our affect of x one on y is not identified. We cannot discern or disentangle The effect of x1 on y from the competing explanations. [SOUND] Some of the great workers in this area are of course Bill Cochran and Charles Manski. Manski the economist who I mentioned earlier has a fabulous book: Identification Problems in the Social Sciences. I think it's a critical piece of work for social epidemiologists. In Epidemiology proper, the foundational paper is from Sandra Greenland and James or Jamie Robins. And this is their 1986 paper in International Journal of Epidemiology. This is some tough sledding, but for the advanced students you may enjoy this work. More recently from economics, this text mostly harmless econometrics, as I mentioned earlier, has really some accessible insights into what effect identification is. I want to talk about that a little bit here. The authors of that text, Mostly Harmless

Econometrics. Talk about the importance of hypothetical experiments. Here is a thought idea, much like counter factuals. It's a thought experiment. And the point they want to make is. If we cannot imagine a hypothetical experiment that will identify or disentangle the effect we're interested in, the causal effect on the outcome. If we can't do that, then we certainly can't do it in the real world with real data. So if you can't imagine the hypothetical experiment, it's no sense in trying to do it in the real world. You can't identify that causal effect. It's a fool's run, if you will. And importantly one may think about these hypothetical experiments, it's OK in terms of a thought experiment to imagine no budget constraints and no ethical constraints. That's not how to practice in the real world, but that's how to think about it. Angrist and Pischke say that if a research question can't be answered by even a hypothetical experiment, we have a fundamentally unidentifiable question. This is important because this tells us what questions can and cannot. Be answered in social epidemiology. The fact is, not all questions can be answered. Some of you know about this area of neighborhood effects research. What's the affect of a neighborhood context on a health outcome. Really important social epidemiologic question. I have concluded that question with current data is fundamentally unidentifiable. Can't disentangle it, what's causing what, competing explanations, too difficult with the data we can actually observe. Importantly, effect identification is nothing new to physicians. The idea is very akin, very similar, to the physician's differential diagnosis. And here's a photograph of my favorite TV physician, Gregory House. Differential diagnosis involves making a list of possible diagnoses. So you make a list of possible causes. Then attempting to eliminate each one, given the data, given the symptoms of the patient.

When you have eliminated all but one, you have the causal effect. The cause of the illness. That's exactly how the effective identification rules works. We have potential competing causes, potential competing causes, we rule them out and the one that remains is the identified effect. That's what's causing the disease or the health outcome. It's also true that effect identification is exactly the way Sherlock Holmes, that great detective from fictional. Literature works. He wants to eliminate the causes. When you have eliminated the impossible whatever remains, however improbable must be the truth. Interestingly, Arthur Conan Doyle, the author of Sherlock Holmes, was a physician himself and clearly understood the benefits or strengths of differential diagnoses. Same things happening with effect identification. Assume sample size to infinity, therefore all statistics are gone and can we discern the effect? There are three criteria, core criteria to identify an effect. Positivity, exchangeability, and consistency. Let me describe these. Positivity simply means that the subjects,or people in our study, could be exposed. If the subjects could not be exposed, they should not be in the data. Now some will be and will not be exposed, that's good that's the comparison. But if the comparison group could not be exposed. They didn't live in the right place. They weren't eligible for some reason. Then they should not be [INAUDIBLE]. They are uninformative. Second, exchangeability. We've talked about this. Exchangeability means we can flip flop those exposed and not exposed. And still get the same answer. There's no confounding. That's what exchangeability means. Finally, consistency. Consistency's the criteria that the treatment given to persons or groups is the same. So we're not going to compare the effect of aspirin and morphine together, unless

we're going an aspirin and morphine experiment. The treatment has to be the same. If you're doing something in one neighborhood, and something else in another neighborhood, and yet something else in a third neighborhood, to see the health outcomes, you really have three different experiments. And so your effect is not identifiable from those three data points. That's consistency. How about an example? This is from some of my own work. We are interested in the effect of poverty on American Indian infant mortality here in the city of Minneapolis, Minnesota. Infant mortality for Native Americans was six times the rate of white Americans here in the city of Minneapolis. Now, ideally, hypothetically, we'd like to compare an infant's risk of mortality when born to a mother in poverty to the same child's risk to the same mother if she wasn't born in poverty. That is the counterfactual. That's the hypothetical ideal. Can't be done, but that's what we want. Second best, no ethics, no budget, thought experiment. We take a bunch of Native American women, randomize some to poverty, and some to not poverty. And see the infant mortality rates. That would identify our effect. Ethically challenged, obviously. But that helps us understand what we want to get to. That's the ideal experiment. What we can do, in the real world, is compare the birth outcomes of Native American women living in poverty to native American women not living in poverty all in the city of Minneapolis at a certain time. So we're left with what we have. This is what's possible. The trouble is, it turns out, I'll add unfortunately, that there's not enough Native American women in Minneapolis not living in poverty. So to whom do we compare the Native American women living in poverty to? African American women living in poverty? White women living in poverty? Men? It's not clear. This is the identification problem. We don't have enough data -- we don't

have enough informative data to make our proper comparison. It's not about statistics. We can collect as much data as we want, but the fact is there are not enough people in that actual comparison group. If we compare The infant mortality rates of Native American women, American Indians in Minneapolis to say African American women living in Minneapolis. We might find a difference in infant mortality rates. That's important, but the question will be is the difference we observe due to poverty. Or perhaps something to do with biological race. Whatever that might me. So here's the question. We have two competing explanations: race or poverty. Can't tell them apart. That's an identification problem. That's what we can't overcome. That's why this idea of identification is so critical to social epidemiology. [SOUND] [BLANK_AUDIO]

You might also like