Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 10: Drawing Conclusions From Data

Fron%ers
of Computa%onal Journalism
Columbia Journalism School Week 10: Drawing Conclusions from Data November 19, 2012
Week 10: Drawing Conclusions

Fooled by Randomness Condi%onal Probability Compe%ng Hypotheses Correla%on and Causa%on
Is something causing cancer?
Cancer rate per county. Darker = greater incidence of cancer.
Which of these is real data?
Global temperature record
How likely is it that the temperature won't increase over next decade?
From The Signal and the Noise, Nate Silver
It is conceivable that the 14 elderly people who are reported to have died soon aPer receiving the vaccina%on died of other causes. Government ocials in charge of the program claim that it is all a coincidence, and point out that old people drop dead every day. The American people have even become familiar with a new sta%s%c: Among every 100,000 people 65 to 75 years old, there will be nine or ten deaths in every 24-hour period under most normal circumstances. Even using the ocial sta%s%c, it is disconcer%ng that three elderly people in one clinic in PiYsburgh, all vaccinated within the same hour, should die within a few hours thereaPer. This tragedy could occur by chance, but the fact remains that it is extremely improbable that such a group of deaths should take place in such a peculiar cluster by pure coincidence. - New York Times editorial, 14 October 1976
Assuming that about 40 percent of elderly Americans were vaccinated within the rst 11 days of the program, then about 9 million people aged 65 and older would have received the vaccine in early October 1976. Assuming that there were 5,000 clinics na%onwide, this would have been 164 vaccina%ons per clinic per day. A person aged 65 or older has about a 1-in-7,000 chance of dying on any par%cular day; the odds of at least three such people dying on the same day from among a group of 164 pa%ents are indeed very long, about 480,000 to one against. However, under our assump%ons, there were 55,000 opportuni%es for this extremely improbable event to occur 5,000 clinics, mul%plied by 11 days. The odds of this coincidence occurring somewhere in America, therefore, were much shorter only about 8 to 1 - Nate Silver, The Signal and the Noise, Ch. 7 footnote 20
The probabili%es of polling

If Romney is two points ahead of Obama, 49% to 47%, in a poll with 5.5% margin of error, how likely is it that Obama is actually leading?
Given:
R = 49%, O=47% MOE(R) = MOE(O) = 5.5%
How likely is it that Obama is actually ahead? Let D = R-O = 2%. This is an observed value, and if we polled the whole
popula%on, we would see a true value D'. We want to know probability that Obama is actually ahead, i.e. P(D' < 0) Margin of error on D MOE(R) + MOE(D) = 11% because they are almost completely dependent, R+O 100. For beYer analysis, see hYp://abcnews.go.com/images/PollingUnit/MOEFranklin.pdf Gives MOE(D) = 10.8%
P(Obama ahead)
P(Romney ahead)
Std. dev of D MOE(D)/1.96 as MOE is quoted as 95% condence interval = 5.5%. Z-score of -D = -2%/5.5% = -0.36 P(z<-0.35) = 0.36, so 36% chance a Romney is not ahead, or about 1 in 3.
Random Happens
"Unlikely to happen by chance" is only a good argument if you've es%mated the chance. Also: a par9cular coincidence may be rare, but some coincidence somewhere occurs constantly.

Mammograms and Cancer

Suppose I tell you: 14 of 1000 women under 50 have breast cancer If a woman has cancer, a mammogram is posi%ve 75% of the %me If a woman does not have cancer, a mammogram is posi%ve 10% of the %me If a woman has a posi%ve mammogram, how likely is she to have cancer?
From The Signal and the Noise, Nate Silver
Condi%onal probabili%es
Pr(posi%ve|cancer) = 75% Pr(posi%ve|no cancer) = 10% What is Pr(cancer|posi%ve)?
Bayes Theorem
Tells us how to go from Pr(A|B) to Pr(B|A) Pr(B|A) = Pr(A|B)Pr(B) / Pr(A)
Bayesian Mammograms
Pr(cancer|posi%ve) = Pr(posi%ve|cancer) Pr(cancer) / Pr(posi%ve)
Pr(posi%ve|cancer) = 0.75 Pr(cancer) = 0.014 Pr(posi%ve) = Pr(posi%ve|no cancer)Pr(no cancer) + Pr(posi%ve|cancer)Pr(cancer) = 0.10*0.986 + 0.75*0.014 = 0.1091
Bayesian Mammograms
Pr(cancer|posi%ve) = Pr(posi%ve|cancer) Pr(cancer) / Pr(posi%ve) = (0.75 * 0.014) / (0.1091) = 0.0962 = 9.6% chance she has cancer if mammogram is posi%ve
cancer no cancer
posi%ve
nega%ve
cancer no cancer
Pr(posi%ve|cancer) = 0.75 = N(posi%ve & cancer) / N(cancer) N(cancer) = 4 N(posi%ve & cancer) = 3
posi%ve
nega%ve
cancer no cancer
Pr(posi%ve|no cancer) = 0.1 = N(posi%ve & no cancer) / N(posi%ve) N(no cancer) = 1000 N(posi%ve & no cancer) = 100
posi%ve
nega%ve
cancer no cancer
Pr(cancer) 0.0014 = N(cancer) / N
posi%ve
nega%ve
cancer no cancer
Pr(cancer|posi%ve) = 9.6%
posi%ve
nega%ve
Get condi%onal probabili%es right

A lot of the probabili%es we're interested in are condi%onal probabili%es. It's easy not to realize this, and they are easy to get backwards (and this leads to e.g. the base rate fallacy.) Use Bayes' theorem to reverse condi%onal probabili%es.

Cogni%ve biases
Availability heuris%c: we use examples that come to mind, instead of sta%s%cs. Preference for earlier informa%on: what we learn rst has a much greater eect on our judgment. Memory forma%on: whatever seems important at the 9me is what gets remembered. Conrma%on bias: we seek out and give greater importance to informa%on that conrms our expecta%ons.
Conrma%on bias
Comes in many forms. ...unconsciously ltering informa%on that doesn't t expecta%ons. ...not looking for contrary informa%on. ...not imagining the alterna%ves.
The thing about evidence...

As the amount of informa%on increases, it gets more likely that some informa%on somewhere supports any par%cular hypothesis. In other words, if you go looking for conrma%on, you will nd it. This is not a complete truth-nding method.
Method of compe%ng hypotheses

Start with mul%ple hypotheses H0, H1, ... HN
(Remember, if you can't imagine it, you can't conclude it!)
Go looking for informa%on that gives you the best ability to discriminate between hypotheses. Evidence which supports Hi is much less useful than evidence which supports Hi much more than Hj, if the goal is to choose a hypothesis.
Quan%ed support for hypotheses

How likely is a hypothesis H, given evidence E? Or, what is Pr(H|E)? It depends on: how likely H was before E, Pr(H) how likely the E would be if H is true, Pr(E|H) how common is the evidence, Pr(E)
Bayes learns from evidence

Pr(H|E) = Pr(E|H) Pr(H) / Pr(E)
or
P(H|E) = Pr(E|H)/Pr(E) * Pr(H)

Likelihood How likely is H given evidence E? Prior How likely was H to begin with?
Model of H Probability of seeing E if H is true
Model of E How commonly do we see E at all?
Alice is coughing. Does she have a cold?

Hypothesis H = Alice has a cold Evidence E = we just saw her cough

Hypothesis H = Alice has a cold Evidence E = we just saw her cough Prior P(H) = 0.05 (5% of our friends have a cold) Model P(E|H) = 0.9 (most people with colds cough) Model P(E) = 0.1 (10% of everyone coughs today)

P(H|E) = P(E|H)P(H)/P(E) = 0.9 * 0.05 / 0.1 = 0.45
If you believe your ini%al probability es%mates, you should now believe there's a 45% chance she has a cold.
A good model has a theory of the world. Bad models, bad inferences
Method of compe%ng hypotheses, quan%ta%ve form

Start with mul%ple hypotheses H0, H1, ... HN
Each is a model of what you'd expect to see P(E|Hi), with ini%al probability P(Hi)
For each new piece of evidence, use Bayes' rule to update probability on all hypotheses. Inference result is probabili%es of dierent hypotheses given all evidence { P(H0|E), P(H1|E), ... , P(HN|E) }
Probability distribu%on over hypotheses
Wide distribu%on, no clear winner 1
Narrow distribu%on, H2 seems best 1
0 H0 H1 H2
0 H0 H1 H2

What is "causa%on"?
Y X thing in the world interac%on observable thing
How correla%on happens

X Y X Y X causes Y Y causes X
Z X Y X Y
Z causes X and Y
hidden variable causes X and Y
random chance!
What an experiment is: intervene in a network of causes
Does Facebook news feed cause people to share links?
A dicult example
NYPD performs ~600,000 street stop and frisks per year. What sorts of conclusions could we draw from this data? How?
Stop and Frisk Causa%on

Suppose you take the address of every mosque in NYC, and discover that there are 15% more stop-and-frisks within 100m of mosques than the overall average. Can we conclude that the police are targe%ng Muslims?

Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 10: Drawing Conclusions From Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 10: Drawing Conclusions From Data

Uploaded by

Copyright:

Available Formats

Fron%ers

Week 10: Drawing Conclusions

Is something causing cancer?

Cancer rate per county. Darker = greater incidence of cancer.

Which of these is real data?

Global temperature record

From The Signal and the Noise, Nate Silver

The probabili%es of polling

Week 10: Drawing Conclusions

Mammograms and Cancer

From The Signal and the Noise, Nate Silver

Get condi%onal probabili%es right

Week 10: Drawing Conclusions

The thing about evidence...

Method of compe%ng hypotheses

Quan%ed support for hypotheses

Bayes learns from evidence

P(H|E) = Pr(E|H)/Pr(E) * Pr(H)

Model of H Probability of seeing E if H is true

Model of E How commonly do we see E at all?

Alice is coughing. Does she have a cold?

Alice is coughing. Does she have a cold?

Alice is coughing. Does she have a cold?

Method of compe%ng hypotheses, quan%ta%ve form

Probability distribu%on over hypotheses

Wide distribu%on, no clear winner 1

Narrow distribu%on, H2 seems best 1

Week 10: Drawing Conclusions

How correla%on happens

hidden variable causes X and Y

What an experiment is: intervene in a network of causes

Does Facebook news feed cause people to share links?

Stop and Frisk Causa%on

You might also like