You are on page 1of 7

```{r}

summary(anes_sub$presapp_econ_x)
```

From the summary statistics for the variable "pressapp_econ_x", we see that there
are large percentage of people who disapprove stongly and larger, but slightly less,
percentage of people who approve srongly of Barack Obama handling the economy.
The percentage of people who diasapprove not strongly are the least.
```{r}
summary(anes_sub$presvote2012_x)
```

From the summary statistics, we see that majority of the majority of the people
have voted for Barrack Obama. People's vote for Mitt Romney is comparitively less
when compared to Barack Obama. Very few people have voted for other
presidential candidates

--title: "project title"


date: "date"
output:
html_document:
theme: cerulean
---

<!-- For more info on RMarkdown see http://rmarkdown.rstudio.com/ -->

<!-- Enter the code required to load your data in the space below. The data will be
loaded but the line of code won't show up in your write up (echo=FALSE) in order to
save space-->
```{r echo=FALSE}
load(url("http://bit.ly/dasi_anes_data"))
```

<!-- In the remainder of the document, add R code chunks as needed -->

### Introduction:

Is there any relationship between the people's opinion on Barrack Obama's handling
the economy and their voting for him during the subsequent presidential election?

I care for this research because I want to know whether the people's opinion
influence their voting pattern and verify the relationship. If relationship is found, this
research will be important to others in the future, as one can predict the Presidential
candidate based on the people's opinion of the President handling the economy in
the past. The findings from the research will also be informative for the first time

voters and this will be one of the factors that the first time voters will consider
before casting their valuable votes.

### Data:

The data were collected during the 2012 Presidential election in two phases, two
months before the election (Pre-election survey) and two months after the election
(Post-election survey), from the same respondents. The sampled respondents were
people of all racial and ethnic backgrounds. The data were collected either through
face-to-face interview or through the internet. The face-to-face interviews were
conducted using an address based, stratified, multi-stage cluster sample in 125
census areas.The surveys were conducted at the respondent's home. Upon the
reaching the area, the interviewer randomly selected the respondents with the help
of the computer. The interviewer conducted the interview with the help of touchsensitive tablet computer. Most of the questions were administered in CAPI
(Computer-aided personal interview) and CASI(Computer-aided self interview).
Internet respondents were invited to the survey by email and they followed a link to
a web address to complete the survey over the internet. The survey began with a
consent screen that provided information about the study and included contact
information for the study administrators. Data were collected only from respondents
who were 18 or above by the election day. All the randomly sampled respondents
who were personally interviewed and all those people who replied to the survey via
email are the cases. Each case is U.S. citizen who is 18 years or older by the
election day. For this research, the two variables that are under study are
"presapp_econ_x" and "presvote_2012_x". Both are categorical variables. The first
variable records the people opinion on Barack Obama handling the economy and
was a Pre-election question. It has four categories namely "Approve Strongly",
"Approve Not Strongly", "Disapprove Not Strongly", "Disapprove Strongly". The
other variable records the candidate that the respondent has voted for and was a
Post- election question. It has three categories namely, "Barack Obama", "Mitt
Romney" and "Other". For analysis purposes, the ANES data set was subsetted to
just these two variables. This study is an observational study as it merely records
the responses of each respondents. From the sampling design we see that, the
respondents were randomly selected as opposed to randomly assigned. This allows
us to conclude that it is an observational study. The population of interest are all the
U.S citizens who are 18 years or older. The findings from this analysis can be
generalized to the population because the respondents were randomly selected
from the population for both face-to-face and internet survey. A potential source of
bias that might prevent generalizability is non-response. If any respondent refuses
to give response to any question, then it creates a non-response bias which might
prevent the analysis being generalized to the population. Another source of bias is

when a respondent interviewed during the pre-election interview, not being


available for post-election interview. Since the data were collected using the method
of random sampling, causal links cannot be established between the variables of
interest. Causal links can be established only when the data are collected using
random assignment of subjects to treatment and control groups.

### Exploratory data analysis:


We first clean the data to remove the NA values. Next we create a subset calles
anes_sub which contains just the two variables that is of interest to us.
```{r echo= TRUE}
clean_data = subset(anes,anes$presapp_econ_x !='NA' & anes$presvote2012_x !
='NA')
anes_sub = clean_data[,c("presapp_econ_x","presvote2012_x")]
```

```{r echo=TRUE, fig.height=2.5}


barplot(table(anes_sub$presapp_econ_x),xlab="People's
Opinion",ylab="Frequency",main="Distribution of People's opinion")
```

From the barplot, we see that there are large number of people in both the
categories "Approve Strongly" and "Disapprove Strongly", relatively small in
"Approve Not Strongly" category and the least in "Disapprove Not Strongly"
category.

```{r echo=TRUE, fig.height=2.5}


barplot(table(anes_sub$presvote2012_x),xlab="Choice of Presidential
candidate",ylab="Number of people",main="Distribution of People's coice of the
Presidential candidate")

```

From the barplot, we see that Barack Obama is prefered by majority of the people.
Large number of people have voted for Mitt Romney, but less in comparison with
the number of people who prefer Barack Obama. Number of people who have voted
for the other presidential candidates are very less in comparison to the other two
candidates.

We now take look at the cross-tabulation of these two variables.

```{r}
table(anes_sub)
```

From the contigency table, we see that, among the people who have voted for
Barack Obama, large number of people approve of him, both strongly and not
strongly, handling the economy. We also see that, among the people who have
voted for Mitt Romney, large number of people disapprove of Barack Obama
handling the economy, both srongly and not strongly. It also seen that, among the
people who have voted for other candidates, very few approve srongly of Barack
Obama handlin the economy, many disapprove strongly and almost equal number
of people in the other two categories.

```{r,fig.width=10}
econ_vs_presvote = table(anes_sub$presapp_econ_x,anes_sub$presvote2012_x)
mosaicplot(econ_vs_presvote, xlab = "People's Opinion", ylab = "Presidential
candidate", main = "People's opinion vs People's choice of the presidential
candidate")
```

From the mosaic plot, we see that among the people who approve, both srongly and
not strongly, almost all have voted for Barack Obama. Among the people who

disapprove not srongly, almost equal number of people have voted for Obama and
Romney with few voting for other candidates. Among the people who disapprove
strongly, almost all people have voted for Romney with few people voting for
Obama and other candidates.

From the exploratory data analysis, we see there is some relationship between the
people's opinion on the president handling the economy and the votes recieved by
him during the subsequent presidential election. In order to confirm this
relationship, we need to see whether the relationship is statistically significant.

### Inference:

In order to evalute the relationship between two categorical varaibles, where atleast
one varaible has more than two levels, a Chi-square test of independence should
be used. The test statistic used is a chi-square statstic which is calculated as, sum
of each {(Observed - Expected)^2/(Expected)}.If the P-values is less than 5%, we
conclude that there is enough sample evidence to reject null hypothesis H0 in
favour of the alternative hypothesis.

Since the respondents were selected through the method of random sampling, the
sample size( which is 5914) is less than 10% of the population and since each case
contributes to one cell in the table, we conclude that the conditions for
independence is met. If we calculate the expected frequency for each cell under the
assumption that the varaibles are independent, then we observe that all expected
counts are greater than 5 (which is shown in output below). Hence both our
conditions are met to caary out Chi-square test of independence.

To verify whether the relationship is statistically significant we test the two


hypothesis,
Null Hypothesis, H0: There is no relationship between the two variables, i.e the two
varaibles are independent
Alternate Hypothesis , HA: There is relationship between the two varaibles, i.e the
two varaibles are dependent.

```{r}
inference(anes$presapp_econ_x,anes$presvote2012_x,type="ht",alternative="great
er",method="theoretical",est="proportion",sum_stats=FALSE,eda_plot=FALSE,inf_pl
ot=FALSE)
```

### Conclusion:

Insert conclusion here...

You might also like