You are on page 1of 6

MATH 153 GREAT PROJECT

LUKAS KASSA

Correlation and Linear Regression


Scatterplot of Percent of low income wo vs Percent of 18-64 yr olds

Percent of low income working f

45

40

35

30

25

20
5.0

7.5

10.0
12.5
15.0
Percent of 18-64 yr olds with n

17.5

1.
The logical Predictor variable(x) is the percentage of 18-64 year olds with no HS
diploma; the logical Response variable(y) is the percentage of low income working
families.

2. The association between the percentage of 18-64 year olds with no HS diploma and the
percentage of low income working families seems to have a relatively strong, positive
linear relationship. The percentage of 18-64 year olds with no HS diploma influences the
percentage of low income working families. The more people who dont receive high
school diplomas: the more low income working families there are.

MATH 153 GREAT PROJECT

LUKAS KASSA

Scatterplot of Percent of low income wo vs Percent of 18-64 yr olds

Percent of low income working f

45

40

35

30

25

20
5.0

7.5

10.0
12.5
15.0
Percent of 18-64 yr olds with n

17.5

3.
The Slope of this model is 1.400: For every 1% increase in the Percent of 18-64 yr. olds
with no HS diploma, I expect to see a 1.4% increase in the percent of low income
working families.

4. (R-squared = 49.1%) About 49.1 % of the variation in the percent of low income
working families can be attributed to the percent of 18-64 year olds with no HS diploma
(based on model)

5. Based on what I know about correlation: I agree with the students idea that a decrease in
the percent of 18-64 year olds with no HS diploma will lead to a decrease in the percent
of low-income working families. I believe this because there seems to be a positive linear
relationship between the two variables, indicating that when one increases the other
increases, or, when one decreases the other decreases as well.

MATH 153 GREAT PROJECT

LUKAS KASSA

6. There is no validity to the students belief in this question. No information was given to
indicate that those with no high school diploma are similar in terms of their nationality,
native language, and disability status. If there was data of the above included it would
completely change everything because all of those would be variables as well. In
addition, you cant make the statement that even if they were similar in these
characteristics it would be the responsibility of these sub- groups and their advocates to
address the working poor issue themselves. Statistically, there is no real basis or
reasoning for this belief.

Confidence Intervals

7. For a researcher to collect a simple random sample of sine n=20 from the full list of
jurisdictions he should use Minitab first he should have the data set in Minitab, next he
should select calc, random data, sample from columns. Then he should enter the desired
sample size in the number of rows box on the Minitab prompt. Then he should select the
data or columns he wants the sample to be generated from and select where he wants the
new sample to be stored. Finally he should click the box that says sample with
replacement to make it a true simple random sample.
8.
Jurisdictio
n
Vermont

%LI
WF
26.2

Arkansas
Oklahoma
Missouri
Virginia

41.8
37.4
32.7
23.3

Jurisdictio
n
Connecticu
t
Ohio
Montana
Oklahoma
W Virginia

%LIW
F
21.1

Jurisdictio
n
Kansas

%LIW
F
32.0

Jurisdictio
n
N Carolina

%LIW
F
36.2

31.8
36.0
37.4
36.1

Colorado
S Dakota
Hawaii
N Dakota

27.6
31.0
25.8
27.2

Ohio
Virginia
Virginia
N Dakota

31.8
23.3
23.3
27.2

MATH 153 GREAT PROJECT

LUKAS KASSA

9. My sample MEAN= 30.46 , My sample St Dev= 5.81


It is very unlikely that and two samples would produce the same result because a random
number generator was used to select the values and replacement was used in the sampling
method to make it truly random, making the chance that another sample would randomly
select the same jurisdictions(or values) very unlikely.

10. The 90% confidence interval for N=20 sample mean=31.08% and sample st dev=5.48%
is(28.96, 33.20)

11. I am 90% confident that the true national mean percent of low-income working families
lies on the interval (28.96, 33.20)

12. I believe that an interval with a higher confidence level would be more advantageous to
the jurisdictions in general because the larger the interval is, the more confident you can
be that any given value would fall within the intervals lower and upper limit. There for if
funds are only available for the jurisdictions whose percent of low income working
families falls within the reported confidence interval, then with a higher confidence level
the interval would be wider, allowing more jurisdictions to fall on the interval.

13. I believe that if a lawmaker were to report only research results that ensure that the
jurisdiction he or she represents gets federal aid; this would be a misuse of statistics.

MATH 153 GREAT PROJECT

LUKAS KASSA

This would be a misuse because it would truly be a poor representation of the data,
whether the law maker used a poor sampling method such as convenience sampling, or
whether the lawmaker distorted the visual representation of the data it is a misuse because
as clearly stated the law maker has a bias and is trying to manipulate the data in order to
benefit the jurisdiction that he or she personally represents. The alternate reporting
practice that I would suggest is either (a) to have somebody who does not have a bias that
favors any jurisdiction conduct the research and report the results (b) ensure that a proper
sampling method is used and that the sample size is large enough to be truly
representative of the population (c) at least clearly state exactly what you are saying when
you give the visual representation of the data so it is not to be misleading.

Hypothesis Testing

14. The type of test that should be performed is a left tailed hypothesis test for the mean of
percent of low income working families. The reason the test should be left tailed is
because the claim is that the national average percent of low income working families has
improved. I believe a left tailed test is appropriate because in the context of the average
percentage of low income working families improved would mean decreased. If we have
less poor people, we have seen an improvement, therefore I believe that a left tail test
should be used to see if the average national percent of low income working families has
improved since 2011.

MATH 153 GREAT PROJECT

LUKAS KASSA

15. The Null Hypothesis:

= 31.3% (mean is equal to 31.3%)

The Alternate Hypothesis:

< 31.3% (mean is less than 31.3%)

16. The test statistic and p-value associated with the sample results are as follows
T = -1.46

P-value = 0.082

17. The result of the hypothesis test is that the p-value is less than the significance level
0.082 < .10 Therefore, we Reject the null hypothesis

18. There is sufficient evidence to support the claim that the average percent of low-income
working families has improved since 2011

You might also like