You are on page 1of 2

Module 3 90%

Interpreting Scatterplots Example residual explanation: The residual


Form: linear, curved, clusters, no pattern plot of least squares regression line for CPU 99%
Direction: positive, negative, no direction usage time on LINET shows no pattern in
Strength: How well points fit “form” residuals. Residuals are randomly scattered
Reduce Margin of error:
Outliers that are deviations from pattern around the zero line in a horizontal band. The
Reduce confidence level
Correlation Coefficient is the measure of the regression line appears to be a good
Increase Sample size
direction and strength of a linear relationship. representation between TIME and LINET.
Reduce variation in S
- r = (1-(n-1)) MODULE 4
(Sample size should be over 30)
- Between -1 and 1 Sampling approaches use statistical techniques
Example: analysis shows with 95% confidence
Values of r close to 0 imply weak or no linear to produce representative samples from a large
that the user’s average mean time between
association. X and Y are independent, knowing population.
strokes is within
X tells nothing about Y. Sampling plan states objectives of study, target
0.225 and 0.375 seconds. The observed
- WATCH out for Outliers population, method of sampling, set of
attempt has average time of 0.39 seconds that
Rule of thumb – Values larger than .8 or characteristics and variables of interest, and
is significantly
smaller than -.8 represent VERY strong more.
higher than the 95% confidence range. We
correlation. Anything between -.3 and .3 Probabilistic sampling chooses individuals or
conclude that the recent attempt cannot be
indicated weak or no association. units in the sample using random procedure.
explained by
Least Squares Regression Analysis Eliminates bias. Following are types:
chance variation and therefore is an
Describes the linear relationship and defines Simple random sample – a set of individuals
unauthorized intrusion.
how one variable changes the other. chosen in such a way that everyone has the
Module 7
Y = a + bX same change of being chose.
Sampling Distribution of Sampling Proportions
Interpreting Slope Stratified Sampling – population is divided
We use proportions when dealing with
Slope b represent the increase or decrease in Y into groups of similar units (or strata).
categorical data – when there are two possible
for any one-unit increase in X. Convenience Sampling individuals are chosen
outcomes for variables.
Coefficient of determination R2 based on ease with which the data can be
For large samples, the sampling distribution of
R2 varies between 0 and 1. collected. Conclusions can’t be generalized to
the sample proportion is approximately
If close to 1, then there is almost a perfect entire population.
linear association between Y and X. Voluntary Response Sampling individuals that normal with mean and
Values of R^2 larger than .6 show that x is a select themselves to respond to data collection
strong predictor of y, and that the regression appeal.
Standard deviation
line on x provides a good explanation of Observational Study vs. Experiment
In large samples, the std. deviation of the
changes in y. If R^2 values less than .25, Observational is when research has no control
sample proportion is computed by replacing
indicate x is not a good predictor for y, and that on the factors, can’t assign subjects to variables
the population proportion p with the sample
there is a large fraction of variation in y that is of interest. Just observe. (Hard to prove
estimate
not explained by changes in X. causation)
Experiment – researched is able to control
factors and assign individuals to diff.
treatments. (Can prove cause and effect).
Experimental Design: Large Sample Confidence Interval for
Controlled Experiments – comparing several population mean p:
treatments. One is many times a placebo. 95%
Randomization: Subjects are selected randomly
to each treatment. Reduces bias.
Replication – Each treatment is done on many 90%
units to reduce chance of variation.
Double-blind process – Both the researches and 99%
Residuals subjects know what treatment they are in –
Vertical deviations – difference between the reduces unconscious bias.
observed y and the y predicted by the least Module 6
squares regression line. Point estimate – single value computed from a Module 8
Average of residuals is equal to zero. sample to estimate the value of a population Test of significance –a probability that
1st – Linear – randomly scattered around the parameter. Point estimates alone provide no measures how well the data support the
zero line information on accuracy of estimate. hypothesis.
2nd – Problem – non-linear association – curved Confidence Interval – range of values that are Null and Alternative Hypotheses
pattern – indicates the relationship between y believes to contain the value of the population Null hypothesits – no difference
and x is curved and not linear. parameter. Alternative hypothesis – must be true if null
3rd – Problem – non-constant variance – Sample mean denoted by – hypothesis is false.
variation in y increases as x increase (fan Population mean denoted by - Ex. Verify mean response time of call center is
shaped) Central Limit Theorem – given the size, the less than 15.
population mean and standard deviation, and
the size is large, the sampling distribution of the
sample mean is approx. normal with the center Null and Alternative Hypotheses on
in the population mean and the standard population average
deviation equal to (st. dev / (sqrrot of n))

To get more accurate estimates, we should Calculating a Test Statistic


increase the sample size. How “far away” is the sample mean from
Confidence Interval general mean? If Z is 2 or 3 standard errors
95% away from general mean, then it is Far away.
P-Value – probability that test statistic is equal
to or more extreme than the value obtained
from data when null hypothesis is true. The
smaller the p-value, the stronger evidence
against the null hypothesis.
- If p-value is LESS than =0.05, the null
hypothesis IS REJECTED at 5% significance level.
The test result is called “statistically
significant”.
-If p-value is LESS than =0.01, the null
hypothesis IS REJECTED at 1% significance level.
The test result is called “highly significant”.
- If p-value is LARGER than 0.05, the null
hypothesis CANNOT BE REJECTED. The test is
“not significant”.

For One sided test – Ha < mean


Find the entry in standard normal table
corresponding to z*.
For One sided test – Ha > mean
Find the entry in standard normal table
corresponding to z* and do 1 – that value.
For Two sided test – Ha != mean
Find entry in table corresponding to absolute
value of -|z*| and compute 2 * that value

Example answers:
- The p-value is extremely small, so we can
conclude that data provide very
strong evidence against the null hypothesis.
The test is highly significant indicating that the
mean response time is significantly lower than
15 minutes.
- Note that the p-value is 0.053 that is slightly
larger than 0.05. So we cannot
reject the null hypothesis at 5% significance
level. However, since the p-value < 0.10, we
could reject it at 10% significance level. Thus
the test does not provide enough evidence at
5% level to support the claim that there is a
significant change in the mean percentage of
visits from search engines. More data are
necessary to evaluate the test hypotheses.

Confidence Interval based decision rule:


Construct a 95% interval. Use the sample
mean, not general mean to compute it.
IF null hypothesis is NOT contained in 95% C.I.
then reject the null hypothesis at 5%
significance level in favor of Ha.
IF null hypothesis IS contained then you cann
reject the null hypothesis at 5% significance
level.
WHY? The sample data provide a range of
plausible values for the general mean. If is
not contained in the C.I. there is strong
evidence that the population value is not equal
to , and therefore we can reject HO.

You might also like