Professional Documents
Culture Documents
Null hypothesis
the general population has no change, no different or no relationship
in the context of an experiment, the IV (treatment) has NO EFFECT on the dependent variable
Alternative/Scientific Hypothesis
the general population has a change, a different or a relationship
in the context of an experiment, the IV (treatment) HAS AN EFFECT on the dependent variabl
Null hypothesis + alternative hypothesis
mutually exclusive, exhaustive i.e. cannot both be true, and one of them must be true
A larger sample size leads to a smaller standard error, which leads to a larger z-score, thus likelihood
of rejecting the null hypothesis INCREASES
A smaller standard deviation produces a smaller standard error, which leads to a larger z-score, thus
likelihood of rejecting the null hypothesis INCREASES
Page 1 of 13
The Process of Statistical Interference
The more variability, the harder it will be to reject the null hypothesis
Significant
a result is significant if it is very unlikely to occur when the null hypothesis is true
null hypothesis is thus REJECTED
Statistical Power
the probability of correctly rejecting the H0 (null hypothesis) when Ho is false / when there is a real
treatment effect / when H1 is true
power = 1 - (beta / probability of committing a type II error)
Relationships
1. as effect size INCREASES, the probability of rejecting H0 increases, which means the power of
the test INCREASES
2. as the power of a test INCREASES, the probability of a type II error DECREASES
i.e. as the probability (of correctly rejecting a null hypothesis when there is a real treatment
effect) increases, the probability (of saying there is no treatment effect when there actually is)
decreases
3. as sample size INCREASES, the probability of rejecting the null hypothesis INCREASES, which
means the power of the test INCREASES
Confidence Intervals
a range of values that we feel confident will include the true population mean
Two-Tailed Test
the critical region of the distribution is divided between its two fails
whenever a two-tailed test is used, it means there is a nondirectional hypothesis: one that does not
predict the exact pattern of results
willing to accept extreme differences that go in either direction
Test Statistics
inferential statistics: statistics that can be used as indictors of what is going on in the population
also called test statistics because they can be used to evaluate results
The Mean
The sum of the scores divided by the number of scores
Page 5 of 13
Requires scores that are numerical values measured on an INTERVAL or RATIO scale
Properties of the Mean
1. the mean is sensitive to the exact values of all the scores in the distribution (e.g. if one
score is new, it will change)
2. the mean is very sensitive to extreme scores (e.g. an extreme high scoring outlier can pull
up the entire mean)
3. the mean is least subject to sampling variations
The mean wont work when
1. data was obtained using a NOMINAL or ORDINAL scale (i.e. discrete variables)
2. the distribution has a few extreme scores (or is skewed)
The Median
median is the midpoint of the list when scores are arranged from smallest to largest
50% of the scores in the distribution have values that are equal to or less than the median
Usually, the median can be found by a simple counting procedure
1. with an ODD # of scores, list the values in order, and median is the middle score on the list
2. with an EVEN # of scores, list the values in order, and the median is the halfway between
the middle two scores
The median wont work when:
1. data was obtained using a NOMINAL scale
2. The data set is very SMALL
The median is not affected by the presence of an outlier i.e. a resilient measure of central tendency
The Mode
The most frequently occurring (category or score in the distribution if nominal data)
Corresponds to the PEAK in the frequency distribution graph
Primary value: it can be used for data measure on a nominal, ordinal, interval, or ratio scale
The mode wont work when:
1. the distribution is rectangular or bimodal/multimodal
The mode is the only measure of CT which produces a # that actually appears on the distribution
Normal Distribution
mean = median = mode
Positively Skewed
mode is smaller than the median, which is smaller than the mean
mode < median < mean
Negatively Skewed
mean is smaller than the median, which is smaller than the mode
mode > median > mean
Page 6 of 13
When to use the mean
generally considered the best measure of central tendency
computed using every score in the distribution
useful when the data fit a normal distribution
Measures of Variability
Variability
Statistical measure of the differences between scores in a distribution
The degree to which scores are spread out or close together/clustered together
Variability: the extent to which the scores in a distribution differ from each other
Measures of Variability
1. Range
difference between the largest and smallest scores in a set of data
however it does not reflect the precise amount of variability
2. Standard deviation
measures the standard distance from the mean
average deviation of scores from their mean
3. Variance
transforming variability into a standard form
measures the average squared deviation of scores from their mean
tells us how much scored as spread out, or dispersed, around the mean of the data
SS = sum of squared deviations
Chapter 14
Which test do I use?
Levels of Measurements
Nominal (Discrete)
a matter of distinguishing by name
classifies items into distinct categories with NO quantitative relationship to one another
provides the least information - refers to quality more than quantity
tells nothing about magnitude/order, and thus does not have equal intervals between values
e.g. binary category for computers of 0 and 1, course, religion, marital status, favorite TV show
e.g. 1 = male, 2 = female, height (short or not)
Ordinal (Discrete)
refers to order in measurement, order matters but the difference between values does not
indicates DIRECTION, in addition to providing nominal formation
Page 7 of 13
reflects differences only in magnitude, where magnitude is measured in the form of ranks
cannot be sure that the intervals between values are equal
scale has no true zero
e.g. movie ratings from * to *****. weight (underweight, average, overweight), Starbucks (tall,
grande, venti), letter grades or UP GRADES(???)
e.g. (express the amount of pain patients feel on a scale of 1 to 10) - A score of 7 means more pain
that a score of 5, and that is more than a score of 3. But the difference between the 7 and the 5 may
not be the same as that between 5 and 3. The values simply express an order.
Interval (Continuous)
ORDERED categories, possess EQUAL intervals, distance between values has MEANING
e.g. The difference between a temperature of 100 degrees and 90 degrees is the same
difference as between 90 degrees and 80 degrees.
NO TRUE ZERO POINT - a unit of measurement exists (+, -), zero does not signify absence of the
characteristic
measures magnitude
e.g. time of day on a 12-hour clock, calendar dates, Celsius or Farenheit temperature (even if temp
is zero, it doesnt mean there is no temperature)
Ratio (Continous)
has an absolute zero point (a point where none of the quality being measured exists & if there is a
negative value, it means the absence of the characteristic)
equal intervals between all of its values
ratios of #s reflect ratios of magnitude
express relationships between values on these scales as ratios: we can say 1 minutes is twice as
long as 1 minute
e.g. ruler: inches or centimetres, income: money earned last year, years of work experience
Parametric vs Non-Parametric
PARAMETRIC
rely on assumptions about (population represented by sample) parameters and hypothesis
about parameters
t-tests, ANOVAs
can only be used if your data allows you to compute for means and variances
thus nominal/ordinal data wont work
NON-PARAMETRIC
do not require assumptions about population parameters nor do they test hypotheses about
population parameters i.e. assumptions are not met
nominal, ordinal data can be used
We can use data in the form of frequencies rather than numerical scores
cannot compute for means and variances
Page 8 of 13
The Chi-Square Test
Chi-Square
chi-square (x2): determines whether the frequencies of responses in our sample represent
frequencies expected in the population
Not measuring a numerical score for each individual
Individuals are simply classified into categories
when HO is true, x2 = 0 i.e. frequencies of response in the sample do not differ in any meaningful
way from those we cwould expect in the untrued population
as differences between expected and obtained frequencies become greater, the value of x2
increases
when x2 is larger than the critical value, we reject H0 at p <0.05
Observed frequency: the number of individuals from the sample who are classified in a particular
category. Each individual us counted in one and only category.
expected frequency: for each category is the frequency value that is predicted from the proportions
in the null hypothesis and the sample size (n). The expected frequencies define an ideal,
hypothetical sample distribution that would be obtained if the sample proportions were in perfect
agreement with the proportions specified in the null hypothesis.
Degrees of Freedom
tells us how many members of a set of data could vary or change value without changing the value
of a statistic we already know for those data
e.g. we know mean of data, then degrees freedom tells us how many members of that set of data
can change without altering the value of the mean
The t-test
Requirements
1. Data should be interval or ratio scale (since t-test is a PARAMETRIC test)
2. The values in the sample must consist of independent observations
3. The population that is sampled must be normally distributed
Degrees of Freedom
Page 9 of 13
the greater the df, the smaller is needed to reach the f-ratio before saying you can reject Ho
the critical value of t is LARGER when the degrees of freedom are SMALLER
the fewer degrees of freedom, the more difficult it will be to reject Ho
the tower the subjects, the higher the critical value of t
Effect Size
transform t values and pdfs to a correlation coefficient, r
Confidence Intervals
represents a range of values above and below our sample mean that is likely to contain the
population mean with the probability level (usually at 95% or 99%)
e.g. mean of 20, and calculate a 95% CI equal to +/- 3.20 > we can be 95% confident that the true
mean falls somewhere within that range / 95% probability that the true population mean for no fun
would fall between 20 +/- 3.20
Analysis of Variance
evaluates the SIGNIFICANCE of the sample mean differences between two or more
treatments (or populations)
Outcome of an ANOVA tells us if the variations of the scores come from treatment (independent
variable) or just came from chance differences.
ANOVA - Type I error is maintained at a manageable level
within-groups variability: the degree to which the scores of subjects in the SAME treatment group
differ from one another i.e. how much subjects vary from others in the group
size of the differences that exist inside each of the samples (i.e. differences expected without
any treatment effect)
between-groups variability: degree to which the scores of different treatment groups differ from
one another i.e. how much subjects vary across each different level/conditions of the IV
the size of the difference between the sample means i.e. treatment effects
Notice that the three means are different; that is, they are variable.
Getting one sample from the population, and that sample is measured under different conditions
(Assumption: each treatment will now represent a certain sample)
Whats new?
eliminates individual differences NATURALLY from the between-treatments variability - same
participant in every treatment condition
removes individual differences systematically by splitting within-treatments to between-subjects
(individual diffs) and ERROR (w/o indiv. diffs and just random, unsystematic factors)
the result: similar to the independent-measures F ratio but with all individual differences removed
F-ratio of repeated-measures differs from independent measures in that it includes NO
VARIABILITY CAUSED BY INDIVIDUAL DIFFERENCES
Sources of Variability
individual differences
result of procedures
all of these together are called ERROR: individual differences, undetected mistakes in recording
data, variations in testing conditions, and a host of extraneous variables
Page 11 of 13
The Statistical Inference Process
assume that the sample of interest, or treatment groups, are DRAWN from the same population
assume that each of these samples are NORMALLY DISTRIBUTED on the DV (i.e. individual
scores are not too dispersed or different from each other)
Why? if too variable (i.e. std dev and variance too high), its difficult to detect effects of IVs
large sample size is important to approximate normality
ie assume NO DIFFERENCE and NORMALITY
Treat the samples differentially subject these sample to different levels of the IV
Afterwards measure these samples on the DV, and compute statistics to test whether the sample are
now different form each other on this variable
Page 12 of 13
CHI-SQUARE TEST OF INDEPENDENCE
IV1: presence or absence of food sample
DV2: purchase decision (buy or not to buy)
Ho: There is no association or relationship between being prior taste of a food product and deciding to
buy it
Expected frequencies correspond to NULL hypothesis i.e. there is no difference between frequency of
observations for taste variable and buy viable
Ergo, being given a taste does not make one more or less likely to buy
H1: There is an association or relationship between prior taste and decision to buy
Observed frequencies refer to ACTUAL frequencies obtained in study
alternative hypothesis: decision to buy is DEPENDENT ON being given prior taste of the food
Multiple ANOVA:
4 effect components
main effects of each IV
interactions effects of IV
effects due to ID
effects due to other sources of error
NNHST
requires specific, dichotomous, no-arbitirary hypothesis: If this (posited caused), then that (posited
effect)
IV: nominal
DV: scale
Margin of Error:
via confidence interval
Page 13 of 13