You are on page 1of 91

Basic statistics

Hui Bian Office for Faculty Excellence

Basic statistics
My contact information: Hui Bian, Statistics & Research Consultant Office for Faculty Excellence, 1413 Joyner library (temporary office) Email: bianh@ecu.edu Website: http://core.ecu.edu/ofe/StatisticsResearch/

Basic statistics
Statistics: a bunch of mathematics used to summarize, analyze, and interpret a group of numbers and observations. *It is a tool. *Cannot replace your research design, your research questions, and theory or model you want to use.
3

Basic statistics
Population: any group of interest or any group that researchers want to learn more about. Population parameters (unknown to us): characteristics of population Sample: a group of individuals or data are drawn from population of interest. Sample statistics: characteristics of sample
4

Basic statistics
We are much more interested in the population from which the sample was drawn. Example: 30 GPAs as a representative sample drawn from the population of GPAs of the freshmen currently in attendance at a certain university or the population of freshmen attending colleges similar to a certain university.
5

Basic statistics

Two types of statistics Descriptive statistics Inferential statistics

Basic statistics
Descriptive statistics: are procedures used to summarize, organize, and make sense of a set of scores or observations.

Basic statistics Inferential statistics:


are procedures used that allow researchers to infer or generalize observations made with samples to the larger population from which they were selected.
8

Descriptive statistics
Use descriptive statistics to describe, summarize, and organize set of measurements. Use descriptive statistics to communicate with other researchers and the public. Descriptive statistics: Central tendency and Dispersion
9

Descriptive statistics
Measures of Central tendency: we use statistical measures to locate a single score that is most representative of all scores in a distribution. Mean Median Mode
10

Descriptive statistic
The notations used to represent population parameters and sample statistics are different. For example Population size : N Sample size : n

11

Descriptive statistics
Mean
(or M) for sample mean and for population mean (x bar) =

x means sum of all individual scores of x1xn n means number of scores


12

Descriptive statistics
Example 1: we want to know how 25 students performed in math tests. Data are in the next slide.

13

Descriptive statistics
Score (X) 60 65 70 75 80 85 90 95 100 Sum Frequency (f) 1 2 3 4 5 4 3 2 1 25 fX 60 130 210 300 400 340 270 190 100 2000
14

Descriptive statistics
How to calculate mean for those 25 scores? =

2000 25

= 80.00

15

Descriptive statistics
Distribution of Example 1
Mean = 80

16

Descriptive statistics
Median
Data: 2, 3, 4, 5, 7, 10, 80. Mean of those scores is 15.86. 80 is an outlier. Mean fails to reflect most of the data. We use median instead of mean to remove the influence of an outlier. Median is the middle value in a distribution of data listed in a numeric order.

17

Descriptive statistics
Median
Position of median =
+1 2

For odd numbered sample size: 3,6,5,3,8,6,7. First place each score in numeric order: 3,3,5,6,6,7,8. Position 4. median = 6
18

Descriptive statistics
Median For even-numbered sample size: 3,6,5,3,8,6. First place each score in numeric order: 3,3,5,6,6,8. Position 5+6 3.5. Median = = 5.5
2

Example 2: we want to know average salary of 36 cases.


19

Descriptive statistics
Salary Frequency

$20k
$25k $30k

1
2 3

$35k
$40k $45k

4
5 6

$50k
$55k $200k $205k $210k Total

5
4 3 2 1 36
20

Descriptive statistics
Median = ? Position 18.5 Which number is at position 18.5? Median = $45k

21

Descriptive statistics
Mode The value in a data set that occurs most often or most frequently. Example: 2,3,3,3,4,4,4,4,7,7,8,8,8. Mode = 4

22

Descriptive statistics
Dispersion (Variability): a measure of the spread of scores in a distribution.

23

Descriptive statistics
Compare different distributions

24

Descriptive statistics
Compare different distributions

25

Descriptive statistics
Two sets of data have the same sample size, mean, and median. But they are different in terms of variability.

26

Descriptive statistics
Dispersion Range Variance Standard deviation

27

Descriptive statistics Range It is the difference between the largest value and smallest value. It is informative for data without outliers.
28

Descriptive statistics Variance


It measures the average squared distance that scores deviate from their mean. Sample variance (s2, population variance 2 sigma)
29

Descriptive statistics How to calculate variance?


= or ss means sum of squares. n-1 means: degree of freedom: the number of scores in a sample that are free to vary.
30

: 1

Descriptive statistics
Standard deviation (s, )
It is the square root of variance. It is average distance that scores deviate from their mean.

31

Descriptive statistics
Example 3: calculate standard deviation
Scores (x) 100 110 120 130 Sum Frequency(f) 6 12 16 6 40 (d) 100-115.5=-15.5 110-115.5= -5.5 120-115.5=4.5 130-115.5=14.5 d2 240.25 30.25 20.25 210.25 fd2(ss) 6*240.25 12*30.25 16*20.25 6*210.25 3390.0
32

Descriptive statistics
s=
3390 401

= 9.32

= 115.5 Summary: When individual scores are close to mean, the standard deviation (SD) is smaller.
33

Descriptive statistics
Summary When individual scores are spread out far from the mean, the standard deviation is larger. SD is always positive It is typically reported with mean.
34

Descriptive statistics
Choosing proper measure of central tendency depends on: the type of distribution the scale of measurement

35

Descriptive statistics
Mean describes data that are normally distributed and measures on an interval or ratio scale. Median is used when the data are not normally distributed.

36

Descriptive statistics
Normal distribution/Normal curve Data are symmetrically distributed around mean, median, and mode. Also called the symmetrical, Gaussian, or bell-shaped distribution.
37

Descriptive statistics
Normal curve

38

Descriptive statistics
Characteristics of normal distribution
The normal distribution is mathematically defined. The normal distribution is theoretical. The mean, median, and mode are all the same value at the center of the distribution. The normal distribution is symmetrical. The form of a normal distribution is determined by its mean and standard deviation.

39

Descriptive statistics
Characteristics of normal distribution
Standard deviation can be any positive value. The total area under the curve is equal to 1. The tails of normal distribution are always approaching to x axis, but never touch it.

40

Descriptive statistics
Normal distribution: the standard deviation indicates precisely how the scores are distributed. Empirical rule: About 68% of all scores lie within one standard deviation of the mean. In another word, roughly two thirds of the scores lie between one standard deviation on either side of the mean.
41

Descriptive statistics
Normal distribution
About 95% of all scores lie within two standard deviation of the mean. About 99.7% of all scores lie within three standard deviation of the mean.

42

Descriptive statistics
In another word, we have 95% chance of selecting a score that is within 2 standard deviation of mean.

43

Descriptive statistics
Normal distribution
.95

44

Descriptive statistics
Descriptive statistics in SPSS Frequencies Descriptives Explore

45

Inferential statistics
The goal of statistics is to make inferences about a population based upon information obtained in a sample. Hypothesis testing is the method we use to test a claim or hypothesis about a parameter in a population using observed data.
46

Inferential statistics
Steps of hypothesis testing State a hypothesis Set the criterion Compare what we observe with what we expect. Make a decision
47

Inferential statistics
Five elements in significance test Assumptions Hypotheses Test statistics P-value conclusion
48

Inferential statistics
Assumptions Type of data Form of the population distribution Method of sampling Sample size

49

Inferential statistics
State a hypothesis
Null hypothesis (H0): in a hypothesis testing, we start by assuming the null hypothesis is true. Alternative hypothesis: directly contradicts null hypothesis The hypothesis testing is all about testing null hypothesis.
50

Inferential statistics
Null hypothesis (H0): populations values are NOT different from each other. Example: H0 : There is no difference in blood pressure between treatment group and control group among patients. Example: H0 : 1= 2 or 1- 2 = 0 Example: H0 : two samples are drawn from the same population.
51

Inferential statistics
Alternative hypothesis (H1): populations values are different from each other.
Example: H1 = There is difference in blood pressure between treatment group and control group among patients (nondirectional-two-tailed test). Example: or H1 = The blood pressure of treatment group is lower than the blood pressure of control group (directional-one-tailed test).
52

Inferential statistics
Or two samples are drawn from the different populations. Or H1: 1- 2 0 Or H1: 1- 2 > 0 Or H1: 1- 2 < 0

53

Inferential statistics
The difference in blood pressure between treatment and control group is because of random error or chance (not statistically different). Or the difference is large enough to conclude that blood pressure values are statistically different between two groups or because of treatment effect.
54

Inferential statistics
Set the criterion: set the level of significance (a prespecified cutoff point)
Typically set at 0.05 ( level) or 0.01. The smaller the level, the stronger the evidence must be to reject H0.
55

Inferential statistics
P-value P-value is the probability of obtaining test statistic from sample data when null hypothesis is true. If p-value is less than 5%, we reject the null hypothesis (why?).
56

Inferential statistics
Compute the test statistic Test statistic: such as t, F values (obtained value depends on tests used in data analysis): measures the extent of apparent departure from H0. The value of test statistic is used to make a decision regarding the null hypothesis: compare test statistic to the critical value.
57

Inferential statistics
Obtain critical value: it depends on degree of freedom.
We need to look at t test table for example to obtain critical value. If the test staticstics is less than the critical value, then you fail to reject the null hypothesis.
58

Inferential statistics
Make a decision
Compare test statistic to critical value p value: p value is the probability of obtaining a test statistic given that null hypothesis is true. Significance: when p < .05, reject the null hypothesis, we reach significance.
59

Inferential statistics
Critical region

(Neutens & Rubinson, 1997)


60

Inferential statistics
Types of errors Type I error (): the probability of rejecting a null hypothesis that is actually true. Type II error (): the probability of retaining a null hypothesis that is actually false.
Truth in the population True False Fail to reject the Null Correct 1- Type II error Reject the Null Type I error Correct 1- (power)
61

Inferential statistics
Type I and Type II errors are inversely related, which means the smaller the level, the larger the Type II error. To keep both errors low, large sample size is important.

62

Inferential statistics
Power: the probability of rejecting H0 when it is in fact false. Power = 1- ( is Type II error) Statistical power is the ability to detect a true effect.

63

Inferential statistics
Power If statistical power is high, the probability of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.
64

Inferential statistics
Power For example, 80% power in a clinical trial means that the study has a 80% chance of ending up with a p value of less than 5% in a statistical test (i.e. a statistically significant treatment effect) if there really was an important difference between treatments.
65

Inferential statistics
Effect size Example: how much of an effect/a difference the intervention had/made (magnitude of intervention effect). We want to know if the intervention effect is large or small, meaningful or trivial.
66

Inferential statistics
Effect size Mean difference Correlation coefficient Odds ratio R2

67

Inferential statistics
The relationship between effect size and power and sample size When effect size increases, power increases. When sample size is large enough, the power increases. Example: Cohens d = M1 - M2 / spooled
68

Inferential statistics
Test means: t tests and Analysis of variance T tests one sample t test Independent-samples t test Paired-samples t test
69

Inferential statistics
Test means: t tests and Analysis of variance
Analysis of variance (ANOVA) One-way/two-way between subject design One-way/two-way within subject design Mixed design
70

Inferential statistics
Correlation Linear regression Non-parametric tests Chi-Square tests Sign test Wilcoxon signed-rank t test Mann-Whitney U test Kruskal-Wallis H test Friedman test
71

Inferential statistics
SPSS demonstration

72

Inferential statistics
SPSS demonstration

73

Inferential statistics
SPSS demonstration

74

Inferential statistics
SPSS demonstration

75

Inferential statistics
T-test for two independent sample means Example: we want to know if there is a gender (Q2) difference in height. H0: 1(Female) = 2(Male); H1: 1(Female) 2(Male)
76

Inferential statistics
Think about the following. The mean differences by two groups can be due to chance. sampling and measurement error. Tests and measuring instrument used to collect data are not perfect.

77

Inferential statistics
Calculate t value: SPSS can do that for us.
=
12
12 22 + 1 2

78

Inferential statistics
T-test for two independent sample means

t = -98.15 < tcritical = -1.96 (critical value), p < .05. There is a significant difference in height between females and males. When sample size is greater than 120, tcritical = 1.96 at = 0.05.
79

Descriptive statistics
Confidence interval (CI) for a Mean A CI for a parameter is a range of numbers within which the parameter is believed to fall.

80

Descriptive statistics

81

Descriptive statistics
Standard error is the standard deviation of a sampling distribution of sample means. It is the distance that sample mean values deviate from the value of the population mean.
82

Descriptive statistics
How to calculate CI?
Compute sample mean and standard error. Choose the level of confidence interval and find the critical value at the level of confidence. Compute the estimation formula to find the confidence interval
83

Descriptive statistics
95% CI = critical value at 95% level ( = .05) * standard error Example 4: two-independent sample t-test, gender differences in height.
We use YRBSS 2011 data. Q2 (Gender) and Q6 (height)
84

Descriptive statistics
Two-Independent sample t-test: go to Analyze > Compare Means > Independent Sample T Tests

85

Descriptive statistics
Click Option > Choose 95%

86

Descriptive statistics
Output

87

Descriptive statistics
Output

88

Descriptive statistics
In this case, critical value = 1.96 (check t distribution table, df = ) What do we learn from the 95% CI of mean difference?

89

Basic statistics
References
Agresti, A. & Finlay, B. (1997). Statistical methods for the social sciences. Upper Saddle River, NJ. Prentice Hall, Inc. Neutens, J. J., & Rubinson, L. (1997). Research techniques for the health sciences. Needham Heights, MA. Allyn & Bacon.
90

Basic statistics
References
Privitera, G. J. (2012). Statistics for the behavioral sciences. Thousand Oaks, CA. SAGE Publications, Inc.

91

You might also like