Professional Documents
Culture Documents
Outline
Concepts:
Descriptive statistics vs. Inferential
statistics
Sample vs. Population
Sampling Procedure
Qualitative vs. Quantitative variables
Discrete vs. Continuous variables
Sample Mean, Median, Range, Variance,
Standard Deviation
A Glimpse of Motivation
What is Statistics?
Branch of mathematics, dealing with the
collection and analysis of data, leading to
statistical inference
3 keywords: mathematics, data analysis,
statistical inference
Statistical inference: deducing a general
conclusion based on collected samples.
Accepting/rejecting a hypothesis
Deriving estimates
Quality assurance
engineer: perform statistical
investigation, e.g., using
ANOVA
Variations in Data
Two sources of variations:
Variation over time/space: between the value observed at one point
of time with another point of time
Ideally,
If the observed values in a process were always the same and were
always on target, there would be no need for statistical method.
If in one batch of thermometers produced, the thermometers (used on
the same person at the same time with the same environment condition)
always gave the same value and the value was accurate (correct),
no statistical analysis to evaluate the products is needed.
However,
Our data tend to have variations and thus we need to use statistical
methods to guarantee(estimate as close as possible) the actual
value(s) of our data.
Samples:
collection of
observations
taken from a
Example:
population.
During the campaign period, campaign managers conducted a survey
to understand the conditions of the voters. The population was
Indonesian citizens who have the right to vote in the presidential
election 2014. The samples were certain numbers of Indonesian
citizens located in various regions in Indonesia with different ages,
genders, and occupations.
Example Scenario:
Consider a market researcher for a soft drink company who
might want to determine the sweetness preferences of
Americans between the ages of 15 and 25.
Obviously, gathering data from every individual in this
population would be nearly impossible and prohibitively
expensive.
It would be more practical to collect data from a subset,
or sample, of the population.
If the sample is unbiased, the sample data can be used
to make inferences about the population.
Sampling Procedure
Population: every American in age group 15 to
25.
Sample
:In? order for a sample to be unbiased, it must be:
1) representative of the population
2) randomly selected
everybody in the population has equal chance to be
selected as sample
If the sample is only from 1 city or 1 school, the
conclusion is only applicable to that city/school and a
narrower age group
3) sufficiently large
If the research only involved 3 respondents, would you
trust the result? Why?
small sample is sensitive to bias (1 wrong answer
largely affects the final result)
Sampling Procedure
Random Selection:
1) Simple Random Sampling
any particular sample has the same chance of being
selected as any other sample.
Sampling Procedure
Example: taking 50 samples from Binus
International population
1) Simple Random Sampling
Randomly take any 50 respondents
Sampling Procedure
Non-Random Selection:
1) Convenience Sampling / Accidental Sampling
the units that are selected for inclusion in the sample are
the easiest to access
Example: the first 50 respondents (but be careful, most of
them may come from IS program)
2) Systematic Sampling
the researcher first randomly picks the first item or
subject from the population. Then, the researcher will
select each nth subject from the list.
Example
Two samples of 10 northern red oak seedlings were planted in a
greenhouse, one containing seedlings treated with nitrogen
and the other containing seedlings with no nitrogen. All other
environmental conditions were held constant. All seedlings
contained the fungus Pisolithus tinctorus.
The stem weights in grams were recorded after 140 days.
x (nitrogen) ?
x (no nitrogen) ?
x (nitrogen) ?
x (no nitrogen) ?
Mean
nitrog
en =
0.565
Media
n=
0,635
Before trimming:
But we know that the two data sets are not identical!
s
2
x X
n 1
range = -
For variance
x X
n 1
s
2
x X
n 1
For variance
n-1 degrees of freedom
x X
n 1
For standard
deviation
Why n-1?
Summary
Concepts:
Descriptive statistics vs. Inferential
statistics
Sample vs. Population
Sampling Procedure: random & nonrandom
Qualitative vs. Quantitative variables
Discrete vs. Continuous variables
Sample Mean, Median, Range, Variance,
Standard Deviation
Exercise
Sample size: 15
Mean: 3.78
Median: 3.6
Trimmed: 2.5, 2.8, 2.8 and 5.6, 5.2,
4.8
Trimmed mean: 3.68
Variance: 0.943
Std: 0.97