Week 1 Intro To Statistics

Week 1: Introduction to
Statistics & Data Analysis

Arfika Nurhudatiana
Outline
Concepts:
Descriptive statistics vs. Inferential
statistics
Sample vs. Population
Sampling Procedure
Qualitative vs. Quantitative variables
Discrete vs. Continuous variables
Sample Mean, Median, Range, Variance,
Standard Deviation
A Glimpse of Motivation
What is Statistics?
Branch of mathematics, dealing with the
collection and analysis of data, leading to
statistical inference
3 keywords: mathematics, data analysis,
statistical inference
Statistical inference: deducing a general
conclusion based on collected samples.
Accepting/rejecting a hypothesis
Deriving estimates
Example: manufacturing industry

In manufacturing industry,
it is common to have the
following roles:
Process engineers: monitor
the different processes
involved, 1 engineer 1 process
Product engineers: monitor

the output, 1 engineer 1
product
Quality assurance
engineer: perform statistical
investigation, e.g., using
ANOVA
The result of the investigation allows the company to determine

necessary modifications in order to keep the process at a desired level of
quality.
Variations in Data
Two sources of variations:
Variation over time/space: between the value observed at one point
of time with another point of time
Variation in measurement: between the value observed and the true

value
Ideally,
If the observed values in a process were always the same and were
always on target, there would be no need for statistical method.
If in one batch of thermometers produced, the thermometers (used on
the same person at the same time with the same environment condition)
always gave the same value and the value was accurate (correct),
no statistical analysis to evaluate the products is needed.
However,
Our data tend to have variations and thus we need to use statistical
methods to guarantee(estimate as close as possible) the actual
value(s) of our data.
Descriptive vs. Inferential Statistics

Descriptive statistics: statistics which help
describe or characterize the nature of the dataset.
They provide simple summaries about the sample
and the measures.
Example: a student has completed 100 SCUs.

Measures of central tendency: GPA (average score) gives a
hint of the students overall performance.
Measures of spread: we may also be interested in how many
As, Bs, and Cs he/she has and from which semester he/she
got the scores.
Inferential statistics: statistics used to reach

conclusions that extend beyond the immediate data
alone.
Example: is his/her GPA within the range of graduates who

usually get jobs immediately after graduation?
Descriptive vs. Inferential Statistics

Population:
collections of
all individual
items of a
particular type.
Samples:
collection of
observations
taken from a
Example:
population.
During the campaign period, campaign managers conducted a survey
to understand the conditions of the voters. The population was
Indonesian citizens who have the right to vote in the presidential
election 2014. The samples were certain numbers of Indonesian
citizens located in various regions in Indonesia with different ages,
genders, and occupations.
Population and samples can also be students in Binus International,

trees in the forest, fish in the sea, manufacturing products, etc.
Example Scenario:
Consider a market researcher for a soft drink company who
might want to determine the sweetness preferences of
Americans between the ages of 15 and 25.
Obviously, gathering data from every individual in this
population would be nearly impossible and prohibitively
expensive.
It would be more practical to collect data from a subset,
or sample, of the population.
If the sample is unbiased, the sample data can be used
to make inferences about the population.
Sampling Procedure
Population: every American in age group 15 to
25.
Sample
:In? order for a sample to be unbiased, it must be:
1) representative of the population
fulfil the above criteria: American with age between 15

and 25
2) randomly selected
everybody in the population has equal chance to be
selected as sample
If the sample is only from 1 city or 1 school, the
conclusion is only applicable to that city/school and a
narrower age group
3) sufficiently large
If the research only involved 3 respondents, would you
trust the result? Why?
small sample is sensitive to bias (1 wrong answer
largely affects the final result)
Various Sampling Procedures
Sampling Procedure
Random Selection:
1) Simple Random Sampling
any particular sample has the same chance of being
selected as any other sample.
2) Stratified Random Sampling

Used when the sampling units are not homogeneous and
are naturally in non-overlapping groups/segments which
are homogeneous
These groups are called strata
Stratified random sampling means random selection
of a sample within each stratum (singular for strata).
The purpose is to be sure that each of the strata is not
underrepresented (or overrepresented).
3) Cluster Random Sampling

Random sampling within clusters (subgroups)
Sampling Procedure
Example: taking 50 samples from Binus
International population
1) Simple Random Sampling
Randomly take any 50 respondents
2) Stratified Random Sampling

Binus International population is composed of 40% males
and 60% females
Randomly take 20 male respondents & 30 female
respondents to be sure that each of the strata is not
underrepresented (or overrepresented).
3) Cluster Random Sampling

There are 4 major faculties: computing, communication &
film, business, HTM
Take 11-13 respondents from each faculty
Sampling Procedure
Non-Random Selection:
1) Convenience Sampling / Accidental Sampling
the units that are selected for inclusion in the sample are
the easiest to access
Example: the first 50 respondents (but be careful, most of
them may come from IS program)
2) Systematic Sampling
the researcher first randomly picks the first item or
subject from the population. Then, the researcher will
select each nth subject from the list.
3) Purposive Sampling / Judgmental Sampling

Usually involves small sample size
Usually in the form of qualitative/investigative research
to focus on particular characteristics of a population that
are of interest
Qualitative vs. Quantitative Variables

Quantitative variables: measures of values or counts and are
expressed as numbers (can be discrete, can be continuous).
Example: how many children do you have? How often do you go
shopping?
Qualitative (categorical) variables: measures of 'types' and may
be represented by a name, symbol, or a number code
Example: which major do you study? What is your occupation?
Qualitative variables can be nominal (no order/ranking sequence)
or ordinal (has order, e.g., like, neutral, dislike)
Discrete vs. Continuous Variables

Discrete variablesare countable in a finite amount of time.
Example: the number of students in a classroom. (bilangan bulat) (int)
Continuous variables are usually obtained by measuring.
Example: length, weight, and time. Since continuous variables are
real numbers, we usually round them.(double)
Measures of Location: The

Sample Mean and Median
Measures of location are designed to provide the analyst with some
quantitative
values of where the centre, or some other location, of data is
located.
1) Mean: average value
Measures of Location: The

Sample Mean and Median
2) Median
The purpose of the sample median is to reflect the central tendency of the
sample in such a way that it is uninfluenced by extreme values or outliers.
Median and mean can be quite different.
Example
Two samples of 10 northern red oak seedlings were planted in a
greenhouse, one containing seedlings treated with nitrogen
and the other containing seedlings with no nitrogen. All other
environmental conditions were held constant. All seedlings
contained the fungus Pisolithus tinctorus.
The stem weights in grams were recorded after 140 days.
x (nitrogen) ?
x (no nitrogen) ?
x (nitrogen) ?
x (no nitrogen) ?
Mean
nitrog
en =
0.565
Media
n=
0,635
Which one has healthier stem (higher stem weights
Other measures of locations:

A trimmed mean is computed by trimming away a certain percent of
both the largest and the smallest set of values.
For example, the 10% trimmed mean is found by eliminating the largest
10% and smallest 10% and computing the average of the remaining
values.
Before trimming:
What do you observed?

- Mean slightly changes,
median does not change.
- Mean gives more detailed
information (more sensitive
to variations).
Sample mean vs. Population mean

What we just calculated is sample / population mean?
Sample mean gives incomplete information (it is true for the
collected sample only, but not for the real population).
However, by collecting the right samples, we expect the sample
mean to be as near as possible to the population mean.
Therefore, in the future chapters, the sample mean is calculated
as an estimated of the population mean.
Measures of Variability: The Sample Range,

Standard Deviation, and Variance
Data Set 1: 3, 5, 7, 10, 10
Data Set 2: 7, 7, 7, 7, 7
What is the mean and median of the above data
set?
But we know that the two data sets are not identical!
Measures of Variability: The Sample Range,

Standard Deviation, and Variance
How data points differ from the mean can be measured using
variance, standard deviation, and range.
The variance, standard deviation, and range are basically
measures of spread.
Sample
s
2
x X
n 1
range = -
For variance
n-1 degrees of freedom
x X
n 1
For standard deviation
The average of the squared

Q: Why squared? Why n-1?
deviations about the mean is called
If not squared, the numerator always equals 0, because

the negative deviations about the mean always cancel out
the positive deviations about the mean.
s
2
x X
n 1
For variance
n-1 degrees of freedom
x X
n 1
For standard
deviation
Why n-1?
Because the last value of is determined by the initial n 1 of them.

If n is very large, n-1 becomes unsignificant.
No Nitrogens standard deviation = 0.0728 gram

Nitrogens standard deviation = 0.1867 gram
Conclusion: The group with Nitrogen has a larger variance and
the group without nitrogen tends to be more consistent.
Summary
Concepts:
Descriptive statistics vs. Inferential
statistics
Sampling Procedure: random & nonrandom
Qualitative vs. Quantitative variables
Discrete vs. Continuous variables
Sample Mean, Median, Range, Variance,
Standard Deviation
Exercise
Sample size: 15
Mean: 3.78
Median: 3.6
Trimmed: 2.5, 2.8, 2.8 and 5.6, 5.2,
4.8
Trimmed mean: 3.68
Variance: 0.943
Std: 0.97

Week 1 Intro To Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 1 Intro To Statistics

Uploaded by

Copyright:

Available Formats

Week 1: Introduction to

Statistics & Data Analysis

Example: manufacturing industry

Product engineers: monitor

The result of the investigation allows the company to determine

Variation in measurement: between the value observed and the true

Descriptive vs. Inferential Statistics

Example: a student has completed 100 SCUs.

Inferential statistics: statistics used to reach

Example: is his/her GPA within the range of graduates who

Descriptive vs. Inferential Statistics

Sample vs. Population

Population and samples can also be students in Binus International,

fulfil the above criteria: American with age between 15

Various Sampling Procedures

2) Stratified Random Sampling

3) Cluster Random Sampling

2) Stratified Random Sampling

3) Cluster Random Sampling

3) Purposive Sampling / Judgmental Sampling

Qualitative vs. Quantitative Variables

Discrete vs. Continuous Variables

Measures of Location: The

1) Mean: average value

Measures of Location: The

Median and mean can be quite different.

Which one has healthier stem (higher stem weights

Other measures of locations:

What do you observed?

Sample mean vs. Population mean

Measures of Variability: The Sample Range,

Measures of Variability: The Sample Range,

n-1 degrees of freedom

For standard deviation

The average of the squared

If not squared, the numerator always equals 0, because

Because the last value of is determined by the initial n 1 of them.

No Nitrogens standard deviation = 0.0728 gram

You might also like