You are on page 1of 36

Fundamentals of

Statistics
MEMBER OF GRUP 1 :
DETYA INDRIAWAN
DIAH AULIA I
KARINA PRAVITASARI
MASATUL FARHAH
TIARA ARISENDA K

Pendidikan Biologi Reguler


2013

Statistics?
A collection of quantitative data from a sample or
population.
The science that deals with the collection, tabulation,
analysis, interpretation, and presentation of quantitative
data.

Statistic types
Deductive or descriptive statistics
describe and analyze a complete
data set
Inductive statistics
deal with a limited amount of data
(sample).
Conclusions: probability?

Population
A population is any entire collection of people, animals,
plants or things from which we may collect data.
It is the entire group we are interested in, which we wish
to describe or draw conclusions about.
For each population there are many possible samples.

Sample
A sample is a group of units selected from a larger group
(population).
By studying the sample it is hoped to draw valid conclusions about
population.
The sample should be representative of the general population.
The best way is by random sampling.

Parameter
A parameter is a value, usually unknown (and which
therefore has to be estimated), used to represent a
certain population characteristic.
For example, the population mean is a parameter that is
often used to indicate the average value of a quantity.

Inferential Statistics
Statistical Inference makes use of information from a
sample to draw conclusions (inferences) about the
population from which the sample was taken.

Types of data
Variables data
quality characteristics that are measurable values.
measurable and normally continuous;
may take on any value - eg. weight in kg

Attribute data
quality characteristics that are observed to be either
present or absent, conforming or nonconforming.
countable and normally discrete; integer - eg: 0, 1,
5, 25, , but cannot 4.65

Describing the Data


Graphical:
Plot or picture of a frequency distribution.

Analytical:
Summarize data by computing a measure of central tendensy
and dispersion.

Sampling Methods
Sampling methods are methods for selecting a
sample from the population:
Simple random sampling - equal chance for each
member of the population to be selected for the sample.
Systematic sampling - the process of selecting every n-th
member of the population arranged in a list.
Stratified sample - obtained by dividing the population
into subgroups and then randomly selecting from each
subgroups.
Cluster sampling - In cluster sampling groups are selected
rather than individuals.
Incidental or convenience sampling - Incidental or
convenience sampling is taking an intact group (e.g. your
own forth grade class of pupils)

Frequency Distribution
Consider the following
set of data which are the
high temperatures
recorded for 30
consequetive days.
We wish to summarize
this data by creating a
frequency distribution of
the temperatures.

Data Set - High


Temperatures for 30
Days

50

45

49

50

43

49

50

49

45

49

47

47

44

51 51

44

47

46

50

44

51

49

43

43

49

45

46

45

51

46

Example of Frequency
Distribution

Frequency Distribution for High


Temperatures

Temperature
51
50
49

Tally
////
////
//////

48
47
46
45
44
43

Frequency
4
4
6
0

///
///
////
///
///

3
3
4
3
3

N=

30

Cummulative Frequency Distribution


A cummulative freq distribution can be created by
adding an additional column called "Cummulative
Frequency."
The cum. frequency for a given value can be obtained
by adding the frequency for the value to the
cummulative value for the value below the given
value.
For example: The cum. frequency for 45 is 10 which is
the cum. frequency for 44 (6) plus the frequency for 45
(4).
Finally, notice that the cum. frequency for the highest
value should be the same as the total of the frequency
column.

Cummulative Frequency Distribution for High


Temperatures
Temperatu
Frequenc
Cummulative
Tally
Frequency
re
y

51
50
49
48
47
46
45
44
43

////
////
/////
/

///
///
////
///
///
N
=

4
4
6
0
3
3
4
3
3
30

30
26
22
16
16
13
10
6
3

Grouped frequency distribution


In some cases it is necessary to group the values of the
data to summarize the data properly.
Eg., we wish to create a freq. distribution for the IQ
scores of 30 pupils.
The IQ scores in the range 73 to 139.
To include these scores in a freq. distribution we would
need 67 different score values (139 down to 73).
This would not summarize the data very much.
To solve this problem we would group scores together
and create a grouped freq. distribution.
If data has more than 20 score values, we should create
a grouped freq. distribution by grouping score values
together into class intervals.

Grouped frequency

Look at the following data of


high temperatures for 50
days.

The highest temperature is 59


and the lowest temperature is
39.
We would have 21
temperature values.
This is greater than 20 values
so we should create a
grouped frequency
distribution.

Data Set - High


Temperatures for 50 Days

57

39

52

52

43

50

53

42

58

55

58

50

53

50

49

45

49

51

44

54

49

57

55

59

45

50

45

51

54

58

53

49

52

51

41

52

40

44

49

45

43

47

47

43

51

55

55

46

54

41

Grouped Frequency Distribution for High Temperatures


Class Interval

Tally

Interval
Midpoint

Frequency

57-59

//////

58

54-56

55

52

11

48-50

///////
/////////
//
/////////

49

45-47

///////

46

42-44

//////

43

39-41

////

40

N=

50

51-53

Histograms
Constructing a Histogram for Discrete Data
First, determine the frequency and relative frequency of each x value.
Then mark possible x value on a horizontal scale.

Descriptive statistics
Measures of Central Tendency
Describes the center position of the data
Mean, Median, Mode

Measures of Dispersion
Describes the spread of the data
Range, Variance, Standard deviation

Measures of central
tendency: Mean N
1
Arithmetic mean: x =
xi

N i 1

where xi is one observation, means add up what


follows and N is the number of observations

So, for example, if the data are : 0,2,5,9,12


the mean is (0+2+5+9+12)/5 = 28/5 = 5.6

Median - mode
Median = the observation in the middle of sorted data
Mode = the most frequently occurring value

Median and mode


100 91 85 84 75 72 72 69 65
Mode
Median
Mean = 79.22

Measures of dispersion:
range
The range is calculated by taking the maximum value and
subtracting the minimum value.

2 4 6 8 10 12 14
Range = 14 - 2 = 12

Measures of dispersion:
variance
Calculate the deviation from the mean for every
observation.
Square each deviation
Add them up and divide by the number of observations

( xi

i 1

Measures of dispersion:
standard deviation
The standard deviation is the square root of the variance.
The variance is in square units so the standard
deviation is in the same units as x.

( xi

i 1

Standard Deviation for a


Sample
General formula/ungrouped data:
n

(X
i 1

X )2

n 1

For computation purposes:


n

n X
i 1

2
i

i 1
n(n 1)

Standard Deviation for a


Sample
Grouped data:

n ( f i X )
i 1

2
i

fX

i 1
n(n 1)

Standard deviation and


curve shape
If is small, there is a high probability for getting a value
close to the mean.
If is large, there is a correspondingly higher probability
for getting values further away from the mean.

The Normal Curve


The normal curve or the normal
frequency distribution or Gaussian
distribution is a hypothetical
distribution that is widely used in
statistical analysis.
The characteristics of the normal
curve make it useful in education and
in the physical and social sciences.

Characteristics of the
Normal Curve
The normal curve is a symmetrical distribution
of data with an equal number of data above and
below the midpoint of the abscissa.
Since the distribution of data is symmetrical the
mean, median, and mode are all at the same
point on the abscissa.
In other words, mean = median = mode.
If we divide the distribution up into standard
deviation units, a known proportion of data lies
within each portion of the curve.

34.13% of data lie between and 1 above the mean ().


34.13% between and 1 below the mean.
Approximately two-thirds (68.28 %) within 1 of the mean.
13.59% of the data lie between one and two standard
deviations
Finally, almost all of the data (99.74%) are within 3 of the
mean.

Standardized normal
value,
Z
When a score is expressed in standard deviation

units, it is referred to as a Z-score.

A score that is one standard deviation above the


mean has a Z-score of 1.
A score that is one standard deviation below the
mean has a Z-score of -1.
A score that is at the mean would have a Z-score
of 0.
The normal curve with Z-scores along the
abscissa looks exactly like the normal curve with
standard deviation units along the abscissa.

Z-value
Deviation IQ Scores, sometimes called Wechsler IQ scores,
are a standard score with a mean of 100 and a standard
deviation of 15.
What percentage of the general population have deviation
IQs lower than 85?

So an IQ of 85 is equivalent to a z-value of 1.
So 50 % - 34.13 % = 15.87% of the population has IQ
scores lower than 85.

Frequency Polygon
A frequency polygon is what you may think of as a curve.
A frequency polygon can be created with interval or ratio
data.
Let's create a frequency polygon with the data we used
earlier to create a histogram.

To create a frequency polygon


Arrange the values along the abscissa (horizonal axis).
Arrange the lowest data on the left & the highest on
the right.
Add one value below the lowest data and one above
the highest data.
Create a ordinate (vertical axis).
Arrange the frequency values along the abscissa.
Provide a label for the ordinate (Frequency).
Create the body of the frequency polygon by placing a
dot for each value.
Connect each of the dots to the next dot with a straight
line.
Provide a title for the frequency polygon.

To create a frequency polygon