Basic Statistics PDF

1-1
Quantitative Techniques for

geographical data analysis
1.Descriptive statistics
1.Descriptive
2.Spatial
2. Spatial data Analysis
3.Inferential
3. Inferential statistics
4.Correlation
4. Correlation & regression
1-2
Chapter 1
1. Descriptive Statistics
1-3
1 Descriptive Statistics
 1.1. Note on descriptive and inferential statistics
 1.2. Types of data and some characteristics
 1.3. Percentiles, deciles and Quartiles
 1.4. Measures of Central Tendency
 1.5. Measures of Variability
 1.6. Methods of data presentation/displaying
 Table of frequency distribution

 Graphs
 Skewness and
 Kurtosis
1-4
1 LEARNING OBJECTIVES
After studying this chapter, you should be able to:

to:
 Distinguish between descriptive and inferential statistical
data.
 Describe nominal, ordinal, interval, and ratio scales of
measurements.
 Calculate and interpret quartiles, deciles and percentiles.
 Explain measures of central tendency and how to compute
them.
 Create different types of charts that describe data sets.
1-5
1-1. Note on descriptive and

inferential statistics
 Descriptive  Inferential Statistics
 Collect data from
Statistics population or
representative samples
Collect  Predict and forecast
Organize values of population
parameters
Summarize  Test hypotheses about
values of population
Display parameters
Analyze  Make decisions
1-6
1.2. Types of data and some

characteristics
 Qualitative -  Quantitative -
Categorical or Measurable or
Nominal: Countable:
Examples are- Examples are-
 Color  Temperatures
 Gender  Salaries
 Nationality  Number of points
scored on a 100
point exam
1-7
Some Characteristics of Data
 Not all data is the same. There are some limitations

as to what can and cannot be done with a data set,
depending on the characteristics of the data
 Some key characteristics that must be considered
are:
 A. Continuous vs. Discrete
 B. Grouped vs. Individual
 C. Scale of Measurement
1-8
A. Continuous vs. Discrete Data
 Continuous data can include any value (i.e., real

numbers)
 e.g., 1, 1.43, and 3.1415926 are all acceptable values.
 Geographic examples: distance, tree height, amount
of precipitation, etc
 Discrete data only consists of discrete values, and
the numbers in between those values are not
defined (i.e., whole or integer numbers)
 e.g., 1, 2, 3.
1-9
B. Grouped vs. Individual Data
 The distinction between individual and grouped

data is somewhat self-explanatory, but the issue
pertains to the effects of grouping data
 While a family income value is collected for each
household (individual data), for the purpose of
analysis it is transformed into a set of classes
(e.g., 80Birr/hh vs. 0 - 100Birr, 100-200Birr,
200- 300Birr, etc)
1-10
B. Grouped vs. Individual Data
 Ingrouped data, the raw individual data is

categorized into several classes, and then analyzed
 Theact of grouping the data, by taking the central
value of each class introduce a significant
distortion
 Groupingalways reduces the amount of
information contained in the data
1-11
C. Scales of Measurement
 The data used in statistical analyses can divided

into four types:
1. The Nominal Scale As we progress through
these scales, the types of
2. The Ordinal Scale data they describe have
increasing information
3. The interval Scale content
4. The Ratio Scale

1-12
The Nominal Scale
Nominal scale data are data that can simply be

broken down into categories, i.e., having to do
with names or types:
 The categories cannot be ranked or ordered
(no greater/less than)
It can appear in the form of:
* Dichotomous or binary
* Multichotomous
1-13
continued
Dichotomous or binary nominal

data has just two types, e.g., yes/no,
female/male, is/is not, hot/cold, etc
Multichotomous data has more
than two types, e.g., vegetation
types, soil types, counties, eye
color, etc
1-14
The Ordinal Scale
 Ordinal scale data can be categorized AND can

be placed in an order, i.e., categories that can be
assigned a relative importance and can be ranked
such that numerical category values have
 star-system restaurant rankings
5 stars > 4 stars, 4 stars > 3 stars, 5 stars > 2 stars
 BUT ordinal data still are not scalar in the sense
that differences between categories do not have a
quantitative meaning
1-15
The Interval Scale
 Intervalscale data take the notion of ranking items in

order one step further, since the distance between
adjacent points on the scale are equal
 Forinstance, the Fahrenheit scale is an interval scale,
since each degree is equal but there is no absolute zero
point.
 Thismeans that although we can add and subtract
degrees (100° is 10° warmer than 90°), we cannot
multiply values or create ratios (100° is not twice as
warm as 50°)
1-16
The Ratio Scale
 Similarto the interval scale, but with the addition

of having a meaningful zero value, which allows
us to compare values using multiplication and
division operations, e.g., precipitation, weights,
heights, etc
 e.g.,
rain – We can say that 2 cm of rain is twice as
much rain as 1 cm of rain because this is a ratio
scale measurement
 e.g.,
age – a 100-year old person is indeed twice as
old as a 50-year old one
1.3 Quartiles, Deciles
1-17
and Percentiles
1-18
Quartiles (for raw data)
 Ifa set of data is organized in order of magnitude,

quartiles are means of dividing a set of data at every
25% of the observation
 There are only 3 quartiles.
 The values are denoted by:

 Q1, 1st or lower quartile (the first 25% of the
observation),
 Q2, 2nd quartile (a value that divides the observation
into 50%), and
 Q3, 3rd quartile (representing 75% of the observation).
1-19
Example for raw data - Sales and

Sorted Sales
Sales Sorted Sales
9 6
6 9
12 10
10 12
13 13
15 14
16 14
14 15
14 16
16 16
17 16
16 17
24 17
21 18
22 18
18 19
19 20
18 21
20 22
17 24
1-20
Quartiles (cont’d)
 This method of data classification is important when we

want to describe the observations in four groups according
to their order of magnitude.
 The serial number of first or lower quartile, Q1 is
1
computed by ( N  1)th
4
 Tofind the position of Q1, determine the data point in
position
1
( 20  1)th  21 / 4  5.25
4
.
1-21
Thus, Q1is located at the 5.25th

position
The 5th observation is 13, and the
6th observation is 14.
Q1 is a point lying 0.25 of the
way from 13 to 14 and is thus =
13 +1*0.25= 13.25.
1-22
 The serial number of 2nd quartile (Q2) is

computed by
2
( N  1)th  [2 * 21] / 4  42 / 4  10.5
4
 Thus, Q2 is located at the 10.5th position
 The 10th observation is 16, and the 11th
observation is also16.
 Thus Q2 will lie halfway between the 10th and
11th values (which are both 16 in this case) and is
thus 16.
1-23
serial number of 3rd quartile (Q3) is
 The
computed by
3
( N  1)th  [3 * 21] / 4  63 / 4  15.75
4
 Thus, Q3 is located at the 15.75th position
 The 15th observation is 18, and the 16th
observation is 19.
 Thus Q2 will lie 0.75 of the way from 18th to
19th values and is thus = 18 +1*0.75= 18.75.
1-24
Deciles (for raw data)
 This method of data classification is important

when we want to describe the observations in 10
groups according to their order of magnitude.
 There are nine values called deciles which divide
the distribution into every 10%.
 Deciles divide the total distribution into 10 equal
parts.
 These values are denoted by D1, D2, D3, …, D9.
1-25
Deciles (cont’d)
Theserial number of any partition

value, such as the Kth deciles can be
computed by:
K
( N  1)th
10
1-26
Percentiles (for raw data)
There are ninety-nine values called

percentiles which divide the
distribution into every 1%.
Percentiles divide the total
distribution into 100 equal parts.
These values are denoted by P1,
P2, P3, …, P99.
1-27
Percentiles (cont’d)
The serial number of any

partition value, such as the
Kth percentiles can be
computed by
K
( N  1)th
100
1-28
Example: Percentiles (cont’d)
 Find the 50th, 80th, and the 90th percentiles of this

data set.
 To find the 50th percentile, determine the data point
in position (n + 1)P/100 = (20 + 1)(50/100)
= 10.5.
 Thus, the percentile is located at the 10.5th
position.
 The 10th observation is 16, and the 11th observation
is also 16.
 The 50th percentile will lie halfway between the
10th and 11th values (which are both 16 in this case)
and is thus 16.
1-29
• To find the 80th percentile, determine

the data point in position (n + 1)P/100 =
(20 + 1)(80/100) = 16.8.
 Thus, the percentile is located at the
16.8th position.
 The 16th observation is 19, and the 17th
observation is 20.
 The 80th percentile is a point lying 0.8
of the way from 19 to 20 and is thus
19.8.
1-30

• To find the 90th percentile, determine the
data point in position (n + 1)P/100 = (20
+ 1)(90/100) = 18.9.
 Thus, the percentile is located at the
18.9th position.
 The 18
th observation is 21, and the 19th
observation is also 22.
 The 90
th percentile is a point lying 0.9 of
the
way from 21 to 22 and is thus 21.9.
1-31
Relationship among quartiles, deciles &

percentiles
The 2nd quartiles, 5th Deciles

and 50th percentiles correspond
to the median.
The 25th and 75th percentiles
correspond to the 1st and 3rd
quartiles respectively.
1-32
Quartiles – Special Percentiles
 Quartiles are the percentage points that break

down the ordered data set into quarters.
 The first quartile is the 25th percentile. It is the
point below which lie 1/4 of the data.
 The second quartile is the 50th percentile. It is
the point below which lie 1/2 of the data. This
is also called the median.
 The third quartile is the 75th percentile. It is
the point below which lie 3/4 of the data.
1-33
Quartiles and Interquartile Range
 The first quartile, Q1, (25th percentile) is

often called the lower quartile.
 The second quartile, Q2, (50th
percentile) is often called the median
or the middle quartile.
 The third quartile, Q3, (75th percentile)
is often called the upper quartile.
 The interquartile range is the difference
between the first and the third quartiles.
1-34
Quartiles for Grouped data
In a grouped frequency data, any

partition value which has a proportion
(Q1, Q2 or Q3) of observation is
calculated by the interpolation formula
as:
1
( N  1)  ( f )1
Q1  L  4 *c
f Q1
1-35
Cont’d
Where,
 L = lower class boundary of the Q1 class
 N = number of observation in the data (total
frequency)
( f )
1
 = sum of frequencies of all classes
lower than the Q1 class
f Q1
 = frequency of the Q1 class
 c = size of the class interval
1-36
Cont’d
 For example, a grouped frequency of monthly

income
Class
of X-factory’s employees
Class interval (monthly Frequency (f)
salary in Dollars)
1 30-39 1
2 40-49 3
3 50-59 11
4 60-69 21
5 70-79 43
6 80-89 32
7 90-100 9
Total 120
1-37
Cont’d
Calculate the lower quartile

(Q1) of the distribution of
monthly salary and interpret
the result
1-38
Cont’d
1
(120  1)  15
Q1  59.5  4 *10  66.8
21
Interpretation:
About 25% of the X-factory’s
employees monthly salary is up to
66.8 Dollars or lower
1-39
Percentiles for Grouped data
p( N  1)  ( f )1
p  L *c
fp
Calculate the 60 percentile of the
distribution of monthly salary of
employees and interpret the result
1-40
Cont’d
0.60(120  1)  36
p 60  69.5  *10  78.01
43
Interpretation:
About 60% of the X-factory’s
employees monthly salary is up to
78.01 Dollars and less
1-41
Summary Measures: Population

Parameters & Sample Statistics
 Measures of Central Tendency  Measures of Variability
 Median  Range
 Interquartile range
 Mode
 Variance
 Mean
 Standard Deviation
 Coefficient of variation (CV)
 Other summary measures:

 Skewness
 Kurtosis
1-42
1-4 Measures of Central Tendency

or Location
Median  Middle value when
sorted in order of
magnitude
 50th percentile
Mode  Most frequently-

occurring value
Mean  Average
1-43
Example – Median (Data is used from

Example 1-
1-1)
The median is the middle

value of raw data sorted in
order of magnitude.
1-44
Median for group data
Inthe case of grouped data the median

would be obtained by interpolation
N 
  ( f )1 
Median  L1   2 c
 f median 
 
 
1-45
Cont’d
 Where,
 L = lower class boundary of the median class
 N = number of observation in the data (total
frequency)
( f ) = sum of frequencies of all classes lower than
 1
the median class

 f median = frequency of the median class
 c = size of median class interval

1-46
Mode
The mode of a set of data is the value

that occurs with the greatest frequency.
It represents the most common value
Note that, the mode as an average may

be used when a frequency distribution
represents data measured only on a
nominal scale.
1-47
Mode in arry data
 In array data the mode may not exist

and sometimes if it does exist may not
be unique. For example:
 Monthly income (in Birr) of 9
employees of small private business
may be: 800, 950, 1200, 1300, 2000,
2500, 2800, 2900, 3000, has no mode.
1-48
Cont’d
 Monthly income (in Birr) of 10 teachers of one

elementary school : 650, 700, 700, 700, 700, 800,
950, 950, 1000, 1050, has mode 7000 Birr.
Distribution with one mode is called unimodal.
 Monthly income (in Birr) of 10 farmers: 100,
250, 250, 250, 300, 350, 400, 400, 400, 500, has
two modes, 250 and 400, and the data or the
characteristics of the variable is bimodal.
1-49
Mode for grouped data
For a frequency distribution of grouped

data or histogram the mode can be
computed using the formula:
 1 
Mode  L1   c
 1   2 
1-50
Continued
Where,
L1
 = Lower class boundary of modal
class (class containing the mode)
 1 = excess of modal frequency over
frequency of the next lower class
 2 = excess of modal frequency over
frequency of the next higher class
 c = size of modal class interval
1-51
Means
There are three types of means:

arithmetic mean
Weighted arithmetic mean
Geometricmean
Harmonic mean
1-52
Arithmetic Mean or Average
The mean of a set of observations is their average -

the sum of the observed values divided by the
number of observations.
Population Mean Sample Mean

N
x
n
x
m= i =1
Ⱦ= i =1
N n
1-53
Weighted arithmetic mean

 Some times collected data may not have equal weights.
 In such cases weighting of data falling under different classes or
category may be important and thus, certain weighting factor (w)
has to be applied using the formula:
_
w1 x1  w2 x2  ...  wk xk
wX 
x1  x2  ...  xk

 wiXi
 xi
1-54
Arithmetic mean for grouped data
 Arithmetic mean of grouped data is computed

using class marks (m) and frequency
distributions, by assuming that all frequencies of
a given class are considered as coincident with
the class mark or midpoint of the interval, using
the formula:
_
m1 f1  m2 f 2  ...  mk f k
X 
f1  f 2  ...  f k

 mf
f
1-55
Empirical relation among Mean, Median

and Mode
In the case of uniform (symmetrical)

distribution the relation is defined as:
 Mean = Mode = Median.
For unimodal frequency curves which
are moderately skewed (asymmetrical)
the empirical relation is:
Mean  Mode  3( mean  median )
1-56
Which one is better: mean, median, or mode?
 The mean is valid only for interval and ratio data.

 The median is valid for ordinal, interval and ratio data.
 The mode is valid for nominal, ordinal, interval, and
ratio data
 Median & mode are the only measures of central

tendency that can be used with ordinal data
 Mode is the only measure of central tendency that can

be used with nominal data
1-57
1-5 Measures of Variability or

Dispersion
 Range
 Difference between maximum and minimum values
 Interquartile Range
 Difference between third and first quartile (Q3 - Q1)
 Variance
 Average*of the squared deviations from the mean
 Standard Deviation
 Square root of the variance
Definitions of population variance and sample variance differ slightly.
1-58
Variance and Standard Deviation
Population Variance & Sample Variance &

Standard deviation Standard deviation
(x − x)
n
N 2
 (x − m)2
s =
2 i =1
s2 = i=1
N
(n − 1)
( )
2
( x)
2
N n
 x
i =1
N
i =1 x −
n
 −
x2 2
= n
i =1
= i=1 N
N (n − 1)
s= s
2
s= s 2
1-59
Calculation of Sample Variance &

Standard deviation ( X  15.85 )
x xx (x  x) 2 x2 n
( x  x )
2
37855
.
6 -9.85 97.0225 36 s 
2 i 1

9 -6.85 46.9225 81 n  1 (20  1)
10 -5.85 34.2225 100
12 -3.85 14.8225 144 37855
.
13 -2.85 8.1225 169   19.923684
14 -1.85 3.4225 196
19
 n x
2
14 -1.85 3.4225 196
15 -0.85 0.7225 225 n  i1 
x 
2
16 0.15 0.0225 256
16 0.15 0.0225 256 
i 1 n
16 0.15 0.0225 256 n  1
17 1.15 1.3225 289
2
17 1.15 1.3225 289 317 100489
18 2.15 4.6225 324 5403  5403 
18 2.15 4.6225 324  20  20
19 3.15 9.9225 361 20  1 19
20 4.15 17.2225 400
5403  5024.45 37855
.
21 5.15 26.5225 441    19.923684
22 6.15 37.8225 484 19 19
24 8.15 66.4225 576
317 0 378.5500 5403
s  s  19.923684  4.46
2
1-60
Standard Deviation-
Deviation- Grouped
frequencies
Standard deviation of grouped data

can be calculated using:
s
 fi
mi 2

 x
2
f
 Where,
 mi : is class midpoint (class mark)
 fi : is frequency
 X : sample mean
1-61
Coefficient of variation
 The actual variation or dispersion as determined

from the standard deviation is called the absolute
dispersion.
 This absolute dispersion cannot tell how much
exactly variability occurred.
 Thus, a measure of this effect can be explained
by relative dispersion or coefficient of variation.
 And this is generally expresses as a percentage
s
CV  *100
X
1-62
1-6 Methods of Data presentation
 Methods of data presentation/displaying

 Table of frequency distribution
 Graphs
 Line graphs:
 Ogives
 Time plot
 Pie charts
 Bar graphs
 Skewness and
 Kurtosis
1-63
Frequency distribution
 Dividingdata into groups or classes or intervals

 Groups should be:
 Mutually exclusive
 Not overlapping - every observation is assigned to
only one group
 Exhaustive
 Every observation is assigned to a group
 Equal-width (if possible)

1-64
Frequency distribution
 Large size of raw data has to be organized into

classes or categories containing a number of
individuals belonging to each class.
 Number of individuals in a given class is known
as the class frequency.
 A tabular arrangement of data by classes together
with the corresponding class frequencies is called
a frequency distribution or frequency table.
1-65
General rules for forming frequency

distribution
1. Identify the largest and the smallest

numbers in the raw data and thus find
the range.
 For example, the largest number of
the raw data in Table 2.1 is 4887,
whilst the smallest number is 950.
 The range is 4887 – 950 = 3937.
1-66
Cont’d
2. Divide the range by a convenient number of

classes.
 For our example 10 classes are used.
 You may have different classes depending on
the nature and size of the data.
 Thus, Range 3937 width)

  393.7  394 (class
Classes 10
1-67
Cont’d
3. Determine the class interval for the 10
classes.
 Common practice to determine class limits
is as:
lcl  SRD  1


ucl  (lcl  1)  Cw
 Where,
 SRD is smallest number of the raw data,
 lcl is lower class limit,
 ucl is upper class limit
 Cw is class width.
1-68
Cont’d
In our example:

 SRD = 950 and Cw is 394.
Thus, lower class limit of the 1st

class is 950 – 1 = 949 and
the upper class limit of the same
class is [(949 -1) + 394] = 1342
(Table 2.3).
1-69
Cont’d
 For other successive classes:

 build the lower class limit of the next higher class by adding
the class interval on the lower class limit of the preceding class
and,
 The upper class limit of the next higher class by adding the
class interval on the upper class limit of the preceding class.
 For example, the class limits of the first class are
949 – 1342
 The next class limits are
(949 +394) = 1343 (lower class limit of 2nd class)

(1342 +394)=1736 (upper class limit of the 2nd class
1343 - 1736
1-70
Table 2.1. Raw data of agricultural production (kg ha-1 yr-1) (an example)
Plot code Yield Plot code Yield Plot code Yield Plot code Yield
(kg ha-1) (kg ha-1) (kg ha-1) (kg ha-1)
1 950 20 1258 39 3689 58 2058
2 1250 21 1509 40 3824 59 1020
3 1504 22 2051 41 4823 60 3687
4 2058 23 1028 42 1886 61 3891
5 1020 24 3681 43 1382 62 4230
6 3687 25 3820 44 4666 63 3228
7 3891 26 4875 45 1785 64 3468
8 4887 27 1825 46 2423 65 4356
9 1895 28 1345 47 3271 66 2598
10 1324 29 4699 48 4430 67 1050
11 4657 30 1735 49 3228 68 1258
12 1765 31 2412 50 3468 69 4130
13 2456 32 3285 51 4356 70 3228
14 3214 33 4145 52 2598 71 3468
15 4167 34 3230 53 1050 72 4356
16 3264 35 3483 54 1258 73 2598
17 3478 36 4099 55 1509 74 1050
18 4052 37 2568 56 2051 75 1258
19 2567 38 990 57 1028 76 1735

1-71
Assignment 1.
 Builda frequency table for the agricultural yield

indicated in the previous slide.
 Number of classes of the frequency distribution
should be 10
 Build the class intervals in class boundary
 Construct the class marks for each class
 What is the upper class boundary of the 3rd class?
 What is the lower class boundary of class 7?
 What is the frequency of the 2nd class?

1-72
Frequency distribution can appear in
 Simple frequency
 Cumulative frequency
 relative cumulative frequency,

or
 relative cumulative frequency
percentage
1-73
Example: Frequency Distribution
x f(x)
Spending Class ($) Frequency (number of customers)
0 - 100 30
100 - 200 38
200 - 300 50
300 - 400 31
400 - 500 22
500 - 600 13
Total 184
1-74
Example: Relative Frequency Distribution
x f(x) f(x)/n
Spending Class ($) Frequency (number of customers) Relative Frequency
0 - 100 30 0.163
100 - 200 38 0.207
200 - 300 50 0.272
300 - 400 31 0.168
400 - 500 22 0.120
500 - 600 13 0.070
Total 184 1.000
• Example of relative frequency for the 1st class: 30/184 = 0.163

• Sum of relative frequencies = 1
1-75
Continued….
Cumulative frequency (up to/less than) Cumulative frequency (down/greater than)

Relative Relative
Relative Relative cumulati cumulative
cumulati cumulative ve percentage
Spending Cumulati ve “less percentage Spending Cumulating “greater “greater
($) ng up than” “less than ($) down than” than”
<0 0 0 0.0 > 0 184 1 100
< 100 30 0.163 16.3 > 100 154 0.837 83.7
< 200 68 0.37 37.0 > 200 116 0.630 63.0
< 300 118 0.641 64.1 > 300 66 0.359 35.9
< 400 149 0.81 81.0 > 400 35 0.190 19.0
< 500 171 0.929 92.9 > 500 13 0.071 7.1
< 600 184 1 100 > 600 0 0 0
1-76
Less than cumulative frequency
 It is the total frequency of all values

successively from the lowest to the highest (“less
than”) upper class boundary of a given class
interval including the frequency of that class.
 For example, the cumulative frequency up to
(“less than”) and including the class interval
300 – 400 in the spending Table is 30+38+50+31
= 149, indicating that 149 customers have
spending less than 400 $ .
1-77
Greater than cumulative frequency
.It is the total frequency of all values

successively from the highest to the lowest
(“more than”) lower class boundary of a given
class interval including the frequency of that
class.
 For example, the cumulative frequency down to
(“more than”) and including the class interval
300 – 400 in the spending Table is 13 + 22
+33 = 66, indicating that 66 customers have
spending greater than 300 $.
1-78
Cont’d
A graph showing the cumulative

frequency less than the upper class
boundary plotted against the upper
class boundary is known as less than
cumulative frequency polygon or
Ogives
1-79
Continued
200
180
160
Customers (number)
140
120
100
80
60
40
20
0
< 100
< 200
< 300
< 400
< 500
< 600
<0
Spending ($)
Figure __. Cumulative frequency distribution (‘’less than”): Ogives

1-80
continued
A graph showing the cumulative

frequency greater than the lower
class boundary plotted against the
lower class boundary is known as
greater than cumulative frequency
polygon or Ogives
1-81
continued
200
180
160
Customers (number)
140
120
100
80
60
40
20
0
> 100
> 200
> 300
> 400
> 500
> 600
>0
Spending ($)
Figure __:Cumulative frequency distribution (‘’Grater than”): Ogives

1-82
Assignment 2
Represent data of Table 2.1 in

cumulative frequency less than
“Ogives” and
cumulative frequency greater
than “Ogives”
And discuss some of the results
1-83
Histogram
 A histogram is a chart made of bars of

different heights.
Widths and locations of bars correspond
to widths and locations of data groupings
Heights of bars correspond to
frequencies or relative frequencies of
data groupings
1-84
Histogram Example
Frequency Histogram
1-85
Histogram Example
Relative Frequency Histogram

1-86
Skewness and Kurtosis
 Skewness
 Measure of asymmetry or symmetrical of a frequency
distribution
 Skewed to left
 Symmetric or unskewed
 Skewed to right
 Kurtosis
 Measure of flatness or peakedness of a frequency distribution
 Platykurtic (relatively flat)
 Mesokurtic (normal)
 Leptokurtic (relatively peaked)
1-87
Skewness
Skewed to left
Mean < Median< Mode
30
25
25
20 20
20
Frequency
15
15
10
10
5
5
0 0
0
0 100 200 300 400 500 600 700
Monthly expenses (Dollar)
1-88
Skewness
Symmetric
1-89
Skewness
Skewed to right
1-90
Kurtosis
Platykurtic - flat distribution

1-91
Kurtosis
Mesokurtic - not too flat and not too peaked

1-92
Kurtosis
Leptokurtic - peaked distribution

Basic Statistics PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Statistics PDF

Uploaded by

Copyright:

Available Formats

1-1

Quantitative Techniques for

 1.3. Percentiles, deciles and Quartiles

 1.4. Measures of Central Tendency

 1.5. Measures of Variability

 1.6. Methods of data presentation/displaying

 Table of frequency distribution

After studying this chapter, you should be able to:

 Explain measures of central tendency and how to compute

1-1. Note on descriptive and

1.2. Types of data and some

Some Characteristics of Data

 Not all data is the same. There are some limitations

A. Continuous vs. Discrete Data

 Continuous data can include any value (i.e., real

B. Grouped vs. Individual Data

 The distinction between individual and grouped

B. Grouped vs. Individual Data

 Ingrouped data, the raw individual data is

 The data used in statistical analyses can divided

4. The Ratio Scale

The Nominal Scale

Nominal scale data are data that can simply be

Dichotomous or binary nominal

The Ordinal Scale

 Ordinal scale data can be categorized AND can

The Interval Scale

 Intervalscale data take the notion of ranking items in

The Ratio Scale

 Similarto the interval scale, but with the addition

Quartiles (for raw data)

 Ifa set of data is organized in order of magnitude,

 The values are denoted by:

Example for raw data - Sales and

 This method of data classification is important when we

Thus, Q1is located at the 5.25th

 The serial number of 2nd quartile (Q2) is

Deciles (for raw data)

 This method of data classification is important

Theserial number of any partition

Percentiles (for raw data)

There are ninety-nine values called

The serial number of any

Example: Percentiles (cont’d)

 Find the 50th, 80th, and the 90th percentiles of this

Example: Percentiles (cont’d)

• To find the 80th percentile, determine

Example: Percentiles (cont’d)

Relationship among quartiles, deciles &

The 2nd quartiles, 5th Deciles

Quartiles – Special Percentiles

 Quartiles are the percentage points that break

Quartiles and Interquartile Range

 The first quartile, Q1, (25th percentile) is

Quartiles for Grouped data

In a grouped frequency data, any

 For example, a grouped frequency of monthly

Calculate the lower quartile

Percentiles for Grouped data

Summary Measures: Population

 Measures of Central Tendency  Measures of Variability

 Other summary measures:

1-4 Measures of Central Tendency