You are on page 1of 2

Populations and Samples

Chapter 1 Data and Observations


1.1 A population is a well-defined collection of
Overview and Populations,
objects.
Univariate data consists of observations
on a single variable (multivariate – more
Descriptive Statistics Samples,
When information is available for the
entire population we have a census. A than two variables).
subset of the population is a sample.
and Processes

Branches of Statistics Relationship Between Probability Stem-and- Leaf Displays


and Inferential Statistics
1.2 1. Select one or more leading digits for
Descriptive Statistics – summary and the stem values. The trailing digits
Probability
description of collected data.
Pictorial and Tabular become the leaves.

Inferential Statistics – generalizing from Population Sample


Methods in Descriptive 2. List stem values in a vertical column.
a sample to a population.
Inferential
Statistics 3. Record the leaf for every observation.
4. Indicate the units for the stem and leaf
Statistics on the disply.

Stem-and-Leaf Example Stem-and- Leaf Displays Dotplots Types of Variables


• Identify typical value
Observed values: Represent data with dots.
9, 10, 15, 22, 9, 15, 16, 24,11 • Extent of spread about a value A variable is discrete if its set of possible
Observed values: values constitute a finite set or an infinite
0 99 • Presence of gaps
9, 10, 15, 22, 9, 15, 16, 24,11 sequence. A variable is continuous if its
2 10556 • Extent of symmetry set of possible values consists of an entire
• Number and location of peaks interval on a number line.
3 24
• Presence of outlying values
Stem: tens digit Leaf: units digit 5 10 15 20 25

Ex. Students from a small college were asked how Histograms Histograms
Histograms: Discrete Data many charge cards that they carry. x is the variable
Continuous Data:
representing the number of cards and the results are Credit card results:
below.
x Rel. Freq. Relative Frequency Equal Class Widths
Determine the frequency and relative Frequency 0 0.08 0.4
x #people Rel. Freq
Distribution Determine the frequency and relative
frequency for each value of x. Then 1 0.28 0.3
0 12 0.08
2 0.38
xi
frequency for each class. Then mark
mark possible x values on a horizontal 1 42 0.28
0.2
3 0.16 the class boundaries on a horizontal
scale. Above each value, draw a 2 57 0.38 0.1
4 0.06 0
measurement axis. Above each class
rectangle whose height is the relative 3 24 0.16
4 9 0.06
5 0.03 0 1 2 3 4 5 6 interval, draw a rectangle whose height
frequency of that value. 6 0.01 Number of Cards
5 4 0.03 is the relative frequency.
6 2 0.01
Histograms (Continuous Data): Histogram Shapes The Mean
Unequal Widths
After determining frequencies and relative
1.3 The average (mean) of the n numbers
frequencies, calculate the height of each x1, x2 ,..., xn is x where
rectangle using: Measures n
xi
symmetric unimodal bimodal
x + x + ... + xn
rectangle height =
relative frequency of the class x= 1 2 = i =1
class width of Location n n
The resulting heights are called densities
Population mean: µ
and the vertical scale is the density scale. positively skewed negatively skewed

Three Different Shapes for a Sample Variance


Median Population Distribution Variance is a measure of the spread of the
1.4 data.
The sample median is the middle value
The sample variance of the sample x1, x2,
in a set of data that is arranged in
ascending order. For an even number of Measures of …xn of n values of X is given by

( xi − x )
2
data points the median is the average of symmetric
Variability s2 = =
S xx
the middle two. n −1 n −1

We refer to s2 as being based on n – 1 degrees


Population median: of freedom.
negative skew positive skew

Standard Deviation Properties of s2 Upper and Lower Fourths


Formula for s2 Let x1, x2,…,xn be any sample and c be any
Standard deviation is a measure of the After the n observations in a data set are
spread of the data using the same units as nonzero constant. ordered from smallest to largest, the lower
An alternative expression for the
the data. 1. If y1 = x1 + c,..., yn = xn + c, then s 2y = sx2 (upper) fourth is the median of the
numerator of s2 is
2. If y1 = cx1 ,..., yn = cxn , then s 2y = c 2 sx2 , smallest (largest) half of the data, where
The sample standard deviation is the
( xi )
2
the median is included in both halves if n
( xi − x )
2
square root of the sample variance: S xx = = xi2 −
n is odd. A measure of the spread that is
where s x2 is the sample variance of the x’s
s= s 2 resistant to outliers is the fourth spread
and s 2y is the sample variance of the y’s.
(IQR) fs = upper fourth – lower fourth.

Boxplots
Outliers
lower fourth upper fourth
Any observation farther than 1.5fs from
the closest fourth is an outlier. An
outlier is extreme if it is more than 3fs
from the nearest fourth, and it is mild
otherwise. extreme mild median
outliers outliers

You might also like