You are on page 1of 16

E370

5/13/17
Week 01, Part 1
Statistics, symbols
and data
19thcentury scientific philosophy
described a clockwork universe
A small number of mathematical laws
could be used to describe reality and
predict the future.
All that was needed were the laws
and sufficiently precise
measurements to put into the laws.
An error function was part of the
laws and any error was attributed to
measurement inaccuracies.

A little history
more accurate measurements
were accompanied by MORE
error.
attempts to discover the laws
of biology and sociology had
failed.
well-known laws of physics
and chemistry were proving to
be rough approximations.

By the end of the 19th century


almostall sciences had shifted to
using statistical models.
Suchmodels make random variability
endogenous.
Data recording and storing has been
simplified by the computer.
There is a LOT of data out there, and a
single observation is not enough.
At the simplest level, statistics is all
about making sense out of datawhich
we have in abundance in our world.

By the end of the 20th century . . .


An academic discipline
The science of uncertainty.
A branch of mathematics that consists
of a set of analytical techniques that can
be applied to data to help make
judgments and decisions in problems
involving uncertainty.
A set of tools and methods that allow
one to get information from data.
The science of collecting, organizing,
presenting, analyzing and interpreting
data.

What is Statistics?
Specific
numbers that
summarize or describe
samples.
Thesample of adult males was
67.5 inches tall on average .

A sample of workers in New York


and California typically spend
35.22 minutes commuting to work.

Specific numbers . . .

Populations
Parameters
Lower case Greek Letters, or
UPPER CASE LATIN: , , , N
Samples
Statistics
Lower case Latin letters: , s, r,
n
Groups of data and their
symbols
Adistinction is made between
two branches of statistics. The
difference lies in their purpose.
Descriptive Statistics
Methods that display, organize and
summarize data in an informative way.
They describe a fixed data set.
Inferential Statistics
Methods that draw conclusions, make
inferences (guesses) and make
predictions about populations based
on a sample of data.
Variable
A variable is some characteristic of a
population or sample that will likely
change from item to item in the
sample or population.
Observation
The individual values of a variable
are observations.
Data or data set
Data are the observed values of a
variable.

Components of data
What distinguishes different types
of data?
An inherent order based on
The amount of information one observation
provides.
The complexity of the information.
The number of statistical techniques that
can be used on that data.
The Hierarchy of Data
Like any hierarchy, the higher in the system
you get, the information, complexity and
number of techniques usable on the data
increase.
Characteristics of a lower data level are
Data Differences
passed along to a higher level, but not the
reverse.
Generally, the presence or
absence of a quality or
Qualitative characteristic in the
observation. They may be
or represented with text or
may be coded into
Categorica numerical values,
Nominal data
for convenience
tellswhich is
only the
only
name of the characteristic
Data, Variables,

l or quality and an indication


of its presence or absence.
Ordinal data is Nominal
Observations

data that has an order that


adds information.
Real numbers which are
the result of a count or
Quantitati measurement, which are
always reported
ve, numerically
Discrete
numerical
of
and
data have
is the
a count meaning.
and is
a
result

Numerical represented by integers.

or Interval Continuous data is the


result of a measurement
and is represented by any
real number.

Categories of Data--Levels
Categorical or qualitative data
Nominal
The presence or absence of a specific characteristic in an
item, for example, the characteristic of gender. A person
is either female or not female.
The simplest type or the lowest level of data.
It contains the least information, only the name of the
characteristic or quality and an indication of its presence or
absence.
The fewest statistical methods are available for use on
nominal data.
A 2 category variable is often called binary.
Variables with more than 2 categories exist.
Categories may be represented by numbers, but the

Types of Data-
numbers are assigned arbitrarily, and thus have no
numerical meaning.

Nominal
Categorical or qualitative data
Ordinal:
A ranked multi-category nominal variable, for example, an
individuals response to the request, Rank the quality of your
meal. Categories for response are: low quality, poor quality,
average quality, good quality and excellent quality.
The ranking provides more information than simply the
presence of a characteristic.
It is understood that as the categories are moved through in
order, the amount of some quality is increasing or
decreasing.
The difference between levels cannot be measured.
Each individuals perception of the quality and its level is likely
different.
This data is more complex than nominal and is of a higher level.
The ranking may be represented by ordered numbers, but the
numbers are assigned arbitrarily, and thus have no numerical
meaning.

Types of Data-Ordinal
Quantitative, numerical or interval
data
Discrete
Values are the result of a count, thus only integers are
possible observations, for example, the number of seats in
a row, the number of stocks that increased in value, etc.
It can assume only a countable number of different values.
It is possible to list the complete set of values such a
variable can take on.
It is considered to have gaps between values along the
real number line, although no intervening number is
skipped.
They are always represented by numbers, the numbers
are always ordered, and no values are skipped even if
there are no observations there.

Types of Data-Discrete
Quantitative, numerical or interval
data
Continuous
Values are the result of a measurement, thus any real
number is a potential observation, for example,
individual height and weight are both continuous.
The inability to measure to an infinite level of detail does
not make the variable other than continuous. If an
infinite number of values are possible, the variable is
continuous.
It is not possible to list the complete set of values such a
variable can take on.
Values are always represented by numbers but the
numbers are generally rounded due to circumstances or
for convenience. They may be represented by integers
only.

Types of Data-Continuous
The techniques we will use in a
given situation will be determined
by the type or level of data we
have.
Since qualities of data levels carry as
one moves up the hierarchy of data,
but not down, in general the lowest
level of data dictates the statistical
methods available for use.
Statisticians have developed many
clever ways to define their way out
of this problem, as we will learn over
the semester.
The tyranny of the data type

You might also like