You are on page 1of 8

TWO DIFFERENT TYPES OF DATA

1.Quantitative data--- numerical measurements or counts


e.g measurement of ozone level of air
2. Qualitative data---- classification into different groups or categories
e.g complaints received at an auto paint shop
UNIVARIATE only 1 measurement taken per unit
e.g measuring ozone level each day
BIVARIATE 2 measurements taken per unit
e.g measuring ozone and carbon monoxide level each day
MULTIVARIATE --- if more 2 or more measurements taken per unit
e.g measuring ozone, carbon monoxide, sulfur dioxide
and particulate matter a each day

VISUALIZING DISTRIBUTIONS
The overall pattern of a distribution of measurements generally described by
3 components
1.CENTER --- describes a point near the middle of a distribution that serve as
balance point for the distribution
2. SPREAD-- describe s how the data points spread out around a center.
3. SHAPE -- describes the basic pattern of the plotted data along with
any notable departures from that pattern. The most basic
shapes can be described using the idea of symmetry
SYMMETRIC DISTRIBUTION half of the distribution is
approximately a mirror image of the other half
RIGHT-SKEWED DISTRIBUTION- the distribution has a longer right tail
than the left tail. More lower values than with higher values.
LEFT-SKEWED DISTRIBUTION- the distribution has a longer left tail
than the right tail.More higher values than with lower values.

OUTLIERS measurement that appears to be different from the rest of


the data
Example : lifetime of batteries, most of batteries had to be
replaced within 100 hrs , but one lasted over 120
hrs.
CLUSTERS AND GAPS-- are any observations grouped together
unusually?
are there significant gaps in the data?
Example summarize the diameter of holes made by the
stamping machine and data clusters at 2 different
points ,a possible reason that the 2 machine
operators set the machine differently during their
shift

A boxplot splits the data set into quartiles. The body of the
boxplot consists of a "box" (hence, the name), which goes
from the first quartile (Q1) to the third quartile (Q3).
Within the box, a vertical line is drawn at the Q2, the median
of the data set.
Two horizontal lines, called whiskers, extend from the front
and back of the box. The front whisker goes from Q1 to the
smallest non-outlier in the data set, and the back whisker goes
from Q3 to the largest non-outlier.

HOW TO INTERPRET A BOXPLOT


The MEDIAN is indicated by the vertical line that runs down
the center of the box. In the boxplot above, the median is
about 400.

Additionally, boxplots display two common measures of the


variability or spread in a data set.

RANGE If you are interested in the spread of all the data, it is


represented on a boxplot by the horizontal distance between
the smallest value and the largest value, including any outliers.

INTERQUARTILE RANGE (IQR). The middle half of a data


set falls within the interquartile range. In a boxplot, the
interquartile range is represented by the width of the box (Q3
minus Q1).

You might also like