You are on page 1of 5

Statistics is the branch of mathematics that transforms numbers into useful information for

decision makers.

DESCRIPTIVE STATISTICS
Descriptive statistics are the methods that help collect, summarize, present, and analyze a set
of data.
INFERENTIAL STATISTICS
Inferential statistics are the methods that use the data collected from a small group to draw
conclusions about a larger group.

VARIABLE
A variable is a characteristic of an item or individual.
DATA
Data are the different values associated with a variable.

POPULATION
A population consists of all the items or individuals about which you want to reach conclusions.
SAMPLE
A sample is the portion of a population selected for analysis.

PARAMETER
A parameter is a measure that describes a characteristic of a population.
STATISTIC
A statistic is a measure that describes a characteristic of a sample.

Identifying Types of Variables


Categorical variables (also known as qualitative variables) have values that can only be
placed into categories such as yes and no. “Do you currently own bonds?” (yes or no) and the
level of risk of a bond fund (below average, average, or above average) are examples of
categorical
variables.
Numerical variables (also known as quantitative variables) have values that represent
quantities. Numerical variables are further identified as being either discrete or continuous
variables.
Discrete variables have numerical values that arise from a counting process. “The number
of premium cable channels subscribed to” is an example of a discrete numerical variable
because the response is one of a finite number of integers.You subscribe to zero, one, two, or
more channels. “The number of items purchased” is also a discrete numerical variable because
you are counting the number of items purchased.
Continuous variables produce numerical responses that arise from a measuring process.

Measurement Scales

A nominal scale classifies data into distinct categories in which no ranking is implied.
ordinal scale classifies values into distinct categories in which ranking is implied.
interval scale is an ordered scale in which the difference between measurements is a meaningful
quantity but does not involve a true zero point. Temperature (in degrees Celsius or Fahrenheit)
Standardized exam score (e.g., ACT or SAT)
ratio scale is an ordered scale in which the difference between the measurements involves a
true zero point, as in height, weight, age, or salary measurements.

Simple random sampling is the basic sampling technique where we select a group of subjects (a sample) for
study from a larger group (a population). Each individual is chosen entirely by chance and each member of the
population has an equal chance of being included in the sample.
Stratified sampling refers to a type of sampling method . With stratified sampling, the researcher divides the
population into separate groups, called strata. Then, a probability sample (often a simple random sample ) is drawn
from each group. Stratified sampling has several advantages over simple random sampling
Cluster sampling is a sampling technique used when "natural" but relatively heterogeneous groupings are evident
in a statistical population. It is often used in marketing research. In this technique, the total population is divided into
these groups (or clusters) and a simple random sample of the groups is selected.
Systematic sampling is a type of probability sampling method in which samplemembers from a larger population
are selected according to a random starting point and a fixed periodic interval. This interval, called
the sampling interval, is calculated by dividing the population size by the desired sample size.

Define the variables that you want to study in order to solve a business problem or meet
a business objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data by examining the appropriate tables and charts (and in later chapters by
using other statistical methods) to reach conclusions.

CENTRAL TENDENCY
The central tendency is the extent to which the data values group around a typical or central
value.
VARIATION
The variation is the amount of dispersion, or scattering, of values away from a central value.
SHAPE
The shape is the pattern of the distribution of values from the lowest value to the highest value.

SAMPLE MEAN
The sample mean is the sum of the values in a sample divided by the number of values in the
sample. Because all the values play an equal role, a mean is greatly affected by any value that is
greatly different from the others. When you have such extreme values, you should avoid using
the mean as a measure of central tendency. The mean can suggest a typical or central value for
a data set.

The Median
The median is the middle value in an ordered array of data that has been ranked from smallest
to largest. Half the values are smaller than or equal to the median, and half the values are larger
than or equal to the median. The median is not affected by extreme values, so you can use the
median when extreme values are present.
The Mode
The mode is the value in a set of data that appears most frequently. Like the median and unlike
the mean, extreme values do not affect the mode. Often, there is no mode or there are several
modes in a set of data.

Variation and Shape


In addition to central tendency, every data set can be characterized by its variation and shape.
Variation measures the spread, or dispersion, of values in a data set. One simple measure of
variation is the range, the difference between the largest and smallest values. More common
used in statistics are the standard deviation and variance.
The Range
The range is the simplest numerical descriptive measure of variation in a set of data. The range
does not consider how the values distribute or cluster between the extremes.

The Variance and the Standard Deviation


sum of squares will always be nonnegative according to the rules of algebra,
neither the variance nor the standard deviation can ever be negative. For virtually all sets of data, the
variance and standard deviation will be a positive value. Both of these statistics will be zero only
if there is no variation in a set of data which happens only when each value in the sample is the
same.
SAMPLE VARIANCE
The sample variance is the sum of the squared differences around the mean divided by the
sample size minus 1.
SAMPLE STANDARD DEVIATION
The sample standard deviation is the square root of the sum of the squared differences around
the mean divided by the sample size minus 1.
In practice, you will most likely use the sample standard deviation as the measure of variation.
Unlike the sample variance, which is a squared quantity, the standard deviation is always a
number that is in the same units as the original sample data.

The characteristics of the range, variance, and standard deviation can be summarized as
follows:
• The greater the spread or dispersion of the data, the larger the range, variance, and standard
deviation.
• The smaller the spread or dispersion of the data, the smaller the range, variance, and standard
deviation.
If the values are all the same (so that there is no variation in the data), the range, variance,and
standard deviation will all equal zero.
• None of the measures of variation (the range, variance, and standard deviation) can ever be
negative.

The Coefficient of Variation


Unlike the measures of variation presented previously, the coefficient of variation is a relative
measure of variation that is always expressed as a percentage rather than in terms of the units of
the particular data. The coefficient of variation, denoted by the symbol CV, measures the scatter
in the data relative to the mean. The coefficient of variation is equal to the standard deviation
divided by the mean, multiplied by 100%. The coefficient of variation is especially useful when
comparing two or more sets of data that are measured in different units,

Z Scores
An extreme value or outlier is a value located far away from the mean. The Z score, which is
the difference between the value and the mean, divided by the standard deviation, is useful in
identifying outliers. Values located far away from the mean will have either very small (negative)
Z scores or very large (positive) Z scores.

Shape
Shape is the pattern of the distribution of data values throughout the entire range of all the
values.
A distribution is either symmetrical or skewed. In a symmetrical distribution, the values below
the mean are distributed in exactly the same way as the values above the mean. In this case, the
low and high values balance each other out. In a skewed distribution, the values are not
symmetrical around the mean. This skewness results in an imbalance of low values or high
values.
Shape also can influence the relationship of the mean to the median. In most cases:
• Mean median: negative, or left-skewed
• Mean median: symmetric, or zero skewness
• Mean median: positive, or right-skewed

Skewness and kurtosis are two shape-related statistics. The skewness statistic measures the
extent to which a set of data is not symmetric. The kurtosis statistic measures the relative
concentration of values in the center of the distribution of a data set, as compared with the tails.
A symmetric distribution has a skewness value of zero. A right-skewed distribution has a positive
skewness value, and a left-skewed distribution has a negative skewness value.
A bell-shaped distribution has a kurtosis value of zero. A distribution that is flatter than a bell-
shaped distribution has a negative kurtosis value. A distribution with a sharper peak (one that
has a higher concentration of values in the center of the distribution than a bell-shaped
distribution) has a positive kurtosis value.

Exploring Numerical Data

You might also like