You are on page 1of 6

A sample is a group of units selected from a larger group (the population).

Bystudying the sample, one hopes to draw valid conclusions about the larger
group.
Asample is generally selected for study because the population is too largeto study
in its entirety. The sample should be representative of the generalpopulation. This
is often best achieved by random sampling. Also, beforecollecting the sample, it is
important that one carefully and completelydefines the population, including a
description of the members to beincluded.Acommon problem in business statistical
decision-making arises when weneed information about a collection called a
population but find that the costof obtaining the information is prohibitive. For
instance, suppose we need toknow the average shelf life of current inventory. If the
inventory is large,thecost of checking records for each item might be high enough
to cancelthe benefit of having the information. On the other hand, a hunch about
the
average shelf life might not be good enough fordecision-makingpurposes.This
means we must arrive at a compromise that involves selecting a smallnumber of
items and calculating an average shelf life as an estimate of theaverage shelf life of
all items in inventory. This is a compromise, since themeasurements for a sample
from the inventory will produce only anestimate of the value
we want, but at substantial savings. What we wouldlike to know is how "good" the
estimate is and how much more will it costtomake it "better". Information of this
type is intimately related tosamplingtechniques.
Cluster sampling
can be used whenever the population is homogeneous butcan be partitioned. In
many applications the partitioning is a result of physical distance. For instance, in
the insurance industry, there are small"clusters" of employees in field offices
scattered about the country. In such acase, a random sampling of employee work
habits might not required travelto many of the" clusters" or field offices in order to
get the data. Totallysampling each one of a small number of clusters chosen at
random caneliminate much of the cost associated with the data requirements of
management.

Reg r e s s i o n a n al y s i s i s a p o w e r f u l t e c h n i q u e f o r s t u d y i n
g r e l a ti o n s h i p b e t w e e n dependent variables (i.e., output,
performance measure) and independent
variables( i . e ., i n p u t s , f a c t o r s , d e c i si o n v ar i a b l e s ) . S u m
m a r i z i n g r el a t i o n s h i p s a m o n g t h e variables by the most
appropriate equation (i.e., modeling) allows us to predict
oridentify the most influential factors and study their
impacts on the output for anychanges in their current
values.Unlike the deterministic decision-making process,
such as linear optimization bys ol v i n g
s y s t e m s o f e q u a t i o n s , Par am e t r i c s y s t e m s o f e q u a t i on s
a n d i n d e c i s i o n making under pure uncertainty, the
variables are often more numerous and moredifficult to
measure and control. However, the steps are the same. They
are:1. Simplification2. Building a decision
model3. Testing the model4. Using the model to find
the solution:

It is a simplified representation of the actual situation

It need not be complete or exact in all respects

It concentrates on the most essential relationships and ignores


the lessessential ones.

It is more easily understood than the empirical (i.e.,


observed)situation, and hence permits the problem to be solved
more readilywith minimum time and effort.

5. It can be used again and again for similar problems or can be


modified.Fortunately the probabilistic and statistical methods for
analysis and decision makingunder uncertainty are more
numerous and powerful today than ever before. Thecomputer
makes possible many practical applications. A few examples of
businessapplications are the following:


An auditor can use random sampling techniques to audit
the accountsreceivable for clients.

A plant manager can use statistical quality control techniques to


assure thequality of his production with a minimum of testing or
inspection.

A financial analyst may use regression and correlation to help


understand therelationship of a financial ratio to a set of other
variables in business.

A market researcher may use test of significace to accept or


reject thehypotheses about a group of buyers to which the firm
wishes to sell aparticular product.

A sales manager may use statistical techniques to forecast sales


for thecoming year.

The arithmetic mean (or the average, simple mean) is computed


by summingall numbers in an array of numbers (x
i
) and then dividing by the number of observations (n) in the
array.M e a n
=
=
X
i
/ n , t h e s u m i s o v er a l l i ' s . Th e m e a n u s e s
a l l o f t h e o b s e r vat i o n s , a n d e a c h o b s e r vat i o n a f f e c t s t
h e mean. Even though the mean is sensitive to extreme
values; i.e., extremelyl ar g e o r s m a l l d a t a c a n c a u s e t h e
m e a n t o b e p u l l e d t o w ar d t h e e x t r e m e data; it is still the
most widely used measure of location. This is due to
thef a c t t h a t t h e m e a n h a s v a l u a b l e m a t h e m a t i c a l p
r o p e r t i e s t h a t m a k e i t convenient for use with
inferential statistical analysis. For example, the sumof the
deviations of the numbers in a set of data from the mean is

zero, andthe sum of the squared deviations of the numbers


in a set of data from themean is the minimum value.
(b) Harmonic Mean:

The harmonic mean (H) is another specialized average, which is


useful inaveraging variables expressed as rate per unit of time,
such as mileage perhour, number of units produced per day. The
harmonic mean (H) of n non-zero numerical values
x(i) is: H = n/[ (1/x(i)].
An Application:
Suppose 4 machines in a machine shop are used to producethe
same part. However, each of the four machines takes 2.5, 2.0,
1.5, and6.0 minutes to make one part, respectively. What is the
average rate of speed?The harmonic means is: H = 4/[(1/2.5) +
(1/2.0) + 1/(1.5) + (1/6.0)] =2.31 minutes.If all machines
working for one hour, how many parts will be produced?Since four
machines running for one hour represent 240 minutes of
operatingtime, then: 240 / 2.31 = 104 parts will be produced.
(C.) The Geometric Mean:
Thegeometric mean (G) of n non-negative numerical values is the n
th
r o o t o f the product of the n values.Ifsome values are very large in magnitude and
others are small, then thegeometric mean is a better representative of the data than
the simple average. Ina "geometric series", the most meaningful average is the
geometric mean (G).The arithmetic mean is very biased toward the larger numbers
in the series.
An Application:
Suppose sales of a certain item increase to 110% in the firstyear and to 150% of
that in
the second year. For simplicity, assume you sold100items initially. Then the
number sold in the first year is 110 and the numbersold in the second is 150%
x 110 = 165. The arithmetic average of 110% and150%is 130% so that we would
incorrectly estimate that the number sold in thefirstyear is 130 and the number in
the second year is 169. The geometric meanof 110% and 150% is G = (1.65)
1/2
so that we would correctly estimate that wewould sell 100 (G)
2

=165 items in the second year.


(D.) Median:
Median: The median is the middle value in an ordered array of
observations.If there is an even number of observations in the
array, the median is theaverage of the two middle numbers. If
there is an odd number of data in thearray, the median is the
middle number.The median is often used to summarize the
distribution of an outcome. If thedistribution is skewed, the
median and the interquartile range (IQR) may be

better than other measures to indicate where the observed data


areconcentrated.Generally, the median provides a better measure
of location than the meanwhen there are some extremely large or
small observations; i.e., when thedata are skewed to the right or
to the left. For this reason, median income isused as the measure
of location for the U.S. household income. Note that if the median
is less than the mean, the data set is skewed to the right. If
themedian is greater than the mean, the data set is skewed to
the left. Fornormal population, the sample median is distributed
normally with m = themean, and standard error of the
median (p/2) times standard error of themean.The mean has
two distinct advantages over the median. It is more stable,and
one can compute the mean based of two samples by combining
the twomeans.
(D.) Mode:
The mode is the most frequently occurring value in a set of
observations.Why use the mode? The classic example is the
shirt/shoe manufacturer whowants to decide what sizes to
introduce. Data may have two modes. In thiscase, we say the
data are bimodal, and sets of observations with more thantwo
modes are referred to as multimodal. Note that the mode is not a
helpfulmeasure of location, because there can be more than one
mode or even nomode.When the mean and the median are
known, it is possible to estimate themode for the unimodal
distribution using the other two averages as follows:
Mode 3(median) - 2(mean)

This estimate is applicable to both grouped and ungrouped data


sets.

You might also like