You are on page 1of 47

Business Statistics

Definition


Statistics is a standard method for collecting, organizing, summarizing, presenting, and analyzing and interpreting data for drawing conclusions and making decisions based upon the analyses of these data. Statistics are used extensively by engineers, managers, govt, businessmen, etc throughout the world.

 

Collection of data Types of data




Secondary data
  

Whether data are suitable? Whether data are adequate? Whether data are reliable?

Primary data
 

Questioning observation

Presentation of data


Classification
   

Geographical Chronological Quantitative Qualitative Classification according to class interval




Frequency distribution


Class limits
 

Exclusive method Inclusive method

 

Class intervals class frequency

Tabulation of data
 

Parts of table Charting of data


    

Bar Diagrams Pie diagrams Line graphs Histograms Frequency polygon

Functions of Statistics
   

 

Presents facts in a definite form Simplifies mass of figures Facilitates comparison Helps in formulating and testing hypothesis Helps in prediction Helps in the formulation of suitable policies.

Populations and Samples




A population is a complete set of all of the possible instances of a particular object




for example, students in this College. for example, any one of the classes.

A sample is a subset of the population




We use samples to draw conclusions about the parent population.

Measures of Central Tendency




If you have to declare a single value to represent a population or a sample, what do you use? The most common value is the mean, also called the average or the expected value. Another common value is the mode or the most likely (most common) value. Another value is the median or the middle of the data set.

Measures of Central Tendency (ungrouped)




Mean


This is the mathematical average of a set of numbers This is the middle value of a set of data that has been arranged from lowest to highest The value that occurs the most in a set of data

Median


Mode


We can use expenditure as a good way of discussing these three measures. If we wanted to know the average expenditure of NIFT students.


Lets take random samples of monthly expenditure of NIFT students.

What is the Mean?




The mean is the sum of all of the values in the data set divided by the number of values.


The equation for calculating the mean is the same for both samples and populations.
x x ! n
Mean

Sample Mean
1 x ! n Where:
  

i!1

xi

X-bar is the mean xi are the data points n is the sample size

Population Mean
Q !
1 N

i!1

Where:
  

is the population mean xi are the data points N is the total number of observations in the population

Measures of Central Tendency (ungrouped)


The sample gives these values:  5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500
 

The Mean


 

This is the average . Sum of values = 271500 Total N = 15 Mean = 18100

What is the Median?




If the data has been sorted (ascending or descending), the median is the middle value (for an odd number of points) or the average of the two middle values (for an even number of points). median is used to characterize data sets with a few extreme values that distort the relevance of the mean, such as house values or family incomes.

Median =

n+1 2

)th item in the data array

Measures of Central Tendency (ungrouped)


The sample gives these values:  5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500



The Median


This is the middle values: 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 The median here is 11500 In cases where there are two middle values, we average the two.

What is the Mode?




If the data is discrete, or has been grouped into discrete intervals, the mode is that value that occurs the most often.  In other words it is the value most likely to occur.

Measures of Central Tendency (ungrouped)


The sample gives these values:  5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500



The Mode


 

This is the most numerous value: 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 The Mode here is 6000. Sometimes there is no mode or even two modes!

Measures of Central Tendency (ungrouped)




So given these values 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000

what is the best measure of central tendency for this random sample of NIFT students?  Mean?...18100  Median?...11500  Mode?...6000


What Is the Range?




range: the distance between the lowest and the highest values in the set. For example, the time to drive to Churchgate is 2-hours plus or minus 15 minutes. Or, 105 to 135 minutes. Thus the range is 30 minutes.

Measures of Dispersion or Spread (ungrouped)




Range
 

The highest value minus the lowest value . From our last example, the range would be: 110000 5000 = 105000

What is the Variance?




The Variance of a population is the sum of the squares of the differences between the mean and the individual data points divided by the number of data points. The Variance of a sample is the sum of the squared differences divided by the number of data points less one.

What is the Standard Deviation?




Standard Deviation


This is the average distance your values have from the mean score.

The Standard Deviation is the square root of the variance

Computing Standard Deviation




Population
1 N

W !

i !1

( xi Q )2

The expression under the square root sign is the variance It is important that you recognize the difference between these two equations!

Sample "s"

n 1 2 s! ( xi x ) (n  1) i !1

Measures of Dispersion or Spread (ungrouped)




Standard Deviation Let s return to our NIFT random sample 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000

1.

2.

3. 4.

Follow the steps on the right while we calculate the standard deviation as a class on the board

5.

Calculate the mean which is 18100 Find the distance that each value has from the mean Square the distance Add up these distances and divide by the sample size 1 Then we get the square root of this number

Standard Deviation
X
5000 6000 6000 6000 6000 8000 11000 11500 12000 13000 15000 15000 17000 30000 110000

Mean (x-bar)
18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100

X x-bar
-13100 -12100 -12100 -12100 -12100 -10100 -7100 -6600 -6100 -5100 -3100 -3100 -1100 11900 91900

(X x-bar)2
17161 + E4 14641 + E4 14641 + E4 14641 + E4 14641 + E4 10201 + E4 5041 + E4 4356 + E4 3721 + E4 2601 + E4 961 + E4 961 + E4 121 + E4 14161 + E4 844561 + E4

Standard Deviation


We sum (x x-bar)2, and get the square root of this sum. This is the standard deviation. What is the square root of the sum?


Appx. 26,219

The Subtle Difference Between S and




The difference in the divisors (N versus n1) results in S being slightly larger than . This is to account for the fact that S (from a sample) is an estimate of the (of a population) and this adds a degree of error to the value. Note: for large n the difference is trivial.

A Valuable Tool


The standard deviation is a rather recent invention and was originally devised by Gauss to explain the error observed in measured star positions. Today it is used in everything from Quality Control to Measuring Risk in financial investments.

Measures of Central Tendency and Dispersion (Grouped Data)




Remember that grouped data is a collection of data that has been placed into categories Thus we need to calculate the mean and standard deviation differently, but the idea is the same.

A.M for Grouped Data


The following are the frequency distribution of 500 workers according to their weekly income (in Rs.) Find the average income.


Income 0 50 100 150 200 50 100 150 200 250

Persons 90 150 100 80 70 10

250 - 300

A.M for Grouped Data


Income Persons Mid values Deviation s

fxd -180 -150 0 80 140 30 -80

0 50 50 100 100 150 150 200 200 250 250 - 300 Total

90 150 10 80 70 10 500

25 75 125 175 225 275

-2 -1 0 1 2 3

A.M for Grouped Data

x!A 
i !1

fi X d i

i !1

x h fi x 50 ! Rs . 117

! 125 

 80
500

Advantages /Disadvantages of the Arithmetic Mean


 

 

   

Advantages: 1) Familiar and intuitively clear to most people 2) Every data set has one and only one mean 3) Useful for performing statistical procedures Disadvantages: 1) May be affected by extreme values 2) Tedious to compute 3) Difficult to compute for data set with open- ended classes

Computation of Mean, Median, and Mode for grouped Data

Age in (Yrs) 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 Total

id d=(X-A)/h Value(x) 15 -3 25 -2 35 -1 45 0 55 1 65 2

No. of Pts(f). 5 19 26 35 15 3 103

fxd -15 -38 -26 0 15 6 -58

Cummulative Frequency 5 24 50 85 100 103

Arithmetic ean = 45 + (-58)/103 X10 = 39.4

Computation of Mean, Median, and Mode for grouped Data

Median = L 

/ 2  C. F .) Xh where L is lower limit of Median Class; N is total Frequency, F C.F. id cumulative frequency of class preceding median class, F is frequency of median class and h is class width. N/2 = 103/2 = 51.5 This value lies in the class interval 40-50 (This value is seen from the cumulative frequency column). Hence L=40

Median = 40+ (51.2 -50)/ 35 x10

= 40.34

Comparing the Mean, Median, and Mode

Mode

Mean

Mean

Mode

Median

Median

Summary of Central Tendency Measures


Measure Mean Median Mode Equation 7x / n (n+1) th item in array 2 none Description Balance Point Middle value in ordered array Most frequent

Standard Deviation (Grouped data)

S.D !

f vd f

f vd vh  f

Where f is frequency; d is deviation computed as

xi  di= h

SD for Grouped Data


The following data provides the chest measurement in Cms. Of 50 MBBS students.  Find Mean and SD


Chest Measurement (Cms) 61 70 71 80 81 90 91 100 101 - 110

No. of Students 2 10 20 17 1

S.D for Grouped Data


CI
Mid Values(x)

Fr. (f)

D=  a x
h

fxd

f X d2

61 81 91
100

71 90 100
110

71 -80

65.5 75.5 85.5 95.5 105.5

2 10 20 17 1

-2 -1 0 1 2

Total

-4 -10 0 17 2 5

8 10 0 17 4 39

S.D for Grouped Data

x ! A 
n

f i xd

i ! 1

i ! 1

x h f
i

86

.5

f i xd

2 i

i ! 1

i ! 1

f i xd

i ! 1

i ! 1

x h

! 8 . 86

Uses of Standard Deviation




Aside from measure of dispersion... Determines where values of frequency distribution are in relation to mean ( standard scores ) Measures percentage of items within specific ranges
 

Chebyshev s Theorem Normal distribution

Coefficient of Variation
    

1. Measure of relative dispersion 2. Always a % 3. Shows variation relative to mean 4. Used to compare 2 or more groups Population W

Sample

s CV ! _ (100) x

CV !

(100) Q

Coefficient of Variation Example


Which technician shows more variability?

Qa!40 W  a!5

Qb!160 Wb!15

Solution
W CV ! Q (100)

Technician A = 5 (100) 40  = 12.5%

Technician B 15 (100) = 160 = 9.4%

Summary of Variation Measures


Measure Range Interquartile Range Standard Deviation (Sample) Standard Deviation (Population) Variance (Sample) Coeff. of Variation Equation Q 3 - Q1 Description Spread of Middle 50% Dispersion about Sample Mean Dispersion about Population Mean Squared Dispersion about Sample Mean Relative Variation x largest - x smallest Total Spread
_ 2 x  x n 1
2

Q

N _ 2 7(x  x )

n1 s / x (100)
_

You might also like