Basics of Stats

Business Statistics
Definition

Statistics is a standard method for collecting, organizing, summarizing, presenting, and analyzing and interpreting data for drawing conclusions and making decisions based upon the analyses of these data. Statistics are used extensively by engineers, managers, govt, businessmen, etc throughout the world.

Collection of data Types of data

Secondary data

Whether data are suitable? Whether data are adequate? Whether data are reliable?
Primary data

Questioning observation
Presentation of data

Classification

Geographical Chronological Quantitative Qualitative Classification according to class interval

Frequency distribution

Class limits

Exclusive method Inclusive method

Class intervals class frequency
Tabulation of data

Parts of table Charting of data

Bar Diagrams Pie diagrams Line graphs Histograms Frequency polygon
Functions of Statistics

Presents facts in a definite form Simplifies mass of figures Facilitates comparison Helps in formulating and testing hypothesis Helps in prediction Helps in the formulation of suitable policies.
Populations and Samples

A population is a complete set of all of the possible instances of a particular object

for example, students in this College. for example, any one of the classes.
A sample is a subset of the population

We use samples to draw conclusions about the parent population.
Measures of Central Tendency

If you have to declare a single value to represent a population or a sample, what do you use? The most common value is the mean, also called the average or the expected value. Another common value is the mode or the most likely (most common) value. Another value is the median or the middle of the data set.
Measures of Central Tendency (ungrouped)

Mean

This is the mathematical average of a set of numbers This is the middle value of a set of data that has been arranged from lowest to highest The value that occurs the most in a set of data
Median

Mode

We can use expenditure as a good way of discussing these three measures. If we wanted to know the average expenditure of NIFT students.

Lets take random samples of monthly expenditure of NIFT students.
What is the Mean?

The mean is the sum of all of the values in the data set divided by the number of values.

The equation for calculating the mean is the same for both samples and populations.
x x ! n
Mean
Sample Mean
1 x ! n Where:

i!1
xi
X-bar is the mean xi are the data points n is the sample size
Population Mean
Q !
1 N
i!1
Where:

is the population mean xi are the data points N is the total number of observations in the population

The sample gives these values: 5000, 6000, 30000, 110000, 15000, 6000, 17000, 13000, 12000, 11000, 8000, 6000, 15000, 6000, 11500

The Mean

This is the average . Sum of values = 271500 Total N = 15 Mean = 18100
What is the Median?

If the data has been sorted (ascending or descending), the median is the middle value (for an odd number of points) or the average of the two middle values (for an even number of points). median is used to characterize data sets with a few extreme values that distort the relevance of the mean, such as house values or family incomes.
Median =
n+1 2
)th item in the data array


The Median

This is the middle values: 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 The median here is 11500 In cases where there are two middle values, we average the two.
What is the Mode?

If the data is discrete, or has been grouped into discrete intervals, the mode is that value that occurs the most often. In other words it is the value most likely to occur.


The Mode

This is the most numerous value: 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000 The Mode here is 6000. Sometimes there is no mode or even two modes!

So given these values 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000
what is the best measure of central tendency for this random sample of NIFT students? Mean?...18100 Median?...11500 Mode?...6000

What Is the Range?

range: the distance between the lowest and the highest values in the set. For example, the time to drive to Churchgate is 2-hours plus or minus 15 minutes. Or, 105 to 135 minutes. Thus the range is 30 minutes.
Measures of Dispersion or Spread (ungrouped)

Range

The highest value minus the lowest value . From our last example, the range would be: 110000 5000 = 105000
What is the Variance?

The Variance of a population is the sum of the squares of the differences between the mean and the individual data points divided by the number of data points. The Variance of a sample is the sum of the squared differences divided by the number of data points less one.
What is the Standard Deviation?

Standard Deviation

This is the average distance your values have from the mean score.
The Standard Deviation is the square root of the variance
Computing Standard Deviation

Population
1 N
W !
i !1
( xi Q )2
The expression under the square root sign is the variance It is important that you recognize the difference between these two equations!
Sample "s"
n 1 2 s! ( xi x ) (n 1) i !1
Measures of Dispersion or Spread (ungrouped)

Standard Deviation Let s return to our NIFT random sample 5000, 6000, 6000, 6000, 6000, 8000, 11000, 11500, 12000, 13000, 15000, 15000, 17000, 30000, 110000
1.
2.
3. 4.
Follow the steps on the right while we calculate the standard deviation as a class on the board
5.
Calculate the mean which is 18100 Find the distance that each value has from the mean Square the distance Add up these distances and divide by the sample size 1 Then we get the square root of this number
Standard Deviation
X
5000 6000 6000 6000 6000 8000 11000 11500 12000 13000 15000 15000 17000 30000 110000
Mean (x-bar)
18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100 18100
X x-bar
-13100 -12100 -12100 -12100 -12100 -10100 -7100 -6600 -6100 -5100 -3100 -3100 -1100 11900 91900
(X x-bar)2
17161 + E4 14641 + E4 14641 + E4 14641 + E4 14641 + E4 10201 + E4 5041 + E4 4356 + E4 3721 + E4 2601 + E4 961 + E4 961 + E4 121 + E4 14161 + E4 844561 + E4
Standard Deviation

We sum (x x-bar)2, and get the square root of this sum. This is the standard deviation. What is the square root of the sum?

Appx. 26,219
The Subtle Difference Between S and

The difference in the divisors (N versus n1) results in S being slightly larger than . This is to account for the fact that S (from a sample) is an estimate of the (of a population) and this adds a degree of error to the value. Note: for large n the difference is trivial.
A Valuable Tool

The standard deviation is a rather recent invention and was originally devised by Gauss to explain the error observed in measured star positions. Today it is used in everything from Quality Control to Measuring Risk in financial investments.
Measures of Central Tendency and Dispersion (Grouped Data)

Remember that grouped data is a collection of data that has been placed into categories Thus we need to calculate the mean and standard deviation differently, but the idea is the same.
A.M for Grouped Data

The following are the frequency distribution of 500 workers according to their weekly income (in Rs.) Find the average income.

Income 0 50 100 150 200 50 100 150 200 250
Persons 90 150 100 80 70 10
250 - 300

Income Persons Mid values Deviation s
fxd -180 -150 0 80 140 30 -80
0 50 50 100 100 150 150 200 200 250 250 - 300 Total
90 150 10 80 70 10 500
25 75 125 175 225 275
-2 -1 0 1 2 3
x!A
i !1
fi X d i
i !1
x h fi x 50 ! Rs . 117
! 125
80
500
Advantages /Disadvantages of the Arithmetic Mean

Advantages: 1) Familiar and intuitively clear to most people 2) Every data set has one and only one mean 3) Useful for performing statistical procedures Disadvantages: 1) May be affected by extreme values 2) Tedious to compute 3) Difficult to compute for data set with open- ended classes
Computation of Mean, Median, and Mode for grouped Data
Age in (Yrs) 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 Total
id d=(X-A)/h Value(x) 15 -3 25 -2 35 -1 45 0 55 1 65 2
No. of Pts(f). 5 19 26 35 15 3 103
fxd -15 -38 -26 0 15 6 -58
Cummulative Frequency 5 24 50 85 100 103
Arithmetic ean = 45 + (-58)/103 X10 = 39.4
Computation of Mean, Median, and Mode for grouped Data
Median = L
/ 2 C. F .) Xh where L is lower limit of Median Class; N is total Frequency, F C.F. id cumulative frequency of class preceding median class, F is frequency of median class and h is class width. N/2 = 103/2 = 51.5 This value lies in the class interval 40-50 (This value is seen from the cumulative frequency column). Hence L=40
Median = 40+ (51.2 -50)/ 35 x10
= 40.34
Comparing the Mean, Median, and Mode
Mode
Mean
Mean
Mode
Median
Median
Summary of Central Tendency Measures

Measure Mean Median Mode Equation 7x / n (n+1) th item in array 2 none Description Balance Point Middle value in ordered array Most frequent
Standard Deviation (Grouped data)
S.D !
f vd f
f vd vh f
Where f is frequency; d is deviation computed as
xi di= h
SD for Grouped Data

The following data provides the chest measurement in Cms. Of 50 MBBS students. Find Mean and SD

Chest Measurement (Cms) 61 70 71 80 81 90 91 100 101 - 110
No. of Students 2 10 20 17 1
S.D for Grouped Data

CI
Mid Values(x)
Fr. (f)
D= a x
h
fxd
f X d2
61 81 91
100
71 90 100
110
71 -80
65.5 75.5 85.5 95.5 105.5
2 10 20 17 1
-2 -1 0 1 2
Total
-4 -10 0 17 2 5
8 10 0 17 4 39
S.D for Grouped Data
x ! A
n
f i xd
i ! 1
i ! 1
x h f
i
86
.5
f i xd
2 i
i ! 1
i ! 1
f i xd
i ! 1
i ! 1
x h
! 8 . 86
Uses of Standard Deviation

Aside from measure of dispersion... Determines where values of frequency distribution are in relation to mean ( standard scores ) Measures percentage of items within specific ranges

Chebyshev s Theorem Normal distribution
Coefficient of Variation

1. Measure of relative dispersion 2. Always a % 3. Shows variation relative to mean 4. Used to compare 2 or more groups Population W
Sample
s CV ! _ (100) x
CV !
(100) Q
Coefficient of Variation Example

Which technician shows more variability?
Qa!40 W a!5
Qb!160 Wb!15
Solution
W CV ! Q (100)
Technician A = 5 (100) 40 = 12.5%
Technician B 15 (100) = 160 = 9.4%
Summary of Variation Measures

Measure Range Interquartile Range Standard Deviation (Sample) Standard Deviation (Population) Variance (Sample) Coeff. of Variation Equation Q 3 - Q1 Description Spread of Middle 50% Dispersion about Sample Mean Dispersion about Population Mean Squared Dispersion about Sample Mean Relative Variation x largest - x smallest Total Spread
_ 2 x x n 1
2
Q
N _ 2 7(x x )
n1 s / x (100)
_

Basics of Stats

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basics of Stats

Uploaded by

Copyright:

Available Formats

Business Statistics

Collection of data Types of data

Geographical Chronological Quantitative Qualitative Classification according to class interval

Exclusive method Inclusive method

Class intervals class frequency

Parts of table Charting of data

Bar Diagrams Pie diagrams Line graphs Histograms Frequency polygon

Populations and Samples

A population is a complete set of all of the possible instances of a particular object

A sample is a subset of the population

We use samples to draw conclusions about the parent population.

Measures of Central Tendency

Measures of Central Tendency (ungrouped)

Lets take random samples of monthly expenditure of NIFT students.

What is the Mean?

Measures of Central Tendency (ungrouped)

This is the average . Sum of values = 271500 Total N = 15 Mean = 18100

What is the Median?

)th item in the data array

Measures of Central Tendency (ungrouped)

What is the Mode?

Measures of Central Tendency (ungrouped)

Measures of Central Tendency (ungrouped)

What Is the Range?

Measures of Dispersion or Spread (ungrouped)

What is the Variance?

What is the Standard Deviation?

The Standard Deviation is the square root of the variance

Computing Standard Deviation

Measures of Dispersion or Spread (ungrouped)

The Subtle Difference Between S and

Measures of Central Tendency and Dispersion (Grouped Data)

A.M for Grouped Data

Income 0 50 100 150 200 50 100 150 200 250

Persons 90 150 100 80 70 10

A.M for Grouped Data

fxd -180 -150 0 80 140 30 -80

25 75 125 175 225 275

A.M for Grouped Data

Advantages /Disadvantages of the Arithmetic Mean

Computation of Mean, Median, and Mode for grouped Data

Age in (Yrs) 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 Total

No. of Pts(f). 5 19 26 35 15 3 103

fxd -15 -38 -26 0 15 6 -58

Cummulative Frequency 5 24 50 85 100 103

Arithmetic ean = 45 + (-58)/103 X10 = 39.4

Computation of Mean, Median, and Mode for grouped Data

Median = 40+ (51.2 -50)/ 35 x10

Comparing the Mean, Median, and Mode

Summary of Central Tendency Measures

Standard Deviation (Grouped data)

Where f is frequency; d is deviation computed as

SD for Grouped Data

Chest Measurement (Cms) 61 70 71 80 81 90 91 100 101 - 110

S.D for Grouped Data

65.5 75.5 85.5 95.5 105.5

S.D for Grouped Data

Uses of Standard Deviation

Chebyshev s Theorem Normal distribution

Coefficient of Variation Example

Technician A = 5 (100) 40 = 12.5%

Technician B 15 (100) = 160 = 9.4%

Summary of Variation Measures

You might also like