You are on page 1of 23

Name:

Roll No:

Learning Centre:

Subject: MB0040 – STATISTICS FOR MANAGEMENT

Date of Submission at the Learning Centre:

Assignment Set- 1
Q1. Why it is necessary to summarise data? Explain the approaches
available to summarize the data distributions?

Answer:

Graphical representation is a good way to represent summarised data.


However, graphs provide us only an overview and thus may not be used for
further analysis. Hence, we use summary statistics like computing averages.
To analyse the data. Mass data, which is collected, classified, tabulated and
presented systematically, is analysed further to bring its size to a single
representative figure. This single figure is the measure which can be found at
central part of the range of all values. It is the one which represents the
entire data set. Hence, this is called the measure of central tendency.

In other words, the tendency of data to cluster around a figure which is in


central location is known as central tendency. Measure of central tendency or
average of first order describes the concentration of large numbers around a
particular value. It is a single value which represents all units.

Statistical Averages: The commonly used statistical averages are


arithmetic mean, geometric mean, harmonic mean.

Arithmetic mean is defined as the sum of all values divided by number of


values and is represented by X.

Before we study how to compute arithmetic mean, we have to be familiar


with the terms such as discrete data, frequency and frequency distribution,
which are used in this unit.

If the number of values is finite, then the data is said to be discrete data. The
number of occurrences of each value of the data set is called frequency of
that value. A systematic presentation of the values taken by variable
together with corresponding frequencies is called a frequency distribution of
the variable.

Median: Median of a set of values is the value which is the middle most
value when they are arranged in the ascending order of magnitude. Median is
denoted by ‘M’.

Mode: Mode is the value which has the highest frequency and is denoted by
Z.

Modal value is most useful for business people. For example, shoe and
readymade garment manufacturers will like to know the modal size of the
people to plan their operations. For discrete data with or without frequency, it
is that value corresponding to highest frequency.
Appropriate Situations for the use of Various Averages

1. Arithmetic mean is used when:

a. In depth study of the variable is needed

b. The variable is continuous and additive in nature

c. The data are in the interval or ratio scale

d. When the distribution is symmetrical

2. Median is used when:

a. The variable is discrete

b. There exists abnormal values

c. The distribution is skewed

d. The extreme values are missing

e. The characteristics studied are qualitative

f. The data are on the ordinal scale

3. Mode is used when:

a. The variable is discrete

b. There exists an abnormal value

c. The distribution is skewed

d. The extreme values are missing

e. The characteristics studied are qualitative

4. Geometric mean is used when:

a. The rate of growth, ratios and percentages are to be studied

b. The variable is of multiplicative nature

5. Harmonic mean is used when:

a. The study is related to speed, time

b. Average of rates which produce equal effects has to be found

Positional Averages
Median is the mid-value of series of data. It divides the distribution into two
equal portions. Similarly, we can divide a given distribution into four, ten or
hundred or any other number of equal portions.

Q2. Explain the purpose of tabular presentation of statistical data.


Draft a form of tabulation to show the distribution of population
according to i) Community by age, ii) Literacy , iii) sex , and iv)
marital status.

Answer:

The objectives of tabulation are to:

i. Simplify complex data

ii. Highlight important characteristics

iii. Present data in minimum space

iv. Facilitate comparison

v. Bring out trends and tendencies

vi. Facilitate further analysis

Tabulation is an orderly arrangement of data in columns and rows


systematically in a tabular form. It is the logical listing of related quantitative
data in vertical columns and horizontal rows. The presentation of data in
tables should be simple, systematic and unambiguous.

The purpose of tabular presentation of statistical data is to:

a) Simplify complex data

Tabulation simplifies the complex data by presenting them systematically


in columns and rows in a condensed form. It avoids all the unnecessary data
that is found in a narrative form.

b) Highlight important characteristics

It also helps to highlight the important characteristics of the data. As it


avoids all the unnecessary data that is usually found in a narrative form.

c) Present data in minimum space


Tabulation achieves economy in using the space for presenting the data.
The textual matter is presented neatly in a short form without sacrificing
utility of the data.

d) Facilitate comparison

The data presented in a tabular form is helpful for a comparative study.


The relationship among the various items can be easily understood.

e) Bring out trends and tendencies

Tabulation depicts the data and their significance at first in the form of
figures, which cannot be understood when the same data are in a narrative
form.

f) Facilitate further analysis

The Tabulation is analytical in nature and hence it helps in further


analysis.

Following is the form of tabulation to present the distribution of population


according to Community by age, Literacy, Sex, and Marital status

Marital
Sex Educated Non-Educated
Status

20- Above Below 20- Above


Age: Below
40 40 20yrs 40 40
20yrs

Male

Married
Femal
e

Male

Unmarried
Femal
e
Q3. Give a brief note of the measures of central tendency together
with their merits & Demerits. Which is the best measure of central
tendency and why?

Answer:

Condensation of data is necessary for a proper statistical analysis. A large


number of big numbers are not only confusing to mind but also difficult to
analyse. After a thorough scrutiny of collected data, classification which is a
process of arranging data into different homogenous classes according to
resemblances and similarities is carried out first. Then of course tabulation of
data is resorted to. The classification and tabulation of the collected data
besides removing the complexity render condensation and comparison.

An average is defined as a value which should represent the whole mass of


data. It is a typical or central value summarizing the whole data. It is also
called a measure of central tendency for the reason that the individual
values in the data show some tendency to centre about this average. It will
be located in between the minimum and the maximum of the values in the
data.

There are five types of average which are

Arithmetic Mean, Median, Mode, Geometric and Harmonic Mean

Arithmetic Mean

The Arithmetic mean or simply the mean is the best known easily understood
and most frequently used average in any statistical analysis. It is defined as
the sum of all the values in the data.

Median: Median is another widely known and frequently used average.It is


defined as the most central or the middle most value of the data given in the
form of an array. By an array, we mean an arrangement of the data either in
ascending order or descending order of magnitude. In the case of ungrouped
data one has to form an array first and then locate the middle most value
which is the median. For ungrouped data the median is fixed by using,

Median = [n+1/2] the value in the array.

Mode: The word mode seems to have been derived French 'a la mode' which
means 'that which is in fashion'. It is defined as the value in the data which
occurs most frequently. In other words, it is the most frequently occurring
value in the data. For ungrouped data we form the array and then fix the
mode as the value which occurs most frequently. If all the values are distinct
from each other, mode cannot be fixed. For a frequency distribution with just
one highest frequency such data are called unimodal or two highest
frequencies [such data are called bimodal],mode is found by using the
formula,

Mode=l+cf2/f1+f2
Where l is the lower limit of the model class, c is its class interval f1 is the
frequency preceding the highest frequency and f2 is the frequency
succeeding the highest frequency.

Relative merits and demerits of Mean, Median and Mode:

Mean: The mean is the most commonly and frequently used average. It is a
simple average, understandable even to a layman. It is based on all the
values in a given data. It is easy to calculate and is basic to the calculation of
further statistical measures of dispersion, correlation etc. Of all the averages,
it is the most stable one. However it has some demerits. It gives undue
weightages to extreme value. In other words it is greatly influenced by
extreme values. Moreover; it cannot be calculated for data with open - ended
classes at the extreme. It cannot be fixed graphically unlike the median or
the mode. It is the most useful average of analysis when the analysis is made
with full reference to the nature of individual values of the data. In spite of a
few shortcomings; it is the most satisfactory average.

Median: The median is another well-known and widely used average. It is


well-defined formula and is easily understood. It is advantageously used as a
representative value of such factors or qualities which cannot be measured.
Unlike the mean, median can be located graphically. It is also possible to find
the median for data with open ended classes at the extreme. It is amenable
for further algebraic processes. However, it is an average, not based on all
the values of the given data. It is not as stable as the mean. It has only a
limited use in practice.

Mode: It is a useful measure of central tendency, as a representative of the


majority of values in the data. It is a practical average, easily understood by
even laymen. Its calculations are not difficult. It can be ascertained even for
data with open-ended classes at the extreme. It can be located by graphical
means using a frequency curve. The mode is not based on all the values in
the data. It becomes less useful when the data distribution is not uni-model.
Of all the averages, it is the most unstable average.
Q4. Machines are used to pack sugar into packets supposedly
containing 1.20 kg each. On testing a large number of packets over a
long period of time, it was found that the mean weight of the
packets was 1.24 kg and the standard deviation was 0.04 Kg. A
particular machine is selected to check the total weight of each of
the 25 packets filled consecutively by the machine. Calculate the
limits within which the weight of the packets should lie assuming
that the machine is not been classified as faulty.

Answer:

 Mean weight of the packets = 1.24 kg

 Standard Deviation,SD = 0.04kg

 Variance = 0.04^2 = 0.0016

 Standard Error, SE = 0.04/sqrt(25)

• = 0.04/5 = 0.008

 Considering 99.7% confidence level

The means will lie between (1.2+3SE) and (1.2-3SE)

 Upper limit is 1.224kg

 Lower Limit is 1.176kg

Q5. A packaging device is set to fill detergent power packets with a


mean weight of 5 Kg. The standard deviation is known to be 0.01 Kg.
These are known to drift upwards over a period of time due to
machine fault, which is not tolerable. A random sample of 100
packets is taken and weighed. This sample has a mean weight of
5.03 Kg and a standard deviation of 0.21 Kg. Can we calculate that
the mean weight produced by the machine has increased? Use 5%
level of significance.

Answer:

 Mean weight of packages, X1 = 5kg

 SD1 = 0.01kg

 Sample size, N= 100

 Sample mean weight, X2 = 5.03kg

 SD2 = 0.21kg

 Using 95% confidence level

 Z = 1.96

 1.96 = [(X-X2)/SD1]/sqrt N

 1.96 =[(X-5.03)/0.01]/sqrt (100)

 Mean Weight X = 5.226kg

Q6. Find the probability that at most 5 defective bolts will be found
in a box of 200 bolts. If it is known that 2 per cent of such bolts are
expected to be defective. (You may take the distribution to be
Poisson; e-4= 0.0183).

Answer:

Poisson distribution

A Poisson random variable is the number of successes that result from a


Poisson experiment. The probability distribution of a Poisson random variable
is called a Poisson distribution.

Given the mean number of successes (μ) that occur in a specified region, we
can compute the Poisson probability based on the following formula:
Poisson Formula. Suppose we conduct a Poisson experiment, in which the
average number of successes within a given region is μ. Then, the Poisson
probability is:

P(x; μ) = (e-μ) (μx) / x!

where x is the actual number of successes that result from the experiment,
and e is approximately equal to 2.71828.

The Poisson distribution has the following properties:

The mean of the distribution is equal to μ.

The variance is also equal to μ .

M=5

PX = 0.0183*4/5

=0.01464

Thus, the probability that at most 5 defective bolts will be found in a box of
200 bolts is 0.01464

Assignment Set- 2
Q1. What do you mean by Statistical Survey? Differentiate between
“Questionnaire” and “Schedule”.

Answer:

1. Definition of statistical survey

A Statistical survey is a scientific process of collection and analysis of


numerical data. Statistical surveys are used to collect numerical information
about units in a population. Surveys involve asking questions to individuals.
Surveys of human populations are common in government, health, social
science and marketing sectors.

Stages of Statistical Survey


Statistical surveys are categorised into two stages –

• Planning and

• Execution.

The figure below shows the two broad stages of Statistical survey.

Fig.1: Stages of Statistical Survey

Information is collected through mailed questionnaires

Often, information is collected through questionnaires. The questionnaires


are filled with questions pertaining to the investigation. They are sent to the
respondents with a covering letter soliciting cooperation from the
respondents (respondents are the people who respond to questions in the
questionnaire). The respondents are asked to give correct information and to
mail the questionnaire back. The objectives of investigation are explained in
the covering letter together with assurance for keeping information provided
by the respondents as confidential.

Good questionnaire construction is an important contributing factor to the


success of a survey. When questionnaires are properly framed and
constructed, they become important tools by which statements can be made
about specific people or entire populations.

This method is generally adopted by research workers and other official and
non-official agencies. This method is used to cover large areas of
investigation. It is more economical and free from investigator’s bias.
However, it results in many “non-response” situations. The respondent may
be illiterate. The respondent may also provide wrong information due to
wrong interpretation of questions.

If the questionnaire consists of invalid questions, or questions in incorrect


order, or questions in inappropriate format, or questions that are biased, then
the survey would be useless. An important method for checking and making
sure whether a questionnaire is accurately capturing the intended
information is to pre-test among a smaller subset of target respondents.
Success of questionnaire method of collection of data depends mainly on
proper drafting of the questionnaire. You have to keep the following points in
mind while preparing a questionnaire:

The respondent should not take much time in completing the questionnaire.
It should be small and not lengthy.

· The questions asked should be well structured and unambiguous.

· The questions asked should be in proper logical sequence.

· Questions should be unbiased. The questions in the questionnaire should


not disturb the privacy of the respondents.

· The task of completion of questionnaire should not have much writing work.

· Necessary instructions and glossary should be given in covering letter.

· Questions involving technological jargons and mathematical calculations


should be avoided.

· The completed questionnaire should be kept confidential and used only for
the purpose of the survey as mentioned in the investigation.

· There should not be any scope for misinterpretation in the questions.

There are different types of questions that can be used in the questionnaire.
A questionnaire can have Contingency questions, Matrix questions, Closed
ended questions and Open ended questions. Let’s have a look at each one in
detail

Contingency questions are questions that are answered only if the


respondent gives a particular response to a previous question. This avoids
asking people questions that do not apply to them

Matrix questions are questions which are placed one under the other,
forming a matrix. The response categories are placed along the top and a list
of questions are placed down the side. This is used to efficiently occupy page
space and respondents’ time.

Closed ended questions are those where the respondents’ answers are
limited to a fixed set of responses. Usually scales are closed ended.

There are various types of closed ended questions.

Yes/no questions – here the respondents answer with “yes” or “no”. Some
of the examples are:
Multiple choices – here the respondents have several options from which to
choose. For example:

Scaled questions – here the responses are graded on a continuum (For


example, rating the appearance of a product on a scale from 1 to 10, with 10
implying the most preferred appearance and 1 implying the least preferred
appearance). Scaled questions are mostly questions related to attitudes. A
Likert scale provides a number of attitude statements. The respondent has to
say how much they agree or disagree with each one.

Open ended questions are those questions for which the respondent
supplies their own answer without any fixed set of possible responses.
Examples of types of open ended questions include:

Sentence completion – In these, respondents complete an incomplete


sentence.

Story completion – In these, respondents complete an incomplete story.

Picture completion – In these, respondents fill in an empty conversation


balloon.

Thematic Apperception Test – In these, respondents explain a picture or


make up a story about what they think is happening in the picture.

Information through schedule filled by investigators


Information can be collected through schedules filled by investigators
through personal contact. In order to get reliable information, the investigator
should be well trained, tactful, unbiased and hard working.

A schedule is suitable for an extensive area of investigation through


investigator’s personal contact. The problem of non-response is minimised.

There is a difference between a schedule and a questionnaire. A schedule is a


form that the investigator fills himself through surveying the units or
individuals. A questionnaire is a form sent (usually mailed) by an investigator
to respondents. The respondent has to fill it and then send it back to the
investigator.

Difference between Questionnaire and Schedule is as follows:

S.N Basis Questionnaire Schedule


o.

1. Liability Informant is liable for filling Enumerator fills it up after


for filling it up. getting answers from
up informants.

2. Means of It is sent to the informants Enumerators themselves


Informatio by post. take up schedules and
n contact the informants.

3. Personal Investigator does not have a Both investigator and


Relations personal contact with the informants have personal
hip informants. contact through schedule.

4. Nature of Sometimes incomplete as Complete information is


Informatio there is lack of personal received because of the
n contact. personal contact between
the investigator and
informants.

5. Scope of The use of Questionnaire is Schedule can be used for


Enquiry suitable where the both literate as well as
informants are literate. illiterate persons.

6. Economic Information by mailed It is comparatively a costly


al questionnaire method is method as most of the
economical. enumerators are paid.

7. Reliability The information collected It is reliable method as the


through it is less reliable as enumerators can get
informants cannot give correct answers after
correct answers to some of clarifying the questions to
the questions. the informants.

8. Delay There is delay in the receipt The information is quickly


of information by this collected by the
method. enumerators.

Q2. The table shows the data of Expenditure of a family on food,


clothing, education, rent and other items.

Items Expenditure

Food 4300

Clothing 1200

Education 700

Rent 2000

Others 600

Depict the data shown in the table using Pie chart.

Answer:

Items Food Clothing Education Others Rent

Expenditur 4300 1200 700 600 2000


e
PIE CHART

2000
Food
Clothing
600 Education
Others
4300 Rent
700

1200
Q3. Average weight of 100 screws in box „A‟ is 10.4 gms. It is mixed
with 150 screws of box „B‟. Average weight of mixed screws is 10.9
gms. Find the average weight of screws of box „B‟.

Answer:

Given: Average Weight in Box ‘A’ [XA] = 10.4 gms.

Number of Screws in Box ‘A’ [NA] = 100.

Number of Screws in Box ‘B’ [NB] =150.

Average Weight of mixed Screws [XAB] =10.9 gms.

[XAB] = NA XA + NB XB

N A + NB

10.9 = (100 X 10.4) + (150 X XB)

100 + 150

XB = 11.23 gms.

Q4. (a) Discuss the rules of “Probability”.

(b) What is meant by “Conditional Probability”?


Answer:

4(a)Rules of Probability (unit-5) (5.2 and 5.3)

Managers very often come across with situations where they have to take
decisions about implementing either course of action A or course of action B
or course of action C. Sometimes, they have to take decisions regarding the
implementation of both A and B.

For Example: A Sales manager may like to know the probability that he will
exceed the target for product A or product B.sometimes,he would like to
know the probability that the sales of product A and B will exceed the
target.the first type of probability is answered by addition rule.the second
type of probability is answered by multiplication rule.

Addition rule:

The addition rule of probability states that:

i) If ‘A’ and ‘B’ are any two events then the probability of the occurrence
of either ‘A’ or ‘B’ is given by:

ii) If ‘A’ and ‘B’ are two mutually exclusive events then the probability of
occurrence of either A or B is given by:

iii) If A, B and C are any three events then the probability of occurrence of
either A or B or C is given by:

In terms of Venn diagram, we can calculate the probability of occurrence of


either event ‘A’ or event ‘B’, given that event ‘A’ and event ‘B’ are dependent
events. From the figure 5.5, we can calculate the probability of occurrence of
either ‘A’ or ‘B’, given that, events ‘A’ and ‘B’ are independent events. From
the figure 5.6, we can calculate the probability of occurrence of either ‘A’ or
‘B’ or ‘C’, given that, events ‘A’, ‘B’ and ‘C’ are dependent events.
iv) If A1, A2, A3………, An are ‘n’ mutually exclusive and exhaustive events
then the probability of occurrence of at least one of them is given by:

Multiplication rule :

If ‘A’ and ‘B’ are two independent events then the probability of occurrence of
‘A’ and ‘B’ is given by:

4(b) Conditional Probability :

Sometimes we wish to know the probability that the price of a particular


petroleum product will rise, given that the finance minister has increased the
petrol price. Such probabilities are known as conditional probabilities.

Thus the conditional probability of occurrence of an event ‘A’ given that the
event ‘B’ has already occurred is denoted by P (A / B). Here, ‘A’ and ‘B’ are
dependent events. Therefore, we have the following rules.

If ‘A’ and ‘B’ are dependent events, then the probability of occurrence of ‘A
and B’ is given by:

It follows that:

For any bivariate distribution, there exists two marginal distributions and
‘m + n’ conditional distributions, where ‘m’ and ‘n’ are the number of
classifications/characteristics studied on two variables.
Q5. (a) What is meant by “Hypothesis Testing”? Give Examples

(b) Differentiate between “Type-I” and “Type-II” Errors

Answer:
5(a) In hypothesis testing, we must state the assumed or hypothesised value
of the population parameter before we begin sampling. The assumption we
wish to test is called the null hypothesis and is symbolised by ’Ho’.

The term ‘null hypothesis’ arises from earlier agricultural and medical
applications of statistics. In order to test the effectiveness of a new fertilizer
or drug, the tested hypothesis (the null hypothesis) was that it had no effect,
that is, there was no difference between treated and untreated samples. If we
use a hypothesised value of a population mean in a problem, we would
represent it symbolically as ‘µ H0’. This is read – ‘The hypothesised value of
the population mean’.

If our sample results fail to support the null hypothesis, we must conclude
that something else is true. Whenever we reject the hypothesis, the
conclusion we do accept is called the alternative hypothesis and is
symbolised H1 (“H sub-one”).

For the null hypothesis H0: µ = 200, we will consider three alternative
hypothesis as:

H1: µ ≠ 200 (population mean is not equal to 200)

H1: µ > 200 (population mean greater than 200)

H1: µ < 200 (population mean less than 200)

Example

We want to test the hypothesis that the population mean is


equal to 500. We would symbolise it as follows and read it as,

The null hypothesis is that the population mean = 500 written


as,

The purpose of hypothesis testing is not to question the computed value of


the sample statistic but to make a judgment about the difference between
that sample statistic and a hypothesised population parameter.

The next step after stating the null and alternative hypotheses is to decide
what criterion to be used for deciding whether to accept or reject the null
hypothesis. If we assume the hypothesis is correct, then the significance level
will indicate the percentage of sample means that is outside certain limits (In
estimation, the confidence level indicates the percentage of sample means
that falls within the defined confidence limits).

5(b) Type I error:

Suppose that making a Type I error (rejecting a null hypothesis when it is


true) involves the time and trouble of reworking a batch of chemicals that
should have been accepted. At the same time, making a Type II error
(accepting a null hypothesis when it is false) means taking a chance that an
entire group of users of this chemical compound will be poisoned. Obviously,
the management of this company will prefer a Type I error to a Type II error
and, as a result, will set very high levels of significance in its testing to get
low β ’s.

Type II error:

Suppose, on the other hand, that making a Type I error involves


disassembling an entire engine at the factory, but making a Type II error
involves relatively inexpensive warranty repairs by the dealers. Then the
manufacturer is more likely to prefer a Type II error and will set lower
significance levels in its testing.

Q6. From the following table, calculate Laspyres Index Number,


Paasches Index Number, Fisher‟s Price Index Number and Dorbish &
Bowley‟s Index Number taking 2008 as the base year?

Commodity 2008 2009

Price (Rs) per Quantity in Kg Price (Rs) per Quantity in Kg


Kg Kg

A 6 50 10 56

B 2 100 2 120

C 4 60 6 60

D 10 30 12 24

E 8 40 12 36

Answer:

Commodi 2008 2009 P0q0 P1q1 P0q1 P1q0


Price Qty Price Qty in
ty (Rs.) in Kg (Rs.) per Kg
per Kg Kg (P1)
(P0) (q0) (q1)

A 6 50 10 56 300 560 336 500

B 2 100 2 120 200 240 240 200

C 4 60 6 60 240 360 240 360

D 10 30 12 24 300 288 240 360

E 8 40 12 36 320 432 288 480

Total (Σ) 136 188 134 190


0 0 4 0

a) Laspeyer’s Method= P01= Σ P1q0 X 100

Σ P0q0

Laspeyer’s Index Number= P01= 1900 X 100 = 139.71

1360

b) Paasche’s Method= P01= Σ P1q1 X 100

Σ P0q1

Paasche’s Index Number= P01= 1880 X 100 = 139.88

1344

c) Dorbish and Bowley’s Method= P01= Σ P1q0 + Σ P1q1

Σ P0q0 Σ P0q1 X 100

Or

L+P
2

Dorbish & Bowley’s Index Number= P01= 139.71+139.88 = 139.80

d) Fisher’s Method= P01= √Σ P1q0 X Σ P1q1 X 100

Σ P0q0 Σ P0q1

Fisher’s Price Index Number= P01= √ 1900 X 1880 X 100 = 139.79

1360 1344

You might also like