Professional Documents
Culture Documents
n .p = ----- N .
Karl Pearson developed the concept of square goodness of fit test. Sir Ronald
fischer (1890-1962) made a major contribution in the field of experimental
design turning into science. Since 1935 “design of experiments” has made rapid
progress making collection and analysis of statistical prompter and more
economical. Design of experiments is the complete sequence of steps taken ahead
of time to ensure that the appropriate data will be obtained, which will permit an
objective analysis and will lead to valid inferences regarding the stated problem
MERITS OF STATSTICS
Presenting facts in a definite form
Simplifying mass of figure- condensation into few significant figures
Facilitating comparison
Helping in formulating and testing of hypothesis and developing new
theories.
Helping in predictions.
Helping in formulation of suitable policies.
LIMITATION OF STATSTICS
Does not deal with individual measurement.
Deals only with quantities characteristics.
Result is true only on an average.
It is only one of the methods of studying a problem.
Statistics can be measured. It requires skills to use it effectively, otherwise
misinterpretation is possible
It is only a tool or means to an end and not the end itself which has to be
intelligently identified using this tool.
Alphabets like A, B, C, X, U; S etc are used to denote sets. Braces like ‘{ }’ are used
as a notation for collection of objects or elements in the set.’ Greek letter epsilon “
” is used to denote “belongs to”. A vertical line ‘|’ is used to denote expression
‘such that’. Alphabet ‘I’ is used to denote an ‘integer’. Using above notation a set
called a considering of elements 0, 1, 2,3,4,5 May be mathematically denoted in
any of the following manner
Read as A is (a set of) (variable x) (such that) (x lies between 0 and 5, both
inclusive) where variable x belongs to integers.
E.g. each of sets A = {0, 1, 2, 3, 4, 5} OR THE SET B = {1, 3, 5} are subset of set C
WHERE C= {0, 1, 2, 3, 4, 5}
7) Equal sets if A is sub set of B and B is a subset of A then A and B are called
equal sets. This can be denoted as follows
If A B and B A then A= B
2. Secondary data: (are compiled by someone other than the user of the data for
decision making purpose)
Tabulation:
Go to next chapter>>>>>>>>>>>>>>>>>>>.
The arithmetic mean is the "standard" average, often simply called the "mean". It
is used for many purposes but also often abused by incorrectly using it to describe
skewed distributions, with highly misleading results. The classic example is
average income - using the arithmetic mean makes it appear to be much higher
than is in fact the case. Consider the scores {1, 2, 2, 2, 3, 9}. The arithmetic mean
is 3.16, but five out of six scores are below this!
The arithmetic mean of a set of numbers is the sum of all the members of the set
divided by the number of items in the set. (The word set is used perhaps
somewhat loosely; for example, the number 3.8 could occur more than once in
such a "set".) The arithmetic mean is what pupils are taught very early to call the
"average." If the set is a statistical population, then we speak of the population
mean. If the set is a statistical sample, we call the resulting statistic a sample
mean. The mean may be conceived of as an estimate of the median. When the
mean is not an accurate estimate of the median, the set of numbers, or frequency
distribution, is said to be skewed.
We denote the set of data by X = {x1, x2, ..., xn}. The symbol µ (Greek: mu) is
used to denote the arithmetic mean of a population. We use the name of the
variable, X, with a horizontal bar over it as the symbol ("X bar") for a sample
mean. Both are computed in the same way:
Geometric mean
The geometric mean is an average which is useful for sets of numbers which are
interpreted according to their product and not their sum (as is the case with the
arithmetic mean). For example rates of growth.
The geometric mean of a set of positive data is defined as the product of all the
members of the set, raised to a power equal to the reciprocal of the number of
members. In a formula: the geometric mean of a1, a2, ..., an is , which is . The
geometric mean is useful to determine "average factors". For example, if a stock
rose 10% in the first year, 20% in the second year and fell 15% in the third year,
then we compute the geometric mean of the factors 1.10, 1.20 and 0.85 as (1.10 ×
1.20 × 0.85)1/3 = 1.0391... and we conclude that the stock rose on average 3.91
percent per year. The geometric mean of a data set is always smaller than or
equal to the set's arithmetic mean (the two means are equal if and only if all
members of the data set are equal). This allows the definition of the arithmetic-
geometric mean, a mixture of the two which always lies in between. The
geometric mean is also the arithmetic-harmonic mean in the sense that if two
sequences (an) and (hn) are defined:
and
Harmonic mean
The harmonic mean is an average which is useful for sets of numbers which are
defined in relation to some unit, for example speed (distance per unit of time).
The harmonic mean is never larger than the geometric mean or the arithmetic
mean (see generalized mean). In certain situations, the harmonic mean provides
the correct notion of "average". For instance, if for half the distance of a trip you
travel at 40 miles per hour and for the other half of the distance you travel at 60
miles per hour, then your average speed for the trip is given by the harmonic
mean of 40 and 60, which is 48; that is, the total amount of time for the trip is the
same as if you traveled the entire trip at 48 miles per hour. Similarly, if in an
electrical circuit you have two resistors connected in parallel, one with 40 ohms
and the other with 60 ohms, then the average resistance of the two resistors is 48
ohms; that is, the total resistance of the circuit is the same as it would be if each
of the two resistors were replaced by a 48-ohm resistor. (Note: this is not to be
confused with their equivalent resistance, 24 ohm, which is the resistance needed
for a single resistor to replace the two resistors at once.) Typically, the harmonic
mean is appropriate for situations when the average of rates is desired.
Another formula for the harmonic mean of two numbers is to multiply the two
numbers, and divide that quantity by the arithmetic mean of the two numbers. In
mathematical terms:
Merits:
It is based on each and every item of the series.
It is rigidly defined
It is useful in averaging ratio and percentage in determining
rates of increase or decrease.
it gives less weight to large items and more to small items. Thus
geometric mean of the geometric of values is always less than
their arithmetic mean.
It is capable of algebraic manipulation like computing the grand
geometric mean of the geometric mean of different sets of
values.>
Limitation:
It is relatively difficult to comprehend, compute and interpret.
A G.M with zero value cannot be compounded with similar
other non-zero values with negative sign
Go to next chapter>>>>>>>>>>>>>>>>>>>.
what to you understand by concept of probability. Explain
various theories of probability
probability is a branch of mathematics that measures the likelihood that an event
will occur. Probabilities are expressed as numbers between 0 and 1. The
probability of an impossible event is 0, while an event that is certain to occur has
a probability of 1. Probability provides a quantitative description of the likely
occurrence of a particular event. Probability is conventionally expressed on a
scale of zero to one. A rare event has a probability close to zero. A very common
event has a probability close to one.
Four theories of probability
1)Classical or a priori probability: this is the oldest concept evolved in 17th
century and based on the assumption that outcomes of random experiments (like
tossing of coin, drawing cards from a pack or throwing a die) are equally likely.
For this reason this is not valid in the following cases (a) Where outcomes of
experiments are not equally likely, for example lives of different makes of bulbs.
2) Empirical concept: this was developed in 19th centaury for insurance business
data and is based on the concept of relative frequency. It is based on historical
data being used for future prediction. When we toss a coin, the probability of a
head coming up is ½ because there are two equally likely events, namely
appearance of a head or that of a tail. This is an approach of determining a
probability from deductive logic.
a) The probability of an event ranges from 0 to 1. That is, an event surely not be
happen has probability 0 and another event sure to happen is associated with
probability 1.
b) The probability of an entire sample space (that is any, some or all the possible
outcomes of an experiment) is 1. Mathematically, P(S) =1
E= expected frequencies
2) Take difference between O and E for each cell and calculate their square (O-E)
2
4) Compare calculated value with table value at given degree of freedom and
specified level of significance. If at a stated level, the calculated value is more
than table values, the difference between theoretical and observed frequencies
are considered to be significant. It could not have arisen due to fluctuation of
simple sampling. However if the values is less than table value it is not
considered as significant, regarded as due to fluctuation of simple sampling and
therefore ignored. Condition for applying x2
1) N must be large, say more than 50, to ensure the similarity between
theoretically correct distribution and our sampling distribution.
2) no theoretical cell frequency cell frequency should be too small, say less than
5,because that may be over estimation of the value of x2 and may result into
rejection of hypotheses. In case we get such frequencies, we should pool them up
with the previous or succeeding frequencies. This action is called Yates correction
for continuity.
Go to next chapter>>>>>>>>>>>>>>>>>>>.
2) Index number measure the net change in a group of related variable over a
period of time.
3) Index number measure the effect of change over a period of time, across the
range of industries, geographical regions or countries.
1) Obtain decision about class of people for whom the index number is to be
computed, for instance, the industrial personnel, officers or teachers etc. also
decide on the geographical area to be covered.
2) Conduct a family budget inquiry covering the class of people for whom the
index number is to be computed. The enquiry should be conducted for the base
year by the process of random sampling. This would give information regarding
the nature, quality and quantities of commodities consumed by an average family
of the class and also the amount spent on different items of consumption.
4) Collect retail prices in respect of the items from the localities in which the class
of people concerned reside, or from the markets where they usually make their
purchases.
5) as the relative importance of various items for different classes of people is not
the same, the price or price relative are always weighted and therefore, the cost of
living index is always a weighted index.
6) The percentage expenditure on an item constitutes the weight of the item and
the percentage expenditure in the five groups constitutes the group weight.
7) Separate index number are first of all determined for each of the five major
groups, by calculating the weighted average of price-relatives of the selected
items in the group.
INDEX NUMBER OF INDUSTRIAL
PRODUCTION
The index number of industrial production is designed to measure increase or
decrease in the level of industrial production in a given period of time compared
to some base periods. Such an index measures changes in the quantities of
production and not their values. Data about the level of industrial output in the
base period and in the given period is to be collected first under the following
heads
Textile industries to include cotton, woolen, silk etc.
Mining industries like iron ore, iron, coal, copper, petroleum etc.
Metallurgical industries like automobiles, locomotive, aero planes etc
Industries subject to excise duties like sugar, tobacco, match etc.
Miscellaneous like glass, detergents, chemical, cement etc.
The figure of output for a various industries classifies above are
obtained on a monthly, quarterly or yearly basis. Weights
are assigned to various industries on the basis of some
criteria such as capital invested turnover, net output,
production etc. usually the weights in the index are based
on the values of net output of different industries. The
index of industrial production is obtained by taking the
simple mean or geometric mean of relatives. When the
simple arithmetic mean is used the formula for
constructing the index is as follows.
Go to next chapter>>>>>>>>>>>>>>>>>>>.
Q. enumerate probability distribution; explain the
histogram and probability distribution curve.
for any x in R.
The Bernoulli distribution, which takes value 1 with probability p and value 0
with probability q = 1 - p.
k! is the factorial of k,
e= exponential constant=2.7183
= mathematical constant=3.1416
HISTOGRAM AND PROBABILITY DISTRIBUTION curve
Histograms--bar charts in which the area of the bar is proportional to the number
of observations having values in the range defining the bar. As an example
construct histograms of populations .the population histogram describes the
proportion of the population that lies between various limits. It also describes the
behavior of individual observations drawn at random from the population, that
is, it gives the probability that an individual selected at random from the
population will have a value between specified limits. When we're talking about
populations and probability, we don't use the words "population histogram".
Instead, we refer to probability densities and distribution functions. (However, it
will sometimes suit my purposes to refer to "population histograms" to remind
you what a density is.) When the area of a histogram is standardized to 1, the
histogram becomes a probability density function. The area of any portion of the
histogram (the area under any part of the curve) is the proportion of the
population in the designated region. It is also the probability that an individual
selected at random will have a value in the designated region. For example, if
40% of a population has cholesterol values between 200 and 230 mg/dl, 40% of
the area of the histogram will be between 200 and 230 mg/dl. The probability
that a randomly selected individual will have a cholesterol level in the range 200
to 230 mg/dl is 0.40 or 40%. Strictly speaking, the histogram is properly a
density, which tells you the proportion that lies between specified values. A
(cumulative) distribution function is something else. It is a curve whose value is
the proportion with values less than or equal to the value on the horizontal axis,
as the example to the left illustrates. Densities have the same name as their
distribution functions. For example, a bell-shaped curve is a normal density.
Observations that can be described by a normal density are said to follow a
normal distribution.