Professional Documents
Culture Documents
When n, the binomial distribution trend to normality, therefore normal distribution is a limiting
are of binomial distribution.
Normal distribution is the most important distribution in all, (or in statistics), which is used to
Normal distribution play and important role both in theoretical and applied spastics.
3.
Most of the methods, in the theoretical statistics have been developed on the knowledge of
normal distribution.
4.
In applied statistics this distribution has many uses, because many of distribution in Biology
sciences (e,g dis of height &weight etc) Therefore normal distribution play a central role in
statistics.
It provides good continuous approximation to the various discrete probability distributions.
Properties of normal distribution
1.
2.
3.
4.
5.
The normal distribution has maximum ordinate ( height of curve) at x= the value of
7.
8.
9.
Reproduction Property:
If X1 and X2 are two independent variables having I,e
X1~N (1, 2) and X2~N (1, 2), then their sum is also a normal variable having X1+X2~N
(1+2+1+2)
[Sum of independent variables is a normal variable stated differently].
Standard / Standardized normal variable (standard/S.N.D):
A normal variable X with mean and standard deviation can easily be transfer into
standard normal variable Z with mean O and variance1 I.e. Z=x- /
X~N (, ), then Z
Z~N (0, 1) the variable Z=x- /
It is expressed in term of deviation from its mean and divided by its standard deviation X~N (, )
and Z~N (0, 1)
Sample:
Sample is a part of population e.g. food inspector wants to check the sample of mile, wheat,
product etc either it is pure or not.
Sampling:
The process of selecting the samples is called sampling.
Population:
It consists of elements, [A collection of a possible observation].
Finite Population:
If population consists of limited number of element e,g all employs of LMS, Lahore (or any firm),
number of boys students in Lahore city.
Infinite Population:
If the population consists un-limits number of elements e.g population of starts in sky,
population of all fish in sea.
Advantages of sampling
1.
A sample survey saves money to get information from small samples then from the whole
population.
2.
A sample survey saves save a lot of time and energy, then population information.
3.
A sample survey provides accurate information as those obtained by complete census (from
population).
4.
A sampling survey provides a valid measure of reliability for the sample estimates.
5.
In case in-finite (or inaccessible) population sampling is the only method to obtain
information.
Purpose of sampling
(i)
(ii)
To get maximum information about the characteristics of population with minimum cost, time
and effort.
Sampling with replacement:
If the selected sampling unit is returned to the population before drawing next unit.
Properties sampling distribution/Difference of means/Proportion/Difference proportions:
Sampling without replacement:
If the selected sampling unit is not returned to the population before drawing next unit.
Properties sampling distribution mean /Difference of means/Proportion/Difference proportions:
Parameter:
A numerical quantity which is computed from population, , etc.
Statistic:
A numerical quantity which is computed from sample, x, s etc.
Sampling unit:
If we draw samples of students then one student will be sampling unit.
Sampling units:
Before selected the sample population must be divided into parts, that are called sampling
units.
Sampling frame:
It is A list of all sampling unit.
Census and survey:
Census will be taken after every ten year, A survey will taken any time, when necessary.
Sample survey:
A survey which is based on sample method is called sample survey. [To collect data, under
federal base of statistics]
Random sample:
Friend often ask the drink / or to take tea, the write their names on slip of paper, mix them (after
folding) and draw a slip which is random sample. A good sample (free from Bias) is also
known as random samples.
Simple random sampling:
It is procedure of selecting a sample (from population) in such a way.
(i)
Each possible of size n (or some size) has equal chance or (probability) being selected.
(ii)
Each unit in the population for sampling has an equal chance or (probability) of being
drawn. Simple random sampling is suitable, when population is relatively small and homogenous
two methods are used to selected a random sample.
Lottery Method
Bias:
It is the difference between expected value of a sample statistic and true value of population
(ii)
(iii)
Biased sample:
If the sample has greater chance of being selected then others, e.g. if we say one boy to call
four or five boys in your class it is possible they are all his friends.
Un-Biased sample:
A sample free from bias is called good or un-biased sample.
Sampling error:
It is for particular sample I.e. sampling error is the difference between the value of sample
statistic (particular sample) and corresponding population parameter. I.e.
E=x-
It is ve, when parameter is under estimated if it is +ve, when parameter is over estimated.**the
error decrease (reduced), when sample size increase.
Non-Sampling error:
These error are accur due to mistake in process of data, clerical errors, due to under clear
Question , error in selection of samples (or data), such error are called Non-sampling error
These error are represent in a complete census as well as in sample survey.
Sampling Design:
It is a lay out or procedure or method or plan, which is used to select the sample from the
population. A sample design is specified before any data are collected.
Probability sampling:
When each unit in a population has a equal or unequal probability (or chance) of its being
selected in the sample called probability sampling, simple random sampling stratified sampling
cluster sampling are the example of probability sampling.
Stratified random sampling:
In this method, we divide the population (not homogenous according to their character tic) into
group or classes called stratas the items within each stratum (or group) are homogenous.
Cluster sampling:
Suppose that a population is divided into N smaller groups (equal or unequal size) called
clusters e,g block of the city, etc (it is a collection of sampling units)
(or no unit has known probability) of its being selected in the sample, but in Non-probability
sampling personal judgment play and important role in the sample of selection. Judgment
sampling or purposive sampling, quota sampling are the example of non-probability sampling.
Standard error:
The standard deviation of sampling distribution of simple statistic X, S, s etc called standard
distribution.
Uses of standard error
1)
2)
Standard error, measures the variability among the sample means, where as population
Larger value of standard error shows greater variability, while smaller value of standard error
Statistical Inference:
The process of drawing inference (or conclusions) about the value of a population parameter on
the basis of sample information (or sample observation) there are two types of statistical
inferences
(1). Estimation (2). Testing of hypotheses.
Estimation:
Estimation is a procedure which we obtain (or Estimate) the value f unknown population
parameter using sample observation.
Two types of estimation.
(1). Point Estimation (2). Interval estimation.
1. Point Estimation:
The process of finding a single value of the sample, which will represent the value of unknown
population parameters, e.g population mean population variance, population proportion are
estimated from corresponding sample statistic e,g (sample mean sample variance, sample
peroration etc.
2. Interval Estimation:
It is a procedure (or process) in which we estimate (or deterring) a range (or interval) of value
within which the unknown population parameter is believed to lie (or expected to lie, or likely to
fall).
Estimate:
The specified value of (unknown population parameters) which we obtained or using sample
observation.
Estimator:
The formula or rule, with the help of value of population parameter is obtained.
(i) Point estimate:
A single numerical value calculated from sample.
Unbiasedness:
If the expected value of sample statistic is equal to population parameter I,e, E(X) =
Unbiased estimator:
An estimator is said to be unbiased if the expected value of sample statistic is equal to the value
of population parameter. e,g
Biased estimator:
An estimator is said to be biased estimator if the expected value of the sample statistic is not
equal to the population parameter.i, e
Confidence interval:
We find the value of population parameter from its sample statistic, standard error is very helpful
to find maximum and minimum limits, within which the parameter of population is expected to lie
such interval is called confidence interval.
Confidence level:
The probability of accepting true null hypotheses is called confidence level and it is denoted by
1-I,e where is the probability that the interval does not include true value of population.
Parameter, as 1- is the complete of (power of test)
As =1-
Testing of hypothesis:
The procedure, which in able as to decide whether to accept or reject hypothesis are called test
of hypothesis.
Null hypothesis:
Probability of rejecting the null hypothesis, when it is true. It is denoted by Ho suppose
average age of ICS student 16 years, so that, Ho; =16
Alternative hypothesis:
It is different from null hypothesis (Ho) of probability accepting the null hypothesis, when it is
false if ho; =16, then H, =16, H :< 16 or
Hi ;> 16
Simple hypothesis:
A hypothesis, in which all the parameters of the distribution are specified, is called simple
hypothesis. e, g if the average age of ICS students is 16 years then Ho; =16 years is simple
hypothesis.
Composite hypothesis:
A hypothesis in which all the parameter of the distribution are not specified, called composite
hypothesis. H, ;> 16 years or H, ;< 16year are composite hypothesis.
Type-I error:
If we rejected the null hypothesis, when it is actually true is called Type-I error E,g An innocent
boy is punished by police etc. It is denoted by .
Type-II error:
If we accept the null hypothesis, when it is false, is called type II error. e,g. A weak student may
be passed by the examiner etc. It is denoted by .
Level of signification:
The probability of making a type-I error, is called level of significance. It is denoted by I, e.
= p (reject Ho/Ho is true)
Testing statistic:
It is function (or formula or rule) for testing of null hypothesis, it is ratio of sampling error to the
standard error.
One tailed test (or one sided test):
If the critical region on one end, is called on tailed test, H, ;< 16, H,; >16.
Two tailed test (or two sided test):
If the critical region is located on at both ends, is called two tailed test.
H ; 16
Critical region:
It is a rejection region, which leads to reject the null hypothesis.
If Ho; = 16 years
Ho; 16 years
Degree of freedom:
Total number of independent sample observation - minus unknown population parameter
being estimated from sample.
Power of a test:
P (Rejecting Ho/Ho is false).
Regression:
The dependence of one variable, upon an-other variable, called regression. e.g. yield of crop on
the basis of amount of fertilizer used, when yield of a crop will be dependent variable, and the
amount of fertilizer will be independent variable, height of children on the basis of there is etc.
Regression relation:
We will be able to predict the value of one variable on the basis of another variable. Two
variables are said to have linear relationship, when a unit changes in independent variable leads
then correlation will be positive e, g Hot weather and demand of ice-cream etc.
-ve Correlation:
If the movements of the two variables are in opposite, (one up-ward & other down-ward) e,g
cold weather & demand of ice-cream etc.
No Correlation:
If the change in one variable does not effect on the other variable, e,g Head size and I.Q of
persons.
Correlation Co-efficient:
Numerical measures of correlation is called correlation, co-efficient or numerical measures the
degree or (strength / closeness)
of relationship between two variables, it is denoted by r.
Properties of correlation co-efficient r
1). It is pure number, which is free from unit of measurement.
2). It is symmetrical with respect to x & y I,e rxy=r.yx
3). It is always lies between -1 and +1.
4). It is the G.M between two regression co-efficients I,e r= bxybyx where
r=+ve if both bxy & byx are +ve
r=-ve, if both bxy & byx are ve
5). If r=0, then both the variables are independent.
6). It is independent origin and scale I,e ruv=rxy
7). It remains un-changed if a constant is added subtracted, multiplied by variables.
Variable:
Those charactertics which can be measured e,g Heights, weights etc.
Attribute:
Those charactertics which can not be measured but e,g beauty, and eye color etc. only
presence or absence can be described.
Correlation:
To measures the degree or (strength / closeness) of relationship between two variables, g
Height and weights ages of husbands and their wifes etc. [variables must be quantative].
Association:
To measures the degree or (strength / closeness) of relationship between two qualitative
variables (Attribute) is called Association.
-ve Attribute:
The absence of attribute ,,,..
+ve Attribute:
The presence of attribute A, B, C,..
Contingency Table:
It is a tabular arrangement of classifying to attribute to say attribute A have r rows I,g A1, A2,
A3,.. Ar and attribute B have C columns B1, B2, B3,Bc then rxc contingency table will
be attribute B.
Dichotomy:
If the data of sex or (population) is divided into to different classes I,g single or married of sex
called dichotomy.(with a single attribute of sex).
Or:
If we divided the data or (population) into to different classes of an attribute, called dichotomy.
Trichotomy:
If we divide the data or (population) into three classes of an attribute called trichotomy.
Manifoled:
If we divide the data or (population) into more than three classes or (many classes) called
manifoled. E,g. divide into religion, into Muslim or Hindu etc.
F-Yates correction continuity:
An applying x-approximation, we have to combine the smaller frequencies (less than 5) with
large ones. But in case of only two classes, we can not combine the smaller frequency into the
large one. For such situations, f-Yates made such add adjustment for correction I,e
(22) table this correction for continuing is used when only one degree of freedom because Xdistribution is a continuous distribution and data in contingency table are discrete. Therefore
Yatess correction is applied in the normal approximation to the binomial distribution.
Y=T+C+S+I where T for time variable, C For secular trend, S for cyclical variations
(fluctuation) I for Irregular variation (movement) Multiplication Model:
Y= T C S I.
1.
It is a large term variation, which indicates a regular pattern of change in the same direction
(either up-ward or down-ward) e.g decline in death rate due to advancement in since, continually
need more food, due to increase population, etc.
2.
These moments or long term wave, (oscillations or swings) about the trend live (or curve),
science moment take up ward and down ward swings, there also called cycles" A business
cycles has four phases. (i). Prosperity or boom (ii). Decline or recession
(iii). Depression or tough (iv). Improvement or recovery.
3.
Seasonal movement:
The main cause of these variations are seasonal these are short terms waves or (variations) it
may complete the cycle within a day, a week, a month, quarter, or are the most in one year
seasonal effect (or within a period of one year) on the time series, e,g increased sales of cotton
in summer, before eid (or in ramzan) demand of ready model governments, polka ice cream sale
in summer.
4.
Irregular variations:
number of values is odd, then the middle value in ignored), then average of part placed against
the mid points of two parts (or center of each part).
Moving average method:
The method of semi average is appropriate when the trend is linear, an-other simple method
can also be use to eliminate seasonal, cyclical and irregular moment is the method of moving
average. In this method we find average successively by taking a fixed value for a time, for
example if we want to find 3 years moving average, we shall find the average of first three values,
than drop the first value and include the 4th value, the process will be continued till all the values
are finished. The average must be placed in the middle of each group. When we find the moving
average, when the number of values in the data is even, there middle of the groups lies between
two years, to make the average for a particular year, we enter the averages by calculating further
two year moving average. The average so obtained are called moving averages (Centered).
Least squares method: