You are on page 1of 13

Normal Distribution:

When n, the binomial distribution trend to normality, therefore normal distribution is a limiting
are of binomial distribution.

Mean of () and variance () are called parameters of normal distribution, X be a random


variable I, e X~N (, ).
Importance of normal distribution
1.

Normal distribution is the most important distribution in all, (or in statistics), which is used to

solve many problems in both probability and statistical inference.


2.

Normal distribution play and important role both in theoretical and applied spastics.

3.

Most of the methods, in the theoretical statistics have been developed on the knowledge of

normal distribution.
4.

In applied statistics this distribution has many uses, because many of distribution in Biology

sciences (e,g dis of height &weight etc) Therefore normal distribution play a central role in
statistics.
It provides good continuous approximation to the various discrete probability distributions.
Properties of normal distribution
1.

It is a continuous probability distribution ranging from - to +.

2.

It is bell shaped distribution.

3.

It is a symmetrical distribution. [Ranging from ( to +)] therefore mean = median = mode.

4.

Total area under the normal curve in unity (or one).

5.

The normal distribution has maximum ordinate ( height of curve) at x= the value of

maximum ordinate at x= is 1/2 e (x-/) => x-=0


=1/2 => x-=0.
6.

All odd order moment about mean are zero I, e 1=3=5=0

7.

The normal distribution has two point of inflection -, +.

8.

In normal distribution 1=0 , 2=0

9.

In normal distribution mean deviation=4/5 , and Quartile deviation=2/3

Reproduction Property:
If X1 and X2 are two independent variables having I,e
X1~N (1, 2) and X2~N (1, 2), then their sum is also a normal variable having X1+X2~N
(1+2+1+2)
[Sum of independent variables is a normal variable stated differently].
Standard / Standardized normal variable (standard/S.N.D):
A normal variable X with mean and standard deviation can easily be transfer into
standard normal variable Z with mean O and variance1 I.e. Z=x- /
X~N (, ), then Z
Z~N (0, 1) the variable Z=x- /

It is expressed in term of deviation from its mean and divided by its standard deviation X~N (, )
and Z~N (0, 1)

Sample:
Sample is a part of population e.g. food inspector wants to check the sample of mile, wheat,
product etc either it is pure or not.
Sampling:
The process of selecting the samples is called sampling.
Population:
It consists of elements, [A collection of a possible observation].
Finite Population:
If population consists of limited number of element e,g all employs of LMS, Lahore (or any firm),
number of boys students in Lahore city.
Infinite Population:
If the population consists un-limits number of elements e.g population of starts in sky,
population of all fish in sea.
Advantages of sampling
1.

A sample survey saves money to get information from small samples then from the whole

population.
2.

A sample survey saves save a lot of time and energy, then population information.

3.

A sample survey provides accurate information as those obtained by complete census (from

population).
4.

A sampling survey provides a valid measure of reliability for the sample estimates.

5.

In case in-finite (or inaccessible) population sampling is the only method to obtain

information.
Purpose of sampling
(i)

To find the reliability of the estimates derived from the sample.

(ii)

To get maximum information about the characteristics of population with minimum cost, time

and effort.
Sampling with replacement:
If the selected sampling unit is returned to the population before drawing next unit.
Properties sampling distribution/Difference of means/Proportion/Difference proportions:
Sampling without replacement:

If the selected sampling unit is not returned to the population before drawing next unit.
Properties sampling distribution mean /Difference of means/Proportion/Difference proportions:

Parameter:
A numerical quantity which is computed from population, , etc.
Statistic:
A numerical quantity which is computed from sample, x, s etc.
Sampling unit:
If we draw samples of students then one student will be sampling unit.
Sampling units:
Before selected the sample population must be divided into parts, that are called sampling
units.
Sampling frame:
It is A list of all sampling unit.
Census and survey:
Census will be taken after every ten year, A survey will taken any time, when necessary.
Sample survey:
A survey which is based on sample method is called sample survey. [To collect data, under
federal base of statistics]
Random sample:
Friend often ask the drink / or to take tea, the write their names on slip of paper, mix them (after
folding) and draw a slip which is random sample. A good sample (free from Bias) is also
known as random samples.
Simple random sampling:
It is procedure of selecting a sample (from population) in such a way.
(i)

Each possible of size n (or some size) has equal chance or (probability) being selected.

(ii)

Each unit in the population for sampling has an equal chance or (probability) of being

drawn. Simple random sampling is suitable, when population is relatively small and homogenous
two methods are used to selected a random sample.

Lottery Method

Random number table.

Bias:
It is the difference between expected value of a sample statistic and true value of population

parameter I.e. (*in bias not particular sample)


Bias=E(X) =
The bias is +ve, if E(X)> and ve if E(X) < and it is zero if E(X) =, the bias is increases with
the increase in sample size. It is to be noted that, Bias is different from sampling error.
Causes of Bias:
(i)

Fault in process of selection sample

(ii)

Wrong information collection

(iii)

Fault in method of analysis

Biased sample:
If the sample has greater chance of being selected then others, e.g. if we say one boy to call
four or five boys in your class it is possible they are all his friends.
Un-Biased sample:
A sample free from bias is called good or un-biased sample.
Sampling error:
It is for particular sample I.e. sampling error is the difference between the value of sample
statistic (particular sample) and corresponding population parameter. I.e.
E=x-
It is ve, when parameter is under estimated if it is +ve, when parameter is over estimated.**the
error decrease (reduced), when sample size increase.
Non-Sampling error:
These error are accur due to mistake in process of data, clerical errors, due to under clear
Question , error in selection of samples (or data), such error are called Non-sampling error
These error are represent in a complete census as well as in sample survey.
Sampling Design:
It is a lay out or procedure or method or plan, which is used to select the sample from the
population. A sample design is specified before any data are collected.
Probability sampling:
When each unit in a population has a equal or unequal probability (or chance) of its being
selected in the sample called probability sampling, simple random sampling stratified sampling
cluster sampling are the example of probability sampling.
Stratified random sampling:
In this method, we divide the population (not homogenous according to their character tic) into
group or classes called stratas the items within each stratum (or group) are homogenous.
Cluster sampling:
Suppose that a population is divided into N smaller groups (equal or unequal size) called
clusters e,g block of the city, etc (it is a collection of sampling units)

Non-Probability sampling (or Non-random sampling):


When each unit (or element) in a population has not based on probability of its being selected

(or no unit has known probability) of its being selected in the sample, but in Non-probability
sampling personal judgment play and important role in the sample of selection. Judgment
sampling or purposive sampling, quota sampling are the example of non-probability sampling.
Standard error:
The standard deviation of sampling distribution of simple statistic X, S, s etc called standard
distribution.
Uses of standard error
1)

It plays and important role, statistical inference testing of hypothesis.

2)

Standard error, measures the variability among the sample means, where as population

standard deviation measures the variability among pop values.


3)

Larger value of standard error shows greater variability, while smaller value of standard error

shows most of the value are closed to its expected value.


Systematic Sampling:
In this method samples are selected (or down) in some systematic way (or in equal n numbers
or internals).Some times the elements of the population are arranged in a specified order, their
list be made in order of magnitude under this method every Kth element (or unit) in population is
selected for the sample (K is sampling interval) I given by
K=Population size/sample size
The sample thus selected is called systematic sample.

Statistical Inference:
The process of drawing inference (or conclusions) about the value of a population parameter on
the basis of sample information (or sample observation) there are two types of statistical
inferences
(1). Estimation (2). Testing of hypotheses.
Estimation:
Estimation is a procedure which we obtain (or Estimate) the value f unknown population
parameter using sample observation.
Two types of estimation.
(1). Point Estimation (2). Interval estimation.
1. Point Estimation:
The process of finding a single value of the sample, which will represent the value of unknown
population parameters, e.g population mean population variance, population proportion are
estimated from corresponding sample statistic e,g (sample mean sample variance, sample
peroration etc.
2. Interval Estimation:
It is a procedure (or process) in which we estimate (or deterring) a range (or interval) of value
within which the unknown population parameter is believed to lie (or expected to lie, or likely to
fall).
Estimate:
The specified value of (unknown population parameters) which we obtained or using sample
observation.
Estimator:
The formula or rule, with the help of value of population parameter is obtained.
(i) Point estimate:
A single numerical value calculated from sample.

(ii) Interval estimate:


It is the interval (or range or limits) in which the true value of population parameters is expected
to lie.
(i) Point estimator:
The rule (or formula) which is used to estimate a population parameter. It is also called
estimator.
(ii) Interval estimator:
It is a rule (or formula) for determing a range (or interval) of values within which the population
parameters is expected to lie (or likely fall).

Unbiasedness:
If the expected value of sample statistic is equal to population parameter I,e, E(X) =
Unbiased estimator:
An estimator is said to be unbiased if the expected value of sample statistic is equal to the value
of population parameter. e,g

Biased estimator:
An estimator is said to be biased estimator if the expected value of the sample statistic is not
equal to the population parameter.i, e

Confidence interval:
We find the value of population parameter from its sample statistic, standard error is very helpful
to find maximum and minimum limits, within which the parameter of population is expected to lie
such interval is called confidence interval.
Confidence level:
The probability of accepting true null hypotheses is called confidence level and it is denoted by
1-I,e where is the probability that the interval does not include true value of population.
Parameter, as 1- is the complete of (power of test)
As =1-
Testing of hypothesis:
The procedure, which in able as to decide whether to accept or reject hypothesis are called test
of hypothesis.
Null hypothesis:
Probability of rejecting the null hypothesis, when it is true. It is denoted by Ho suppose
average age of ICS student 16 years, so that, Ho; =16
Alternative hypothesis:
It is different from null hypothesis (Ho) of probability accepting the null hypothesis, when it is
false if ho; =16, then H, =16, H :< 16 or
Hi ;> 16
Simple hypothesis:
A hypothesis, in which all the parameters of the distribution are specified, is called simple
hypothesis. e, g if the average age of ICS students is 16 years then Ho; =16 years is simple
hypothesis.

Composite hypothesis:
A hypothesis in which all the parameter of the distribution are not specified, called composite
hypothesis. H, ;> 16 years or H, ;< 16year are composite hypothesis.
Type-I error:
If we rejected the null hypothesis, when it is actually true is called Type-I error E,g An innocent
boy is punished by police etc. It is denoted by .
Type-II error:
If we accept the null hypothesis, when it is false, is called type II error. e,g. A weak student may
be passed by the examiner etc. It is denoted by .
Level of signification:
The probability of making a type-I error, is called level of significance. It is denoted by I, e.
= p (reject Ho/Ho is true)
Testing statistic:
It is function (or formula or rule) for testing of null hypothesis, it is ratio of sampling error to the
standard error.
One tailed test (or one sided test):
If the critical region on one end, is called on tailed test, H, ;< 16, H,; >16.
Two tailed test (or two sided test):
If the critical region is located on at both ends, is called two tailed test.
H ; 16
Critical region:
It is a rejection region, which leads to reject the null hypothesis.
If Ho; = 16 years
Ho; 16 years
Degree of freedom:
Total number of independent sample observation - minus unknown population parameter
being estimated from sample.
Power of a test:
P (Rejecting Ho/Ho is false).

Regression:
The dependence of one variable, upon an-other variable, called regression. e.g. yield of crop on
the basis of amount of fertilizer used, when yield of a crop will be dependent variable, and the
amount of fertilizer will be independent variable, height of children on the basis of there is etc.
Regression relation:
We will be able to predict the value of one variable on the basis of another variable. Two
variables are said to have linear relationship, when a unit changes in independent variable leads

to a constant (absolute) change in dependent variable.


Regressand:
The dependent variable is called regressand or predictand variable explained variable. It is
usually denoted by.
Regressor:
The independent variable is called regressor predictor variable explanatory variable / controlled
variable. It is usually denoted by X.
Simple linear regression:
Support the relationship between X & Y is exact a straight line, and then it can be express.
Y=+x
Where , are the parameters of the regression model, is y-intercept at x=0 and is slope
of line.

*A regression model with some error (e) can be written as Y=+x+e


Scatter Diagram:
To see the relationship between the two variable (paired) by graphic representation. Therefore
we plot paired observation
(X1,Y1), (X2 ,Y2 ) (Xn ,Yn ) on a graph, the resulting set of points is called a scatter diagram.
We plot independent variable on x-axis and dependent variable on y-axis.
Properties of Regression Lines
1. The regression lines always passes through the point (X, Y) it means we can replace X as X
and Y and Y in regression equation.
2. The sum of residuals is zero I,e (y-) =0 {e=(y-)
=>e=o
3. The sum of square of residuals is always minimum I,e (y-) is minimum.
4. y=
5. a and b are unbiased estimates of and
Curve fitting:
It is method of estimating unknown parameter, which involve in the equation (or curve).
Method of least-square:
The method (principle) of least square, by minimizing the sum of square of residuals. Where
residual (e) =y-
Correlation:
Measures the degree or (strength / closeness) of relationship between two variablese, g
Height and weights ages of husbands and their wifes etc. [variables must be quantative]
+ve Correlation:
If the movements of the two variables are in a same direction (both up-ward / both low-ward),

then correlation will be positive e, g Hot weather and demand of ice-cream etc.
-ve Correlation:
If the movements of the two variables are in opposite, (one up-ward & other down-ward) e,g
cold weather & demand of ice-cream etc.
No Correlation:
If the change in one variable does not effect on the other variable, e,g Head size and I.Q of
persons.
Correlation Co-efficient:
Numerical measures of correlation is called correlation, co-efficient or numerical measures the
degree or (strength / closeness)
of relationship between two variables, it is denoted by r.
Properties of correlation co-efficient r
1). It is pure number, which is free from unit of measurement.
2). It is symmetrical with respect to x & y I,e rxy=r.yx
3). It is always lies between -1 and +1.
4). It is the G.M between two regression co-efficients I,e r= bxybyx where
r=+ve if both bxy & byx are +ve
r=-ve, if both bxy & byx are ve
5). If r=0, then both the variables are independent.
6). It is independent origin and scale I,e ruv=rxy
7). It remains un-changed if a constant is added subtracted, multiplied by variables.

Variable:
Those charactertics which can be measured e,g Heights, weights etc.
Attribute:
Those charactertics which can not be measured but e,g beauty, and eye color etc. only
presence or absence can be described.
Correlation:
To measures the degree or (strength / closeness) of relationship between two variables, g
Height and weights ages of husbands and their wifes etc. [variables must be quantative].
Association:
To measures the degree or (strength / closeness) of relationship between two qualitative
variables (Attribute) is called Association.
-ve Attribute:
The absence of attribute ,,,..
+ve Attribute:
The presence of attribute A, B, C,..

Contingency Table:
It is a tabular arrangement of classifying to attribute to say attribute A have r rows I,g A1, A2,
A3,.. Ar and attribute B have C columns B1, B2, B3,Bc then rxc contingency table will
be attribute B.
Dichotomy:
If the data of sex or (population) is divided into to different classes I,g single or married of sex
called dichotomy.(with a single attribute of sex).
Or:
If we divided the data or (population) into to different classes of an attribute, called dichotomy.
Trichotomy:
If we divide the data or (population) into three classes of an attribute called trichotomy.
Manifoled:
If we divide the data or (population) into more than three classes or (many classes) called
manifoled. E,g. divide into religion, into Muslim or Hindu etc.
F-Yates correction continuity:
An applying x-approximation, we have to combine the smaller frequencies (less than 5) with
large ones. But in case of only two classes, we can not combine the smaller frequency into the
large one. For such situations, f-Yates made such add adjustment for correction I,e
(22) table this correction for continuing is used when only one degree of freedom because Xdistribution is a continuous distribution and data in contingency table are discrete. Therefore
Yatess correction is applied in the normal approximation to the binomial distribution.

Time series [y= (t) +e]:


An arrangement of data by successive time or period is called time series. (Time may be yearly,
monthly, weekly, quarterly etc).
Analysis of time series:
It is a process of measuring and studying of components of time series. The mathematical
relationship between different components of time series is called model of time series.
(a). Additive model (b). Multiplicative model.
Additive model:

Y=T+C+S+I where T for time variable, C For secular trend, S for cyclical variations
(fluctuation) I for Irregular variation (movement) Multiplication Model:
Y= T C S I.
1.

Secular (Long-term )trend:

It is a large term variation, which indicates a regular pattern of change in the same direction
(either up-ward or down-ward) e.g decline in death rate due to advancement in since, continually
need more food, due to increase population, etc.
2.

Cyclical variation (or moments):

These moments or long term wave, (oscillations or swings) about the trend live (or curve),
science moment take up ward and down ward swings, there also called cycles" A business
cycles has four phases. (i). Prosperity or boom (ii). Decline or recession
(iii). Depression or tough (iv). Improvement or recovery.
3.

Seasonal movement:

The main cause of these variations are seasonal these are short terms waves or (variations) it
may complete the cycle within a day, a week, a month, quarter, or are the most in one year
seasonal effect (or within a period of one year) on the time series, e,g increased sales of cotton
in summer, before eid (or in ramzan) demand of ready model governments, polka ice cream sale
in summer.
4.

Irregular variations:

These moments appear after a result of random events e


e,g wars, floods, earth quakes, strikes etc. These moments are also called accidental
or sudden or random moments.
Single:
The first three components of time series, show regular pattern (or systematic pattern). Of
variations, a systematic component of variation we denoted by (t) I,e
y= (t).
Noise:
The fourth component of time series will show irregular (or random) not systematic component
of variation and irregular component of variations denoted by u I,e. y= (t)+ u where u
denotes random sequence noise and (t) is a systematic sequence of single.
Measurement of secular trend
Free-hand curve method:
It is a graphic method for measurement of secular trend taking time on x-axis and there
observation on y-axis. First of all we draw a historigram for the given time series and then draw a
smooth curve, through the plotted points, if the curve shows a pattern of st/line than we get linear
trend, if the curve shows a pattern of a curve than it will be known linear, if curve has only on
bend then it will be quadratic trend.
Semi average method:
It is another simple method for measuring secular trend is moving average method. In this
method we obtain trend line (or trend values), we divide the data into two equal parts, (if the

number of values is odd, then the middle value in ignored), then average of part placed against
the mid points of two parts (or center of each part).
Moving average method:
The method of semi average is appropriate when the trend is linear, an-other simple method
can also be use to eliminate seasonal, cyclical and irregular moment is the method of moving
average. In this method we find average successively by taking a fixed value for a time, for
example if we want to find 3 years moving average, we shall find the average of first three values,
than drop the first value and include the 4th value, the process will be continued till all the values
are finished. The average must be placed in the middle of each group. When we find the moving
average, when the number of values in the data is even, there middle of the groups lies between
two years, to make the average for a particular year, we enter the averages by calculating further
two year moving average. The average so obtained are called moving averages (Centered).
Least squares method:

You might also like