A Short Course On PROBABILITY and SAMPLING

A short Course on Probability Theory and Sampling, originally prepared as lecture notes for M.Sc.
(Geography) students of Vidyasagar Univ, WB, India. Compiled by Dr. A. Kar Gupta, kg.abhi@gmail.com, Physics Deptt, Panskura B. College, WB, India
PROBABILITY and SAMPLING

Concept of Probability, the Probability Rules, Probability Distributions and Applications
For randomly occurring events, we would like to know how many times we get a desired result out of all trials. This means we would like to know the fraction of favourable events or trails. Suppose, we flip a coin a few number of times. We know there is a 50-50 chance of occurring a Head or a Tail. We may count how many times there is a Head or a Tail out of all the flips. Let, = No. of favourable events and = Total no. of events.
= fraction of favourable events. We can also say this is relative frequency in the usual language of Statistics. Now, if we do the trials a large number of times, this fraction tends to some fixed value specific to the event. Then the limiting value of the fraction is what we call probability.
Note: Total no. of trials is also called sample space when we are drawing samples out of total population. As the no. of trials is increased, the sample space becomes bigger. Definition of Probability: Probability is the ratio of number of favourable events to the total number of events, provided the total number of events is very large (actually infinity). , when So by definition, is a fraction between 0 and 1 : (infinity). .
No favourable outcome. All the outcomes are in favour. We can also think in the following way: probability of occurring an event, probability of not occurring the event. Since, either the event will occur or not occur, we must write:
A short Course on Probability Theory and Sampling, originally prepared as lecture notes for M.Sc. (Geography) students of Vidyasagar Univ, WB, India. Compiled by Dr. A. Kar Gupta, kg.abhi@gmail.com, Physics Deptt, Panskura B. College, WB, India
Therefore, we have, Example #1:
In a coin tossing, we know from our experience, . So, Example #2: .
and
In a throw of a dice, we know that the probability of the dice facing 1 up, 2 up, 3 up etc. will be , , and so on. Here, Probability of not occurring 1 is .
Note: The condition that the total probability of all the events has to be 1 is called normalization of probabilities:
Rules of Probability:
When more than one event takes place, we need to calculate the joint probability for the all the events. Mutually Exclusive Events Two events are mutually exclusive (or disjoint) when they cannot occur at the same time. Suppose, two events are A and B and the individual probabilities for them are designated as and . Mutually exclusive means, . Addition Rule:
Example#1: The probability of occurring either Head or Tail in a coin toss,

2
Example#2: The probability of occurring either 1 or 6 in a dice throw, . Independent Events When the occurrence of one event does not influence the other but they can occur at the same time, they are called independent. For example, the rain fall today and the Manchester United winning a match. Multiplication Rule:

Example #1: What is the probability that two Heads will occur when we toss two coins together? for the first coin and for the second coin. . Note that if would flip a single coin two times and ask the probability of getting Heads twice, we would get the same answer. Example #2: Now we ask the question, what is the probability of getting one Head and one Tail in the flipping of two coins together? Consider, the probability of obtaining Head in the first coin and Tail in the second coin: . And the probability of obtaining Tail in the first and the Head in the second: . Now the total probability of above two events (either of them occurs mutually exclusively): . Note that in the flipping of two coins together, there are 4 types of events, HH, HT, TH, TT. Out of which the relative occurrence of one Head and one Tail is 2/4 = /12.
When Events are NOT Mutually Exclusive: If the events are not mutually exclusive, there are some overlap. Suppose, we designate an area A corresponding to the probability of some event A and the area B to the probability of another event B. The overlap between the two areas then represents the joint probability, . Note that for two independent events the overlap would be zero.
Addition Rule in this case:
When Events are NOT Independent: Multiplication rule:

)
) The probability of B given A. This is a conditional probability, i.e., the probability of occurring B provided A occurs first. ) The probability of A, given B. Similarly, Note here that )= , when B does not depend on A which means A and B are independent. )= , when A does not depend on B which means A and B are independent.
So, we can write the formula for conditional probability:
Let us consider the following table and use the probability rules. In a survey over 100 people, the question was asked whether they are graduate or not. Graduate Male Female Total 40 10 50 Nongraduate 20 30 50 Total 60 40 100
Q,1 What us the probability that a randomly selected person is a male? Ans. Q.2 What is the probability that a randomly selected person is a female? Ans. Q.3 What is the probability that a randomly selected person is a male who is graduate? Ans. [Also we can think, Ans. [Also, ] ]
Q.4 What is the probability that a randomly selected person is a female who is non-graduate?
Q.5 What is the probability that the randomly selected person is either a male graduate or a female non-graduate? Ans. This two events are mutually exclusive and by the law of addition, . Q.6 If we now select two persons, what is the probability that one of them is a male graduate and another is a female non-graduate? Ans. Two independent events are occurring together. So by the law of multiplication of probabilities, . Q.7 What is the probability that a randomly selected no-graduate is a female? [Prob. of nongraduate among female]
5
Ans. Q.8 What is the probability that a randomly selected graduate is a male? Ans. This is no. of male out of total graduates, .
Note: In Q.7 & 8, each probability is a conditional probability. However, we gave the answers by looking at the table directly. Now we answer them in terms of the law of conditional probability. = probability of male given that they are Ans. to Q.8: Suppose, A = graduate, B = male, graduates. We use the formula:
Here,
= Prob. of male graduates = .
= prob. of graduates =
Exercise: Q.7 can also be answered in terms of conditional probability formula. Do this and check yourself.
Q.9 What is the probability that the selected person is either male or graduate? Ans. Here the two events do not happen together but they are not mutually exclusive. So we use the formula: = .
Probability Distributions
Let us think of the probabilities for a number of events marked 1, 2, 3..and so on. For each event we can have and also for all the events, (normalization). So, we have a set of probabilities corresponding to a set of events. This collection of probabilities is a probability distribution for all that discrete events. Imagine, instead of discrete events, we have as a variable which can have continuous values. Also, there is the probability for each value of . Now if we plot against , we get a
6
continuous curve which is the continuous probability distribution curve (commonly referred as the probability distribution curve).
Fig. 3.1
Area under the curve (above x-axis) can be obtained by summing up the areas of the approximate rectangular bars (which we may easily find by plotting this on a graph paper). Approximate area of one such bar of width and height is = . So, the approximate total area between the two end points and is = . To calculate exactly, we need the help of Integral Calculus which essentially sums up the areas of the rectangles (bars) of infinitesimally (smaller than the smallest you can think) small width. Those not familiar with the Mathematics of Calculus, do not have to worry as the following explanation and symbols can be understood qualitatively which may serve the purpose for now. The area under the curve (between the two extreme points shown in the above figure) is the following definite integral: Area = = .
is the total probability for all the values between the two limits. That is why, is often referred to as the probability density. So, is the actual probability in between and , where is the infinitesimally small (smaller than you can think) range! Note that the area of the bar of height and width at some position is . As in the discrete case, the area is the sum of all the mutually exclusive events. [The sum (called sigma) in the discrete case becomes (called integral) for the continuous case.]
7
Also, = (Normalization)
Normalization means that the total area under the curve (extended from negative infinity to positive infinity that means over the entire stretch of the curve.) is unity. This is true as in discrete case we know that the sum of all the probabilities for all the events should be 1. For discrete events, we calculated the relative frequency and then the Bar diagram from them. Here for the continuous case, the bars merge together to form a continuous spectrum and that is the probability distribution. The relative frequencies tend to the probabilities for corresponding values of the variable for large number of events. Now given the probability distribution curve, we would like to know about the shape and size of the curve, some specific quantities that are representative of the character of the event.
From a discrete data set to a continuous Prob. distribution: For any discrete set of data collection, we measure the central tendency of the data set. We commonly calculate mean, mean of square and variance. Mean:
=
where [Note:
( )
,
.
is the frequency of occurrence for event relative frequency]
and we have total frequency,
Mean of Square:
=(

Variance: Var ( ) =
=
Standard deviation
) = ( )
* ( )
is the square root of the variance. in the above formulas becomes the
Now for a large number of events, each of the ratios corresponding probability : as
tends to very large.
Therefore, we write the above quantities in terms of probabilities:
Mean of Square, = Variance, =

Mean,
Standard deviation
If the probabilities , , etc. are known for the values , , and so on, we can say that we have a discrete probability distribution. When the probabilities are so infinitesimally closely spaced that we can have probabilities for all possible continuous values of the variable , we can say that there is a function of which is called continuous probability distribution function.
[Note: However, in a practical calculation, when instead of probabilities, we are given the frequencies , , for the quantities that appear in a data set, we calculate mean or average:
= ( )
.]
Expectation Values:
As the probability distribution (no matter discrete or continuous) for some event or some population is known, we may expect what its mean value would be, either through mathematical calculations or through our experience. *In Statistics, population means entire or all possible set of data. Taking a few data (which we call sample ) from the population we often try to estimate the mean, which is definitely different from the population mean. But we know, with the larger and larger sample size, this
9
mean (which we call sample mean) should tend towards the population mean. This means, we expect the population mean. More on this aspect will be discussed in the chapter on Sampling. ] So, the expectation value, any power of . the is mean of . Likewise, we can have expectation value of
[Continuous case] [Continuous case]
Combination Rules:
When we scale a variable that is we multiply a variable by a number or add with this, we need to know how this scaled variable behaves. Do they have same statistical measures? Do they follow the same kind of distributions? Also, we ask the same question for two or more variables when scaled and added together to form a combined variable.
When
Mean: Variance:
When When
is also a Normal distribution.
Mean: Variance: If has a Normal distribution,
Mean: Variance: If and are separately Normal distributions, distribution.
is then also a Normal 10
Following the combination rules in the above box, we can solve the following problem.
Example: The weight of individual people follows Normal distribution, probability distribution of weight of 10 people taking together? Ans. Here, mean , . + = Mean weight of 10 people, = 40 Variance, + = . What will be the
= 500 .
The probability distribution of weight of 10 people taking together,

Normal Distribution:
For any naturally occurring event, for any random measurement of any value in any experiment, the distribution that occurs is Normal distribution. The bell shaped symmetric curve is called Normal curve. If we calculate the height or age distribution or a distribution IQ level among a population, the probability distribution turns out to be Normal. The name normal is given as it occurs normally. In Mathematics or Physics literature, it is also called Gaussian distribution after the great mathematician, Karl Fredrick Gauss.
Properties of Normal Distribution:
A Bell shaped Symmetric distribution with the peak at the middle. The distribution curve is extended from to [from minus infinity to plus infinity]. Mean, Mode and Median at the same position (at the peak).
11
Area
under the curve:
Total area under the curve = 100% A = 68%, [Area within one standard deviation ( from the mean ( on both sides]
A = 95%,
A = 99.7%,
Normal distribution is most commonly observed and widely used and discussed. There are various other kinds of distributions which can be identifies by the shapes and mathematical expressions.
NOTE: If we combine a set of Normal distributions, we get a Normal distribution as a result. Consider some -numbers ( ) where each of which are drawn independently from a Normal distribution. Calculate the mean of the numbers: . If we draw -numbers again and again, the mean of them would be different but the mean would follow Normal distribution, provided the number is sufficiently large. But more interestingly, the individual distributions from which the numbers are drawn, do not matter, the combination always turn up to be a Normal distribution. This is Central Limit Theorem.
12
Experiment with rolling dice: So, here we roll dice, calculate probabilities of occurring numbers and try to establish some truth!
13
Example #1 Throwing of a single dice: The chance of turning up of any side is equal which is 1 out of 6. We consider that a priori probabilities for each case and find out the mean and variance from the following table.
1 1/6 1/6 1/6
2 1/6 2/6 4/6
3 1/6 3/6 9/6
4 1/6 4/6 16/6
5 1/6 5/6 25/6
6 1/6 6/6 36/6
Total 1 21/6 91/6
From the table, we can calculate mean, variance,
and
If we plot against , we obtain the probability distribution for this case. This distribution is uninteresting as we can check that the probabilities for all values of are same! The curve obtained by joining the points will be a horizontal straight line.
Fig.
Now we do this similar experiment taking two dice together. Example #2 (Two Dice) We look for the value of which is the sum of two numbers on the top faces of the two dice as rolled. Here we shall have possible combinations of events and can have a minimum value, and maximum value, .
14
A short Course on Probability Theory and Sampling, originally prepared as lecture notes for M.Sc. (Geography) students of Vidyasagar Univ, WB, India. Compiled by Dr. A. Kar Gupta, kg.abhi@gmail.com, Physics Deptt, Panskura B. College, WB, India 2 1/36 2/36 4/36 3 2/36 6/36 18/36 4 3/36 12/36 48/36 5 4/36 20/36 100/36 6 5/36 30/36 180/36 7 6/36 42/36 294/36 8 5/36 40/36 320/36 9 4/36 36/36 324/36 10 3/36 30/36 300/36 11 2/36 22/36 242/36 12 1/36 12/36 144/36 Total 1 252/36 1974/36
Mean,
, Variance,
Now if we plot against taking from above table, we get an interesting symmetric distribution around a peak! The peak is at (mean value). The distribution is showing a peak at the middle and it is symmetric!
We can go on doing such experiment taking 3 or more dice together and ask for the sum of values and the corresponding probabilities as above. It can be understood that the smoothness of the distribution would be more and more tending towards a definite shape while retaining the peak at the centre. [In fact, the envelope of the probability values at different (joining the top of the height bars) of the discrete distribution will slowly assume a continuous symmetric curve!] In the limit of large number of events obtained from the large number of dice throwing together, we tend to get a continuous bell shaped symmetric distribution. This is Normal Distribution.
For a large number of independent random observations, the probability distribution for the mean of the observations can be shown to be Normal distribution. This is called Central Limit Theorem.
15
Shape of a Distribution: Symmetry, Skewness, Kurtosis Skewness:
A Normal distribution is symmetric around its peak. The peak corresponds to the most probable value that is the value for which the probability is the maximum. An interesting thing about a symmetric distribution is that the mean, median and mode are at the same position. The skewness is any deviation from symmetry or we can say, lack of symmetry. For a symmetric distribution, skewness is zero. Coefficient of skewness = The following mathematical definition is often used to measure the skewness: Skew =
) ,
where is the standard deviation of the distribution. So, we see that the skewness is a dimensionless quantity. Skewness can be positive or negative. A distribution with a positive value of skewness is called positively skewed, which means the tail of the distribution is more extended towards the more positive values of . On the other hand, a distribution with a negative value of skewness is called negatively skewed, which means the tail is more extended towards more negative values (or lowers values) of . Below are the two figures demonstrating the negative and positive skewness: the distributions are correspondingly called negative skewed and positively skewed distributions.
(Negative Skewness: Mean < Mode)

Kurtosis:
(Positive Skewness: Mean > Mode)
16
Kurtosis is another kind of measure of the shape of the distribution. It tells us about the peakedness (how the peak looks like) or flatness of the probability distribution. A Normal distribution is considered as a standard (or benchmark) in this regard. So, any change of shape of the peak of a distribution (peakedness or flatness) compared to a Normal distribution is measured. The mathematical expression for kurtosis: Kurt = (
Note that the number 3 is subtracted from the expression so as to make the value of kurtosis for Normal distribution equal to zero. It can be shown that distribution. When kurtosis is positive, the peak of the distribution appears sharper relative to a Normal distribution. The distribution is then called leptokurtic. One the other hand, when the kurtosis is negative, we call the distribution mesokurtic. A mesokurtic distribution looks flatter compared to a Normal distribution. As the distribution looks almost flat on top, it is called platykurtic. (
) = 0 for Normal
Platykurtic
Fig.
(Negative kurtosis)
If a distribution has more than one peak The distribution we discussed (and we shall consider throughout) is a unimodal distribution that means a distribution which has a single mode or one peak. But in many practical cases, we can have a distribution with many peaks or many modes. For example, a distribution with two peaks (in the fig. below) is called a bimodal distribution.
17
Z-Distribution
What is a Z-distribution?
A Z-distribution is nothing but a Normal distribution with the peak (mean) at zero. The peak of a Normal distribution is generally at a finite value with a standard deviation (say). If we consider a new variable the given Normal distribution (of variable) becomes another Normal distribution (of variable ) with the peak value at and this is then called Z-distribution. [The derivation of Z-distribution is given in appendix for those who are interested to know.] For solving problems with Normal distribution, it is often advantageous to obtain a Zdistribution and then to consult a Z-table. In the following, we demonstrate with some examples how that is done.
Consider the following typical situations where we have to calculate the areas from Zdistribution:
Fig. (Total area under the curve = 1)
Fig. (Area between and is 0.5 or area between and is 0.5 because of symmetry)
Fig. (Area between
and any other value
18
Fig. (Area between two positive values of or between two negative values)
Fig. (Area between a negative value and a positive value)
Fig. (Area less than a negative or greater than a positive value)
Important:
In the z-score table we always look for the area between zero and any other value (as the integral is actually done that way). So, zero is always the reference point. Finally, the area between any two values of is obtained by adding or subtracting the scores involving zero. This will be clear from the following examples. Examples: (Some typical problems are discussed, consult the z-score table given in the appendix.) #1. In the Geography examination, the marks distribution is known to be Normal where the mean is 52 and the standard deviation is 15. Determine the z-scores of students receiving marks: (i) 40, (ii) 95, (iii) 52. Solution: Here, , (i) (ii) (iii) So, we see the z-scores can be negative, positive or zero.
19
#2. Find the area under the normal curve in each of the following cases: (i) and Area = 0.3849 from table.
(ii) Area = 0.2518
and
(Note: The area is equal to the area between
and
as the curve is symmetric.)
(iii)
Area between
and 2.21
Area = (area between and 2.21) + (area between = 0.4861 + 0.1772 = 0.6633 (Note: The areas are added as they are on both sides of (iv) Area between and .)
and -0.46)
Required area = (area between
and 1.94) (area between
and 0.81)
= 0.4738 0.2881 = 0.1857 (Note: There is the subtraction as the two areas are on the same side of (v) To the left of
20
.)
Required area = 0.5 (area between = 0.5 0.2257 = 0.2743 (vi) To the right of
and
Required area = (area between = (area between = 0.3997 + 0.5 = 0.8997
and and
) + 0.5 ) + 0.5
#3. Among 1000 students, the mean score in the final examination is 25 and the standard deviation is 4.0. Assume the distribution is Normal. Find the following. (a) How many students score between 22 and 27? =25, = 4.0
So the probability is the area under the curve between -0.75 and 0.5 = (area between 0 and -0.75) + (area between 0 and 0.5) = 0.2734 + 0.1915 = 0.4649 The number of students in this marks range = (b) How many students score above 30?
Probability = area right to = (area between 0 and 1.25) = 0.5 0.3944 = 0.1056 The number of students =
21
(c) How many students score below 15?
Area = 0.5 (area between and -2.5) = 0.5 0.4938 = 0.0062 The number of students = (d) How many score 24? Here we have to calculate area between 23.5 and 24.5. ,
Area between and = (area between 0 and ) + (area between 0 and = 0.1480 0.0517 = 0.0963 The number of students =
Binomial Distribution
Before we discuss Binomial distribution, we should know certain basic mathematical operations. For those who are not familiar with some mathematical notations and rules, may consult the necessary introduction given in the following Box.
Binomial Probability:
Suppose, the probability of occurring a certain event is and not occurring of the event is . In a total of trials, the particular event occurs times each with probability and does not occur times each with probability, . Also, we have to know which events will occur out of total events. The number of ways we can do that is the number of combinations = ( ) . Consider a variable which is equal to the relative frequency, As the events are considered independent, the joint probability will be
22
( )
The above probability

Factorial: ! =
is called binomial probability.
For example,
! .
Consider that factorial of negative integers have no meaning and ! Note that we can write ! = !
Permutation: How many different objects can be arranged among themselves? The
answer is the permutation of objects, ! For example, for three objects A, B, C, the different combinations are ABC, ACB, BCA, BAC, CAB, CBA: total 6 ways = !
Combination: ( ) or =
! !
This is the number of ways some objects can be selected from objects. For example, if we want to know how 2 students can be selected from total 3 students, the answer is ( )
! ! ! ! ! !
.
! ! !
Also note for quick calculations, ( ) ( )

! ! !
= 1, ( )
! ! !
and
Now consider the following table based on the binomial probability: ..
( )
( )
--------
23
If we add all the terms of the second row above, we get the following binomial expansion: ( ) ( ) (1)
From the expression (1) above, we can easily check the following known algebraic formulas:
. = .. The coefficients of the terms on the right of the above can be arranged in the following triangular form which is called Pascals triangle: 1 1 1 1 1 1 1 1 The Rule: As indicated above, a number in a row (except the right and left most ones) is the sum of two numbers on the two sides of the preceding row. So, from the 8th row in the Pascals triangle we can easily write the binomial expansion: 8 7 28 6 21 56 5 15 35 70 4 10 20 35 56 3 6 10 15 21 28 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1
24
Remember that each term represents a binomial probability. A binomial distribution is a collection of these discrete binomial probabilities. Note: Example #1: Five independent shots are fired at a target. The probability of a hit from each shot is 0.4. Q. What is the probability that two shots will hit the target? Ans. Here ( ) ,
! ! !
Q. What is the probability that there will be more than two hits? Ans. Prob. = ( ) = = = Q. What is the expectation value of the hits (that is the mean value of hitting the targets out of all five shots)? Ans. For this we have to calculate the probabilities of hits 0, 1, 2.. The expectation value, , , ,..for the corresponding number
! ! ! ! ! ! ! !
( )
! !
( )
! ! !
=0+
( ) ( )
( ) ( ) ( )
25
= 0.2592 + 0.6912 + 0.6912 + 0.3072 + 0.0512 = 2.0 Example #2: Now, imagine a situation where we toss 8 coins together or we toss one coin 8 times consecutively. We measure the relative occurrence of Head in 8 trials. Let us attach values, Head = 1 and Tail = 0. So, we can think of a variable which can take values 1/8, 2/8, 3/8, 4/8. and so on. Thus we can associate probabilities for the values of directly from Pascals triangle (or by using formula). Note that probability of occurring Head, and notoccurring Head, . ( ) , ( ) ( ) ( ) ( ) ( ) ( ) ( ) If we now plot peak value at against , we get the following symmetric discrete distribution with the . , , , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
Fig.
For large number of trails, this distribution becomes Normal distribution. Therefore, we can say the following:
Binomial Probability distribution for a random variable becomes Normal distribution for a large number of trials.
26
The Z-Table
0.00
0.01 0.04380 0.08317 0.12172 0.15910 0.19497 0.22907 0.26115 0.29103 0.31859 0.34375 0.36650 0.38686 0.40490 0.42073 0.43448 0.44630 0.45637 0.46485 0.47193 0.47778 0.48257 0.48645 0.48956 0.49202 0.49396 0.49547 0.49664 0.49752 0.49819 0.49869
0.02 0.04776 0.08706 0.12552 0.16276 0.19847 0.23237 0.26424 0.29389 0.32121 0.34614 0.36864 0.38877 0.40658 0.42220 0.43574 0.44738 0.45728 0.46562 0.47257 0.47831 0.48300 0.48679 0.48983 0.49224 0.49413 0.49560 0.49674 0.49760 0.49825 0.49874
0.03 0.05172 0.09095 0.12930 0.16640 0.20194 0.23565 0.26730 0.29673 0.32381 0.34849 0.37076 0.39065 0.40824 0.42364 0.43699 0.44845 0.45818 0.46638 0.47320 0.47882 0.48341 0.48713 0.49010 0.49245 0.49430 0.49573 0.49683 0.49767 0.49831 0.49878
0.04 0.05567 0.09483 0.13307 0.17003 0.20540 0.23891 0.27035 0.29955 0.32639 0.35083 0.37286 0.39251 0.40988 0.42507 0.43822 0.44950 0.45907 0.46712 0.47381 0.47932 0.48382 0.48745 0.49036 0.49266 0.49446 0.49585 0.49693 0.49774 0.49836 0.49882
0.05 0.05962 0.09871 0.13683 0.17364 0.20884 0.24215 0.27337 0.30234 0.32894 0.35314 0.37493 0.39435 0.41149 0.42647 0.43943 0.45053 0.45994 0.46784 0.47441 0.47982 0.48422 0.48778 0.49061 0.49286 0.49461 0.49598 0.49702 0.49781 0.49841 0.49886
0.06 0.06356 0.10257 0.14058 0.17724 0.21226 0.24537 0.27637 0.30511 0.33147 0.35543 0.37698 0.39617 0.41308 0.42785 0.44062 0.45154 0.46080 0.46856 0.47500 0.48030 0.48461 0.48809 0.49086 0.49305 0.49477 0.49609 0.49711 0.49788 0.49846 0.49889
0.07 0.06749 0.10642 0.14431 0.18082 0.21566 0.24857 0.27935 0.30785 0.33398 0.35769 0.37900 0.39796 0.41466 0.42922 0.44179 0.45254 0.46164 0.46926 0.47558 0.48077 0.48500 0.48840 0.49111 0.49324 0.49492 0.49621 0.49720 0.49795 0.49851 0.49893
0.08 0.07142 0.11026 0.14803 0.18439 0.21904 0.25175 0.28230 0.31057 0.33646 0.35993 0.38100 0.39973 0.41621 0.43056 0.44295 0.45352 0.46246 0.46995 0.47615 0.48124 0.48537 0.48870 0.49134 0.49343 0.49506 0.49632 0.49728 0.49801 0.49856 0.49896
0.09 0.07535 0.11409 0.15173 0.18793 0.22240 0.25490 0.28524 0.31327 0.33891 0.36214 0.38298 0.40147 0.41774 0.43189 0.44408 0.45449 0.46327 0.47062 0.47670 0.48169 0.48574 0.48899 0.49158 0.49361 0.49520 0.49643 0.49736 0.49807
0.1 0.03983 0.2 0.07926 0.3 0.11791 0.4 0.15542 0.5 0.19146 0.6 0.22575 0.7 0.25804 0.8 0.28814 0.9 0.31594 1.0 0.34134 1.1 0.36433 1.2 0.38493 1.3 0.40320 1.4 0.41924 1.5 0.43319 1.6 0.44520 1.7 0.45543 1.8 0.46407 1.9 0.47128 2.0 0.47725 2.1 0.48214 2.2 0.48610 2.3 0.48928 2.4 0.49180 2.5 0.49379 2.6 0.49534 2.7 0.49653 2.8 0.49744 2.9 0.49813 3.0 0.49865
27 0.49861
0.49900
Sampling
Basic Concept: What is sampling? Sampling is to take a subsection of the population for a particular study. The aim is to select the data sample in order to represent the total data set. In statistics, population means the total collection of data. When the population or the entire collection of data is studied, it is called census. In short, population is the total set and the sample is the subset of it. Why the sampling is done? When the number of elements in a population is large it is often not possible to investigate the population completely due to lack of time, money and resources. This is why the sampling is necessary. Sampling is done in such a way that the subset of data represents the entire set. Example: If a TV channel wants to know the popularity of a program it would be expensive to ask everybodys opinion. Instead a subsection of viewers are interviewed and the data is collected. Methods of Sampling: A sample of size means there are -data points in the collection. A sample of size is collected from a population of size in such a way that all the features of the population are well represented by this. If a sampling method does over-represent or under-represent a feature of the population it is said to be biased. The aim of any selection method is to reduce the chance of bias as far as possible. There are several methods of sampling; among them the most common is the random sampling. Random sampling: For a sample of size , we collect -data from the population. We collect many such samples for our evaluation. If this is done randomly so that each group of size taken
28
from the population has equal chance of getting selected, we call this random sampling. Sometimes, it is called simple random sampling. For a random sampling, the successive drawings have to be independent. Let us suppose, we want to select a sample of size 100 from a population of size 10000. In case of random sampling, we select the elements (that is which element is to be picked) with the help of a random number (generated in a computer) or by consulting a random number table or by some kind of dice throwing. Systematic Sampling: If simple random sampling from population is not possible, the systematic sampling may be done. First, population is enumerated from 1 onwards. If sample size of from a population of size is to be obtained, every -th item is selected. First a random number between 1 and is selected and then it is taken as the 1st element. After this every -th element is taken. Example: Follow the table given below. Sl no. 1 2 3 4 5 6 value 20 27 33 21 15 22 For a sample of size Select a random number between 13: choose 2, for example.
7 8
9 10 11 12
45 13
32 29 10 16
Start with #2 and then take 5, 8, 11 number data.
29
Stratified Sampling: In this method, the population is first divided into groups (strata). Each element of the sample belongs to one such group. Divide the population into non-overlapping groups each containing , data such that . Next do the simple random sampling to collect one or a few elements from each group. Suppose, a population is classified into several groups according to age or something like that. Then from each group random samples are collected. Note: This is also called restricted random sampling. Cluster Sampling: In this method, like before, the population is divided into groups called clusters. Then clusters are taken randomly and the elements are collected from them as sample.
Probability sampling
Any method of sampling that uses (probabilistically) random selection is in general called probability sampling.
Sampling variation:
When sampling from a population is done, we take not one sample but different sets of samples having same size. If the samples are different, we call this sampling variation. Usually in practice, we often draw only one sample or one set of data from a population. But we may not be sure what may happen in case we draw several other samples. Will we get the same result? The answer is No. If we look for mean value, we see that the mean is not the same for all the samples that we are able to draw. We then get some distributions of the sample means. population size, sample size, = the sample fraction.
Many samples of the same size yield a sampling distribution. The sampling distributions are usually assumed to follow any well-known probability distribution. We look for various properties from the distribution curves. It is seen how the variation of sample size can affect the properties.
30
From the experience and theory, we can say that the variability of sampling distributions decreases with sample size.
SAMPLING DISTRIBUTIONS
What do you do after the sample is collected?
The first thing one can do with a set of data is to measure the central tendency of it. Usually, we calculate the mean and variance. The calculation of mean (or variance) is done over many samples of same sizes. Let us suppose, we have collected -samples of same size. The mean values , , .of the various samples are calculated. It is assumed that the grand mean of all these mean values is the actual sample mean, . The mean of the sample means is the estimate of the population mean. Similarly, the variance of the mean values calculated from the set of samples (of equal size) is an estimate of the population variance. It can be shown:
Sample mean is the unbiased estimate of population mean, . For the population variance, the unbiased estimate is
Hypothesis Testing
What is Hypothesis?
On the basis of sample information, we make certain decisions about the population. In taking such decisions we make certain assumptions. These assumptions are known as statistical hypothesis. [ Note: A collected set of data points which is a part of the population (a few number of data) is called a sample. The process of selection is called sampling. When all the data are considered for a study, this is called population.]
31
A short Course on Probability Theory and Sampling, originally prepared as lecture notes for M.Sc. (Geography) students of Vidyasagar Univ, WB, India. Compiled by Dr. A. Kar Gupta, kg.abhi@gmail.com, Physics Deptt, Panskura B. College, WB, India How to test Hypothesis?
Assuming the hypothesis correct, we calculate the probability of getting the observed sample. If this probability is less than a certain assigned value, the hypothesis is rejected. If there is no significant difference between the observed value and the expected value, the hypothesis is called Null Hypothesis.
Test of significance:
The tests which enable us to decide whether to accept or to reject the null hypothesis are called the tests of significance. If the differences between the sample values and the population values are significantly large it is to be rejected (i.e., Hypothesis is not Null). It is known that the mean of a sample is an unbiased estimate of the population mean . It is called point estimate. But we know, if we collect different samples, the mean ( ) varies from sample to sample. Mean of samples form a distribution which we call sampling distribution. Note that the sampling distribution is Normal if the variable in the population is normally distributed. Now the question is, how close is a calculated mean to the population mean? We have to estimate that with some level of accuracy.
Confidence Interval:
Confidence interval is a range of values over which we can trap the population mean with some probability. So, we consider the probability distribution of sample means in order to find that probability of trapping. Suppose, we have a sample mean and we consider a symmetric interval around this: where is a value that we shall determine.
If |
, the confidence interval traps the population mean .
How to calculate confidence interval? 32
Suppose, the variable Symbolically,
follows a Normal distribution, with mean
and standard deviation .
So, for a sample if size ,
). follows a Normal distribution with
This mean that the distribution of mean ( ) of sample size mean and standard deviation .
If the confidence interval is 95%, the interval has a probability 0.95 to trap the population | | mean: Now as an example, consider a sampling distribution with Here follows z-distribution, Z . [Normal distribution with mean = 0, stand dev. = 1] , .
Now let us look up the z-table. The total area under the curve is 100% which gives us the total probability = 1. The shaded area (as in the fig.) is 95% of the total area which corresponds to probability = 0.95.
The half of the shaded area = 0.95/2 = 0.475 as it is symmetric around zero. In the z-distribution [ area from to
, we now find the value of is 0.475. as we consider the critical value,
from z-table, where the
33
Thus 95% confidence interval:
If the sample mean is , the confidence interval: So we can say with 95% confidence level that the population mean can be in this interval.
NOTE: For a sample of size = , with population variance , a 95% confidence

interval means *

Let us now calculate the width of the confidence interval for 95% confidence: So, we can see that the interval decreases with the increase of sample size. That is we can narrow down the search of the population mean as we take larger sample size. Then we can say with more accuracy that our measured mean is closer to the population mean. For example, for For , , For 98% Confidence interval: Shaded area = 0.98. Half the shaded area = 0.98/2 =0.49 which is between Thus 98% confidence interval is *
and
+.
NOTE #1: Symbolically, it often said that the confidence level is This also means significant level.
, where
34
For example, for Confidence level = Significance level = Confidence level and significance levels are complimentary.
Confidence Level 90% 95% 98% 99%
1.645 1.96 2.326 2.576
NOTE #2: When we are not sure if the population is Normal and we do not know the population variance , we can still use the method of calculating the confidence interval by considering the variance of a large sample (usually ). ( ) Then we consider the interval, *
+.
Students T-test:
This is applied to find confidence interval for a small sample. The population is Normal. Consider the variable defined as [Note here, we use , calculated for the sample, instead of .] The values of the variable varies from sample to sample and thus it forms a distribution looking very similar to Normal distribution. This is t-distribution. As we take larger and larger samples, the t-distributions more and more become closer to a Normal distribution, , which is nothing but z-distribution.
35
Now instead of sample size, the family of distributions are characterized by a parameter called degrees of freedom (df), usually denoted by [Nu]. Degrees of freedom = No. of independent values used for calculation of . For example, if or is the sample size, we use -data points but they are related by their mean, . Such a condition in the form of a relation or equation is called a independent quantities and this is degrees of freedom here. Number of values Number of constraints -distributions
constraint. Thus we have Degrees of freedom, In this case,
The -distributions are now designated as -distributions. As is higher the tend more and more towards z-distribution.
Like z-table, we now have -table to consult, from where we have the area under the curve with some -range.
So, for a Normal distribution, for a sample of size , we have confidence interval: *
+ for a
confidence level, for -degrees of freedom.
36
EXAMPLE #1: Consider the following 10 measurements of some variable. The hypothesis is that the population mean is . We have to verify that. Assume that the readings follow a Normal Distribution. No. of 1 2 3 4 5 6 7 8 9 10 Obs. Values ( Degrees of freedom, From -table for ) ( ) 0.13 -0.09 0.06 0.15 -0.02 0.03 0.01 -0.02 -0.07 0.05
with 95% confidence level, we have

. +
Confidence interval: * The mean hypothesis. EXAMPLE #2:
is trapped inside the above interval. So the hypothesis is right. Null
The mean life time (in Hours) of an electric bulb is measured to be 10.4. Now a technology is introduced to increase the life time. The experimental data collected from a random sample of size , , . Test whether there is any evidence at the 10% significance level that the new technology has actually increased the life time. [Note that it is not asked if there is any decrease in life time. The question is to ask whether there is any increase or it remains the same.] Ans. Null hypothesis, , Alternate hypothesis, Here we consider one tail t-test as we are to look for the increase only. Sample mean, (

Unbiased estimate of the population variance (from the sample), ) * +

37
For the t-test,
Here, degrees of freedom, distribution.
. So we look for area under the curve for
For 10% significance, i.e. for 90% confidence level, we find . Thus our observed value lies in the rejection region. That means that the mean life time is increased. Alternate hypothesis.
EXAMPLE #3: You are measuring some length which is 10 cm. Five measurements by you are 9.88, 10.18. 10.23, 10.39, 10.25 cm. Assume that the measurements follow a Normal distribution. Test at the 5% significance level whether there results support the claim or it is biased. Ans. Since the bias can be in either direction (positive or negative), we consider two tail test. The Hypothesis, Null cm, Alternate cm. Sample mean, Variance,

, this is an unbiased
estimate of the population mean.
For 5% significance level, we consider the area of 0.95 (shaded area in the fig.) around the centre, and an area of 0.025 on both sides (at both the tails).
38
We consider distribution as the degrees of freedom, either sides corresponds to , from the table.
. The rejection region on
Here we find that the t-value is below the rejection region that is in the acceptance region. Thus the hypothesis ( cm) is accepted. Null hypothesis.
NOTE: In one tail t-test, we consider only one side of the t-distribution, either on the right side (for increase or positive values) or on the left side (for decrease or negative values). For two tail ttest, we consider both sides of the distribution (as we have done before) considering the fact that the value of the variable can increase or decrease from the mean value.
Chi-Squared Test: (
-Test)
In some measurement, we obtain the frequencies of some events. We call them observed frequencies ( ). We have to test whether the observed frequencies are consistent with the expected frequencies according to some given distribution or hypothesis. The measure of discrepancy between the observed and expected frequencies is defined by the following quantity:
39
Note: (Chi-square) is a positive quantity, lower its value better is the agreement between the observed and expected frequencies. In other words, it gives a goodness of fit of the model or hypothesis. For , the agreement is absolute. Like t-distribution, we do also have -distribution. We measure the values for different samples of same size and obtain a distribution. The distribution, here also, is characterized by the degrees of freedom . So for we write ,
EXAMPLE #1: In a dice throw experiment, we obtain the following fig. where the dice was thrown 600 times. Score Freq. 1 90 2 108 3 110 4 95 5 100 6 97
Let us check the above with respect to probability = 1/6 (for a fair dice). So the expected frequency = .
-test. Our hypothesis is that for each score, the
Score 1 90 108 110 95 100 97 600 100 100 100 100 100 100 600 -10 8 10 -5 0 -3 0 1 0.64 1 0.25 0 0.09 2.98
40
Hypothesis, the dice is fair, the dice is not fair. In this example, the degrees of freedom, . So after we calculate from the
following table, we have to look for the
2 3 4 5 6 Total
-table.
To calculate
: . If we consider 90% confidence level, we have ( )
From the table, we see . Our obtained value for

the hypothesis null.
is below this. So it falls within the acceptance region. The dice is fair,
EXAMPLE #2: In a genetic study, it is predicted that the children with both parents of blood group AB will fall into blood groups AB, A and B in the ratio 2:1:1. Out of a random sample 100, we find 55 children have blood group AB, 27 have blood group A and 18 blood group B. Test at 10% significance level whether the observed results agree with the theoretical prediction. Ans. Hypothesis The childrens blood group is in ratio 2:1:1 The childrens blood group is NOT in ratio 2:1:1 Blood group AB A AB Total
The ratio of probabilities AB, A, B is 2:1:1 =
55 27 18 100
50 25 25 100
5 2 -7 0
0.5 0.16 1.96 2.62
Degrees of freedom For 10% significance level we look for We find The rejection region is thus above
-distribution table: (
Here the obtained value of The hypothesis is correct. Null Hypothesis!
. is below the rejection region, so it falls in the acceptance region.
EXAMPLE #3 The rain fall ( ) at some place is measured in cm in the following table. We assume that is a random variable and it follows a Normal distribution with mean and standard deviation . Obs. Freq. (i) <35 10 35-45 18 45-55 28 55-65 18 >65 12
Calculate the expected frequencies of the different classes

41
(ii)
Carry out a .
goodness of fit analysis to test at the 5% level of significance and test the
hypothesis that the random variable
actually follows the Normal distribution
Ans. (i) For 35, 45, 55, 65 we have 0.333, 1 respectively. Now Follow z-table. For , we have , Expected frequency = Here, total frequency = -1, -0.333,
For
, ,
Expected frequency =
For
, ,
Expected frequency = By symmetry, the expected frequencies for the 4th and 5th groups are 18.14 and 13.65 respectively. To carry out
-test we prepare the following table. 42
Class <35 35-45 45-55 55-65 >65 Total Here, From the 10 18 28 18 12 86 13.65 18.14 22.43 18.14 13.65 86.01 -3.65 -0.14 5.57 -0.14 -1.65 0 0.98 0.0 1.38 0.0 0.2
-distribution table,
, for 5% significance level.
Since, 2.56 is not in the rejection region, the data follows Normal distribution,
Null hypothesis.
Additional Information Type I and Type II errors: In case of Hypothesis testing, we call Type I Error -> When we incorrectly reject the true Null Hypothesis. Type II Error -> When we fail to reject the false Null Hypothesis. Probability Density Function: In Probability theory, the probability density function (P.D.F.) of a continuous random variable is the probability around a certain value or probability in a unit interval. P.D.F. when integrated over a finite interval gives the cumulative probability. , P.D.F.
43
*The Lecture notes are for private circulation only. Some of the ideas and examples are taken from the some books and numerous materials available in internet.
Books and Websites: 1. Advanced Level Mathematics: STATISTICS 1 & 2 Steve Dobbs and Jane Miller (Pub: CAMBRIDGE International Examinations) 2. The Analysis of Time Series An Introduction (fifth edition) C. Chatfield (Pub: Chapman & Hall) 3. Basic & Clinical Biostatistics (fourth edition) Beth Dawson, Robert G. Trapp (Pub: Lange Medical Books/ McGraw-Hill) 4. Numerical Recipes (2nd Ed, Vol I FORTRAN) William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery (Pub: Cambridge University Press) 5. Mathematical Physics, H.K. Dass 6. Website: people.richland.edu
44

A Short Course On PROBABILITY and SAMPLING

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Short Course On PROBABILITY and SAMPLING

Uploaded by

Copyright:

Available Formats

A short Course on Probability Theory and Sampling, originally prepared as lecture notes for M.Sc.

PROBABILITY and SAMPLING

Therefore, we have, Example #1:

In a coin tossing, we know from our experience, . So, Example #2: .

Example#1: The probability of occurring either Head or Tail in a coin toss,

Addition Rule in this case:

When Events are NOT Independent: Multiplication rule:

So, we can write the formula for conditional probability:

= Prob. of male graduates = .

is the frequency of occurrence for event relative frequency]

and we have total frequency,

tends to very large.

Therefore, we write the above quantities in terms of probabilities:

Mean of Square, = Variance, =

[Continuous case] [Continuous case]

is also a Normal distribution.

Mean: Variance: If has a Normal distribution,

Mean: Variance: If and are separately Normal distributions, distribution.

is then also a Normal 10

The probability distribution of weight of 10 people taking together,

under the curve:

1 1/6 1/6 1/6

2 1/6 2/6 4/6

3 1/6 3/6 9/6

4 1/6 4/6 16/6

5 1/6 5/6 25/6

6 1/6 6/6 36/6

Total 1 21/6 91/6

From the table, we can calculate mean, variance,

Shape of a Distribution: Symmetry, Skewness, Kurtosis Skewness:

(Negative Skewness: Mean < Mode)

(Positive Skewness: Mean > Mode)

Fig. (Total area under the curve = 1)

Fig. (Area between

and any other value

Fig. (Area between a negative value and a positive value)

Fig. (Area less than a negative or greater than a positive value)

(ii) Area = 0.2518

(Note: The area is equal to the area between

as the curve is symmetric.)

Required area = (area between

and 1.94) (area between

Required area = (area between = (area between = 0.3997 + 0.5 = 0.8997

(c) How many students score below 15?

The above probability

is called binomial probability.

Also note for quick calculations, ( ) ( )

Now consider the following table based on the binomial probability: ..

Start with #2 and then take 5, 8, 11 number data.

, the confidence interval traps the population mean .

How to calculate confidence interval? 32

Suppose, the variable Symbolically,

follows a Normal distribution, with mean

and standard deviation .

So, for a sample if size ,

). follows a Normal distribution with

, we now find the value of is 0.475. as we consider the critical value,

from z-table, where the

Thus 95% confidence interval:

NOTE: For a sample of size = , with population variance , a 95% confidence

Confidence Level 90% 95% 98% 99%

1.645 1.96 2.326 2.576

constraint. Thus we have Degrees of freedom, In this case,