2. Introduction to Probability Department of Economics University of Essex 18/20 October 2011 EC114 Introduction to Quantitative Economics 2. Introduction to Probability 2/25 Outline 1 Frequency Distributions 2 Basic Probability Reference: R. L. Thomas, Using Statistics in Economics, McGraw-Hill, 2005, Prerequisites 2 and 3. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 3/25 Sometimes descriptive statistics, such as the mean and variance, summarise a data set too much. A frequency distribution is a halfway house between the raw data and summary statistics. Table P.4 in Thomas provides a frequency distribution for the data set on clothing expenditure. For the 10 observations from lecture 1 we can count the number of observations that fall into certain ranges or classes. Recall that the (ordered) observations were: 1572 1666 1743 2111 2401 2651 2806 2848 3201 3567 Suppose we choose classes of width 250 beginning at 1500 i.e. 15001749, 17501999 etc. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 4/25 We obtain the following frequency distribution: Class Frequency 15001749 3 17501999 0 20002249 1 22502499 1 25002749 1 27502999 2 30003249 1 32503499 0 35003749 1 Note that the class frequencies sum to 10 (the number of data points). The class 15001749 is the modal class because it has more observations (3) than any other. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 5/25 We can also depict the frequency distribution in a histogram:
1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 Annual expenditure on clothing and footwear () 3
2
1
0 F r e q u e n c y
Note that the areas of the columns are proportional to frequency, not the heights. This is important when the classes are of unequal width, as we can see as follows. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 6/25 Suppose we combine the rst two and last two classes: Class Frequency 15001999 3 20002249 1 22502499 1 25002749 1 27502999 2 30003249 1 32503749 1 The rst and last class widths are now twice the size of the others. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 7/25 Allowing heights to represent the frequencies overstates the larger classes:
1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 Annual expenditure on clothing and footwear () 3
2
1
0 F r e q u e n c y
EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 8/25 Allowing areas to represent the frequencies provides a more accurate picture:
1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 Annual expenditure on clothing and footwear () 3
2
1
0 F r e q u e n c y
EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 9/25 The shape of the histogram tells us something about the distribution. A symmetric distribution has mode=mean=median:
freq 0 0 X Mean Median Mode EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 10/25 A distribution that is skewed to the right often has mode<median<mean:
freq 0 0 X Mode Mean Median Here, there are some large extreme values that skew the distribution. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 11/25 A distribution that is skewed to the left often has mean<median<mode:
freq 0 0 X Mode Mean Median Here there are some small extreme values. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Frequency Distributions 12/25 It is sometimes useful to replace the frequencies of observations in a distribution by their relative frequencies, particularly when the number of observations is large. The relative frequencies are the proportions of the total number of observations that belong to the various classes. For our data on clothing expenditure we obtain: Class Frequency Relative frequency 15001749 3 0.3 17501999 0 0.0 20002249 1 0.1 22502499 1 0.1 25002749 1 0.1 27502999 2 0.2 30003249 1 0.1 32503499 0 0.0 35003749 1 0.1 EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 13/25 Probabilities Note that relative frequencies add up to one. In general, if the frequency in a class is f and the number of observations is n, then the relative frequency is f /n. When n become very large we call this a probability. For example, we might assign the probability of clothing expenditure being in the range 15001749 is equal to 0.3, or Pr(X in class 15001749) = 0.3. Note, however, that the number of observations in our example is small. More precisely, we dene a probability as being a limiting relative frequency. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 14/25 Suppose an event E has n possibilities/opportunities of occurring but actually occurs f times. The probability of E occurring is the value that the relative frequency f /n approaches as n becomes large. Mathematically, the probability is the limit of f /n as n tends to innity: Pr(E) = lim n (f /n). Suppose we have a fair six-sided die and we roll it 12 times to obtain the following frequencies: Outcome 1 2 3 4 5 6 Frequency 2 5 0 2 2 1 Relative frequency 1/6 5/12 0 1/6 1/6 1/12 EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 15/25 We would expect the relative frequencies to be close to 1/6. In the limit, as the number of rolls increases, each relative frequency would tend to 1/6. Each outcome in this case is equally likely or has equal probability e.g. Pr(Roll a 3) = lim n (Number of 3s/n) = 1/6. Suppose we now roll two six-sided dice; there are 36 possible outcomes, all equally likely. For example, Pr(4, 4) = 1/36, Pr(5, 2) = 1/36. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 16/25 Consider two more complicated events: E 1 : the sum of the two dice is equal to 10; E 2 : one of the dice shows a 2. E 1 is shown in green, E 2 in purple: 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6 4,1 4,2 4,3 4,4 4,5 4,6 5,1 5,2 5,3 5,4 5,5 5,6 6,1 6,2 6,3 6,4 6,5 6,6 All outcomes are equally likely and so Pr(E 1 ) = 3/36 = 1/12 and Pr(E 2 ) = 11/36. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 17/25 Mutually Exclusive Events Note that the events E 1 and E 2 cant occur at the same time they are mutually exclusive. The probability of both E 1 and E 2 occurring is zero: Pr(E 1 and E 2 ) = 0. However, the probability of either E 1 or E 2 occurring is the sum of the distinct probabilities: Pr(E 1 or E 2 ) = 3/36 + 11/36 = 14/36. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 18/25 Venn diagrams are often used to depict the probabilities of events. For the mutually exclusive events E 1 and E 2 :
E 1
E 2
EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 19/25 However, suppose we have an event E 3 : one of the dice shows a 6. We nd that Pr(E 3 ) = 11/36 while Pr(E 1 or E 3 ) = 12/36, Pr(E 1 and E 3 ) = 2/36 = 1/18. In this case E 1 and E 3 are non-mutually exclusive events (the events E 2 and E 3 are also non-mutually exclusive). EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 20/25 The Venn diagram for E 1 and E 3 is:
E 1
1/36
2/36 E 3
9/36
E 1 and E 3
Note that the entire area represents Pr(E 1 or E 3 ) while the shaded area represents Pr(E 1 and E 3 ). EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 21/25 The diagram shows that we cant simply write Pr(E 1 or E 3 ) = P(E 1 ) + P(E 3 ) because the shaded area would be included twice. However, we can write Pr(E 1 or E 3 ) = P(E 1 ) + P(E 3 ) Pr(E 1 and E 3 ). We know that Pr(E 1 ) = 3/36, Pr(E 3 ) = 11/36 and Pr(E 1 and E 3 ) = 2/36 so that Pr(E 1 or E 3 ) = 3/36 + 11/36 2/36 = 12/36 = 1/3. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 22/25 Conditional Probabilities Sometimes we are interested in the probability of an event given that we know another event has already occurred. This is known as conditional probability. Suppose we wish to nd the probability that the sum of the dice is 10 given that one of the dice is a 6. This is the probability of E 1 given that E 3 has occurred. We write Pr(E 1 |E 3 ) = Pr(E 1 and E 3 ) Pr(E 3 ) i.e. the probability of both events occurring as a proportion of the probability of E 3 occurring. We therefore nd that Pr(E 1 |E 3 ) = 2/36 11/36 = 2/11. EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 23/25 By reversing the roles of E 1 and E 3 we obtain: Pr(E 3 |E 1 ) = Pr(E 3 and E 1 ) Pr(E 1 ) We nd that Pr(E 3 |E 1 ) = 2/36 3/36 = 2/3, which makes sense because two of the three outcomes in E 1 coincide with E 3 . We can use the formula for conditional probabilty to obtain an expression for Pr(E 1 and E 3 ): Pr(E 1 and E 3 ) = Pr(E 1 |E 3 ) Pr(E 3 ) = Pr(E 3 |E 1 ) Pr(E 1 ). EC114 Introduction to Quantitative Economics 2. Introduction to Probability Basic Probability 24/25 Independent Events An important concept in statistics is that of independence. Two events are said to be independent if, and only if, the occurrence or non-occurrence of one has no inuence on the probability of occurrence of the other. For example, with a fair die, each roll of the die is independent of all others. For independent events E x and E y it must be the case that Pr(E x |E y ) = Pr(E x ) and Pr(E y |E x ) = Pr(E y ). It is also true that, for independent events, Pr(E x and E y ) = Pr(E x ) Pr(E y ). This generalises to more than two independent events: Pr(E x 1 and E x 2 and. . . and E x n ) = Pr(E x 1 )Pr(E x 2 ). . .Pr(E x n ). EC114 Introduction to Quantitative Economics 2. Introduction to Probability Summary 25/25 Summary Frequency distributions and histograms. Relative frequencies and probabilities. Venn diagrams; mutually exclusive events; conditional probabilities; independence. Next week: Discrete probability distributions. EC114 Introduction to Quantitative Economics 2. Introduction to Probability