You are on page 1of 21

Lab 5: Probability density and cumulative distributions

1 Introduction
The purpose of this lab is to become familiar with the concepts of probability density, cumulative distribution, mean, and median. In particular, we will be studying the empirical data from some Bernoulli trials, which should t a binomial distribution. We will plot the probability density and cumulative distribution as bar graphs, and then compute the mean and median from these bar graphs. The mean x of a random variable whose possible values xi have probabilities p(xi ) is dened by x = xi p(xi ). The median m of an (unordered) list of numbers is the middle number after ordering. For instance, if the ordered list is 1, 2, 2, 2, 3, 3, 4 then the middle number (median) is 2. Thus, the median of the related random variable is the rst x for which the cumulative distribution function exceeds (or is equal to) 0.5. Convince yourself that this is so! (In cases of an even number of data points the convention is to use the average of the middle two, but this does not occur in this lab.) [
i

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


]

2 Problems 2.1 Problem Set 1


Given below is the distribution of number of heads obtained by a group of people in an experiment in which each person tossed a fair coin ten times. Let: x = number of heads obtained out of ten tosses. N (x)=number of people who got x heads. Use the above data to plot a bar graph of p(x), i.e., the empirical probability distribution of obtaining x heads in 10 tosses. Center the bars at the integers. (See note at the bottom for plotting.) Plot the cumulative distribution (of obtaining up to x heads) for the empirical data. Technically this should be a step function, but step functions are dicult to display using Mathsheet, so we suggest using a bar graph. Use the spreadsheet to calculate the expected number of heads based on the same empirical data. Find the median number of heads of the empirical data. Finally, plot a bar graph of the theoretical probability distribution in the same way. Submit a graph of these three bar graphs and write the values you found for the mean and the median of the empirical data on the bottom of the page. Also, please explain how you computed the median.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

x 0 1 2 3 4 5 6 7 8 9 10

N (x) 0 1 8 11 20 36 25 13 8 2 0

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Table 1 Problem Set 1: Of 124 people N (x) tossed x heads Note: It is somewhat tricky to display all three of these bar graphs without having them cover eachother. One possible solution is to plot the cumulative distribution rst (lled) with the empirical probability distribution lled on top. Finally, the theoretical probability distribution can be plotted over everything, but with the ll setting o so that the other graphs show through.

Previous Page Next Page Exit


[ ]

2.1.1 Solution to Problem Set 1


1

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

0.8

0.6 0.5 0.4

0.2

10

Figure 1 Solution for problem set 1. The median number of heads is 5, (rst x for which the cumulative distribution function exceeds 0.5) and the expected number of heads (also called mean number of heads) is calculated by the spreadsheet as x = 5.1. Heres how the data could be organized: Column 1 contains the number x of heads from 1 to 10. Column 2 contains the number of people who got x heads. Column 3 contains the theoretically expected number of people who got x heads. Column 4 totals [

Previous Page Next Page Exit


]

up the size of the group of people. (There were 124 people in all). The numbers in (x) column 5 are then the fraction N 124 of the group that got a given number of heads. This is the probability p(x) of getting x heads. Column 6 computes the cumulative function F (x). Column 7 computes xp(x) and column 8 then sums these up to get the mean value of x for the probability density. We see that the mean value is x = 5.1. Figure 1 then shows the graphs of p(x) and F (x) and the theoretical value on the same graph. From this graph, we can read o the median value. We locate the x at which the cumulative distribution function (blue curve) exceeds the value 0.5 for the rst time. This is at m = 5.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.2 Problem Set 2


Given below is the distribution of number of heads obtained by a group of people in an experiment in which each person tossed a fair coin ten times. Let: x = number of heads obtained out of ten tosses. N (x)=number of people who got x heads. x 0 1 2 3 4 5 6 7 8 9 10 N (x) 0 1 3 13 16 14 20 9 5 1 0

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

Table 2 Problem Set 2: Of 82 people N (x) tossed x heads

Use the above data to plot a bar graph of p(x), i.e., the empirical probability distribution of obtaining x heads in 10 tosses. Center the bars at the integers. (See note at the bottom for plotting.) Plot the cumulative distribution (of obtaining up to x heads) for the empirical data. Technically this should be a step function, but step functions are dicult to display using Mathsheet, so we suggest using a bar graph. Use the spreadsheet to calculate the expected number of heads based on the same empirical data. Find the median number of heads of the empirical data. Finally, plot a bar graph of the theoretical probability distribution in the same way. Submit a graph of these three bar graphs and write the values you found for the mean and the median of the empirical data on the bottom of the page. Also, please explain how you computed the median. Note: It is somewhat tricky to display all three of these bar graphs without having them cover eachother. One possible solution is to plot the cumulative distribution rst (lled) with the empirical probability distribution lled on top. Finally, the theoretical probability distribution can be plotted over everything, but with the ll setting o so that the other graphs show through.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.2.1 Solution to Problem Set 2


1

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

0.8

0.6 0.5 0.4

0.2

10

Figure 2 Solution for problem set 2. The median number of heads is 5, (rst x for which the cumulative distribution function exceeds 0.5) and the expected number of heads (also called mean number of heads) is calculated by the spreadsheet as x = 5.02. Heres how the data could be organized: Column 1 contains the number x of heads from 1 to 10. Column 2 contains the number of people who got x heads. Column 3 contains the theoretically expected number of people who got x heads. Column 4 [

Previous Page Next Page Exit


]

totals up the size of the group of people. (There were 82 people in all). The numbers (x) of the group that got a given number of heads. in column 5 are then the fraction N82 This is the probability p(x) of getting x heads. Column 6 computes the cumulative function F (x). Column 7 computes xp(x) and column 8 then sums these up to get the mean value of x for the probability density. We see that the mean value is x = 5.02. Figure 2 then shows the graphs of p(x) and F (x) and the theoretical value on the same graph. From this graph, we can read o the median value. We locate the x at which the cumulative distribution function (blue curve) exceeds the value 0.5 for the rst time. This is at m = 5.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.3 Problem Set 3


Given below is the distribution of number of heads obtained by a group of people in an experiment in which each person tossed a fair coin ten times. Let: x = number of heads obtained out of ten tosses. N (x)=number of people who got x heads. x 0 1 2 3 4 5 6 7 8 9 10 N (x) 0 1 9 17 25 37 20 14 6 1 0

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

Table 3 Problem Set 3: Of 130 people N (x) tossed x heads

Use the above data to plot a bar graph of p(x), i.e., the empirical probability distribution of obtaining x heads in 10 tosses. Center the bars at the integers. (See note at the bottom for plotting.) Plot the cumulative distribution (of obtaining up to x heads) for the empirical data. Technically this should be a step function, but step functions are dicult to display using Mathsheet, so we suggest using a bar graph. Use the spreadsheet to calculate the expected number of heads based on the same empirical data. Find the median number of heads of the empirical data. Finally, plot a bar graph of the theoretical probability distribution in the same way. Submit a graph of these three bar graphs and write the values you found for the mean and the median of the empirical data on the bottom of the page. Also, please explain how you computed the median. Note: It is somewhat tricky to display all three of these bar graphs without having them cover eachother. One possible solution is to plot the cumulative distribution rst (lled) with the empirical probability distribution lled on top. Finally, the theoretical probability distribution can be plotted over everything, but with the ll setting o so that the other graphs show through.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.3.1 Solution to Problem Set 3


1

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

0.8

0.6 0.5 0.4

0.2

10

Figure 3 Solution for problem set 3. The median number of heads is 5, (rst x for which the cumulative distribution function exceeds 0.5) and the expected number of heads (also called mean number of heads) is calculated by the spreadsheet as x = 4.85. Heres how the data could be organized: Column 1 contains the number x of heads from 1 to 10. Column 2 contains the number of people who got x heads. Column 3 contains the theoretically expected number of people who got x heads. Column 4 totals [

Previous Page Next Page Exit


]

up the size of the group of people. (There were 130 people in all). The numbers in (x) column 5 are then the fraction N 130 of the group that got a given number of heads. This is the probability p(x) of getting x heads. Column 6 computes the cumulative function F (x). Column 7 computes xp(x) and column 8 then sums these up to get the mean value of x for the probability density. We see that the mean value is x = 4.85. Figure 3 then shows the graphs of p(x) and F (x) and the theoretical value on the same graph. From this graph, we can read o the median value. We locate the x at which the cumulative distribution function (blue curve) exceeds the value 0.5 for the rst time. This is at m = 5.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.4 Problem Set 4


Given below is the distribution of number of heads obtained by a group of people in an experiment in which each person tossed a fair coin ten times. Let: x = number of heads obtained out of ten tosses. N (x)=number of people who got x heads. x 0 1 2 3 4 5 6 7 8 9 10 N (x) 0 0 1 8 35 17 20 14 3 1 0

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

Table 4 Problem Set 4: Of 99 people N (x) tossed x heads

Use the above data to plot a bar graph of p(x), i.e., the empirical probability distribution of obtaining x heads in 10 tosses. Center the bars at the integers. (See note at the bottom for plotting.) Plot the cumulative distribution (of obtaining up to x heads) for the empirical data. Technically this should be a step function, but step functions are dicult to display using Mathsheet, so we suggest using a bar graph. Use the spreadsheet to calculate the expected number of heads based on the same empirical data. Find the median number of heads of the empirical data. Finally, plot a bar graph of the theoretical probability distribution in the same way. Submit a graph of these three bar graphs and write the values you found for the mean and the median of the empirical data on the bottom of the page. Also, please explain how you computed the median. Note: It is somewhat tricky to display all three of these bar graphs without having them cover eachother. One possible solution is to plot the cumulative distribution rst (lled) with the empirical probability distribution lled on top. Finally, the theoretical probability distribution can be plotted over everything, but with the ll setting o so that the other graphs show through.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.4.1 Solution to Problem Set 4


1

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

0.8

0.6 0.5 0.4

0.2

10

Figure 4 Solution for problem set 4. The median number of heads is 5, (rst x for which the cumulative distribution function exceeds 0.5) and the expected number of heads (also called mean number of heads) is calculated by the spreadsheet as x = 5.07. Heres how the data could be organized: Column 1 contains the number x of heads from 1 to 10. Column 2 contains the number of people who got x heads. Column 3 contains the theoretically expected number of people who got x heads. Column 4 [

Previous Page Next Page Exit


]

totals up the size of the group of people. (There were 99 people in all). The numbers (x) of the group that got a given number of heads. in column 5 are then the fraction N99 This is the probability p(x) of getting x heads. Column 6 computes the cumulative function F (x). Column 7 computes xp(x) and column 8 then sums these up to get the mean value of x for the probability density. We see that the mean value is x = 5.07. Figure 4 then shows the graphs of p(x) and F (x) and the theoretical value on the same graph. From this graph, we can read o the median value. We locate the x at which the cumulative distribution function (blue curve) exceeds the value 0.5 for the rst time. This is at m = 5.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.5 Problem Set 5


Given below is the distribution of number of heads obtained by a group of people in an experiment in which each person tossed a fair coin ten times. Let: x = number of heads obtained out of ten tosses. N (x)=number of people who got x heads. x 0 1 2 3 4 5 6 7 8 9 10 N (x) 0 2 1 17 15 31 28 10 2 1 0

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

Table 5 Problem Set 5: Of 107 people N (x) tossed x heads

Use the above data to plot a bar graph of p(x), i.e., the empirical probability distribution of obtaining x heads in 10 tosses. Center the bars at the integers. (See note at the bottom for plotting.) Plot the cumulative distribution (of obtaining up to x heads) for the empirical data. Technically this should be a step function, but step functions are dicult to display using Mathsheet, so we suggest using a bar graph. Use the spreadsheet to calculate the expected number of heads based on the same empirical data. Find the median number of heads of the empirical data. Finally, plot a bar graph of the theoretical probability distribution in the same way. Submit a graph of these three bar graphs and write the values you found for the mean and the median of the empirical data on the bottom of the page. Also, please explain how you computed the median. Note: It is somewhat tricky to display all three of these bar graphs without having them cover eachother. One possible solution is to plot the cumulative distribution rst (lled) with the empirical probability distribution lled on top. Finally, the theoretical probability distribution can be plotted over everything, but with the ll setting o so that the other graphs show through.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

2.5.1 Solution to Problem Set 5


1

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

0.8

0.6 0.5 0.4

0.2

10

Figure 5 Solution for problem set 5. The median number of heads is 5, (rst x for which the cumulative distribution function exceeds 0.5) and the expected number of heads (also called mean number of heads) is calculated by the spreadsheet as x = 4.98. Heres how the data could be organized: Column 1 contains the number x of heads from 1 to 10. Column 2 contains the number of people who got x heads. Column 3 contains the theoretically expected number of people who got x heads. Column 4 totals [

Previous Page Next Page Exit


]

up the size of the group of people. (There were 107 people in all). The numbers in (x) column 5 are then the fraction N 107 of the group that got a given number of heads. This is the probability p(x) of getting x heads. Column 6 computes the cumulative function F (x). Column 7 computes xp(x) and column 8 then sums these up to get the mean value of x for the probability density. We see that the mean value is x = 4.98. Figure 5 then shows the graphs of p(x) and F (x) and the theoretical value on the same graph. From this graph, we can read o the median value. We locate the x at which the cumulative distribution function (blue curve) exceeds the value 0.5 for the rst time. This is at m = 5.

Introduction Problem Set 1 Problem Set 2 Problem Set 3 Problem Set 4 Problem Set 5

Previous Page Next Page Exit


[ ]

You might also like