You are on page 1of 7

Soda Comparison Lab

Alejandra Zepeda Stats 1510 (Day)

Abstract: UNDER CONSTRUCTION. Introduction: A regular soda consumer of diet pepsi noticed that his 12 oz diet pepsi can had less soda than diet coke and coke zero his friends where drinking. This group of consumers decided to nd out which company does a better job of lling up their soda cans with exactly 12oz, the Coca Cola company or the Pepsi Company. A study was made to compare the weight of the three different types of soda. The populations of interest is Diet Pepsi, Coke Zero and Diet Coke. A population sample was taken of 96 Diet Pepsi cans, 72 Coke Zero cans, and 85 Diet Coke cans. The variable of interest is the weight in grams of the population sample for Diet Pepsi, Coke Zero, Diet Coke. Hypothesis: The Coca Cola company does a better job at lling up Diet Coke cans that are labeled 12oz. Methods: An Apple application called TC-Stats was used to generate all the data analysis. It is an application that performs the following functions: summary statistics, histograms, box-and-whiskers plot, scatter plot, frequency distribution table, probability calculations based on the binomial, standard and nonstandard normal, t, f, and chi-squared distributions...sample size calculations for inferences regarding a population mean or proportion. Tc-stats can also import les in csv format and has Dropbox support. The sampling technique implemented in this data collection used to represent each population is the simple random sample technique. To begin, 96 diet pepsi cans were bought and numbered from 1-96 and placed in a table. A sample of 15 numbers was randomly generated using TC-Stats., Second, 72 coke zero cans were bought and numbered 1-72 and placed in a table. A sample of 15 was randomly generated using TC-Stats. Third, 85 diet coke cans were bought and numbered from 1-85 and placed in a table. A sample of 15 numbers was randomly generated using TC-Stats. A sample of 15 cans was randomly generated using TC-Stats. The measurement scale for the weight in grams is ratio and the sample number selected from each population is nominal. The simple random sample technique is best indicated for this type of data collection because every possible sample of size n in the population has equal probability of being chosen. In this case, TC- Stats was used to generate the randomized sample for each soda type and an electric scale was used to measure the weight of each can in grams. The sample size n is 15 for each population size. The sample size remained the same for all three populations samples. The devices used to generate numbers and to weight were all kept the same too. First, to obtain the sample of 15 Diet Pepsi. The lower bound was 1 and upper bound was 96. The generator was set to start inserting numbers at 1 and stop at 15. Once the sample size was obtained

from Diet Pepsi cans we went to the table where the numbered cans where placed and took the selected cans to get weighted Second, to obtain the sample of 15 of Coke Zero. The lower bound was 1 and upper bound was 72. The generator was set to start inserting number at 1 and stop at 15. Once the sample size was obtained from Coke Zero cans we went to the table where the numbered cans where placed and took the selected cans to get weighted. Third, to obtain the sample of 15 of the Diet Coke. The lower bound was 1 and upper bound was 85. The generator was set to start inserting numbers at 1 and stop at 15. Once the sample size was obtained from the Diet Coke cans we went to the table where the numbered cans where placed and took the selected cans to get weighted. Figure 1:

Summary Statistics: Above, gure 1 tells us that the population sample size (N) for all three populations which is 15. Looking at this table, the weight of all three soda cans has been calculated to obtain a summary statistic individually. With this information we can compare the weights, calculate a ve-number summary, create a box-plot graph, histograms for each population sample and make inferences of what the data means. Figure 1 displays the sum, Mean, the population standard deviation, the sample standard deviation, Minimum value, 1 Quartile, Median, 3 Quartile, and Maximum value of weights for each population sample. The mean tells us the center of the data in number line and the median is the middle position when the data is sorted in order. Quartiles tell us the measurement of position and are divided into four groups of at most 25% of the data. The rst quartile tells us that at most 25% falls below and at most 75% of the data is above. The third quartile tells us that at most 75% falls below and at most 25% of the data is above. Below is a Five number summary for each population sample that will help us understand this better. Weight in Grams/Diet Coke Min: 367.92 Q1: 371.13 M: 373.19 Q3: 374.99 Max: 375.35 Weight in Grams/Coke Zero Min: 368.46 Q1: 370.04 M: 372.25 Q3: 373.6 Max: 374.53 Weight in Grams Diet Pepsi Min: 366.5 Q1: 368.61 M: 369.5 Q3: 370.34 Max: 370.67

The 5 number summary helps us construct a box-in-whiskers plot for each variable of interest. Box and whiskers plots are used for measurements of positions, to inform us of potential outliers and to indicate the location of quartiles and median. Figure 2 represents our data in a stacked box plot, with this we can compare the distributions. The weight of the Diet pepsi looks like a skewed left distribution, the weight of coke zero looks fairly bimodal, and the weight of diet coke looks skewed left. By looking at the Diet pepsi and Diet coke we see that there might be an outlier somewhere in the data. Figure 2:

To get a better visual idea of the distribution for diet pepsi, look at gure 4. TC- Stats generated this histogram. Before constructing a histogram for diet pepsi, a frequency table is constructed (gure 3). The class widths of the frequencies used to determine the groupings it starts at the lowest value, 366.5 and is spaced by .53. This class width was chosen because its common sense to start at the lowest value and at .53 to show it is skewed left. The class width is used for both the frequency table and the histogram for all sample datas. Frequency tables tells us the number of times a data value is observed relative to the size of data set. The cumulative frequency computes the sum of all the frequencies observed and the cumulative relative frequency is the cumulative frequency divided by the sample size. Figure 3 helps us see the frequencies the weight of diet pepsi were observed. Figure 3: The histograms below have the yaxis for the frequency and the x-axis for the weight in grams. This histogram (gure 4) is based on the diet pepsi data, it looks fairly skewed left as mentioned in the box plot display in gure 2 . It doesn't perfectly display a skewed left distribution but if we were to draw a line of the histogram than we can see the shape. This suggest that the

distribution is clumped on the right and tails off to the left. Knowing that the data for diet pepsi is skewed left then we can calculate the median. Locating the median is most appropriate for all measurement scales except nominal. All our data is already sorted according to size and as we can see the scale in which the data falls under are extreme values and clumped data values, it isn't the common scale from 1 to 10. The actual range for the diet pepsi data is 4.17 grams apart. The median is preferred measurement of location since the median is invariant to extreme values, unlike the mean. The median for diet pepsi is 369.5 grams. That value is telling us that half of the sample cans weighted less than 369.5 and the other half weighted 369.5 or more. We can also refer to the mean, which is 369.15, this means that the center or balancing point of all the data is at 369.15 grams. Since our data is ratio, we can also get information from measurements of position. In this case we will be looking at quartiles. We already saw that at 369.5 grams half (50%) of the cans weighted more than 369.5 and the other 50% weighted less. In this data the rst quartile is at 368.61 grams and the third quartile is at 374.99 grams. The rst quartile tells us that at most 25% of the coke cans weighted under 368.61 grams and at most 75% of the coke cans weighted above 368.61 grams. The third quartile tells us that 75% of the data is under 374.99 grams and at most 25% of the cans weight above the 374.99 grams. To nd the variance of the data we can look at the summary statistics for diet pepsi and see that it is 1.334. It is important to know how the weights of the individual diet pepsi cans are dispersed. The amount of dispersion for diet pepsi is small, which is good because the soda cans are not overlled or under lled. Figure 4: TC- Stats also generated a histogram for the Coke Zero sample data. Before constructing a histogram for coke zero, a frequency table is constructed (gure 5). The class widths of the frequencies used to determine the groupings is by .75. This class width was chosen because its common sense to start at the lowest value and count by .75 to show a better visual distribution. Figure 6 helps us see the frequencies the weight of coke zero were observed.

Figure 5: A histogram was also constructed below (gure 6) for the coke zero sample data. This histogram looks fairly bimodal as mentioned in the box plot display in gure 2. Knowing that the data for coke zero is bimodal we can calculate the median and mean. The actual range for the coke zero data is 6.07 grams spread out. The median for coke zero is 372.25 grams. That value is telling us that half of the sample cans weighted less than 372.25 and the other half weighted 372.25 or more. We can also refer to the mean, which is 371.9173, this is the center or balancing point of all the data. By looking at the summary statistics we can see that the rst quartile is at 370.04. The rst quartile lets us know where the data is divided, at 370.04 grams that at most 25% of the coke zero cans weighted under 370.04 grams and at most 75% of the coke zero cans weighted above 370.04 grams. The third quartile tells us that at most 75% of the data is under 373.6 grams and at most 25% of the cans weight above the 373.6 grams. To get a better understanding of how the coke zero sample data is spread out or the variance of the sample data we can calculate standard deviation . The calculated standard deviation is 1.974771. It is important to know how the weights of the individual coke zero cans are dispersed. The amount of dispersion for coke zero is small but a little more than the weight of diet pepsi, which is good because the soda cans are not overlled or under lled. Figure 6: TC- Stats also generated a histogram and frequency table for the Diet Coke sample data. The frequency table is constructed (gure 7). The class widths of the frequencies used to determine the groupings is by .93. This class width was chosen because its common sense to start at the lowest value and count by .93 to make the histogram correspond to the box plot display. Figure 8 helps us see the frequencies the weight of diet coke were observed.

Figure 7: A histogram was also constructed below (gure 8) for the diet coke sample data. This histogram looks fairly bimodal as mentioned in the box plot display in gure 2. Knowing that the data for diet coke is skewed left we can calculate the median and mean. The actual range for the diet coke data is 7.43 grams spread out. The median for coke zero is 373.19 grams. That value is telling us that half of the sample cans weighted less than 373.19 and the other half weighted 373.19 or more. We can also refer to the mean, which is 373.19, this is the center or balancing point of all the data. By looking a the summary statistics we can see that the rst quartile is at 371.13. The rst quartile lets us know where the data is divided, at 373.19 grams that at most 25% of the coke zero cans weighted under 373.19 grams and at most 75% of the coke zero cans weighted above 370.04 grams. The third quartile tells us that at most 75% of the data is under 374.99 grams and at most 25% of the cans weight above the 374.99 grams. To get a better understanding of how the diet coke sample data is spread out or the variance of the sample data we can calculate standard deviation . The calculated standard deviation is 2.180016. It is important to know how the weights of the individual diet coke cans are dispersed. The amount of dispersion for diet coke is more than the other two sample datas, which tells us that the soda cans for diet coke may be slightly overlled. It is okay for consumers to get the most of the soda but for the Coca Cola company this means they might be waisting their resources. Figure 8: In these designs there is possible error due to chance because the way the cans where being measured. The electric scale was placed on top of the table and the table was unstable. People where moving around the table and it could have affected the measurements of the the cans.

You might also like