You are on page 1of 14

Boxed Vs.

Bottled: A Wine Tasting Experiment


By
Blake Hilgemann

1. Introduction and Problem Statement


In the wine industry, there is a vast disparity in prices and perceived quality in products. Just
walk into a supermarket today and one will find a large array of choices. Not only are there
choices in bottled wine, but there are also many choices from a different package: boxed wine.
Because wine as a choice of adult beverage is often thought of as a status symbol, boxed wine
has long been at the low end of the scale.

This dynamic has brought a number of questions to mind. Does boxed wine have a noticeably
different quality than that of bottled wine? For the great difference in price that a sophisticated
wine drinker could pay for the equivalent volume of wine, is it really, in general, worth the
“investment” in a relatively expensive bottle versus a boxed wine that costs, say, a fifth of the
price by volume? Is it worth the expenditure even for an inexpensive bottle of wine versus a still
less expensive box? Are there inherent quality differences between boxed wines which sell at a
higher price tag and those that are the lowest price of all? It all boils down to this: where does
the best value lie when determining whether to buy boxed or bottled wine? Or are there good
and bad wines at all price points? Can people tell the difference between boxed and bottled
wine, or is it just the method of packaging that makes it a “less sophisticated” wine?

The subsequent screening experiment was devised to make some initial but definitive
conclusions on these matters.

2. Selection of Response Variable


Wine rating will be the response variable for this experiment. As the Wine Rating Sheet in the
appendix shows, the wine rating is composed of four categories: aroma, body, taste, and finish.
Each category is given a scale of 1-5 for simplicity, however a weight (multiplier) of one is given
to aroma and body while a weight of two is given to taste and finish. Therefore, the range on
wine rating is 6-30.

3. Choice of Factors, Levels, and Range


For this experiment, three design factors and one held-constant factor were chosen. One
allowed-to-vary factor and four nuisance factors were identified. The factors are broken down as
follows:
Design factors
Type of grape – cabernet and merlot
Type of packaging – boxed and bottled
Price range – “low” (under $5/750mL) and “high” ($12-$20/750mL)
Held-constant factors
Growing region – California, USA

Allowed-to-vary factors
The wines may come from separate distributors, which allows for different storage
procedures. This is assumed to have negligible impact on the outcome of this
experiment.
Nuisance factors
Year – year can have a great effect on wine quality, but it is assumed that using year as a
design factor will make the samples (boxes/bottles) rather hard acquire for this
experiment.
Winery – though all wines will come from California, they will be grown by different
wineries. Again, this could affect wine quality, but we are looking at price as a factor,
and price is assumed to be an indicator of quality. It may be a hypothesis for further
experimentation that one should not be worried about price, but rather by the wine’s
maker (or instead by the year and region it was grown).
Time – because the experiment will be a taste test, a taster’s opinion of wine may change
(positively or negatively) as the test goes on. Time correlates with the amount of alcohol
the taster has consumed, so it will not be considered a separate factor.
Tasters – ratings from individual tasters is potentially an extremely large source of
variation. Obviously, each taster much be treated as an individual block, and a replicate
of all factor combinations must be evaluated by each taster to account for this variability.

4. Choice of Experimental Design


In this wine tasting experiment, there are 3 factors with 2 levels each. Therefore, a 23 factorial
design is well-suited. There must be 8 runs per replication. Additionally, there are two nuisance
variables that must be blocked: individual tasters and drink order (i.e. time). These two
randomization restrictions call for a Latin Square design. Therefore, I will run an 8 x 8 Latin
Square.

To visualize the design, I have assigned low and high levels to each factor below.
Package: (-) Box, (+) Bottle
Price: (-) Low, (+) High
Type: (-) Cabernet Sauvignon, (-) Merlot

Then the coded treatment combinations follow a binary pattern:


0: --- (Box, Low, Cab)
1: --+ (Box, Low, Merlot)
2: -+- (Box, High, Cab)
3: -++ (Box, High, Merlot)
4: +--
5: +-+
6: ++-
7: +++

Now a Latin Square table can be created to ensure randomization. Table 4-1 is the random Latin
Square chosen for the experiment. Blocking is performed for drink order and each taster to
minimize these nuisance factors.
Drink Taster
Order 1 2 3 4 5 6 7 8
1 7 5 3 4 2 0 1 6
2 5 4 1 6 0 2 7 3
3 3 7 6 0 5 1 2 4
4 1 3 2 5 4 6 0 7
5 6 1 0 7 3 4 5 2
6 0 2 7 1 6 3 4 5
7 2 0 4 3 7 5 6 1
8 4 6 5 2 1 7 3 0
Table 4-1: 8x8 Latin Square design showing treatment combinations for each taster and
drink order

Finally, notice that the binary treatment combinations are ordered. That is, the first 4 numbers
are the boxes, and the last 2 numbers are the highest priced bottles. In an attempt to eliminate
the possibility of a taster’s conscious or unconscious association of a numbered wine with a
certain rating (e.g. wine #7 must be the highest rated wine), I randomly assigned a planet’s name
to each of the combinations. These codes were only used for the purposes of the rating sheet that
each taster received, and were used to eliminate bias during the blind taste test. This coding is
shown in Table 4-2 on the following page. I have also included a column that gives the actual
wines chosen for each treatment combination.
Table 4-2: Planet code structure and wines chosen for each treatment combination

5. Carrying Out the Experiment


The experiment was performed on April 24, 2009 in Sedona, AZ. Eight wine enthusiasts were
selected, including my in-laws and six of their friends (some of whom have been drinking wine
for decades longer than I have been alive). To give an idea of the taster’s experience level with
wine and hopefully the development of their palates as well, two simple questions were asked of
them:
 What is the average price (retail) of a bottle of wine that you drink?
 About how much do you spend on wine annually (retail)?
The mean answer for the first question was $17.20 with a range of $7-$30. The mean answer for
the second question was $706 with a range of $250-$1200. This at least proved to me that my
tasting group consisted of mostly those who knew their way around wine, although the range in
answers suggests a variety of tastes in wine.

Each round of one ounce samples was served simultaneously, and the sampling glasses were
cleared of residue by rinsing with water between samples. Wine crackers were used between
rounds to cleanse the palate, and each taster had a glass of drinking water. The wines were
served from pitchers marked only with the planet’s name they were coded with. Because of the
nature of the Latin Square, each round consisted of one sample of each factor combination. This
proved to be tedious, but was necessary to collect the best set of data possible.

6. Results of Statistical Analysis


The wine scores are shown in the Table 6-1 below.

Wine Taster
Type 1 2 3 4 5 6 7 8
0 13 8 12 21 7 14 12 15
1 14 11 12 11 12 9 14 11
2 9 16 13 13 22 16 18 17
3 17 13 19 15 12 14 13 15
4 7 9 15 9 8 8 16 6
5 8 27 15 18 10 14 9 18
6 23 27 20 15 20 11 22 27
7 10 26 21 18 18 16 25 6
Table 6-1: Wine rating for each taster and treatment combination

On first glance, there appears to be a wide variance in taste for some wines. JMP analysis shows,
however, that a meaningful conclusion can still be made.

I have included all of the two-factor and three-factor interactions as well as the two blocking
factors in the model that will be fit to the data. This was set up using the Custom Design option
in JMP, and can be seen in Figure 6-1.
Figure 6-1: JMP effects modeled

The JMP output is as follows. With an R2 of just 0.53, the actual by predicted plot and the
Summary of Fit (Figure 6-2 and Table 6-2) attest to the large variance in data.

Figure 6-2: Actual vs. Predicted plot


Table 6-2: Summary of fit for the given model

The ANOVA and model parameter estimates, given in Tables 6-3 through 6-5, conclude that
there is high significance in only one parameter: price. Package type was only significant at the
8% level, and the type of wine had no significance. None of the interaction terms were
significant, although the price-type and package-price-type interactions were significant at 8%
and 6% levels respectively. Interestingly, the block effects for Tasters and Drink Order (time)
were insignificant. The effect of the first drink, however, was nearly significant at the 5% level
suggesting that blocking was still worthwhile.

Tables 6-3 and 6-4: ANOVA and Parameter Estimates


Table 6-5: ANOVA breakdown

The two-factor interaction plots are shown in Figure 6-3 below. Again, the price-type interaction
shows low significance, but there is some cross-over there.

Figure 6-3: Two-factor interaction profiles


Finally, the residuals were analyzed to ensure that the assumptions made in the ANOVA model
were sufficiently validated. Figure 6-4 gives the JMP residuals vs. fitted values. The normal
probability plot of residuals and the residuals vs. drink order, taster, and treatment combination
were calculated and plotted in Excel and are shown in Figures 6-5 through 6-8. None of these
plots show abnormality, so the model assumptions have been validated and no data
transformation is necessary.

Figure 6-4: Residual vs. predicted values


Figure 6-5: Normal probability plot of residuals

Figure 6-6: Residuals vs. Tasters


Figure 6-7: Residuals vs. Drink Order

Figure 6-8: Residuals vs. Treatment Combination


7. Conclusions
This experiment has answered some of the questions posed in the problem statement (section 1).
Being a screening experiment, however, it has led to many other possibilities for optimization
and follow-up testing. It has identified that there is no great significance between the packaging
of a wine, so something can be said for buying boxed wines these days. There are definitely
“drinkable” boxed wines out there. On the other hand, a higher priced wine could be worth the
expense; whether it is a higher priced box or a higher priced bottle of wine. The type of wine
was not at all an issue in this experiment. These conclusions could allow a factor or two to be
dropped in further experimentation.

Although there was no significant interaction among the factors, the interaction profiles actually
provide a little more insight into the results. Figure 7-1 below is a copy of Figure 6-3. The left-
center plot shows that the low priced boxed and bottled wines had practically equal ratings and
the high priced bottled wine had greater ratings than the high priced boxed wine. This could lead
to further experimentation and optimization to understand precisely where the value lies since
the high priced bottle still cost more by volume than the high priced box.

Figure 7-1: Two-factor interaction profiles


Another observation from Figure 7-1 comes from the bottom-center plot. This plot shows that
low priced cabs may be worse than high priced cabs while the difference may be less significant
in the case of merlots. This could again lead to further experimentation to determine whether cab
drinkers must spend more money than merlot drinkers to get good value for their wine. It has
been said that good cabs are more difficult to make than good merlots, which gives this
observation some intuitive appeal.

The main takeaway from this experiment, unfortunately, is that a good wine does in general
come with a higher price tag. But do not give up hope because there is always a chance for a
“diamond in the rough” since this experiment was run with a very limited subset of the available
wines on the market. Obviously, there are bargain wines out there; and there is not a linear
relationship between the price you pay for wine and its total quality. This leads to the final idea
for experimentation brought about by my analysis: For a given price point, how great is the
range in quality? This experiment would be run with wine as a random factor so that wines
could be randomly sampled rather than requiring a test involving the entire population of wines.
Wine Scoring Sheet 1

What is the average price (retail) of a bottle of wine that you drink? $_____

About how much do you spend on wine annually (retail)? $_____________

You might also like