Professional Documents
Culture Documents
ABSTRACT
INTRODUCTION
Products
Altogether 12 cheese samples of the same variety (“Norvegia cheese”)
were evaluated by cheese expert assessors and profiled by a trained panel of
selected assessors. The cheese samples were produced at three different dairies
in Norway, they represented different production batches and had a 12-week
storage period. Based on the results from the expert assessors and the sensory
profiling, five cheeses were selected for consumer testing.
Consumer Testing
A consumer panel consisting of 110 consumers evaluated the five selected
cheeses. The consumers were recruited from local clubs and associations in the
community and were selected according to the following criteria: Eating
“Norvegia cheese” minimum twice a week, 25–55 years old and not employee
at Matforsk or the nearby Norwegian University of Life Sciences, Department
of Chemistry, Biotechnology and Food Science. The test was carried out in
sensory booths in the laboratory at Matforsk and the consumers arrived in
groups of 10. The consumers were presented with five coded samples of 200 g
cheese (blind test), which were equipped with a cheese slicer, and were
requested to remove one slice from the sample before tasting. Water was
served to rinse the mouth during the test. The consumers were given two
different questionnaires. The first questionnaire asked the consumers to score
the cheese samples for overall liking on a seven-point hedonic, numerical,
category scale anchored with “dislike extremely” and “like extremely” and
with a neutral center point of “neither like nor dislike.” After removing this
questionnaire the consumers got a new questionnaire and they were asked to
report the perception of flavor and texture for each of the cheese samples on
seven-point intensity scales (also numerical and categorical). For intensity
PERCEPTION OF CHEESE 337
rating of cheese flavor, the scale was anchored with “little flavor” to “much
flavor” with a center point of “neither little nor much flavor.” For intensity
rating of cheese texture the scale was anchored with “soft” to “firm” with a
center point of “neither soft nor firm.” No more information concerning the
products or the experiment was given during the tasting session. Serving order
was varied according to a cyclic design balanced for order and carry-over
effects (MacFie et al. 1989).
Statistical Methods
Analysis of Variance (ANOVA) was performed on the three data sets. The
models for expert data and descriptive data included main effect of product and
main effect of assessor, plus interaction effects between product and assessor.
The effects of products were considered fixed, while the effects of assessors
and the interaction effects were considered random (Næs and Langsrud 1998).
For consumer data we used an ANOVA model with main effects for product
and consumer. The interaction in this model was confounded with the error
term and not estimable due to the design. The analyses of the expert data and
the consumer data were performed using Minitab version 14. The statistical
analysis of the sensory descriptive data was performed using SAS version 8.2.
Reported P-values at 0.01 means a P-value equal to or less than 0.01.
Principal component analysis (PCA) and partial least squares regression
(PLSR) were performed using the Unscrambler statistical package (Camo,
version 8.0, Oslo, Norway). PCA was used to study the main sources of
systematic variation in the average sensory descriptive data. Furthermore,
PLSR were conducted to study the relationship between descriptive data from
selected assessors and scores from the expert assessors and the relationship
between descriptive data and hedonic liking from the consumers. The variables
were standardized and full cross-validation was applied. Correlation loading
plots were applied (Westad et al. 2003) with circles indicating 50 and 100%
explained variance, respectively. In the correlation loadings plots products
were included as dummy variables (passified in the data matrix) to improve the
visual interpretation (Martens and Martens 2001).
RESULTS
random effects of assessors were also significant at the 5% level for all three
parameters (P = 0.01). The interaction effect (product ¥ assessor) for flavor
was not significant (P = 0.74), interaction effect for consistency showed a
P-value of 0.05 and interaction effect for overall score showed a P-value of
0.01. These P-values indicated some differences between assessors with
respect to the scoring of perceived difference between samples for consistency
and overall score.
Figure 1 shows average scores for the 12 cheese samples with letters
indicating significantly different samples at the 5% level. The average scores
for flavor ranged from 3.0 to 4.0 (Fig. 1a), for consistency from 2.4 to 4.1
(Fig. 1b) and overall score ranged from 2.6 to 3.8 (Fig. 1c). Figure 1a shows
that sample number 1 got the lowest average score for flavor (3.0), close
behind sample number 3 (3.0). Sample number 5 got the highest average score
for flavor (4.0). Figure 1b shows that sample number 3 got the lowest average
score for consistency (2.4) while sample numbers 4 got the highest average
score for consistency (4.1). Figure 1c shows that sample number 3 got the
lowest average overall score (2.6) while sample numbers 4 and 10 got the
highest average overall scores (3.8 for both samples). The raw data (not
shown) displayed that the assessors used a relatively small part of the scale –
the individual scores given ranged from 2.0 to 4.5.
The expert assessors performance was evaluated individually by use of
graphical procedures (Lea et al. 1995). The plots demonstrated that the MSE-
values, that is, the repeatability for the expert assessors, varied to some extent.
One of the five expert assessors had P = 0.47 for flavor and P = 0.19 for overall
score (based on individual F-values). The assessor with best performance had
the following P-values: flavor P = 0.02, consistency P = 0.01 and overall score
P = 0.01.
Chosen samples for consumer testing were 3, 4, 7, 10 and 11. Average
scores from the expert assessors and reported defect terms for these samples
are given in Table 1. The table shows that the expert assessors reported defects
for all samples although overall score ranged from 2.6 to 3.8. The total number
of terms given was considerably higher for sample 3 (overall score 2.6) than
for the other cheese samples.
(a)
5
4.5 a a a a a a
4 a a
b b
3.5 b b
3
2.5
2
1.5
1
0.5
0
Flavor
(b)
5
4.5 a
(ab)
4 b d bc ab (bc) ( bc )
3.5 cd
3
2.5
2
1.5
1
0.5
0
Consistency
(c)
5
4.5 a abc
4 a
ab bcd ( abc ) (abc)
d
3.5
cd
3
2.5
2
1.5
1
0.5
0
Overall score
1 2 3 4 5 6 7 8 9 10 11 12
FIG. 1. AVERAGE SCORES FOR (a) FLAVOR (ODOR AND TASTE), (b) CONSISTENCY
(BODY AND TEXTURE) AND (c) OVERALL SCORE GIVEN BY EXPERT ASSESSORS
Different letters means different ratings at the 5% level of significance. n = 12 samples.
340
TABLE 1.
AVERAGE SCORES AND DEFECT TERMS FROM EXPERT ASSESSORS
Sample Flavor (odor Defect terms, flavor Consistency Defect terms, Consistency Overall Total number of
and taste) (body and texture) score defect terms
Samples selected for consumer testing, flavor, consistency and overall score.
PERCEPTION OF CHEESE 341
Saltiness
0.8
Odour int.
0.6
–0.8
–1
–1 –0.8 –0.6 –0.4 –0.2 0 0.2 0.4 0.6 0.8 1
attributes along PC1, while PC2 mainly was described by variation in saltiness
and bitter flavor. When interpreting the results from the descriptive profiling, it
is important to note that the cheese samples were quite equal, possible differ-
ences consisted of minor deviation from specified quality.
The performance of the descriptive panel was evaluated by use of graphi-
cal procedures, especially the “egg-shell plot” (Lea et al. 1995). This plot
demonstrated the variation in agreement between selected assessors for dif-
ferent attributes. The plots revealed a relatively large variation in some of the
odor and flavor attributes, that is, odor intensity, flavor intensity, mature odor
and mature flavor. The reason was probably that the samples were perceived as
quite similar for these attributes and possibly that the assessors were not
correspondingly calibrated. This variation also caused relatively high MSE-
values and high P-values for some attributes and some individual assessors.
This weakness was to a certain extent compensated with the relatively high
number of selected assessors in the trained panel.
342 M. HERSLETH ET AL.
Samples selected for the consumer test were 3, 4, 7, 11 and 10 and Fig. 2
shows that these samples represented the variation among the cheese samples.
Cheese number 3 was characterized with a relative high degree of fermented
flavor, acidity, flavor intensity and odor intensity. Additionally this sample had
a relatively high degree of doughiness and solubility. Cheese numbers 7 and 11
were on the other hand characterized with a relative high degree of mature
odor and mature flavor. Besides, these samples were firmer, dryer, more grainy
and more elastic than the other samples presented for the consumers.
Consumer Testing
The result from ANOVA showed that effect of sample on reported
hedonic liking was close to significance at the 5% level (P = 0.07). The
cheeses got very similar average rating, ranging from 4.4 (sample number 11)
to 4.8 (samples number 3). It was interesting to note that cheese number 3,
which the dairy experts gave the lowest score, was the cheese that got the
highest average score from the consumers. This will further be commented on
when discussing the results from descriptive profiling (Figs. 3 and 4).
Figure 5 shows the average ratings for intensity of (a) flavor and (b)
texture given by the consumers. The consumers were able to distinguish
between samples for intensity of flavor (P = 0.01) and texture (P = 0.01), and
the effect of consumer was also significant at the 5% level in both analyses
(P = 0.01). Sample 3 was perceived to have a significantly higher intensity of
flavor than the other samples. Sample 3 was also perceived as the softest
sample and sample 11 as the firmest sample. These results from the consumer
test go well together with the results from the quality scoring and the descrip-
tive testing, and show that the consumers perceived the cheeses as different
with regard to flavor and texture properties.
0.8
Cons
C onsis
iste
tenc
ncy
0.6
4
Over
Overall
ll sco
score
re
0.4 Bitt
Bi tter
er flflav
avou
our
10 8
0.2 6 7
Afterr flflav
Afte avou
our
Doughine
Do ughiness
ss 9
Flav
Fl avou
our
Grai
Grainine
ninessss
0 Dryn
Dr ynes
ess
Solu
So lubi
bilility
ty 1 12 Firm
Fi rmn.n. ch
chewing
ewing
2 Elas
El astitisi
sity
ty Matu
Ma ture
re odour
odou
–0.2 Ferm
Fe rmen
ente
ted
d fl. Odou
Odourr in
int.
t. Matu
Ma ture
re flav
flavou
our
5 Firm
Fi rmn.
n.cu
cutti
tting
ng
Firm
Fi rmne
ness
ss
Flav
Fl avou
ourr in
int.
t.
–0.4 Acidit
Acid ity
3 11
–0.6
–0.8
Saltltin
Sa ines
ess
–1
–1 –0.8 –0.6 –0.4 –0.2 0 0.2 0.4 0.6 0.8 1
was also relatively high in this case; the two first significant components
described 54% of the variation in Y. Figure 4 shows an even distribution of
samples and consumers; for example, some consumers rated sample number
11 highest while some rated sample number 3 highest. This distribution in
hedonic liking illustrates the reason why we found only minor differences in
average hedonic liking of the cheese samples.
DISCUSSION
The expert assessors differed significantly between samples for all the
three quality parameters scored. The random effects of expert assessors were
significant, which demonstrated a different use of the scale. Additionally, we
344 M. HERSLETH ET AL.
1 56
83
69
15
32
58
0.8 77
4
60
72
86
0.6 81
57
88 Bitt
Bitter
er flavour
flavour
44 68
7
34 22
14
97
1 26 42 10 a
55
87
92
64
0.4
82 45 40
93 51
20
2 30 99
110
94 Afterr fl
Afte flav
avou
our
0.2 Mature flflav
avou
our 7 66
47 95
16
21
29 59
Gr
Grai
78
ainine
niness ss
12
90 10 37 105
96
Dough
Doughin
ines
ess
0 DrDryn ynesess
71
109 5 28 85
107
103
Firm
Fi rmn.
n. chewin
chewing 63 100
101
Solu
Solubi
bilility
ty
87 b
108
74
Mature odour 106
4
48 49 33
–0.2 36 13
38 91
Elasti
El astisisity
ty 9 8
89
6
104
79
67
Firm
Fi rmn.
n. cutti
cutting
ng 80
41 35 98
39
–0.4 Firm
Firmne nessss 2762
73
46
53
18
3
3 Fermen
Fermente
ted
d fl.
fl
25 54
17 Odour in
int.
t.
–0.6 11
19
11
61
Flavour int.
23
84 52 50
Ac
Acidity
70
–0.8 31
24
65
76 7543
Saltin
Saltines
ess
–1
–1 –0.8 –0.6 –0.4 –0.2 0 0.2 0.4 0.6 0.8 1
observed interaction effects between expert assessors and products for consis-
tency and overall score. However, in view of the relatively small number of
expert assessors, we may conclude that this panel performed a consistent
scoring of presented cheeses. When looking at reported terms for flavor and
consistency in Table 1, the impression may be that all the cheeses had similar
degree of defects. However, the total number of terms shows that sample
number 3 had considerably more defects than the other cheeses, which prob-
ably caused the relatively low scores.
It is important to point out that the principles in sensory evaluation of
cheese (expert assessments) using integrated parameters like “flavor,” “con-
sistency” and “overall score” and in addition reporting defect terms, is very
different from descriptive profiling. It is obvious that scores given from the
PERCEPTION OF CHEESE 345
(a)
7
6 a
5 b b b
b
4
3
2
1
Flavor intensity
(b)
7
a
6 b b
5 c
4
d
3
2
1
Texture (soft – firm)
3 4 7 10 11
expert assessors do not describe the attributes in cheeses that can be relevant in
a product development project. Current dairy technology is often aimed at
development of alternative ingredients and processes to achieve desired nutri-
tional profiles. New ingredients may give rise to unexpected flavors and
textural changes during storage, which do not correspond to existing specifi-
cations, the basis for the expert assessments. For example, a newly developed
low fat cheese may be an excellent example of low fat cheese, and yet receive
low scores in traditional dairy scoring systems. Therefore, descriptive profiling
346 M. HERSLETH ET AL.
those aged 21 years and over preferred the cheeses considered by graders to be
second quality. These cheeses were assigned lower scores because of various
flavor defects and tended to have stronger flavors. McBridge and Hall (1979)
summed up their study like this: “Thus it is impossible to deduce if, for
example, the adult consumers liked the cheese graded 86 points specifically
because of its ‘unclean, fermented, bitter’ character, or simply because it had
more flavor.” Another study which also demonstrates the importance of
looking for different clusters of consumers within a sampled population was
performed by Lawlor and Delahunty (2000). This study identified different
sensory segments with regard to preference for each of the speciality cheeses
investigated. Examination of overall preference alone would have missed this
important point.
An important principle for the food producers in quality control is not to
tolerate too much product variation, but to keep the limits of tolerance given in
the specification. In the dairy industry the critical expert assessors probably
want to be on the conservative side in demanding the highest quality obtain-
able. Possibly they also evaluate products based on anticipated shelf-life or
aging potential. However, this study shows that mild levels of sensory defects
in dairy products may not always be objectionable to consumers. It is impor-
tant for a food producer to regularly compare the approved sensory specifica-
tions with consumer preference data. If a food company identifies changes in
the consumer target group or formation of new segments within this group, this
may give new possibilities for new market strategies and increased sale.
It is important to do more research on methods for obtaining sensory
specifications with consumer input. An interesting article which investigated
how large the intensity of a sensory defect could be before a consumer rejected
the product was recently published by Hough et al. (2004). This article pre-
sented the use of “survival analysis statistic” as a tool to answer this question.
It is, however, likely that the food industry performs many studies where the
aim is to identify consumer tolerance limits for food products without pub-
lishing the results. We encourage product developers and sensory scientists
from the industry more often to publish their results, to increase the knowledge
about this topic in the area of food science.
ACKNOWLEDGMENTS
REFERENCES