Professional Documents
Culture Documents
Correspondence
steven.maere@psb.vib-ugent.be (S.M.),
kevin.verstrepen@biw.vib-kuleuven.be
(K.J.V.)
In Brief
The history and domestication of yeast
used for making beer and other types of
alcohol are revealed through genomic
and phenotypic analyses.
Highlights
d We sequenced and phenotyped 157 S. cerevisiae yeasts
Belgium
2Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
3Department of Plant Systems Biology, VIB, 9052 Gent, Belgium
4Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
5White Labs, 9495 Candida Street, San Diego, CA 92126, USA
6Synthetic Genomics, 11149 North Torrey Pines Road, La Jolla, CA 92037, USA
7Department of Microbiology and Immunology, Rega Institute, KU Leuven, 3000 Leuven, Belgium
8Encinitas Brewing Science, 141 Rodney Avenue, Encinitas, CA 92024, USA
9Illumina, 5200 Illumina Way, San Diego, CA 92122, USA
10Biological & Popular Culture (BioPop), 2205 Faraday Avenue, Suite E, Carlsbad, CA 92008, USA
11Co-first author
12Lead Contact
SUMMARY due to the presence of ethanol (Michel et al., 1992; Steensels and
Verstrepen, 2014). Whereas the use of pure cultures started well
Whereas domestication of livestock, pets, and crops after the pioneering work of Pasteur and Hansen in the 19th cen-
is well documented, it is still unclear to what extent tury, early brewers, winemakers, and bakers had already learned
microbes associated with the production of food that inoculating unfermented foods with a small portion of fer-
have also undergone human selection and where mented product resulted in fast and more predictable fermenta-
the plethora of industrial strains originates from. tions. This so-called ‘‘backslopping’’ might have resulted in
yeast lineages that grew continuously in these man-made envi-
Here, we present the genomes and phenomes of
ronments and lost contact with their natural niches, providing a
157 industrial Saccharomyces cerevisiae yeasts. perfect setting for domestication. However, strong evidence
Our analyses reveal that today’s industrial yeasts for this hypothesis is still missing and it remains unclear whether
can be divided into five sublineages that are geneti- industrial yeast diversity is shaped by selection and niche
cally and phenotypically separated from wild strains adaptation (domestication) or neutral divergence caused by
and originate from only a few ancestors through geographic isolation and limited dispersal (Goddard and Greig,
complex patterns of domestication and local diver- 2015; Warringer et al., 2011).
gence. Large-scale phenotyping and genome anal- Domestication is defined as human selection and breeding of
ysis further show strong industry-specific selection wild species to obtain cultivated variants that thrive in man-made
for stress tolerance, sugar utilization, and flavor pro- environments, but behave suboptimally in nature. Typical signs
duction, while the sexual cycle and other phenotypes of domestication, including genome decay, polyploidy, chromo-
somal rearrangements, gene duplications, and phenotypes re-
related to survival in nature show decay, particularly
sulting from human-driven selection, have been reported in
in beer yeasts. Together, these results shed light on crops, livestock, and pets (Driscoll et al., 2009; Purugganan
the origins, evolutionary history, and phenotypic di- and Fuller, 2009). Several studies have recently investigated
versity of industrial yeasts and provide a resource the S. cerevisiae population by sequencing the genomes of hun-
for further selection of superior strains. dreds of different strains, providing a first glimpse of the complex
evolution of this species (Almeida et al., 2015; Borneman et al.,
INTRODUCTION 2011, 2016; Liti et al., 2009; Magwene et al., 2011; Schacherer
et al., 2009; Strope et al., 2015). However, most of these studies
Since prehistoric times, humans have exploited the capacity of focused primarily on yeasts from wild and clinical habitats and
the common baker’s yeast Saccharomyces cerevisiae to convert often include only a limited set of industrial strains, mainly origi-
sugars into ethanol and desirable flavor compounds to obtain nating from wine. Moreover, most studies use haploid deriva-
foods and beverages with a prolonged shelf-life, enriched tives instead of natural strains and can therefore not explore
sensorial palate, improved digestibility, and an euphoriant effect typical patterns of domestication like polyploidy, aneuploidy,
Cell 166, 1397–1410, September 8, 2016 ª 2016 The Author(s). Published by Elsevier Inc. 1397
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
and heterozygosity. The use of haploids also excludes a large exclusively produced by strains of the genetically related
fraction of industrial S. cerevisiae strains that have lost the ability Saccharomyces pastorianus. After de novo assembly of each
to sporulate, such as the vast majority of beer yeasts. Neverthe- of the genomes, we inferred a maximum-likelihood phylogenetic
less, some studies already revealed signs of domestication in tree based on codon alignments for 2,020 concatenated single-
wine strains, such as an increased resistance to copper copy nuclear genes shared by each of the 157 isolates and
(present in grapevine pesticides) and sulfite (used as a preserva- the outgroup species Saccharomyces paradoxus (Figure S1A).
tive in wine) (Pérez-Ortı́n et al., 2002; Warringer et al., 2011). An Additionally, we included a representative set of 24 previously
in-depth investigation of strains originating from other industrial sequenced strains belonging to the main established lineages
niches is still lacking. of the S. cerevisiae phylogeny (Liti et al., 2009; Strope et al.,
Here, we describe the high-quality sequencing, de novo 2015), extending the number of strains to 181 (Figure 1A). Trees
assembly, annotation, and extensive phenotyping of 157 constructed from the original and extended datasets are
S. cerevisiae strains used for the industrial production of beer, congruent and show five main lineages that contain the majority
wine, bread, spirits, saké, and bioethanol, in their natural ploidy. of industrial yeasts: Wine (bootstrap support 100%), Beer 1
Our data reveal that industrial yeasts are genetically and pheno- (86%), Beer 2 (56%), Asia (100%), and a Mixed lineage (99%)
typically distinct from wild strains and stem from only a limited containing yeasts used in different industries. Three of these lin-
set of ancestral strains that have been adapting to man-made eages (Beer 1, Beer 2, and Mixed) were not previously described.
environments. They further diversified into five clades: one Next, we studied the population structure in a filtered set of
including Asian strains such as saké yeasts, one mostly contain- 53,929 polymorphic sites accounting for 2,454,052 SNPs across
ing wine yeasts, a mixed clade that contains bread and other all strains, using the Bayesian model-based clustering approach
yeasts, and two separate families of beer yeasts. While most implemented in fastStructure (Raj et al., 2014) (Figures 1B and
clades lack strong geographical substructure, one of the beer S1B). This analysis yields a population structure that is highly
clades contains geographically isolated subgroups of strains consistent with the major lineages defined in the phylogeny
used in continental Europe (Belgium/Germany), the United and identifies mosaicism in 17% of the strains (in which the esti-
Kingdom, and a recent sublineage of United States beer yeasts mated ancestry Q < 0.8 for K = 8 ancestral populations). The
that diverged from the British subclade during colonization. population structure is further supported by a principal compo-
Interestingly, these beer yeast lineages exhibit clear and pro- nent analysis (PCA) on the same SNP data (Figure 1C).
found hallmarks of domestication, more so than the other line- Further analysis of the phylogeny and population structure
ages. The shift from variable, complex, and often harsh environ- reveals that the evolutionary divergence of industrial yeasts is
ments encountered in nature to more stable and nutrient-rich shaped by both their industrial application and geographical
beer medium favored specialized adaptations in beer yeasts, origin. First, most yeasts cluster together according to the indus-
but also led to genome decay, aneuploidy, and loss of a func- try in which they are used and are clearly separated from the wild
tional sexual cycle. Specifically, we find evidence for active or clinical yeasts that have previously been sequenced. This was
human selection, demonstrated by convergent evolution for effi- further confirmed by constructing a larger phylogeny, based
cient fermentation of beer-specific carbon sources, mainly on nine genomic regions, that includes the vast majority of all
through mutations and duplications of the MAL (maltose) genes, sequenced S. cerevisiae strains, 450 isolates in total (Figure S1C;
as well as nonsense mutations in PAD1 and FDC1, which are Table S2). Wine and saké yeasts cluster in the previously identi-
involved in the production of 4-vinyl guaiacol (4-VG), an unde- fied Wine and Asia lineages (Liti et al., 2009). The majority of beer
sirable off-flavor in beer. Our results further suggest that beer yeasts (85.3%) are found in two main lineages (Beer 1 and Beer
yeast domestication was initiated hundreds of years ago, well 2) that are only distantly related. The Mixed clade harbors 7.8%
after the first reported beer production, but before the discovery of all beer strains (most of which are atypical beer yeasts that
of microbes. Together, our results reveal how today’s industrial are used for bottle refermentation of strong Belgian ales) and
yeasts are the outcome of centuries of human domestication contains all bread strains. Interestingly, spirit strains lack this
and provide a new resource for further selection and breeding clear phylogenetic relationship, as they are highly mosaic and
of superior variants. scattered throughout the tree, suggesting that these strains
might be the result of breeding by modern-day yeast companies
RESULTS that sell yeasts for spirits production. Moreover, because spirit
yeasts are typically not re-used after fermentation, they likely
Niche and Geography Drive Diversification had less opportunity to diverge into a separate clade.
To examine the evolutionary history of industrial yeasts, we Within and between the lineages, we also observed geograph-
sequenced the genomes of 157 S. cerevisiae isolates originating ical patterns. For example, most saké yeasts form a monophy-
from various sources in their natural ploidy to a median coverage letic group and cluster together with wild isolates and bioethanol
of 1353 (min = 263, max = 4033) (for details on data analysis, strains from China, while South American bioethanol strains are
see STAR Methods). This collection includes 102 industrial closely related to strains used to produce cachaça, a Brazilian
beer strains, 19 wine strains, 11 spirit strains, 7 saké strains, 7 sugarcane spirit. Moreover, the Beer 1 clade consists of three
strains isolated from spontaneous fermentations, 5 bioethanol separate subpopulations, each reflecting geographically distinct
strains, 4 bread strains, and 2 laboratory strains (Table S1). Inter- groups: Belgium/Germany, Britain, and the United States. The
estingly, ten of these S. cerevisiae beer strains are used for com- absence of genetic admixture among these subpopulations indi-
mercial production of lager beers, which were believed to be cates that these strains diverged allopatrically after the initial split
LA YC3467
BE003
BE039
BE063
BE040
BE080
BE011
BE021
BE085
BE059
BE027
BE0321
WI01 4
BE09 2
BE00 9
Origin
BE0692
NC 2
BE00
BE0 86
BE0013
BE0002
BE 034
W 18
yjmP010 50
BE 083
yjmI016
W
BE 084
S 14
BE 004
Beer A
W m6 20
SP 06
W L0 8 2
SP I002 5
yj 3
W L00 01
B I00 0
Wine
L L0 3
0
B 03 4
32 574 6
N A0 02
BE 01 4
YC 3 6
BE I01 07
C C 2
N C 01
Spirits
N CY YC3
65
W I0 18
W I0
Saké
W
NC W 5
YC WI I01 00 03
As
Wild 3 0 0
W 44 06 SA A0 04
NC 7 S A 0 07
ia
I 0
Bio-ethanol YC WI0 17 S A0 01
3 0 S A0 06
B 56 3 S A0 3
Bread NC SAE0208 S I00
e
YC 00 B I001
Win
NA
S.paradoxus yjm I015
12 NCY1273
WI0144 yjm C3448
SP01 1 NCY1342
Lineage WI0041 yjm 3449
WI005 NCYC
BE088 WL004
WI002
Beer 1 BE024 yjm1418
M
WI009 yjm1447
Britain SP002 spar
WI013 BE054
US BE033 BE049
WI001 BE101
SP007 BE037
Belgium/Germany SP008 BE048
WI008 BE03
Mixed 6 BE01 6
BE00 1 BE0 0
SP0029 BE0 79
Wine BE0 28 BE0 89
E
B 05 0 BE 64
Beer 2 BE0009 BE 100
SP 038 BE 047
BE 003 WI 009
West Africa (WA) SP 061 B 012
B E07
Mix
tain
BE 023 4 B E05 0
Asia BE 0 0 2 BEE05 3
BRR00 5 B 0 6
ed
B E0 55
Bri
North America (NA) B E02 0 1 B E0 69
B R0 05 B E0 01
B L0 07 BE E0 99
Malaysia (M) W L0 006 04 50
B E0 0 3
B E0 066
W L 5
E0 94
B E 0 72
B E 60
B R0
W
BE E09 7 4
B E0 35
90
B
B 01 3
B 0 7
B E04 7
BEE05 9
Mosaic
BEE00 6
B E01 2
BE 0418
B E10 8
B 09
BE 012
BEE007
BE 073
B 071
BE 087
BE 067
BE 015
BE 044
BE0043
BE 068
BE0 81
BE 82
SP0016
BE0 5
BE02 5
BE0651
BE0226
BE0 6
BE09
BE077
BE097
BE076
BE095
BE075
BE052
BE078
BE042
BE031 BE058
1
er
Be
Be
lgiu
m/G US
erma
ny
0.005
B C
1.00
0.3
Britain
0.50 US
K=2 Belgium/Germany
Estimated ancestry (Q)
0.00 Beer 2
1.00 0.2 Mixed
Wine
Asia
PC2 (9.9%)
0.50
K=4 Mosaic
NS
0.00
1.00
0.1
0.50
K=6
0.00 0.0
1.00
0.50
K=8
0.00
-0.1
(Figure 1B). Moreover, the high nucleotide diversity within each incidence of polyploidy and aneuploidy (R2 0.14, p < 0.001;
of the Beer 1 sublineages exceeds that within the Wine popula- average genome content of 3.52, SD = 0.67, Figures 2A, 2F,
tion, suggesting that the split did not happen recently (Table 1). and 2G), which is linked to extensive chromosomal loss and
Compared to Beer 1, Beer 2 is more closely related to the general genome instability (Sheltzer et al., 2011).
Wine lineage and includes 20.6% of all brewing strains. Howev- CNVs are not uniformly spread across the genome. Consid-
er, in contrast to the Beer 1 group, the Beer 2 lineage lacks ering subtelomere lengths of 33 kb (Brown et al., 2010), on
geographic structure and contains yeasts originating from average 39.7% of subtelomeric nucleotide positions are affected
Belgium, the United Kingdom, the United States, Germany, by CNV events compared to 9.54% of non-subtelomeric nucle-
and Eastern Europe. The presence of two major genetically otide positions (4.1-fold difference, Wilcoxon signed-rank test,
distinct sources of beer yeasts hints toward two independent p < 0.001). However, not all subtelomeres are equally prone to
European domestication events, one of which is at the origin of CNV: most variability is detected in ChrI, ChrVII, ChrVIII, ChrIX,
both the Wine and Beer 2 clade. ChrX, ChrXII, ChrXV, and ChrXVI (Figure 2A). Gene ontology
(GO) enrichment analysis reveals that genes involved in nitrogen
Remarkable Structural Variation in Beer Yeasts and carbon metabolism, ion transport, and flocculation are
Variation in genome structure, such as polyploidy, aneuploidy, most heavily influenced by CNVs (Table S3), which is in line
large segmental duplications, and copy-number variations with previous results (Bergström et al., 2014; Dunn et al.,
(CNVs), have repeatedly been found in association with domes- 2012). Interestingly, some CNVs seem linked to specific environ-
tication and adaptation to specific niches in experimentally ments (Table S4), suggesting that CNVs may underlie niche
evolved microbes (Bergström et al., 2014; Borneman et al., adaptation. For example, many genes involved in uptake and
2011; Dunham et al., 2002; Dunn et al., 2012; Pavelka et al., breakdown of maltose (present in saké medium, main carbon
2010; Rancati et al., 2008; Selmecki et al., 2009; Voordeckers source in beer, but absent from grape must) are amplified in
et al., 2015) and in association with domestication of higher beer and saké-related subpopulations, while they are often lost
organisms (Purugganan and Fuller, 2009). in strains from the Wine subpopulation (false discovery rate
Sequencing the yeast strains in their natural ploidy allowed [FDR] q value < 0.001).
analysis of gross chromosomal rearrangements and aneu-
ploidies (Figure 2A). We detected a staggering 15,288 deletion Relaxed Selection on Sex and Survival in Nature
and amplification events across all strains, covering on average Apart from selection for industrial traits, domestication is also
1.57 Mb per strain. The size of the regions ranges from complete characterized by relaxed selection and potential loss of costly
chromosomes (resulting in aneuploidies) to small local variations traits that are not beneficial in the man-made environment (Dris-
of a few kilobases (kb), all of which we will refer to as ‘‘CNVs.’’ coll et al., 2009). In order to chart the phenome of our collection
The extent of deletions significantly exceeds that of amplifica- and investigate signs of selection for some traits and loss of
tions, respectively 1.07 Mb and 0.50 Mb on average per strain others, 82 phenotypes, such as aroma production, sporulation
(2.15-fold difference, Wilcoxon signed rank test, p < 0.001). We characteristics, and tolerance to osmolytes, acids, ethanol,
observed significant variation among strains originating from and low and high temperatures, were measured in all strains (Fig-
different industries in the total frequency of CNV events (ANOVA ures 3A and S3; Table S5). Hierarchical clustering of the pheno-
F test, p < 0.001) and the fraction of the genome affected types resolves the main phylogenetic lineages and reveals a
(ANOVA F-test, p < 0.001) (Figure S2). Pairwise comparisons of moderate correlation between genotype and phenotype dis-
subpopulations and industries show no significant differences tances between strains (Spearman correlation 0.33), which is
in the load of amplifications between strains from different indus- further increased (Spearman correlation 0.36) when mosaic
tries or subpopulations, but we detected significant differences strains, for which genetic distance has no straightforward evolu-
in the load of deletions between strains from the wine (median = tionary interpretation, are omitted (Figure 3A). Moreover, the
0.51 Mb) and beer (median = 0.94 Mb) industry (Tukey honest clustering splits the collection into two main phenotypic sub-
significant difference [HSD], p < 0.05) (Figures 2B–2E). This groups: one largely overlapping with the Beer 1 clade that con-
high incidence of CNV in beer strains goes together with a high tains the majority of the Belgium/Germany, United States, and
Asia
BI001
SA001
SA006 Saké
SA003
SA005
SA007
Wild
SA004
WL002
WL003
Bioethanol
WL001
LA001
SP010
Bread
WI016
BE018
LA002
Laboratory
BE002
WI019 S.paradoxus
BE004
BE027
BE085
BE040 >= 2-fold amplification
Beer 2
BE011
BE039 Amplification
BE003 < 2-fold amplification
BE086
BE013
BE034
BE084
BE080
BE021 1 =< deletion < n
BE032 Deletion
BE062
SP006 complete deletion
SP004
BI005
BI002
BE014
BE030
WI014
WI007
WI018
WI010
WI017
WI003
WI006
SA002
Wine
WI011
WI015
BE020
WI004
SP011
WI005
WI009
BE088
BE024
WI001
SP002
WI013
BE033
SP008
SP007
WI008
BE006
SP001
BE028
BE029
BE005
Mixed
BR004
BE023
BE038
SP003
BE025
BR001
BR003
BR002
WL007
WL005
BE074
BE093
BE017
BE046
BE008
BE041
BE087
BE073
Bel/Ger
BE012
BE015
BE043
BE081
BE016
SP005
BE022
BE026
BE077
BE076
BE075
BE078
BE031
BE058
BE042
BE052
BE097
BE096
BE095
BE051
BE065
BE071
BE067
US
BE007
BE102
BE098
BE082
BE068
BE044
BE060
BE035
BE019
BE057
BE066
BE094
BE090
BE099
BE050
BE045
BE069
BE001
BE055
BE056
Britain
BE053
BE047
WI012
BE009
BE089
BE079
BE036
BE010
BE100
BE064
BE048
BE037
BE101
BE054
BE049
1 2 3 4 5
Estimated
ploidy (n)
B C
Beer Bioethanol Bread Saké Spirits Wild Wine Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
2.5 2.5
Amplifications (Mb)
Amplifications (Mb)
2 2
1.5 1.5
1 1
0.5 0.5
0 0
D E
Beer Bioethanol Bread Saké Spirits Wild Wine Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
Deletions (Mb)
Deletions (Mb)
6 6
4 4
2 2
0 0
F G Britain
Estimated ploidy (n)
Beer
Bioethanol US
4 4
Bread Bel/Ger
Lab Beer 2
3 Saké 3 Mixed
Spirits Wine
Wild Asia
2 Wine 2
R2 = 0.14, p < 0.001 R2 = 0.14, p < 0.001 Mosaic
0 2 4 6 8 0 2 4 6 8
Glycerol (20°C)
Melibiose (20°C)
Sorbitol (20°C)
Ethanol (20°C)
Galactose (20°C)
Nutrient stress
Fructose (20°C)
Sucrose (20°C)
Maltose (20°C)
Maltose (liquid)
Maltotriose (liquid)
Glucose (10°C)
Ethanol (10°C)
Fructose (10°C)
Sucrose (10°C)
Maltose (10°C)
Glucose (39°C)
Ethanol (39°C)
Fructose (39°C)
Sucrose (39°C)
Maltose (39°C)
Lysine
Ethyl propionate
Ehtyl butyrate
Ethyl hexanoate
Ethyl octanoate
Ethyl decanoate
Aroma
Propyl acetate
Isobutyl acetate
Ethyl acetate
Isoamyl acetate
Phenyl ethyl acetate
Phenyl ethanol
Butanol
Isoamyl alcohol
Isobutanol
POF
Acetaldehyde
Other
Flocculation
Ethanol production
Sporulation efficiency
Spore viability
Subpopulation
B C
Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
Ethanol production (% v/v)
15.0 100
Copper tolerance (au)
12.5 75
10.0 50
7.5 25
5.0 0
D E
Maltotriose fermentation (au)
Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
100 100
Sulfite tolerance (au)
75 75
50 50
25 25
0 0
(C) Growth of all strains from different subpopulations on medium supplemented with 0.075 mM copper, relative to growth on medium without copper.
(D) Growth of all strains from different subpopulations on medium supplemented with 2.25 mM sulfite, relative to growth on medium without sulfite.
(E) Growth of all strains from different subpopulations in medium containing 1% w v1 maltotriose as the sole carbon source, relative to growth on medium with
1% w/v1 glucose. au, arbitrary units; Bel/Ger, Belgium/Germany.
See also Figure S3 and Tables S5, S6, and S7.
50 50
25 25
0 0
C
I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI
1.00
Britain
0.50
0.00
1.00
US
0.50
0.00
1.00
Bel/Ger
Level of heterozygosity
0.50
0.00
1.00
Beer 2
0.50
0.00
1.00
Mixed
0.50
0.00
1.00
Wine
0.50
0.00
1.00
Asia
0.50
0.00
D E
80 R2 = 0.17, p < 0.001 80 R2 = 0.03, p = 0.022
Nr. of heterozygous SNPs (x103)
Nr of heterozygous SNPs (x103)
Britain Britain
60 US 60 US
Bel/Ger Bel/Ger
Beer 2 Beer 2
40 40
Mixed Mixed
Wine Wine
20 Asia 20 Asia
Mosaic Mosaic
0 0
0 25 50 75 100 0 25 50 75 100
Spore viability (%) Sporulation efficiency (%)
F G
Amplified/Deleted genome fraction (%)
confirmed by the average per-site nucleotide divergence (dxy), signed-rank test, p < 0.001) (Table S8). This suggests that the
which is significantly lower between Britain and the United States origin of the United States brewing strains can be traced back
(average dxy = 1.97 3 103) than between Belgium/Germany and to the introduction of beer culture in the United States by early
the United States strains (average dxy = 2.26 3 103) (Wilcoxon 17th century British settlers (Van Wieren, 1995). Third, in contrast
c.495_496insA
4-VG-
4-VG+
c.864delA
4-VG+
75% 75%
W102*
W497*
Q154*
R309*
Q86*
Y98*
K54*
BE049
BE054
50% 50%
BE101
BE037
BE048
BE064
BE100
25% 25%
BE010
BE036
BE079
BE089
BE009
WI012
0% 0%
Wine
Spirits
Saké
Wild
Bioethanol
Bread
Laboratory
Beer
Britain
US
Beer 2
Mixed
Wine
Asia
Mosaic
Bel/Ger
BE047
BE053
BE070
BE056
BE055
BE001
BE069
BE045
BE050
BE099
BE090
BE094
BE066
BE057
C
BE019
BE035
BE060
BE068
BE044 PAD1 FDC1
WL003
BE082
SP006
WI004
WI011
B E076
BE086
WI015
WI005
WI018
BE081
WI005
SP010
BE031
WI017
BE058
BE078
WI014
WI016
7
BE01
BE061
WI007
1
6
WI01
LA00
SP00
BE098
SP00 8
BE02
BE06
SP00
8
BE01
SP00 3
8
BE03
WI0
WI00
BI0
1
WL 02
BE03
SP00 9
WI0 3
BE102 BE 02
B R 09
SP00
WI0 17
8
005
BR 01
SP 88
2
BE 03
05
01
004
0
4
4
BE 02
BE 001
09
03
WL 003
BE007
SP
0
WI0 02
BE 06
SA 021
W 005
0
SP
BE 014
0
W 028
BE 007
BE 080
SP
0
BE 011
BE067
BR 007
0
0
BE 084
B E 07 2
SP 004
B 001
L
BE 07
BE 010 3
4
L
BE071
BR 00 3
6
SP 06 6
B E09 3
BE E 0 2
BR E06 8
9
B E07 9
BE065
BE 00
0
B 02
E0 2
B I00
00 7
W
43
B
2
W
BE051
L
3
B
B E03 3
B
2
02 5 BE E03 00 9
W E08 2
R
BE095 BE 026 0 BE I01 6
L
BE 003 3 BE E07 1
BE096 B 08 BE 077 W I00
BE 062 BE I002 BE 022 W 007
BE097 BE 030 SA 059
W 003 BE 075
BE052 BE 017 S A 059 0 BE 003
BE 7
BE042 BE 027 BE 001 0
8 SA 001
0 BE 58 SA 004
BE058 BE 04 SA 001 0
0 LA 0 5 BE 76 SA 0 5
BE 20 0 03
SA
0
BE031 0 SA 6 BE 1
006
BE 11 00 01
SA
BE078 03 SA SP00 6
BE04 9 4
SA00 7 5 BI0
03
BE075 0 BE0
WI0
11 SA00 BE03
04 BI004
BE076 1
BE07 BI00
1 9 BI00
BE04
BE077 BE08
4
WL0
01 0 WL002
6 BE011
BE026 BE043 BI003 WL001
WI014 BE023
BE022 WI010 WL002
WI007 LA001
SP005 BR002 BI004
spar BE020 spar
BE016 SP011
BI002 WI018 BI002
BE081 WI006
BE096 WI010 BE094
BE043 BE034 BE036
BE036 WI013
BE015 BE003 BE037 WI012
BE088
BE091 BE057 WI009 BE099
BE012
SP007 BE049 WI001 BE079
BE073
BE002 BE045 BE033
BE070
BE087 BE01
4 BE08 4 BE06
BE041 3 2 BE02 6
BE09 BE05 2 BE04
SP00 7
BE008 WI0
04 BE0 0 83 BE0
4 55 BE0 9 64
BE01 BE01
BE046 BR00 5 BE02
08 BE 0 BE 0
BE017 BE 097 17
WI0 13
051
005 BE BE
BE093 WL 15 BE 0 6 0
0
B E 27 BE
090
WI0 72 1
BE 00 0
BE 06 BE 054
BE074 0
BE 13 BE 035 0
WL 025 BE 089
BE072 0
BE 03 BE 067 BE 009
I0 6 BE 062
WL006 W BE 056
B E 021 BE 053
00 B 04 B 04
WL005 SP I019 BE E01 1 BE 091 4
W 092 3 BE E05 8
WL007 W 08 5 BE 08 2 BE 04 0
BE E02 5 I0 9
12 BE 03 00 5
BR002 B 02
BE 00 8
BE
B E06 4
B E06 0
1
S A 02
E0 9
E0 8
8
BE 0 8 0 2
B 06
B 10 5
LA 01
E
BR003 B
56
46
W I01 2
B
BE 09
BE 05
BE
W 00
6
BE 006
BE 095
BE 015
BR001
SP 004
BE 102
BE 034
BE 067
4
BE 085
BE 069
BE 102
0
BE025
L
4
SP 0 0 5
BE 42
BE 05
0
BE 097
BE 08
BE 048
BE 19
BE 95
0
BE 01
BE
BE 60
SP003
BE
BE 82
0
BE 47
BE
057
0
BE
0
BE
BE09 4
BE0 5
0
BE
9
BE0
87
0
0
04
03
BE038
BE0 6
BE01
012
1
2
0
BE09
BE09
6
087
8
BE07
BE04
BE04
BE05
037
8
BE
BE066
BE009
BE073
BE10
BE04
BE070
BE09
BE007
BE012
BE071
BE053
BE008
B E079
SP008
BE07
BE00
BE061
BE065
BE054
BE052
BE068
BE051
65
BE101
BE007
BE098
BE042
44
9
0
BE023
3
9
1
BR004
SP009
BE005
BE029 6.0E-4 5.0E-4
BE028
SP001
BE006
WI008
BE027
SP007
SP008
BE033
D
WI013
SP002
WI001
BE024
BE088
WI009
WI005
SP011
WI004
BE020
WI015
WI011
(1)
SA002
WI006
WI003
WI017
WI010
Mating type a Mating type α
WI018
WI007
WI014
BE030
BE014 (1)
BI002
BI005
SP004
SP006
BE091
BE062
BE032
BE059
BE021
BE080
BE084
BE083
BE034
(2)
BE013 SA005
BE086
BE092
BE003
BE063 120
BE039 BE027-a SA005-α Hybrid 1
BE011
4-VG production (au)
BE040 100
BE085
BE027
x
BE004
WI019 80
BE002 SA005-a BE027-α Hybrid 2 Parental
LA002
BE018
WI016 x
60 Haploid
SP010 (3) Hybrid
LA001
WL001 40
WL003 BE027-a BE027-α Hybrid 3
WL002
SA004 20
SA007 x
SA005
SA003
SA006
0
SA001 SA005-a SA005-α Hybrid 4
BE027
BE027-a
BE027-α
SA005-a
SA005-α
Hybrid 1
Hybrid 2
Hybrid 3
Hybrid 4
SA005
BI001
BI003
BI004 x
WL004
WI002
spar
(B) Percentage of strains within each origin (left) and population (right) capable of producing 4-vinyl guaiacol (4-VG). Red, 4-VG; turquoise, 4-VG+.
(C) Phylogenetic trees and ancestral trait reconstruction of PAD1 and FDC1 genes. Branches are colored according to the most probable state of their ancestral
nodes, turquoise (4-VG+) or red (4-VG). Pie charts indicate probabilities of each state at specific nodes, turquoise (4-VG+) or red (4-VG); posterior probability for
the same nodes is indicated by a dot: black dot, 90%–100%; gray, 70%; white, 42%. Branch lengths reflect the average numbers of substitutions per site
(compare scale bars).
(D) Development of new yeast variants with specific phenotypic features by marker-assisted breeding. Two parent strains (BE027 and SA005) were sporulated
and, using genetic markers, segregants with the desired genotype were selected (1). Next, breeding between segregants from different parents (outbreeding) or
the same parent (inbreeding) were performed (2). This breeding scheme yields hybrids with altered aromatic properties that can directly be applied in industrial
fermentations (3). 4-VG production is shown relative to the production of BE027. Yeast genomes are represented by gray bars, loss-of-function mutations in
FDC1 as red (W497*) and blue (K54*) boxes within the gray bars. Error bars represent one SD from the mean.
See also Table S5.
Detailed methods are provided in the online version of this paper REFERENCES
Cingolani, P., Platts, A., Wang, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Legras, J.L., and Karst, F. (2003). Optimisation of interdelta analysis for
Lu, X., and Ruden, D.M. (2012). A program for annotating and predicting the Saccharomyces cerevisiae strain characterisation. FEMS Microbiol. Lett.
effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of 221, 249–255.
Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92. Lemey, P., Rambaut, A., Drummond, A.J., and Suchard, M.A. (2009). Bayesian
Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., phylogeography finds its roots. PLoS Comput. Biol. 5, e1000520.
Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., et al.; 1000 Genomes Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Bur-
Project Analysis Group (2011). The variant call format and VCFtools. Bioinfor- rows-Wheeler transform. Bioinformatics 25, 1754–1760.
matics 27, 2156–2158. Liti, G., Carter, D.M., Moses, A.M., Warringer, J., Parts, L., James, S.A., Davey,
Dittmar, J.C., Reid, R.J., and Rothstein, R. (2010). ScreenMill: a freely available R.P., Roberts, I.N., Burt, A., Koufopanou, V., et al. (2009). Population genomics
software suite for growth measurement, analysis and visualization of high- of domestic and wild yeasts. Nature 458, 337–341.
throughput screen data. BMC Bioinformatics 11, 353. Liu, Y., Schröder, J., and Schmidt, B. (2013). Musket: a multistage k-mer spec-
Driscoll, C.A., Macdonald, D.W., and O’Brien, S.J. (2009). From wild animals to trum-based error corrector for Illumina sequence data. Bioinformatics 29,
domestic pets, an evolutionary view of domestication. Proc. Natl. Acad. Sci. 308–315.
USA 106 (Suppl 1 ), 9971–9978. Loewe, L., Textor, V., and Scherer, S. (2003). High deleterious genomic muta-
Drummond, A.J., and Suchard, M.A. (2010). Bayesian random local clocks, or tion rate in stationary phase of Escherichia coli. Science 302, 1558–1560.
one rate to rule them all. BMC Biol. 8, 114. Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., Dopman, E.B., Dick-
Drummond, A.J., Suchard, M.A., Xie, D., and Rambaut, A. (2012). Bayesian inson, W.J., Okamoto, K., Kulkarni, S., Hartl, D.L., and Thomas, W.K. (2008). A
phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973. genome-wide view of the spectrum of spontaneous mutations in yeast. Proc.
Natl. Acad. Sci. USA 105, 9272–9277.
Dunham, M.J., Badrane, H., Ferea, T., Adams, J., Brown, P.O., Rosenzweig,
F., and Botstein, D. (2002). Characteristic genome rearrangements in experi- Magwene, P.M., Kayıkçı, Ö., Granek, J.A., Reininga, J.M., Scholl, Z., and Mur-
mental evolution of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA ray, D. (2011). Outcrossing, mitotic recombination, and life-history trade-offs
99, 16144–16149. shape genome evolution in Saccharomyces cerevisiae. Proc. Natl. Acad.
Sci. USA 108, 1987–1992.
Dunn, B., Richter, C., Kvitek, D.J., Pugh, T., and Sherlock, G. (2012). Analysis
of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number McDonald, M.J., Rice, D.P., and Desai, M.M. (2016). Sex speeds adaptation by
variants distributed in diverse yeast strains from differing industrial environ- altering the dynamics of molecular evolution. Nature 531, 233–236.
ments. Genome Res. 22, 908–924. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky,
Eden, E., Navon, R., Steinfeld, I., Lipson, D., and Yakhini, Z. (2009). GOrilla: a A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. (2010).
tool for discovery and visualization of enriched GO terms in ranked gene lists. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-
BMC Bioinformatics 10, 48. generation DNA sequencing data. Genome Res. 20, 1297–1303.
Michel, R.H., McGovern, P.E., and Badler, V.R. (1992). Chemical evidence for
Filteau, M., Hamel, V., Pouliot, M.C., Gagnon-Arsenault, I., Dubé, A.K., and
ancient beer. Nature 360, 24.
Landry, C.R. (2015). Evolutionary rescue by compensatory mutations is con-
strained by genomic and environmental backgrounds. Mol. Syst. Biol. 11, 832. Mukai, N., Masaki, K., Fujii, T., Kawamukai, M., and Iefuji, H. (2010). PAD1 and
FDC1 are essential for the decarboxylation of phenylacrylic acids in Saccharo-
Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for clus-
myces cerevisiae. J. Biosci. Bioeng. 109, 564–569.
tering the next-generation sequencing data. Bioinformatics 28, 3150–3152.
Muller, H.J. (1964). The relation of recombination to mutational advance.
Goddard, M.R., and Greig, D. (2015). Saccharomyces cerevisiae: a nomadic
Mutat. Res. 106, 2–9.
yeast with no niche? FEMS Yeast Res. 15, fov009.
Pattengale, N.D., Alipour, M., Bininda-Emonds, O.R., Moret, B.M., and Stama-
Goddard, M.R., Godfray, H.C.J., and Burt, A. (2005). Sex increases the effi-
takis, A. (2010). How many bootstrap replicates are necessary? J. Comput.
cacy of natural selection in experimental yeast populations. Nature 434,
Biol. 17, 337–354.
636–640.
Pavelka, N., Rancati, G., Zhu, J., Bradford, W.D., Saraf, A., Florens, L.,
Hornsey, I.S. (2003). A History of Beer and Brewing (Cambridge: The Royal
Sanderson, B.W., Hattem, G.L., and Li, R. (2010). Aneuploidy confers quanti-
Society of Chemistry).
tative proteome changes and phenotypic variation in budding yeast. Nature
Hutter, S., Vilella, A.J., and Rozas, J. (2006). Genome-wide DNA polymor- 468, 321–325.
phism analyses using VariScan. BMC Bioinformatics 7, 409. Peng, Y., Leung, H., Yiu, S., and Chin, F. (2010). IDBA – a practical iterative de
Huxley, C., Green, E.D., and Dunham, I. (1990). Rapid assessment of S. cere- Bruijn Graph de novo assembler. In Research in Computational Molecular
visiae mating type by PCR. Trends Genet. 6, 236. Biology, B. Berger, ed. (Springer), pp. 426–440.
Further information and requests may be directed to, and will be fulfilled by the corresponding author Kevin J. Verstrepen (kevin.
verstrepen@biw.vib-kuleuven.be).
Strain Collection
For this study, a set of 157 Saccharomyces cerevisiae strains was sequenced, phenotyped and analyzed. Strains were obtained from
historical yeast collections of the VIB Laboratory for Systems Biology (KU Leuven, Belgium) and White Labs (USA). While detailed
background information on many of these strains is limited, their geographical origin was in most cases documented and is listed
in Table S1. Beer strains mainly originate from the main fermentation (82/102) or bottle conditioning (10/102) of ale beers. While lager
beer fermentations are usually carried out by S. pastorianus, an interspecific hybrid of S. cerevisiae and S. eubayanus, we identified
10 S. cerevisiae strains typically employed in lager fermentations and included them in the selection.
All strains are long-term stored in 80 C using a glycerol-based standard storage medium (peptone 1% w v-1, yeast extract
0.5% w v-1, glucose 1% w v-1, glycerol 25% v v-1).
METHOD DETAILS
DNA Extraction
For strains BE001-043, BI001-005, BR001-004, LA001, SA001-007, SP001-007, NA001-004 and WI001-018, genomic DNA was pre-
pared using the QIAGEN genomic tip kit (QIAGEN, Germany) according to recommended protocols. Final DNA concentrations were
measured using Qubit (Thermo Fisher Scientific, USA). For the other strains, genomic DNA was extracted with the MasterPureTM
Yeast DNA Purification Kit (Epicenter, USA), but with some modifications to the recommended protocols. Three mL of each of the
liquid yeast cultures were pelleted by centrifugation at 20,800 x g for 5min in 1.7mL micro centrifuge tubes. 300 ml of yeast cell lysis
solution was added to each micro centrifuge tube along with 1 ml of 5g L-1 RNase A. The cells were resuspended by vortex mixing
each micro centrifuge tube for 10 s. Each tube was incubated at 65 C for 15 min and was then chilled on ice for 5 min. Next, 150 ml of
protein precipitation reagent was added to each tube and the tubes were vortexed for 10 s. The suspensions were then centrifuged
for 10 min at 20,800 x g to pellet the cellular debris. The supernatants were transferred to clean 1.7mL micro centrifuge tubes. Next,
500 ml of isopropanol was added to each tube. Each tube was then inverted several times to precipitate the DNA, which was then
pelleted by centrifugation at 20,800 x g for 10 min. The supernatants were discarded and the pellets were washed with 500 ml of
ice-cold 70% ethanol and briefly centrifuged. The ethanol was removed by pipetting. The DNA pellets were centrifuged again for
10 min at 20,800 x g to remove any remaining ethanol. Each DNA pellet was then suspended in 35 ml of TE buffer and stored at
De Novo Assembly
For each library, low-quality and ambiguous reads were trimmed using Trimmomatic (v0.30) (Bolger et al., 2014). After k-mer based
read correction with musket (Liu et al., 2013), reads were assembled using idba_ud (Peng et al., 2010). De novo assemblies were
evaluated by mapping back the reads and also by checking BLAST matches of assembled contigs to the SILVA database for
rDNA classification. Each de novo assembly was scaffolded against the S. cerevisiae S288c reference genome assembly (http://
downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/S288C_reference_genome_R64-1-1_20110203.tgz).
The liftOver workflow (Kuhn et al., 2007) was used to determine the coordinates of contigs from each newly assembled strain relative
to the reference strain (http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver). Scaffolded contigs mapped to each
reference strain chromosome were combined into a ‘‘pseudo-molecule,’’ with the placed contigs stitched together with gaps indi-
cated by ‘‘N.’’ Unplaced contigs (including alternative lower scoring matches) were kept. Unplaced contigs less than 300 nucleotides
were not included in the final assembly (Table S1).
Annotation
The genome annotation of the S. cerevisiae S288c reference genome (nr. of genes = 6,692) was downloaded from the UCSC (version
Apr2011/sacCer3) Table Browser in GPR format. FASTA records were renamed to match the chromosome naming convention in the
GPR file. The liftOver workflow was used to create a coordinate conversion file (chain file) between the S. cerevisiae S288c genome
and each newly scaffolded assembly. Using the chain file, the coordinates of the S. cerevisiae S288c genes were ‘‘lifted’’ to each
new genome assembly. Lifted genes were considered valid if they did not contain internal stop codons. Independently, the gene pre-
diction tool AUGUSTUS v2.5 (Stanke and Morgenstern, 2005) was used to predict genes for each new strain using the provided
training set/model for S. cerevisiae S288c with the following parameters (–noInFrameStop = true–maxDNAPieceSize = 1000000–
progress = false–uniqueGeneId = true–keep_viterbi = false). The annotated and predicted genes, using liftOver and AUGUSTUS
respectively, were combined with priority given to the liftOver annotation when the predictions overlapped (Table S1).
Phylogenetic Analyses
Phylogenetic Tree for the Sequenced Collection - Figure S1A
Multiple sequence alignments (MSAs) for the 2,020 amino acid sequences identified above were generated using MAFFT (v7.187),
with default settings and 1,000 refinement iterations (Katoh and Standley, 2013). Codon alignments were obtained from MSAs of pre-
dicted amino acid sequences and the corresponding DNA sequences by the PAL2NAL program (v14) (Suyama et al., 2006). Quality
checks and format conversions were performed using trimAl (v1.2) (Capella-Gutiérrez et al., 2009). The full set of codon alignments
were concatenated into a supermatrix using FASconCAT (v1.0) (Kück and Meusemann, 2010). The resulting supermatrix included
158 taxa and 2,782,494 positions, 99.174% nucleotides, 0.826% gaps and 0% ambiguities. The matrix was partitioned based on
all 2,020 gene blocks and all three codon positions within each block, resulting in 6,060 distinct data partitions accounting for
144,171 distinct alignment patterns. Twenty completely random starting trees and 20 randomized stepwise-addition parsimony
starting trees were obtained using RAxML (v8.1.3) (Stamatakis, 2014). Robinson-Foulds (RF) distances were computed between
all trees in both the fully random and parsimony tree sets, to avoid systematic bias due to low diversity in starting trees. Because
the stepwise addition algorithm generated a set of starting trees with low diversity, all the subsequent analyses were conducted
with fully random starting trees. Twenty maximum-likelihood (ML) tree searches were performed on each of the 20 fully random start-
ing trees under the GTRGAMMA model (4 discrete rate categories) using ExaML (v3) and the rapid hill climbing algorithm (-f d) (Kozlov
et al., 2015). During the ML search, the alpha parameter of the model of rate heterogeneity and the rates of the GTR model of nucle-
otide substitutions were optimized independently for each partition. The branch lengths were optimized jointly across all partitions.
For each starting tree, the best tree was selected based on the highest log-likelihood score. Parameters and branch lengths were re-
optimized on the best 20 topologies with ExaML (-f E) using the median of the four categories for the discrete approximation of the
GAMMA model of rate heterogeneity (-a). The tree with the best overall log-likelihood score of all 20 tree inferences was considered
the final ML tree. Non-parametric bootstrap analysis was performed on the concatenated matrix using RaxML (v8.1.3). The a pos-
teriori boot-stopping criterion (Pattengale et al., 2010) (MRE bootstrapping convergence criterion) was applied to define the number
of replicates. After every 50 replicates, the set of bootstrapped trees generated so far is repeatedly (100 permutations) split in two
equal subsets, and the Weighted Robinson-Foulds (WRF) distance is calculated between the majority-rule consensus trees of
both subsets (for each permutation). Low WRF distances (< 3%) for > = 99% of permutations were used to indicate bootstrapping
convergence. Convergence was reached after 250 replicates: average weighted Robinson-Foulds distance (WRF) = 1.86%, percent-
age of permutations in which the WRF was % 3.00 = 100%. The tree was visualized and rooted in FigTree (v1.4.2) using S. paradoxus
as the outgroup (http://tree.bio.ed.ac.uk/software/figtree/).
Phylogenetic Tree for the Extended Collection - Figure 1A
In order to compare our strain collection with previously sequenced strains, we included 24 additional isolates, previously described
in Liti et al. (2009) (re-sequenced in Bergström et al. (2014)) and Strope et al. (2015) (Table S1). Generation of MSAs and construction
of the concatenation matrix was performed as described earlier. The resulting supermatrix included 2,785,239 positions, 99.077%
nucleotides, 0.922% gaps and 0.001% ambiguities. The matrix was partitioned based on all 2,020 gene blocks and all three codon
positions within each block, resulting in 6,060 distinct data partitions, accounting for 163,920 distinct alignments patterns. The ML
searches and re-optimization were run on 30 fully random starting trees as described above, using RAxML (v8.1.3) and ExaML (v3).
Non-parametric bootstrap analysis was performed as described above. Convergence was reached after 250 replicates: average
weighted Robinson-Foulds distance (WRF) = 2.10%, percentage of permutations in which the WRF was % 3.00 = 99%. The tree
was visualized and rooted in FigTree using S. paradoxus as the outgroup.
Multi-locus Phylogeny - Figure S1C
Nine partial genes previously used to genetically characterize 99 Chinese isolates of S. cerevisiae (Wang et al., 2012) were recovered
from 194 previously sequenced genomes (Table S2) and from the 157 isolates sequenced in this study, for a total of 450 strains. Each
gene was aligned with MAFFT (v7.187) and the final MSAs were concatenated with FASconCAT (v1.0). The concatenated alignment
was trimmed using trimAl (v1.2) with the automated1 option, optimized for ML tree reconstruction. The resulting supermatrix included
19,254 positions, 83.348% nucleotides and 16.652% gaps. The matrix was partitioned based on gene blocks in nine distinct parti-
tions with joint branch length optimization. ML search on 30 fully random starting trees and non-parametric bootstrap analysis were
Phenotypic Analysis
Flavor Production and Flocculation in Fermentation Conditions
To assess the production of aroma-active compounds, lab-scale fermentation experiments were performed. These fermentations
were performed in rich growth medium (YPGlu 10%; peptone 2% w v-1, yeast extract 1% w v-1, glucose 10% w v-1). Yeast
precultures were shaken overnight at 30 C in test tubes containing 5mL of yeast extract (1% w v-1), peptone (2% w v-1) and glucose
(4% w v-1) medium (YPGlu 4%). After 16 hr of growth, 0.5mL of the preculture was used to inoculate 50mL of YPGlu 4% medium in
250mL Erlenmeyer flasks, and this second preculture was shaken at 30 C for 16 hr. This second preculture was used for inoculation
of the fermentation medium (YPGlu 10%) at an initial optical density (at 600nm; OD600) of 0.5, roughly equivalent to 107 cells mL-1. The
fermentations, performed in 250mL Schott bottles with a water lock placed on each bottle, were incubated statically for 7 days at
20 C. Weight loss was measured daily to estimate fermentation progress. After 7 days, the fermentations were stopped, filtered
(0.15 mm paper filter) and samples for chromatographic analysis, density and ethanol measurements were taken.
Headspace gas chromatography coupled with flame ionization detection (HS-GC-FID) (Agilent Technologies, USA) was used for
the quantification of yeast aroma production. The GC was calibrated for 16 important aroma compounds, including esters (ethyl ac-
etate, isobutyl acetate, propyl acetate, isoamyl acetate, phenyl ethyl acetate, ethyl propionate, ethyl butyrate, ethyl hexanoate, ethyl
octanoate, ethyl decanoate), higher alcohols (isoamyl alcohol, isobutanol, butanol, phenyl ethanol), acetaldehyde and 4-VG. The GC
was equipped with a headspace autosampler (PAL system, CTC analytics, Switzerland) and contained a DB-WAXETER column
(length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.5 mm, Agilent Technologies, USA) and N2 was used as the carrier
gas. Samples were heated for 25 min at 70 C in the autosampler. The injector block and FID temperatures were both kept constant
at 250 C. Samples of 5mL filtered fermentation medium were collected in 15 ml glass tubes containing 1.75 g of sodium chloride
each. These tubes were immediately closed and cooled to minimize evaporation of volatile compounds. The oven temperature
Standard statistical analyses were conducted in RStudio (v.0.98.994) (https://www.rstudio.com/) with custom scripts.
Preprocessing of the phenotypic data to produce Figure 3A and Table S5 consisted of a conversion to z-scores, calculated as fol-
lows: z-score = (Xi - m)/s, where Xi is the data value for strain i, m the mean of all strains and s the standard deviation across all strains.
To facilitate direct comparison between different strains for a specific trait, a reference condition for each environmental stressor-
related trait was determined. This reference condition is defined as the most stringent condition (i.e., the condition with the highest
stressor concentration or most extreme temperature) where around 50% of the investigated strains still managed to reach a colony
area greater than 10% of their colony area in the control condition (YPGlu 2% agar, 20 C).
The phenotypic variability explained by the set of mutations identified in MAL11 gene was computed as following: first, all SNPs
and InDels were searched in pairwise comparison for high correlations (> 0.90). A Ward clustering was additionally performed to
retain or exclude mutations according to the clusters identified. Second, a linear regression analysis was performed on individual
mutations; the mutation was retained if significantly associated with the phenotype, with p < 0.001 considered as significant.
Last, REML (restricted maximum likelihood) analysis with completely random effects was carried out on the final set of SNPs in
SAS (v9.4).
Data Resources
The accession numbers for the de novo assembly data reported in this paper are DDBJ/ENA/GenBank: Bioproject ID,
PRJNA323691, Biosample ID SAMN05190362-SAMN05190518, and MBUB00000000-MCAB00000000 (Table S1).
A B
Bee
BE032
BE059
r2
BE062
BE021
BE091
BE080
BE084
SP006
BE083
SP004
4
BE0 3
BI005
BE03
BE0 0
BE0986
BI00
BE01
BE0314
BE 2
Origin
BE 0 3
WI0 07
1.00
063
WI0 18
0
14
BE 0 3 9
2
WI0 10
BE 011
WI0 17
BE 040
WI0 03
BE
BE 085
W I006
Beer e
W E00 7
W 002
I0
B 02
in 0.50
SA I01 5
B I01 4
K=2
W I01 0
LA E0 9
Wine W
00 02
W 02
B
2
1
W
E
8
SP I00 01 6
WA
BE 0 8 8
9 LA L00 3
W 00
WI0 024 WL 002
Wild SP 01
0 WL 004 0.50
K=4
WI0 02 SA 007
Bio-ethanol BE 13 SA 05
Asi
0 0
SP0033 SA 003
Bread SP00 8 SA 6
0.00
a
WI0087 SA00 1 1.00
SA00
Laboratory BE00
SP001
6 BI001
BI003
S.paradoxus BE028
BE029
BI004 0.50
K=6
WL004
BE005 WI002
SP009 spar
0.00
Lineage BR004 BE049
Mixed
BE047
BE048
BE049
BE053
BE054
BE055
BE056
BE064
BE069
BE070
BE079
BE089
BE100
BE101
BE001
BE009
BE010
BE037
BE090
BE066
BE050
BE036
BE045
BE099
BE094
WI012
BE065
BE067
BE071
BE082
BE098
BE102
BE007
BE068
BE044
BE051
BE046
BE073
BE075
BE076
BE077
BE078
BE081
BE087
BE008
BE012
BE016
BE022
BE026
BE041
SP005
BE031
BE015
BE043
BE061
WL005
WL006
WL007
BE025
SP003
BR002
BR003
BR004
BR001
BE028
BE023
BE006
BE029
SP001
BE038
BE005
SP011
BE088
BE014
BE020
BE024
BE030
BE033
WI001
WI003
WI004
WI005
WI006
WI007
WI009
WI010
WI011
WI013
WI014
WI015
WI017
WI018
SA002
SP002
BE086
BE092
BE002
BE032
BE003
BE063
BE021
WI019
BE083
BE084
BE085
BE091
BE004
BE011
BE027
BE034
BE039
BE040
BE080
BE013
BE062
BE059
SA001
SA003
SA004
SA005
SA006
SA007
BI001
BI003
BI004
WL004
WI002
WL002
WL003
WL001
LA001
SP004
SP006
WI016
BI005
BI002
LA002
BE018
SP010
BE017
BE074
BE072
BE093
BE035
BE060
BE019
BE057
WI008
SP008
SP007
SP009
BE058
BE042
BE052
BE095
BE096
BE097
00
WL 05 BE 9
0
WL 006 BE 089
Mixed WL 072 WI0 009
BE 074 BE 12
BE 047 Belgium/
in
Wine BE 093 BE 053 Britain US Mixed Wine Beer 2 Asia Mosaic NS
BE 0 1 7
i ta
Beer 2 BE 046
BE 070
B 05
Germany
Br
8 B E0 6
BE 00 1
B E E04 7 BE E00 55
West Africa B E08 06 1
BE E01 3
B 0 9
9
B E07
E0 50
B
BE 01 2
BE 09 0
Asia
45
B E 04 5
BE E09 4
B
B 09
BE 0 8 1
BE 066
SP 016
BE 057
BE 005
BE 019
B
lg
BE 022
BE 035
e
BE 026
BE 60
iu
B E 77
BE 4 4
BE0776
BE 8
BE0 5
BE0682
m
0
78
BE0 8
0
0
BE09
/G
8
BE10
BE042
BE007
BE03
BE052
BE067
BE097
BE071
BE096
BE065
BE095
BE051
BE05
e rm
2
an r1
x1
y Bee
*
US
0.005
C
UWOPS83-787.3
UWOPS87-2
Origin
OakBor21-1
OakArd11
YJM1443
EXF6780
YJM1444
YJM1311
OakGri7
YJM456
YJM681
WL004
YJM1 479
ZP611
YJM4 400
ZP779
ZP785
SDO2 419
YJM1 51
YJM1 527
ZP793
SD4
YJM1M428
SDO3 s1
XJ1
SD 1573
BJ16
421
BJ20
SX9
s1
YJM1
YN4
YN5
T7
YJMM689
YN1
SDO 1402
BJ19
YJ
XJ6
YJMO8s1
BJ18
BJ9
BJ14
BJ108
YJ I008
6s1
s1
YJM
Wild
33
ZP53 50
W 52
11
BE0096
BE 095
SM69
YP M14 73
0
BE 097
ZP10
Y S12 34
90
ZP 0
BE N17
YJM651
YJ 12
ZP68
ZPPS1 8
H 19
HN 18
Z 65663
HNN16
H N9
Clinical
FJP653
H J2
Z 4
HN12
FJ 10
1
FJ N8
X X3
X J7
H 7
H LJ2
N
H J3
HNJ3
J LJ3
F 11
FJ N 6
Fermented source
SAA00 05
F L4
H N2
S J9
S 0 4
H N5
S A0
S A0 03
H N3
H J1
Z A0 06
S TW 01
Z J8
HNN10 1
F 12
U A0 1
K C 07
Laboratory
H HN 11
Y JM 1 i7
Y1 JM 15 389
Y JM ka
HN N1 14
Y yo 5
YJ 2 13 92
H N 13
88
H N J2
H F J7
60
NA
14
F J5
M
FJ 2 1
F
B W FJ 4 YN I00 03
L E0 30 6 B I0 04
CE s A0 18 3 B I0
N.P288 02
Y1K2
SK 0
c
B J1
B L2 14
J JM 86
01
Y P7 23
Z P8
S.paradoxus
Y LA Y55 1 Z N6 2
Y D3 134
YJJM6 001 S JM
M 27 Y D1 3
Y P 19
DB YJ JM1 W5 5 S J1
B X8
VP M1 43 S D2 5
Y G6 24 9 S P67 2
ANJM1 044 8 Z P65 7
3 38 Z 81 1
SO e1s 1 ZPP78 6
N4 1 .4
Y c Z P79 447 3-461
YJ JM XJ4 Z M1 S0
YJ M14470 YJWOP
YJ M16133 U 3
M
YJ 120 5 YNX7
S 5
YJMM6838 J
B 3
YJM 32 BJ 6
6
YJM1549 BJ2 7
YJM 555 BJ1 1
YJM 320 BJ2 2
YJM 554 BJ1 5
BE0541 BJ1
BE0565 BJ2
BE04 1 BJ8
BE08 4 BJ24
BE10 2 BJ725
BJ
BE00 2 BJ6 2
BE07 7 WI00 418
BE06 1 YJM1
BE09 8 HLJ1
8
BE06 JL5
BE0667 BJ28
BE045 BJ27
BE094 BJ4
BE050 BJ22
BE090 BJ23
BE099 SX2
BE069 SX11
BE056 SX5
BE058 SX10
BE047 SX3
BE037 SX1
WI012 SX6
BE009 SX4
spar
BE100 BR001
BE064 WL006
BE010 WL005
BE036 YJM1307
BE089 WL007
BE070 BE025
BE053 BR003
BE048 BR002
FostersB YJM693
BE001 Sigma127
BE079 YJM138 8b
BE055 YJM119 7
BE049 YJM120 9
BE101 YJM130 2
BE0544 YJM1 4
BE07 2 EC9-8 083
BE07 3 SP01
BE09 5 BE02 0
BE01 6 BR00 3
BE04 2 WL0 4
BE01 8 WL0 03
01
BE00 5 WL0
BE03 YJ 02
0
BE0619 RedM339
YJM Star
BE0057 BE 450
BE sO YJM063
er
FostBE0813 YJM 1478
04 YJM 1385
BE 031 YJM 1386
BE 078 YS 1355
BE 0 7 5 BE 9
B E 077 BE 006
BE 005 B 028
SP 076 SPE02
BE 017 B 00 9
BEE0262 B E08 1
B E02 5 BEE0343
B 00 2 B 08
BEE04 6 Y E01 4
B E01 1 B JM 3
B E04 7 B E08 193
Y E0 0
B E0873 Y JM 86
B E0 J2 S JM 145
B X s1 S P0 27 0
2
1g 061 3 B P0 08 1
AN BE 00 8 BEE09 07
B 0 1
SP E03 03 B E0 92
B I0 39 S E0 32
W E0 04 B P0 62
B E0 40 1 E I0 0
B E0 01 7 B X 02 4
Y E F7
B E 2 5
B E0 08 9 Y J 00 14
B E 00 W JM M1 3 5
I0 13 52
SP 14 JL1 3
B P 13 3 6
B I0 002 41
JL
S
E0 01
2
W P 10 4
C W BI0 006 3
S P 02
33
6
Z E 09 8 106
C BS I01 05
B I0 08 G1
M
YJLIB 796 6
W E P
YJ
3 0
B BV 17
ZPM24 24
D I0 41
W 8 4
W P6 02 0
YJ ZP I00 62
Z A0 99 4
E M 86 6
S
YJ XF 135 1
YJ JM 981 6
YJ M16719 6
Y JM 99
M
Y JM 975
AWYJM 141 17
Y JM 0
RI1112 5
Y E02 78
M 4
9
63 9
B JM9 93
1
8
Y
YJ NXX2
YJJM9787
BE 4531
Y JM9 69
N
Y M9
M
lvinBE0030
YJP577
Q 14
Z 578
9 2
WI0A23
ZPP579
M
Y WI0 0
Z 528 6765
YJMJM2714
1
L1 VPG
13 0
DBI005
Dav 83
W 1244
1e
YJM 1242
IO WI018
YJM 1477
YJMC9002
YJM 189
W 74
YJMF6761
EC11I007
EX 15
PY 18
WI0 3
EXF7 R4b
La
15
Vin111
EXF7 179
WI0
200
VL3M1341
ZP56 9
YJ 4
ZP86 8
WI00 1
ZP85 0
39
SP01 G1373
ZP56 0
DBVP
ZP5906
XJ5 336
YJM1 96
ZP633
AWRI7
ZP560
BE059
HUN9.1s 3
1
YJM1
BE021
BE002
ZP561
YJM682
DBVPG101 1
YJM1326
YJM1133
00
OakRom
ZP570
YJM1250
WI019
ZP636
ZP575
ZP541
ZP742
CLIB215
ZP565
ZP848
ZP736
ZP563
ZP562
ZP85
0.002
Figure S1. Phylogeny and Population Structure of the Industrial S. cerevisiae Strains, Related to Figures 1A–1C
(A) Phylogenetic tree of strains sequenced in this study, using Saccharomyces paradoxus as an outgroup. The tree was inferred from the concatenation matrix of
2,020 single copy orthologs. Black dots on nodes indicate bootstrap support values < 70%. Color codes indicate origin (names) and lineage (circular bands). The
basal splits of the five industrial lineages are indicated with a colored dot. Branch lengths reflect the average number of substitutions per site (scale bar = 0.005
substitutions per site).
(B) Population structure plot, with strain codes indicated.
Distribution of amplifications (red) and deletions (blue) in each strain, expressed in percentage of the genome affected (top) and in number of CNV events (bottom).
Deletion
WI002
WL004
BI004
BI003
BI001
SA001
SA006
SA003
SA005
SA007
SA004
WL002
WL003
WL001
LA001
SP010
WI016
BE018
LA002
BE002
WI019
BE004
BE027
BE085
BE040
BE011
BE039
BE003
BE086
BE013
BE034
BE084
BE080
BE021
BE032
BE062
SP006
SP004
BI005
BI002
BE014
BE030
WI014
WI007
WI018
WI010
WI017
WI003
WI006
Figure S2. Copy-Number Variability across the Phylogenetic Tree, Related to Figure 2
SA002
WI011
WI015
BE020
WI004
SP011
WI005
WI009
BE088
BE024
WI001
SP002
WI013
BE033
SP008
SP007
WI008
BE006
SP001
BE028
BE029
BE005
BR004
BE023
BE038
SP003
BE025
BR001
BR003
BR002
WL007
WL005
BE074
BE093
BE017
BE046
BE008
BE041
BE087
BE073
BE012
BE015
BE043
BE081
BE016
SP005
BE022
BE026
BE077
BE076
BE075
BE078
20%
10%
10%
20%
30%
40%
50%
60%
60
40
20
0
20
40
60
80
100
120
140
0%
CNV-(% of genome length) CNV-frequency
Figure S3. Trait Variation in Industrial S. cerevisiae Strains, Related to Figure 3
Graphic representation (violin plots) of trait variation within and between the subpopulations for different environmental stressors. Triangles represent median
values for each subpopulation. All values are depicted as relative growth compared to growth on medium without the stressor. Statistical analysis for each trait is
given in Table S6. au = arbitrary units.
B
A
0.1
0.01
Number of heterozygous sites Amount of mutations (%) Amount of mutations (%)
Significance at
Significance at
0%
25%
50%
75%
100%
0%
25%
50%
75%
100%
0
10000
30000
50000
70000
BE005 BE001
BE009
BE006 BE010
BE036
BE023 BE037
BE025 BE045
BE047
E
C
BE028 BE048
Britain
BE029 BE049
BE050
BE038 BE053
BE061 BE054
BE055
BR001 BE056
BR002 BE064
Mixed
Britain
BE066
BR003 BE069
BR004 BE070
BE079
SP001 BE089
D
C
C
BE090
US
SP003
BE094
WL005 BE099
WL006 BE100
BE101
WL007 WI012
BE014
BE020 BE007
BE024
BE030 BE044
BE033
BE088
E
BE051
D
C
SA002
SP002
Bel/Ger
SP004 BE065
SP011
WI001 BE067
WI003
WI004
US
BE068
Wine
WI005
WI006
WI007 BE071
WI009
WI010
B
A
BE082
WI011
WI013
Beer 2
WI014 BE098
WI015
WI017 BE102
WI018
BE008
BI001
BE012
BE015
BI003
BE016
F
D
BI004 BE022
Mixed
BE026
SA001 BE031
BE041
SA003 BE043
BE046
Asia
SA004
BE073
Bel/Ger
SA005 BE075
BE076
A
A
SA006 BE077
Wine
BE078
SA007 BE081
BE087
WL004 SP005
BE017 BE002
BE018 BE003
BE019
BE035 BE004
BE011
A
BE042
A
BE057 BE013
Asia
BE058 BE021
BE060
BE072 BE027
BE074 BE032
BE093 BE034
BI002 BE039
BI005
LA001 BE040
LA002 BE059
Beer 2
Mosaic
SP006 BE062
SP007 BE063
SP008 BE080
B
SP009
SP010 BE083
WI002 BE084
Mosaic
WI008 BE085
WI016 BE086
WI019
WL001 BE091
WL002 BE092
US
Asia
Wine
Mixed
Britain
Beer 2
Mosaic
Bel/Ger
Homozygous