You are on page 1of 31

Article

Domestication and Divergence of Saccharomyces


cerevisiae Beer Yeasts
Graphical Abstract Authors
Brigida Gallone, Jan Steensels,
Troels Prahl, ..., Guy Baele, Steven Maere,
Kevin J. Verstrepen

Correspondence
steven.maere@psb.vib-ugent.be (S.M.),
kevin.verstrepen@biw.vib-kuleuven.be
(K.J.V.)

In Brief
The history and domestication of yeast
used for making beer and other types of
alcohol are revealed through genomic
and phenotypic analyses.

Highlights
d We sequenced and phenotyped 157 S. cerevisiae yeasts

d Present-day industrial yeasts originate from only a few


domesticated ancestors

d Beer yeasts show strong genetic and phenotypic hallmarks


of domestication

d Domestication of industrial yeasts predates microbe


discovery

Gallone et al., 2016, Cell 166, 1397–1410


September 8, 2016 ª 2016 The Author(s). Published by Elsevier Inc.
http://dx.doi.org/10.1016/j.cell.2016.08.020
Article

Domestication and Divergence


of Saccharomyces cerevisiae Beer Yeasts
Brigida Gallone,1,2,3,4,11 Jan Steensels,1,2,11 Troels Prahl,5 Leah Soriaga,6 Veerle Saels,1,2 Beatriz Herrera-Malaver,1,2
Adriaan Merlevede,1,2 Miguel Roncoroni,1,2 Karin Voordeckers,1,2 Loren Miraglia,8 Clotilde Teiling,9 Brian Steffy,9
Maryann Taylor,10 Ariel Schwartz,6 Toby Richardson,6 Christopher White,5 Guy Baele,7 Steven Maere,3,4,*
and Kevin J. Verstrepen1,2,12,*
1Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven,

Belgium
2Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
3Department of Plant Systems Biology, VIB, 9052 Gent, Belgium
4Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
5White Labs, 9495 Candida Street, San Diego, CA 92126, USA
6Synthetic Genomics, 11149 North Torrey Pines Road, La Jolla, CA 92037, USA
7Department of Microbiology and Immunology, Rega Institute, KU Leuven, 3000 Leuven, Belgium
8Encinitas Brewing Science, 141 Rodney Avenue, Encinitas, CA 92024, USA
9Illumina, 5200 Illumina Way, San Diego, CA 92122, USA
10Biological & Popular Culture (BioPop), 2205 Faraday Avenue, Suite E, Carlsbad, CA 92008, USA
11Co-first author
12Lead Contact

*Correspondence: steven.maere@psb.vib-ugent.be (S.M.), kevin.verstrepen@biw.vib-kuleuven.be (K.J.V.)


http://dx.doi.org/10.1016/j.cell.2016.08.020

SUMMARY due to the presence of ethanol (Michel et al., 1992; Steensels and
Verstrepen, 2014). Whereas the use of pure cultures started well
Whereas domestication of livestock, pets, and crops after the pioneering work of Pasteur and Hansen in the 19th cen-
is well documented, it is still unclear to what extent tury, early brewers, winemakers, and bakers had already learned
microbes associated with the production of food that inoculating unfermented foods with a small portion of fer-
have also undergone human selection and where mented product resulted in fast and more predictable fermenta-
the plethora of industrial strains originates from. tions. This so-called ‘‘backslopping’’ might have resulted in
yeast lineages that grew continuously in these man-made envi-
Here, we present the genomes and phenomes of
ronments and lost contact with their natural niches, providing a
157 industrial Saccharomyces cerevisiae yeasts. perfect setting for domestication. However, strong evidence
Our analyses reveal that today’s industrial yeasts for this hypothesis is still missing and it remains unclear whether
can be divided into five sublineages that are geneti- industrial yeast diversity is shaped by selection and niche
cally and phenotypically separated from wild strains adaptation (domestication) or neutral divergence caused by
and originate from only a few ancestors through geographic isolation and limited dispersal (Goddard and Greig,
complex patterns of domestication and local diver- 2015; Warringer et al., 2011).
gence. Large-scale phenotyping and genome anal- Domestication is defined as human selection and breeding of
ysis further show strong industry-specific selection wild species to obtain cultivated variants that thrive in man-made
for stress tolerance, sugar utilization, and flavor pro- environments, but behave suboptimally in nature. Typical signs
duction, while the sexual cycle and other phenotypes of domestication, including genome decay, polyploidy, chromo-
somal rearrangements, gene duplications, and phenotypes re-
related to survival in nature show decay, particularly
sulting from human-driven selection, have been reported in
in beer yeasts. Together, these results shed light on crops, livestock, and pets (Driscoll et al., 2009; Purugganan
the origins, evolutionary history, and phenotypic di- and Fuller, 2009). Several studies have recently investigated
versity of industrial yeasts and provide a resource the S. cerevisiae population by sequencing the genomes of hun-
for further selection of superior strains. dreds of different strains, providing a first glimpse of the complex
evolution of this species (Almeida et al., 2015; Borneman et al.,
INTRODUCTION 2011, 2016; Liti et al., 2009; Magwene et al., 2011; Schacherer
et al., 2009; Strope et al., 2015). However, most of these studies
Since prehistoric times, humans have exploited the capacity of focused primarily on yeasts from wild and clinical habitats and
the common baker’s yeast Saccharomyces cerevisiae to convert often include only a limited set of industrial strains, mainly origi-
sugars into ethanol and desirable flavor compounds to obtain nating from wine. Moreover, most studies use haploid deriva-
foods and beverages with a prolonged shelf-life, enriched tives instead of natural strains and can therefore not explore
sensorial palate, improved digestibility, and an euphoriant effect typical patterns of domestication like polyploidy, aneuploidy,

Cell 166, 1397–1410, September 8, 2016 ª 2016 The Author(s). Published by Elsevier Inc. 1397
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
and heterozygosity. The use of haploids also excludes a large exclusively produced by strains of the genetically related
fraction of industrial S. cerevisiae strains that have lost the ability Saccharomyces pastorianus. After de novo assembly of each
to sporulate, such as the vast majority of beer yeasts. Neverthe- of the genomes, we inferred a maximum-likelihood phylogenetic
less, some studies already revealed signs of domestication in tree based on codon alignments for 2,020 concatenated single-
wine strains, such as an increased resistance to copper copy nuclear genes shared by each of the 157 isolates and
(present in grapevine pesticides) and sulfite (used as a preserva- the outgroup species Saccharomyces paradoxus (Figure S1A).
tive in wine) (Pérez-Ortı́n et al., 2002; Warringer et al., 2011). An Additionally, we included a representative set of 24 previously
in-depth investigation of strains originating from other industrial sequenced strains belonging to the main established lineages
niches is still lacking. of the S. cerevisiae phylogeny (Liti et al., 2009; Strope et al.,
Here, we describe the high-quality sequencing, de novo 2015), extending the number of strains to 181 (Figure 1A). Trees
assembly, annotation, and extensive phenotyping of 157 constructed from the original and extended datasets are
S. cerevisiae strains used for the industrial production of beer, congruent and show five main lineages that contain the majority
wine, bread, spirits, saké, and bioethanol, in their natural ploidy. of industrial yeasts: Wine (bootstrap support 100%), Beer 1
Our data reveal that industrial yeasts are genetically and pheno- (86%), Beer 2 (56%), Asia (100%), and a Mixed lineage (99%)
typically distinct from wild strains and stem from only a limited containing yeasts used in different industries. Three of these lin-
set of ancestral strains that have been adapting to man-made eages (Beer 1, Beer 2, and Mixed) were not previously described.
environments. They further diversified into five clades: one Next, we studied the population structure in a filtered set of
including Asian strains such as saké yeasts, one mostly contain- 53,929 polymorphic sites accounting for 2,454,052 SNPs across
ing wine yeasts, a mixed clade that contains bread and other all strains, using the Bayesian model-based clustering approach
yeasts, and two separate families of beer yeasts. While most implemented in fastStructure (Raj et al., 2014) (Figures 1B and
clades lack strong geographical substructure, one of the beer S1B). This analysis yields a population structure that is highly
clades contains geographically isolated subgroups of strains consistent with the major lineages defined in the phylogeny
used in continental Europe (Belgium/Germany), the United and identifies mosaicism in 17% of the strains (in which the esti-
Kingdom, and a recent sublineage of United States beer yeasts mated ancestry Q < 0.8 for K = 8 ancestral populations). The
that diverged from the British subclade during colonization. population structure is further supported by a principal compo-
Interestingly, these beer yeast lineages exhibit clear and pro- nent analysis (PCA) on the same SNP data (Figure 1C).
found hallmarks of domestication, more so than the other line- Further analysis of the phylogeny and population structure
ages. The shift from variable, complex, and often harsh environ- reveals that the evolutionary divergence of industrial yeasts is
ments encountered in nature to more stable and nutrient-rich shaped by both their industrial application and geographical
beer medium favored specialized adaptations in beer yeasts, origin. First, most yeasts cluster together according to the indus-
but also led to genome decay, aneuploidy, and loss of a func- try in which they are used and are clearly separated from the wild
tional sexual cycle. Specifically, we find evidence for active or clinical yeasts that have previously been sequenced. This was
human selection, demonstrated by convergent evolution for effi- further confirmed by constructing a larger phylogeny, based
cient fermentation of beer-specific carbon sources, mainly on nine genomic regions, that includes the vast majority of all
through mutations and duplications of the MAL (maltose) genes, sequenced S. cerevisiae strains, 450 isolates in total (Figure S1C;
as well as nonsense mutations in PAD1 and FDC1, which are Table S2). Wine and saké yeasts cluster in the previously identi-
involved in the production of 4-vinyl guaiacol (4-VG), an unde- fied Wine and Asia lineages (Liti et al., 2009). The majority of beer
sirable off-flavor in beer. Our results further suggest that beer yeasts (85.3%) are found in two main lineages (Beer 1 and Beer
yeast domestication was initiated hundreds of years ago, well 2) that are only distantly related. The Mixed clade harbors 7.8%
after the first reported beer production, but before the discovery of all beer strains (most of which are atypical beer yeasts that
of microbes. Together, our results reveal how today’s industrial are used for bottle refermentation of strong Belgian ales) and
yeasts are the outcome of centuries of human domestication contains all bread strains. Interestingly, spirit strains lack this
and provide a new resource for further selection and breeding clear phylogenetic relationship, as they are highly mosaic and
of superior variants. scattered throughout the tree, suggesting that these strains
might be the result of breeding by modern-day yeast companies
RESULTS that sell yeasts for spirits production. Moreover, because spirit
yeasts are typically not re-used after fermentation, they likely
Niche and Geography Drive Diversification had less opportunity to diverge into a separate clade.
To examine the evolutionary history of industrial yeasts, we Within and between the lineages, we also observed geograph-
sequenced the genomes of 157 S. cerevisiae isolates originating ical patterns. For example, most saké yeasts form a monophy-
from various sources in their natural ploidy to a median coverage letic group and cluster together with wild isolates and bioethanol
of 1353 (min = 263, max = 4033) (for details on data analysis, strains from China, while South American bioethanol strains are
see STAR Methods). This collection includes 102 industrial closely related to strains used to produce cachaça, a Brazilian
beer strains, 19 wine strains, 11 spirit strains, 7 saké strains, 7 sugarcane spirit. Moreover, the Beer 1 clade consists of three
strains isolated from spontaneous fermentations, 5 bioethanol separate subpopulations, each reflecting geographically distinct
strains, 4 bread strains, and 2 laboratory strains (Table S1). Inter- groups: Belgium/Germany, Britain, and the United States. The
estingly, ten of these S. cerevisiae beer strains are used for com- absence of genetic admixture among these subpopulations indi-
mercial production of lager beers, which were believed to be cates that these strains diverged allopatrically after the initial split

1398 Cell 166, 1397–1410, September 8, 2016


A
Beer 2

LA YC3467
BE003
BE039
BE063

BE040
BE080

BE011
BE021

BE085
BE059

BE027
BE0321

WI01 4
BE09 2

BE00 9
Origin

BE0692

NC 2
BE00
BE0 86
BE0013

BE0002
BE 034

W 18

yjmP010 50
BE 083

yjmI016
W

BE 084

S 14
BE 004
Beer A

W m6 20
SP 06

W L0 8 2
SP I002 5

yj 3

W L00 01
B I00 0
Wine

L L0 3
0
B 03 4

32 574 6
N A0 02
BE 01 4

YC 3 6
BE I01 07

C C 2
N C 01
Spirits

N CY YC3

65
W I0 18
W I0
Saké

W
NC W 5
YC WI I01 00 03

As
Wild 3 0 0
W 44 06 SA A0 04
NC 7 S A 0 07

ia
I 0
Bio-ethanol YC WI0 17 S A0 01
3 0 S A0 06
B 56 3 S A0 3
Bread NC SAE0208 S I00
e

YC 00 B I001
Win

Laboratory yj 331 2 B I004 89


y m2 9 B m13 445
Clinical NC jm12 44 yj YC3 0
YC 42 NC 140 81
3
W 570 yjm C35

NA
S.paradoxus yjm I015
12 NCY1273
WI0144 yjm C3448
SP01 1 NCY1342
Lineage WI0041 yjm 3449
WI005 NCYC
BE088 WL004
WI002
Beer 1 BE024 yjm1418

M
WI009 yjm1447
Britain SP002 spar
WI013 BE054
US BE033 BE049
WI001 BE101
SP007 BE037
Belgium/Germany SP008 BE048
WI008 BE03
Mixed 6 BE01 6
BE00 1 BE0 0
SP0029 BE0 79
Wine BE0 28 BE0 89
E
B 05 0 BE 64
Beer 2 BE0009 BE 100
SP 038 BE 047
BE 003 WI 009
West Africa (WA) SP 061 B 012
B E07
Mix

tain
BE 023 4 B E05 0
Asia BE 0 0 2 BEE05 3
BRR00 5 B 0 6
ed

B E0 55

Bri
North America (NA) B E02 0 1 B E0 69
B R0 05 B E0 01
B L0 07 BE E0 99
Malaysia (M) W L0 006 04 50
B E0 0 3

B E0 066
W L 5

E0 94
B E 0 72

B E 60
B R0

W
BE E09 7 4

B E0 35

90
B

B 01 3

B 0 7
B E04 7

BEE05 9
Mosaic
BEE00 6

B E01 2
BE 0418

B E10 8
B 09
BE 012

BEE007
BE 073

B 071
BE 087

BE 067
BE 015

BE 044
BE0043

BE 068
BE0 81

BE 82
SP0016

BE0 5
BE02 5

BE0651
BE0226

BE0 6
BE09
BE077

BE097
BE076

BE095
BE075

BE052
BE078

BE042
BE031 BE058

1
er
Be
Be
lgiu
m/G US
erma
ny
0.005
B C
1.00
0.3
Britain
0.50 US
K=2 Belgium/Germany
Estimated ancestry (Q)

0.00 Beer 2
1.00 0.2 Mixed
Wine
Asia
PC2 (9.9%)

0.50
K=4 Mosaic
NS
0.00
1.00
0.1

0.50
K=6
0.00 0.0
1.00

0.50
K=8
0.00
-0.1

Belgium/ −0.15 −0.10 −0.05 0.00 0.05 0.10


Britain US Mixed Wine Beer 2 Asia Mosaic NS
Germany
PC1 (11.7%)

Figure 1. Phylogeny and Population Structure of Industrial S. cerevisiae Strains


(A) Maximum likelihood phylogenetic tree of all S. cerevisiae strains sequenced in this project supplemented with a representative set of 24 previously sequenced
strains (Liti et al., 2009; Strope et al., 2015) and using Saccharomyces paradoxus as an outgroup. Black dots on nodes indicate bootstrap support values <70%.
Color codes indicate origin (names) and lineage (circular bands). The basal splits of the five industrial lineages are indicated with colored dots. Mosaic strains
identified in this study are indicated with black dots next to the strain codes. Branch lengths reflect the average number of substitutions per site. Scale bar, 0.005
substitutions per site.
(B) Population structure identified in the 157 surveyed strains. The vertical axis depicts the fractional representation of resolved populations (colors) within each
strain (horizontal axis, strains listed in Figure S1C) for K = 2, 4, 6, and 8 assumed ancestral populations (where K = 8 maximizes the marginal likelihood and best
explains the data structure). Mosaic strains (i.e., strains that possess <80% ancestry from a single population) are visualized as a separate group.
(C) Principal component projection, using the same set of SNPs as in Figure 1B. Colors represent different populations. WA, West Africa; NA, North America; M,
Malaysia; NS, not specified.
See also Figure S1 and Tables S1, S2, and S8.

Cell 166, 1397–1410, September 8, 2016 1399


Table 1. Genetic Diversity within Each Subpopulation of Industrial S. cerevisiae Strains
Subpopulation Number of Strains Analyzed Sites Segregating Sites p Qw
Britain 26 12,018,937 101,881 3.13E-03 1.88E-03
United States 10 11,973,239 72,559 2.31E-03 1.72E-03
Belgium/Germany 18 12,017,007 108,560 3.12E-03 2.19E-03
Mixed 17 12,043,532 132,188 4.35E-03 2.69E-03
Wine 24 12,052,956 114,133 1.59E-03 2.15E-03
Beer 2 21 12,063,361 142,745 2.95E-03 2.77E-03
Asia 10 12,035,745 99,879 2.39E-03 2.36E-03
The number of strains per subpopulation, the amount of analyzed and segregating sites, as well as nucleotide diversity (p) and population mutation rate
(Watterson’s q, qw) are indicated.

(Figure 1B). Moreover, the high nucleotide diversity within each incidence of polyploidy and aneuploidy (R2 0.14, p < 0.001;
of the Beer 1 sublineages exceeds that within the Wine popula- average genome content of 3.52, SD = 0.67, Figures 2A, 2F,
tion, suggesting that the split did not happen recently (Table 1). and 2G), which is linked to extensive chromosomal loss and
Compared to Beer 1, Beer 2 is more closely related to the general genome instability (Sheltzer et al., 2011).
Wine lineage and includes 20.6% of all brewing strains. Howev- CNVs are not uniformly spread across the genome. Consid-
er, in contrast to the Beer 1 group, the Beer 2 lineage lacks ering subtelomere lengths of 33 kb (Brown et al., 2010), on
geographic structure and contains yeasts originating from average 39.7% of subtelomeric nucleotide positions are affected
Belgium, the United Kingdom, the United States, Germany, by CNV events compared to 9.54% of non-subtelomeric nucle-
and Eastern Europe. The presence of two major genetically otide positions (4.1-fold difference, Wilcoxon signed-rank test,
distinct sources of beer yeasts hints toward two independent p < 0.001). However, not all subtelomeres are equally prone to
European domestication events, one of which is at the origin of CNV: most variability is detected in ChrI, ChrVII, ChrVIII, ChrIX,
both the Wine and Beer 2 clade. ChrX, ChrXII, ChrXV, and ChrXVI (Figure 2A). Gene ontology
(GO) enrichment analysis reveals that genes involved in nitrogen
Remarkable Structural Variation in Beer Yeasts and carbon metabolism, ion transport, and flocculation are
Variation in genome structure, such as polyploidy, aneuploidy, most heavily influenced by CNVs (Table S3), which is in line
large segmental duplications, and copy-number variations with previous results (Bergström et al., 2014; Dunn et al.,
(CNVs), have repeatedly been found in association with domes- 2012). Interestingly, some CNVs seem linked to specific environ-
tication and adaptation to specific niches in experimentally ments (Table S4), suggesting that CNVs may underlie niche
evolved microbes (Bergström et al., 2014; Borneman et al., adaptation. For example, many genes involved in uptake and
2011; Dunham et al., 2002; Dunn et al., 2012; Pavelka et al., breakdown of maltose (present in saké medium, main carbon
2010; Rancati et al., 2008; Selmecki et al., 2009; Voordeckers source in beer, but absent from grape must) are amplified in
et al., 2015) and in association with domestication of higher beer and saké-related subpopulations, while they are often lost
organisms (Purugganan and Fuller, 2009). in strains from the Wine subpopulation (false discovery rate
Sequencing the yeast strains in their natural ploidy allowed [FDR] q value < 0.001).
analysis of gross chromosomal rearrangements and aneu-
ploidies (Figure 2A). We detected a staggering 15,288 deletion Relaxed Selection on Sex and Survival in Nature
and amplification events across all strains, covering on average Apart from selection for industrial traits, domestication is also
1.57 Mb per strain. The size of the regions ranges from complete characterized by relaxed selection and potential loss of costly
chromosomes (resulting in aneuploidies) to small local variations traits that are not beneficial in the man-made environment (Dris-
of a few kilobases (kb), all of which we will refer to as ‘‘CNVs.’’ coll et al., 2009). In order to chart the phenome of our collection
The extent of deletions significantly exceeds that of amplifica- and investigate signs of selection for some traits and loss of
tions, respectively 1.07 Mb and 0.50 Mb on average per strain others, 82 phenotypes, such as aroma production, sporulation
(2.15-fold difference, Wilcoxon signed rank test, p < 0.001). We characteristics, and tolerance to osmolytes, acids, ethanol,
observed significant variation among strains originating from and low and high temperatures, were measured in all strains (Fig-
different industries in the total frequency of CNV events (ANOVA ures 3A and S3; Table S5). Hierarchical clustering of the pheno-
F test, p < 0.001) and the fraction of the genome affected types resolves the main phylogenetic lineages and reveals a
(ANOVA F-test, p < 0.001) (Figure S2). Pairwise comparisons of moderate correlation between genotype and phenotype dis-
subpopulations and industries show no significant differences tances between strains (Spearman correlation 0.33), which is
in the load of amplifications between strains from different indus- further increased (Spearman correlation 0.36) when mosaic
tries or subpopulations, but we detected significant differences strains, for which genetic distance has no straightforward evolu-
in the load of deletions between strains from the wine (median = tionary interpretation, are omitted (Figure 3A). Moreover, the
0.51 Mb) and beer (median = 0.94 Mb) industry (Tukey honest clustering splits the collection into two main phenotypic sub-
significant difference [HSD], p < 0.05) (Figures 2B–2E). This groups: one largely overlapping with the Beer 1 clade that con-
high incidence of CNV in beer strains goes together with a high tains the majority of the Belgium/Germany, United States, and

1400 Cell 166, 1397–1410, September 8, 2016


A Origin
1 2 3 4 5
Beer
I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI
WI002
WL004
Wine
BI004
BI003 Spirits

Asia
BI001
SA001
SA006 Saké
SA003
SA005
SA007
Wild
SA004
WL002
WL003
Bioethanol
WL001
LA001
SP010
Bread
WI016
BE018
LA002
Laboratory
BE002
WI019 S.paradoxus
BE004
BE027
BE085
BE040 >= 2-fold amplification

Beer 2
BE011
BE039 Amplification
BE003 < 2-fold amplification
BE086
BE013
BE034
BE084
BE080
BE021 1 =< deletion < n
BE032 Deletion
BE062
SP006 complete deletion
SP004
BI005
BI002
BE014
BE030
WI014
WI007
WI018
WI010
WI017
WI003
WI006
SA002

Wine
WI011
WI015
BE020
WI004
SP011
WI005
WI009
BE088
BE024
WI001
SP002
WI013
BE033
SP008
SP007
WI008
BE006
SP001
BE028
BE029
BE005

Mixed
BR004
BE023
BE038
SP003
BE025
BR001
BR003
BR002
WL007
WL005
BE074
BE093
BE017
BE046
BE008
BE041
BE087
BE073

Bel/Ger
BE012
BE015
BE043
BE081
BE016
SP005
BE022
BE026
BE077
BE076
BE075
BE078
BE031
BE058
BE042
BE052
BE097
BE096
BE095
BE051
BE065
BE071
BE067
US
BE007
BE102
BE098
BE082
BE068
BE044
BE060
BE035
BE019
BE057
BE066
BE094
BE090
BE099
BE050
BE045
BE069
BE001
BE055
BE056
Britain

BE053
BE047
WI012
BE009
BE089
BE079
BE036
BE010
BE100
BE064
BE048
BE037
BE101
BE054
BE049

1 2 3 4 5
Estimated
ploidy (n)

B C
Beer Bioethanol Bread Saké Spirits Wild Wine Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
2.5 2.5
Amplifications (Mb)

Amplifications (Mb)

2 2

1.5 1.5

1 1

0.5 0.5

0 0

D E
Beer Bioethanol Bread Saké Spirits Wild Wine Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
Deletions (Mb)

Deletions (Mb)

6 6

4 4

2 2

0 0

F G Britain
Estimated ploidy (n)

Estimated ploidy (n)

Beer
Bioethanol US
4 4
Bread Bel/Ger
Lab Beer 2
3 Saké 3 Mixed
Spirits Wine
Wild Asia
2 Wine 2
R2 = 0.14, p < 0.001 R2 = 0.14, p < 0.001 Mosaic
0 2 4 6 8 0 2 4 6 8

CNV load (Mb) CNV load (Mb)

(legend on next page)

Cell 166, 1397–1410, September 8, 2016 1401


Britain beer yeasts as well as mosaic strains containing major Selection for Industrial Phenotypes
genome fractions of these subpopulations and a second one A key hallmark of domestication is phenotypic adaptation to
where the remaining genetic subpopulations are strongly over- artificial, man-made niches and accentuation of traits desirable
represented (Fisher’s exact test, Bonferroni corrected p < for humans. Phenotypic evaluation of the strains for industrially
0.001). Overall, strains from the Beer 1 clade perform poorly in relevant traits (including aroma production, ethanol production,
general stress conditions that are not usually encountered in and fermentation performance) shows that many strains harbor
the brewing environment (Figure S3; Table S6). In contrast, phenotypic signatures linked to their industrial application.
strains from the Wine subpopulation show superior performance The ability to accumulate high concentrations of ethanol, for
in general stress conditions, which likely reflects the high-sugar example, seems tightly linked to industrial niche. Beer 1 strains
and high-alcohol environments encountered in wine-making, as typically generate only 7.5%–10% v/v of ethanol, while strains
well as survival in potentially nutrient-poor and harsh natural en- used for the production of high-alcohol products like saké,
vironments in between the grape harvest seasons. spirits, wine, and especially bioethanol, can produce up to
Saccharomyces cerevisiae is a facultative sexual organism. 14.5% v/v (Figure 3B; Table S6).
While its main mode of reproduction is clonal, sporadic sporula- With the exception of a few wine yeast characteristics (see
tion can help to survive periods of stress (Briza et al., 1990). It has earlier) (Figures 3C and 3D), it remains unclear whether genetic
been shown that in yeast, sexual reproduction is beneficial when and phenotypic variation between S. cerevisiae lineages is pri-
adapting to new, harsh niches, but plays a lesser role in more marily caused by human-driven selection and domestication,
favorable environments (Goddard et al., 2005; McDonald et al., or if neutral genetic drift or non-human selection are involved.
2016). Our data show that there are large systematic differences To assess this further, we compared the phenotypic behavior
in the reproductive lifestyle of yeasts inhabiting different indus- of different subpopulations for two industrially relevant traits
trial niches: 44.4% of the Beer 1 population is obligate asexual, for which the genetic underpinnings are largely known, namely
while this trait ranges between 0% and 21% in the other popula- maltotriose fermentation and the production of 4-vinyl guaiacol
tions (Figure 4A) and is absent in wild strains. Furthermore, over (4-VG), the main compound responsible for phenolic off-flavors
80% of the non-mosaic Beer 1 strains that are able to sporulate (POF). Beer yeasts show a significantly higher capacity to
show little or no spore viability (Figure 4B). Additionally, beer metabolize maltotriose, a carbon source specifically found in
yeast lineages generally show a high level of heterozygosity, beer medium (Figure 3E; Table S6). Efficient utilization of malto-
especially Beer 1. Compared to the Wine clade for example, triose correlates with the presence of a specific allele (AGT1)
strains from the Beer 1 and Beer 2 clade have on average of the sugar transporter MAL11, known to show high affinity
5.10-fold (Tukey HSD, p < 0.001) and 2.04-fold (Tukey HSD, for maltotriose (phenotypic variability explained by SNPs in
p = 0.06) more heterozygous sites, respectively (Figures 4C, MAL11 77.40%, SE 0.5%). This allele is only present in Beer
S4A, and S4B). The lack of genetic admixture suggests that 1 subpopulations and some mosaic strains, while the complete
this heterozygosity was acquired during long periods of asexual MAL1 locus (including the MAL11 gene) is absent in the Wine
reproduction, rather than through outbreeding. Further analysis subpopulation (Table S7). Interestingly, strains of the Beer 2 sub-
of the correlation between sexual lifestyle and genome structure population are generally able to ferment maltotriose but contain
shows that spore viability is weakly anticorrelated with the het- various frameshift mutations in MAL11 and show a reduced
erozygosity level (R2 0.17; p < 0.001) and the fraction of the CNV for the complete MAL1 locus, suggesting that other, yet un-
genome associated with large (>20 kb) amplifications and dele- known mechanisms facilitate maltotriose uptake in this lineage,
tions (R2 0.16; p < 0.001), while sporulation efficiency is only and maltotriose metabolism evolved convergently in the Beer 1
significantly anticorrelated with the latter (R2 0.19; p < 0.001) and Beer 2 lineages.
(Figures 4D–4G). Yeasts used for the production of alcoholic beverages ideally
Together, this indicates that the genome of beer yeasts, but should not produce undesirable aromas. Although tolerated in
not wine yeasts, show signs of decay and loss of survival skills some specialty beers, the presence of 4-VG, a compound with
outside a specific man-made environment, probably caused a spicy, clove-like aroma, is generally undesired in saké, wine,
by their long (estimated >75,000 generations) and uninterrupted and most beer styles. Two genes, phenylacrylic acid decarbox-
growth in rich medium. ylase (PAD1) and ferulic acid decarboxylase (FDC1), both

Figure 2. Ploidy and Copy-Number Variation in Industrial S. cerevisiae Strains


(A) Genome-wide visualization of copy-number variation (CNV) profiles, with the aggregate profile across all strains depicted on the top. Estimates for the nominal
ploidy (n) values of the strains are represented by a bar chart next to the strain codes. Heat map colors reflect amplification (red shades) or deletion (blue shades)
of genomic fragments. A distinction is made between completely deleted fragments (dark blue) and fragments of which at least one copy is still present (light blue).
Similarly, highly amplified fragments (copy number R2-fold the basal ploidy) are depicted in dark red, while low and moderately amplified fragments (copy
number <2-fold the basal ploidy) are depicted in orange. For strains with no estimated ploidy available, colors are only indicative of the presence of amplifications
(orange) or deletions (light blue). Roman numbers indicate chromosome number. Strains are clustered according to their genetic relatedness as determined in
Figure S1A. Origin (name colors) and population (colored rectangles) are indicated on the figure.
(B–E) Violin plots describing the density of amplifications and deletions across different industries and subpopulations. Triangles indicate the median within each
group.
(F and G) Correlations between levels of CNV load (Mb) and estimated ploidy (n), by industry and subpopulations.
See also Figure S2 and Tables S3 and S4.

1402 Cell 166, 1397–1410, September 8, 2016


A
SP006
BR001
BR003
BR002
BR004
SP003
LA001
SP007
BE038
BE023
WL002
WL003
WL001
BE006
BE014
BE005
BE028
BE029
BI001
BI003
BI004
BE102
BE034
BE025
WI002
WL004
WI016
BI005
BI002
SP004
WI015
WI006
WI014
WI018
WI013
WI010
WI007
WI005
WI011
WI003
SP001
WI009
SP002
WI001
BE033
BE024
SP011
WI004
WI008
LA002
BE018
BE020
WI017
BE086
BE080
BE021
BE011
BE013
BE004
BE032
BE027
BE039
BE040
BE091
BE059
BE063
BE062
BE002
BE003
BE083
BE084
SA002
SA003
SA006
SA001
SA004
SA005
SA007
SP009
SP010
WL007
WL006
WL005
WI019
BE085
BE092
SP008
BE061
SP005
BE075
BE073
BE087
BE012
BE035
BE041
BE078
BE081
BE077
BE030
BE043
BE022
BE026
BE031
BE015
BE100
BE099
BE101
WI012
BE009
BE037
BE036
BE001
BE010
BE008
BE090
BE050
BE045
BE079
BE070
BE064
BE055
BE058
BE049
BE066
BE069
BE048
BE054
BE056
BE047
BE046
BE044
BE053
BE065
BE051
BE057
BE067
BE068
BE071
BE082
BE072
BE074
BE093
BE052
BE088
BE089
BE094
BE076
BE016
BE060
BE017
BE019
BE007
BE098
BE042
KCl (500mM) z-score
KCl (1000mM) 1.5
KCl (1500mM)
NaCl (250mM)
NaCl (500mM)
NaCl (1000mM) 0.5
4°C
16°C
30°C
40°C -0.5
Ethanol (5%)
Ethanol (7%)
Ethanol (9%)
Environmental stress

Ethanol (10%) -1.5


Ethanol (11%)
Ethanol (12%) No Data
Sulfite (1.50mM)
Sulfite (2.25mM)
Sulfite (3.0mM)
Glucose (46%)
Sorbitol (46%)
Glucose (48%)
Sorbitol (48%)
Glucose (50%)
Sorbitol (50%)
Copper (0.075mM)
Cadmium (0.3mM)
Cadmium (0.4mM)
Cadmium (0.5mM)
Actidione (0.2 mg L-1)
Actidione (0.4 mg L-1)
Actidione (0.5 mg L-1)
Acetic acid (50mM)
Acetic acid (75mM)
Acetic acid (100mM)
Levulinic acid (25mM)
Levulinic acid (50mM)
Levulinic acid (75mM)
Formic acid (25mM)
Formic acid (50mM)
Formic acid (75mM)

Glycerol (20°C)
Melibiose (20°C)
Sorbitol (20°C)
Ethanol (20°C)
Galactose (20°C)
Nutrient stress

Fructose (20°C)
Sucrose (20°C)
Maltose (20°C)
Maltose (liquid)
Maltotriose (liquid)
Glucose (10°C)
Ethanol (10°C)
Fructose (10°C)
Sucrose (10°C)
Maltose (10°C)
Glucose (39°C)
Ethanol (39°C)
Fructose (39°C)
Sucrose (39°C)
Maltose (39°C)
Lysine

Ethyl propionate
Ehtyl butyrate
Ethyl hexanoate
Ethyl octanoate
Ethyl decanoate
Aroma

Propyl acetate
Isobutyl acetate
Ethyl acetate
Isoamyl acetate
Phenyl ethyl acetate
Phenyl ethanol
Butanol
Isoamyl alcohol
Isobutanol
POF
Acetaldehyde
Other

Flocculation
Ethanol production
Sporulation efficiency
Spore viability

Subpopulation

B C

Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
Ethanol production (% v/v)

15.0 100
Copper tolerance (au)

12.5 75

10.0 50

7.5 25

5.0 0

D E
Maltotriose fermentation (au)

Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
100 100
Sulfite tolerance (au)

75 75

50 50

25 25

0 0

Figure 3. Trait Variation of Industrial S. cerevisiae Strains


(A) Heat map representation of phenotypic diversity within industrial S. cerevisiae strains. Phenotypic values are calculated as Z scores (normalized values) and
colored according to the scale on the right. Missing values are represented by gray shadings. Strains are hierarchically clustered based on phenotypic behavior.
Strain names are colored according to geographical origin, as in Figure 1A. The corresponding subpopulation of each strain is indicated by the colored bar below
the figure, according to the color code of Figure 1B.
(B) Ethanol production (depicted as % v/v1) of all strains from different subpopulations in fermentation medium containing 35% glucose.
(legend continued on next page)

Cell 166, 1397–1410, September 8, 2016 1403


located as a cluster in the subtelomeric region of ChrIV, control from a wine strain (4-VG production) that is only desirable in spe-
4-VG production. PAD1 encodes a flavin prenyltransferase that cial beer styles.
catalyzes the formation of a flavin-derived cofactor, which is
required by Fdc1 for decarboxylation of the precursor ferulic Creating Superior Hybrid Yeasts through Marker-
acid (White et al., 2015). Pad1 and Fdc1 help to detoxify phenyl- Assisted Breeding
acrylic acids found in plant cell walls (Mukai et al., 2010). There- Apart from yielding insight into the origins of today’s industrial
fore, it would be expected that, unless there is counterselection, yeasts, our results also open new routes for the creation of new su-
activity of these genes is retained. Interestingly, phenotypic perior strains. The availability of genomic data and the increasing
profiling reveals that many industrial strains have lost the ability number of polymorphisms that are known to contribute to indus-
to produce 4-VG, while it is generally retained in wild strains, trially relevant phenotypes enables rapid DNA-based selection of
as well as in bakery and bioethanol strains (Figure 5B). In superior segregants and hybrids in large-scale breeding schemes.
these cases, 4-VG production is likely less detrimental, either Such marker-assisted breeding is already intensively used for
because the flavor disappears during baking, or the product is crop and livestock breeding, because it circumvents labor-inten-
not destined for consumption. Sequence analysis shows that sive and time-consuming phenotyping. As proof-of-concept, we
many industrial strains, especially beer and saké strains, ac- combined our genomic and phenotypic data to obtain new hy-
quired loss-of-function mutations (SNPs and/or frameshift In- brids with altered aromatic properties using marker-assisted
Dels) in PAD1 and/or FDC1, while this was never observed in breeding. Specifically, a 4-VG producing beer strain harboring a
strains from natural environments or bioethanol production heterozygous loss-of-function mutation in FDC1 (strain BE027)
(Figure 5A). Moreover, different sublineages acquired different was selected and sporulated to obtain segregants. Next, the
disruptive mutations, hinting to the presence of diverse conver- FDC1 allele of the segregants was genotyped using mismatch
gent adaptive strategies in response to human selection against PCR. Two segregants, one harboring the loss-of-function allele
4-VG production. and one harboring the functional allele, were crossed with segre-
To investigate the origin and the maintenance of the pheno- gants of SA005, an Asian saké strain with a homozygous non-
typic diversity in 4-VG production, we used Bayesian inference functional FDC1 allele, resulting in hybrids with good beer fermen-
to reconstruct the ancestral phenotypic state in the two key tation characteristics but drastically different aroma profiles
genes PAD1 and FDC1 using BEAST (Drummond et al., 2012) (4-VG+ versus 4-VG) that suit specific beer styles (Figure 5D).
(Figure 5C). Shifts from 4-VG+ to 4-VG and vice versa occurred
frequently after the initial split from S. paradoxus. In both the Domestication Predates Microbe Discovery
PAD1 and the FDC1 trees, an early subclade containing most Despite its wide use in industry and as a model organism, little is
Beer 1 strains acquired loss-of-function mutations at the base known about the ecology and evolutionary history of S. cerevisiae.
of the clade, suggesting that already very early during domesti- Moreover, because early brewers, winemakers, and bakers were
cation of the Beer 1 lineage, a 4-VG variant was derived from unaware of the existence of yeast, there is no record of how yeasts
the 4-VG+ ancestor. Several other loss- and gain-of-function made their way into these processes, nor how yeasts were prop-
mutations occurred across both trees, most notably the loss- agated and shared. As a result, it has proven difficult to estimate
of-function mutation in FDC1 of the Asian saké (but not bio- when specific industrial lineages originated. Moreover, current
ethanol) strains. demographic and molecular clock models of S. cerevisiae employ
Interestingly, a strong incongruence between single gene the experimentally determined mutation rate of the haploid lab
trees and the strain phylogeny is present for three beer strains strain S288c in rich growth medium (Lynch et al., 2008), while it
used in the production of German Hefeweizen beers (BE072, is known that the mutation rate is heavily influenced by the genetic
BE074, and BE093). Hefeweizen (wheat) beer is a traditional background (Filteau et al., 2015), ploidy (Sheltzer et al., 2011),
German beer style and one of the few styles where a high growth speed (van Dijk et al., 2015), and environmental stress
4-VG level is desirable because it contributes to the typical (Voordeckers et al., 2015), factors that are likely very different
smoky, spicy aroma of these beers. Phylogenetically, Hefewei- for industrial, wild, and lab yeasts. However, our dataset, and spe-
zen yeasts cluster within the Beer 1 lineage, but they are shown cifically the Beer 1 clade, provides a strong tool for dating beer
to be highly mosaic, containing genomic fragments of all three yeast divergence. First, given the absence of a functional sexual
Beer 1 subclades (mainly from Belgium/Germany). Only a small cycle and lack of admixture, exclusively clonal reproduction can
fraction (8%–13%) of the genome originates from the Wine be assumed. Second, our data show that United States beer
subpopulation, but this fraction includes the subtelomeric region yeasts are related closest to European beer yeasts, suggesting
of ChrIV, containing a functional PAD1 and FDC1 allele. This that they were imported from Europe during colonization, rather
suggests that hybridization between different domesticated sub- than stemming from indigenous wild United States yeasts. More
populations yielded variants combining the typical traits of beer specifically, United States beer yeasts seem phylogenetically
yeasts, including maltotriose fermentation, with a particular trait most closely related to British beer yeasts (Figure 1A), which is

(C) Growth of all strains from different subpopulations on medium supplemented with 0.075 mM copper, relative to growth on medium without copper.
(D) Growth of all strains from different subpopulations on medium supplemented with 2.25 mM sulfite, relative to growth on medium without sulfite.
(E) Growth of all strains from different subpopulations in medium containing 1% w v1 maltotriose as the sole carbon source, relative to growth on medium with
1% w/v1 glucose. au, arbitrary units; Bel/Ger, Belgium/Germany.
See also Figure S3 and Tables S5, S6, and S7.

1404 Cell 166, 1397–1410, September 8, 2016


A B
Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic Britain US Bel/Ger Beer 2 Mixed Wine Asia Mosaic
100 100
Sporulation efficiency (%)

Spore viability (%)


75 75

50 50

25 25

0 0

C
I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI
1.00

Britain
0.50

0.00
1.00

US
0.50

0.00
1.00

Bel/Ger
Level of heterozygosity

0.50

0.00
1.00

Beer 2
0.50

0.00
1.00

Mixed
0.50

0.00
1.00

Wine
0.50

0.00
1.00

Asia
0.50

0.00

D E
80 R2 = 0.17, p < 0.001 80 R2 = 0.03, p = 0.022
Nr. of heterozygous SNPs (x103)
Nr of heterozygous SNPs (x103)

Britain Britain

60 US 60 US
Bel/Ger Bel/Ger
Beer 2 Beer 2
40 40
Mixed Mixed
Wine Wine
20 Asia 20 Asia
Mosaic Mosaic
0 0
0 25 50 75 100 0 25 50 75 100
Spore viability (%) Sporulation efficiency (%)

F G
Amplified/Deleted genome fraction (%)

Amplified/Deleted genome fraction (%)

R2 = 0.16, p < 0.001 R2 = 0.19, p < 0.001


Britain Britain
60 60
US US
Bel/Ger Bel/Ger
40 Beer 2 40 Beer 2
Mixed Mixed
Wine Wine
20 20
Asia Asia
Mosaic Mosaic
0 0
0 25 50 75 100 0 25 50 75 100

Spore viability (%) Sporulation efficiency (%)

Figure 4. The Reproductive Lifestyle of Industrial S. cerevisiae Strains


(A) Violin plots depicting sporulation efficiency of all strains from different subpopulations.
(B) Violin plots depicting spore viability of all sporulating strains from different subpopulations.
(C) Visualization of the level of heterozygosity across the genome of the different subpopulations, calculated as the ratio of heterozygous/homozygous SNPs in
10 kb windows.
(D–G) Scatter plots depicting the correlation between the number of heterozygous loci and spore viability (D) or sporulation efficiency (E), and the correlation
between the fraction of the genome subjected to large (>20 kb) structural variation and spore viability (F) or sporulation efficiency (G). Dot colors indicate
subpopulations similar to the color code of Figure 1B.
See also Figure S4.

confirmed by the average per-site nucleotide divergence (dxy), signed-rank test, p < 0.001) (Table S8). This suggests that the
which is significantly lower between Britain and the United States origin of the United States brewing strains can be traced back
(average dxy = 1.97 3 103) than between Belgium/Germany and to the introduction of beer culture in the United States by early
the United States strains (average dxy = 2.26 3 103) (Wilcoxon 17th century British settlers (Van Wieren, 1995). Third, in contrast

Cell 166, 1397–1410, September 8, 2016 1405


A B
PAD1 FDC1
100% 4-VG- 100%

c.495_496insA
4-VG-
4-VG+

c.864delA
4-VG+
75% 75%
W102*

W497*
Q154*

R309*
Q86*
Y98*

K54*
BE049
BE054
50% 50%
BE101
BE037
BE048
BE064
BE100
25% 25%
BE010
BE036
BE079
BE089
BE009
WI012
0% 0%

Wine
Spirits
Saké
Wild
Bioethanol
Bread
Laboratory
Beer

Britain
US

Beer 2
Mixed
Wine
Asia
Mosaic
Bel/Ger
BE047
BE053
BE070
BE056
BE055
BE001
BE069
BE045
BE050
BE099
BE090
BE094
BE066
BE057
C
BE019
BE035
BE060
BE068
BE044 PAD1 FDC1

WL003
BE082

SP006
WI004
WI011
B E076

BE086
WI015
WI005
WI018

BE081
WI005

SP010
BE031
WI017

BE058
BE078
WI014

WI016

7
BE01
BE061
WI007

1
6
WI01

LA00

SP00
BE098

SP00 8
BE02

BE06
SP00
8
BE01

SP00 3
8

BE03
WI0

WI00

BI0

1
WL 02
BE03

SP00 9
WI0 3
BE102 BE 02

B R 09
SP00

WI0 17

8
005

BR 01
SP 88

2
BE 03

05
01

004

0
4

4
BE 02

BE 001
09
03

WL 003
BE007

SP
0
WI0 02
BE 06
SA 021

W 005
0
SP

BE 014
0

W 028

BE 007
BE 080

SP
0

BE 011
BE067

BR 007
0

0
BE 084

B E 07 2
SP 004
B 001

L
BE 07
BE 010 3

4
L
BE071

BR 00 3
6
SP 06 6

B E09 3
BE E 0 2

BR E06 8

9
B E07 9
BE065
BE 00

0
B 02

E0 2
B I00
00 7
W

43
B

2
W
BE051
L

3
B
B E03 3

B
2
02 5 BE E03 00 9
W E08 2

R
BE095 BE 026 0 BE I01 6
L
BE 003 3 BE E07 1
BE096 B 08 BE 077 W I00
BE 062 BE I002 BE 022 W 007
BE097 BE 030 SA 059
W 003 BE 075
BE052 BE 017 S A 059 0 BE 003
BE 7
BE042 BE 027 BE 001 0
8 SA 001
0 BE 58 SA 004
BE058 BE 04 SA 001 0
0 LA 0 5 BE 76 SA 0 5
BE 20 0 03
SA
0
BE031 0 SA 6 BE 1
006
BE 11 00 01
SA
BE078 03 SA SP00 6
BE04 9 4
SA00 7 5 BI0
03
BE075 0 BE0
WI0
11 SA00 BE03
04 BI004
BE076 1
BE07 BI00
1 9 BI00
BE04
BE077 BE08
4
WL0
01 0 WL002
6 BE011
BE026 BE043 BI003 WL001
WI014 BE023
BE022 WI010 WL002
WI007 LA001
SP005 BR002 BI004
spar BE020 spar
BE016 SP011
BI002 WI018 BI002
BE081 WI006
BE096 WI010 BE094
BE043 BE034 BE036
BE036 WI013
BE015 BE003 BE037 WI012
BE088
BE091 BE057 WI009 BE099
BE012
SP007 BE049 WI001 BE079
BE073
BE002 BE045 BE033
BE070
BE087 BE01
4 BE08 4 BE06
BE041 3 2 BE02 6
BE09 BE05 2 BE04
SP00 7
BE008 WI0
04 BE0 0 83 BE0
4 55 BE0 9 64
BE01 BE01
BE046 BR00 5 BE02
08 BE 0 BE 0
BE017 BE 097 17
WI0 13
051
005 BE BE
BE093 WL 15 BE 0 6 0
0
B E 27 BE
090
WI0 72 1
BE 00 0
BE 06 BE 054
BE074 0
BE 13 BE 035 0
WL 025 BE 089
BE072 0
BE 03 BE 067 BE 009
I0 6 BE 062
WL006 W BE 056
B E 021 BE 053
00 B 04 B 04
WL005 SP I019 BE E01 1 BE 091 4
W 092 3 BE E05 8
WL007 W 08 5 BE 08 2 BE 04 0
BE E02 5 I0 9
12 BE 03 00 5
BR002 B 02

BE 00 8
BE
B E06 4

B E06 0
1

S A 02

E0 9
E0 8
8

BE 0 8 0 2
B 06

B 10 5
LA 01

E
BR003 B

56
46
W I01 2

B
BE 09

BE 05
BE

W 00
6

BE 006
BE 095

BE 015
BR001
SP 004

BE 102

BE 034

BE 067
4

BE 085
BE 069

BE 102
0

BE025
L

4
SP 0 0 5

BE 42

BE 05
0

BE 097
BE 08

BE 048

BE 19

BE 95
0
BE 01

BE

BE 60
SP003
BE

BE 82
0

BE 47

BE

057
0

BE
0

BE
BE09 4

BE0 5
0

BE
9

BE0

87

0
0
04

03
BE038

BE0 6
BE01

012

1
2

0
BE09
BE09

6
087
8

BE07

BE04

BE04
BE05

037
8
BE
BE066
BE009

BE073

BE10
BE04
BE070
BE09

BE007

BE012

BE071
BE053

BE008
B E079

SP008
BE07

BE00
BE061

BE065
BE054

BE052

BE068
BE051

65
BE101

BE007

BE098
BE042

44
9
0

BE023
3

9
1
BR004
SP009
BE005
BE029 6.0E-4 5.0E-4
BE028
SP001
BE006
WI008
BE027
SP007
SP008
BE033
D
WI013
SP002
WI001
BE024
BE088
WI009
WI005
SP011
WI004
BE020
WI015
WI011
(1)
SA002
WI006
WI003
WI017
WI010
Mating type a Mating type α
WI018
WI007
WI014
BE030
BE014 (1)
BI002
BI005
SP004
SP006
BE091
BE062
BE032
BE059
BE021
BE080
BE084
BE083
BE034
(2)
BE013 SA005
BE086
BE092
BE003
BE063 120
BE039 BE027-a SA005-α Hybrid 1
BE011
4-VG production (au)

BE040 100
BE085
BE027
x
BE004
WI019 80
BE002 SA005-a BE027-α Hybrid 2 Parental
LA002
BE018
WI016 x
60 Haploid
SP010 (3) Hybrid
LA001
WL001 40
WL003 BE027-a BE027-α Hybrid 3
WL002
SA004 20
SA007 x
SA005
SA003
SA006
0
SA001 SA005-a SA005-α Hybrid 4
BE027

BE027-a

BE027-α

SA005-a

SA005-α

Hybrid 1

Hybrid 2

Hybrid 3

Hybrid 4
SA005

BI001
BI003
BI004 x
WL004
WI002
spar

Figure 5. Production of 4-Vinyl Guaiacol by Industrial S. cerevisiae Strains


(A) Distribution of loss-of-function SNPs and frame-shift mutations in FDC1 and PAD1 of industrial S. cerevisiae strains. Gray boxes indicate the presence of the
loss-of-function mutation, diagonal bars indicate heterozygosity at this site. Strains are clustered according to the strain phylogeny and strain names are colored
according to their origin. Basal splits of the five industrial lineages are indicated with colored dots (see Figure 1A).
(legend continued on next page)

1406 Cell 166, 1397–1410, September 8, 2016


to wine, beer is not produced seasonally but throughout the whole stocks due to human selection and trafficking. Specifically, the
year, which provides fermenting yeast with a predictable and thousands of industrial yeasts that are available today seem to
stable growth environment. Yeast cells undergo about three stem from only a few ancestral strains that made their way
doublings during one batch of beer fermentation, which takes into food fermentations and subsequently evolved into separate
1 week. Moreover, brewers typically recycle yeasts from a lineages, each used for specific industrial applications. Within
finished fermentation to inoculate a new batch, which implies each cluster, strains are sometimes further subdivided along
that beer yeasts are continuously growing in their industrial niche. geographical boundaries, as is the case for the Beer 1 clade,
Together, these facts make it possible to estimate the number of which is divided into three main subgroups. However, further
generations to be 150/year. subclustering of beer yeasts according to beer style was gener-
Based on estimates of the number of generations per year and ally not observed, which may not be surprising as it is common
the divergence time between United Kingdom and United States practice for brewers to use only one yeast strain within their
beer strains, we calculated the average mutation rate in a brewing brewery for the production of a wide array of different beers.
environment to be 1.61–1.73E-08/bp/generation. While this value Notable exceptions are yeasts associated with the few beers
differs from previous assumptions, it is similar to the measured that largely depend on very specific yeasts characteristics,
mutation rate in a diploid yeast strain that was subjected to 2 such as Hefeweizen beers. Another exception may include those
years of artificial evolution in a high-ethanol environment (Voor- beers for which production is restricted to a specific geographic
deckers et al., 2015). Moreover, mutations likely also occur in area, such as Belgian Saisons or British Stouts.
the second phase of beer fermentations, when cells are no longer We further show that industrial yeasts were clearly subjected
dividing, which implies that the mutation rate per generation in in- to domestication, which is reflected in their genomes and phe-
dustrial conditions should be higher than what is measured under nomes. Interestingly, domestication seems strongest in beer
conditions where the cells are dividing frequently, as is usually the yeasts, which demonstrate domestication hallmarks such as
case in laboratory experiments (Loewe et al., 2003). Using these decay of sexual reproduction and general stress resistance, as
data, the last common ancestor of the three major Beer 1 sub- well as convergent evolution of desirable traits like maltotriose
clades (Belgium/Germany, United Kingdom, and the United utilization. Yeasts from the Beer 1 clade show the clearest signs
States) is estimated to date from AD 1573–1604, suggesting of domestication, possibly because Beer 2 only diverged more
that domestication started around this time. Interestingly, this co- recently from other sublineages. Many of these domestication
incides with the gradual switch from home-centered beer brew- features may have simply been the result of the yeasts’ adapta-
ing where every family produced their own beer, to more profes- tion to their new industrial niches. However, for some traits, it is
sional large-scale brewing, first in pubs and monasteries and later likely that humans actively intervened, e.g., by selecting strains
also in breweries (Hornsey, 2003). The last common ancestor of that do not produce undesirable off-flavors, which our analysis
Beer 2 is estimated to be more recent, between AD 1645–1671. identifies as PAD1 or FDC1 nonsense mutants.
This suggests that beer yeast domestication started before The presence of a strong domestication signature in beer
the discovery of microbes and the isolation of the first pure yeast yeast genomes agrees well with the common practices in the
cultures by Emil Hansen in the Carlsberg brewery in 1883, but brewing industry. Beer yeasts are typically recycled after each
well after the invention of beer production, estimated to have fermentation batch, and because beer is produced throughout
occurred as early as 3000 BC (Michel et al., 1992). Although it is the year, this implies that beer yeasts are continuously growing
difficult to assess how many different yeast strains were domes- in their industrial niche. By contrast, wine yeasts can only grow
ticated and in which industrial context these domestications in wine must for a short period every year, spending the rest of
occurred, the limited number of clades of industrial yeasts and their lives in and around the vineyards or in the guts of insects
the clear segregation of wild and industrial yeasts suggests that (Bokulich et al., 2014; Christiaens et al., 2014; Stefanini et al.,
today’s industrial yeasts originated from a limited set of ancestral 2016). During these nutrient-poor periods, wine yeasts likely un-
strains, or closely related groups of ancestral strains. dergo few mitotic doublings, yet they may undergo sexual cycles
and even hybridize with wild yeasts (Stefanini et al., 2016). More-
DISCUSSION over, only a very small portion of the yeasts may find their way
back into the grape must when the next harvest season arrives,
Together, our results show that today’s industrial S. cerevisiae while trillions of cells are being transferred to the next batch
yeasts are genetically and phenotypically separated from wild during backslopping in beer production. This results in large

(B) Percentage of strains within each origin (left) and population (right) capable of producing 4-vinyl guaiacol (4-VG). Red, 4-VG; turquoise, 4-VG+.
(C) Phylogenetic trees and ancestral trait reconstruction of PAD1 and FDC1 genes. Branches are colored according to the most probable state of their ancestral
nodes, turquoise (4-VG+) or red (4-VG). Pie charts indicate probabilities of each state at specific nodes, turquoise (4-VG+) or red (4-VG); posterior probability for
the same nodes is indicated by a dot: black dot, 90%–100%; gray, 70%; white, 42%. Branch lengths reflect the average numbers of substitutions per site
(compare scale bars).
(D) Development of new yeast variants with specific phenotypic features by marker-assisted breeding. Two parent strains (BE027 and SA005) were sporulated
and, using genetic markers, segregants with the desired genotype were selected (1). Next, breeding between segregants from different parents (outbreeding) or
the same parent (inbreeding) were performed (2). This breeding scheme yields hybrids with altered aromatic properties that can directly be applied in industrial
fermentations (3). 4-VG production is shown relative to the production of BE027. Yeast genomes are represented by gray bars, loss-of-function mutations in
FDC1 as red (W497*) and blue (K54*) boxes within the gray bars. Error bars represent one SD from the mean.
See also Table S5.

Cell 166, 1397–1410, September 8, 2016 1407


effective population sizes for beer, but not wine, yeasts. The dif- SUPPLEMENTAL INFORMATION
ferences in industrial practices between beer brewing and wine-
Supplemental Information includes four figures and eight tables and can be
making likely had three important consequences. First, beer
found with this article online at http://dx.doi.org/10.1016/j.cell.2016.08.020.
yeasts evolved faster than wine strains (Figure 1A). This resulted An audio PaperClip is available at http://dx.doi.org/10.1016/j.cell.2016.08.
in a large genetic diversity within beer yeasts, while wine yeasts 020#mmc9.
are genetically more homogeneous (Table 1). Second, after the
initial domestication event, some beer yeasts were contained AUTHOR CONTRIBUTIONS
in the brewery and diverged allopatrically, leading to geograph-
ically defined subpopulations mirroring human traffic and coloni- Conceptualization, B.G., J.S., T.P., L.M., S.M., and K.J.V.; Formal Analysis,
zation. Third, beer strains generally lost their ability to reproduce B.G., J.S., L.S., M.R., A.M., and K.V.; Investigation, B.G., J.S., T.P., A.M.,
sexually. This, combined with continuous cultivation in a mild L.S., V.S., B.H.-M., and M.T.; Resources, L.M., B.S., C.T., C.W., and K.J.V.;
Writing, B.G., J.S., S.M., and K.J.V.; Supervision, T.P., T.R., A.S., C.W.,
growth environment, made them susceptible to genetic drift
G.B., S.M., and K.J.V.
and fixation of deleterious alleles that would otherwise be purged
by evolutionary competition in harsh conditions. Hence, these
ACKNOWLEDGMENTS
asexual populations continuously accumulated deleterious mu-
tations in an irreversible manner, a process known as Muller’s We thank all K.J.V. and S.M. laboratory members for their help and sugges-
ratchet (Muller, 1964). We propose that continuous clonal repro- tions. We thank K. Wolfe, S. Oliver, and P. Malcorps for their valuable feedback
duction and relaxed selection for general stress resistance and on the manuscript. We acknowledge D. Brami, A. Bass, J. Kurowski, K. For-
famine likely allowed genome decay in beer yeasts and resulted tmann, and N. Parker for data collection and data review. Additionally we
in yeasts specialized in thriving in a man-made niche like beer acknowledge S. Rombauts, Y. Lin, R. de Jonge, and V. Storme for fruitful dis-
cussions on data analysis. B.G. acknowledges funding from the VIB Interna-
fermentations, but not in natural environments. Both these char-
tional PhD Program in Life Sciences. J.S. acknowledges funding from IWT
acteristics (genome decay and niche specialization) are consid- and KU Leuven. K.J.V. acknowledges funding from an ERC Consolidator grant
ered to be key characteristics of domestication. CoG682009, HFSP program grant RGP0050/2013, KU Leuven NATAR Pro-
Our study does not only provide insight into the domestication gram Financing, VIB, EMBO YIP program, FWO, and IWT. S.M. acknowledges
origin of industrial yeasts, it may also help to select and breed new funding from VIB and Ghent University. The funders had no role in study
superior strains. The genome sequences, phylogenetic tree, and design, data collection and analysis, the decision to publish, or preparation
of the manuscript.
phenome data can be used to set up marker-assisted breeding
schemes similar to those routinely used for the breeding of supe-
Received: March 24, 2016
rior crops and livestock (Takeda and Matsuoka, 2008). Revised: June 8, 2016
Accepted: August 8, 2016
STAR+METHODS Published: September 8, 2016

Detailed methods are provided in the online version of this paper REFERENCES

and include the following:


Almeida, P., Barbosa, R., Zalar, P., Imanishi, Y., Shimizu, K., Turchetti, B.,
Legras, J.L., Serra, M., Dequin, S., Couloux, A., et al. (2015). A population
d KEY RESOURCES TABLE
genomics insight into the Mediterranean origins of wine yeast domestication.
d CONTACT FOR REAGENT AND RESOURCE SHARING Mol. Ecol. 24, 5412–5427.
d EXPERIMENTAL MODEL AND SUBJECT DETAILS
Baele, G., and Lemey, P. (2013). Bayesian evolutionary model testing in
B Strain Collection the phylogenomics era: matching model complexity with computational effi-
d METHOD DETAILS ciency. Bioinformatics 29, 1970–1979.
B DNA Extraction Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M.A., and
B Library Prep and Whole Genome Sequencing Alekseyenko, A.V. (2012). Improving the accuracy of demographic and molec-
B De Novo Assembly ular clock model comparison while accommodating phylogenetic uncertainty.
B Annotation Mol. Biol. Evol. 29, 2157–2167.
B Core Genome Analysis and Identification of Single Bergström, A., Simpson, J.T., Salinas, F., Barré, B., Parts, L., Zia, A., Nguyen
Copy Genes Ba, A.N., Moses, A.M., Louis, E.J., Mustonen, V., et al. (2014). A high-definition
view of functional genetic variation from natural yeast genomes. Mol. Biol.
B Reference-Based Alignments and Variant Calling
Evol. 31, 872–888.
B Phylogenetic Analyses
Bokulich, N.A., Thorngate, J.H., Richardson, P.M., and Mills, D.A. (2014).
B Population Structure and Diversity Analysis
Microbial biogeography of wine grapes is conditioned by cultivar, vintage,
B Time Divergence Estimate
and climate. Proc. Natl. Acad. Sci. USA 111, E139–E148.
B Copy-Number Variation Analysis
Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible
B Character Evolution Analysis
trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120.
B Determination of Cell Ploidy
Borneman, A.R., Desany, B.A., Riches, D., Affourtit, J.P., Forgan, A.H., Pretor-
B Phenotypic Analysis ius, I.S., Egholm, M., and Chambers, P.J. (2011). Whole-genome comparison
B Development of Artificial Hybrids reveals novel genetic elements that characterize the genome of industrial
d QUANTIFICATION AND STATISTICAL ANALYSIS strains of Saccharomyces cerevisiae. PLoS Genet. 7, e1001287.
d DATA AND SOFTWARE AVAILABILITY Borneman, A.R., Forgan, A.H., Kolouchova, R., Fraser, J.A., and Schmidt, S.A.
B Data Resources (2016). Whole genome comparison reveals high levels of inbreeding and strain

1408 Cell 166, 1397–1410, September 8, 2016


redundancy across the spectrum of commercial wine strains of Saccharo- Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment
myces cerevisiae. G3 (Bethesda) 6, 957–971. software version 7: improvements in performance and usability. Mol. Biol.
Briza, P., Breitenbach, M., Ellinger, A., and Segall, J. (1990). Isolation of two Evol. 30, 772–780.
developmentally regulated genes involved in spore wall maturation in Saccha- Kozlov, A.M., Aberer, A.J., and Stamatakis, A. (2015). ExaML version 3: a tool
romyces cerevisiae. Genes Dev. 4, 1775–1789. for phylogenomic analyses on supercomputers. Bioinformatics 31, 2577–
2579.
Brown, C.A., Murray, A.W., and Verstrepen, K.J. (2010). Rapid expansion and
functional divergence of subtelomeric gene families in yeasts. Curr. Biol. 20, Kück, P., and Meusemann, K. (2010). FASconCAT: convenient handling of
895–903. data matrices. Mol. Phylogenet. Evol. 56, 1115–1118.
Capella-Gutiérrez, S., Silla-Martı́nez, J.M., and Gabaldón, T. (2009). trimAl: a Kuhn, R.M., Karolchik, D., Zweig, A.S., Trumbower, H., Thomas, D.J., Thakka-
tool for automated alignment trimming in large-scale phylogenetic analyses. pallayil, A., Sugnet, C.W., Stanke, M., Smith, K.E., Siepel, A., et al. (2007). The
Bioinformatics 25, 1972–1973. UCSC genome browser database: update 2007. Nucleic Acids Res. 35, D668–
D673.
Christiaens, J.F., Franco, L.M., Cools, T.L., De Meester, L., Michiels, J., Wen-
seleers, T., Hassan, B.A., Yaksi, E., and Verstrepen, K.J. (2014). The fungal Lanfear, R., Calcott, B., Ho, S.Y.W., and Guindon, S. (2012). Partitionfinder:
aroma gene ATF1 promotes dispersal of yeast cells through insect vectors. combined selection of partitioning schemes and substitution models for phylo-
Cell Rep. 9, 425–432. genetic analyses. Mol. Biol. Evol. 29, 1695–1701.

Cingolani, P., Platts, A., Wang, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Legras, J.L., and Karst, F. (2003). Optimisation of interdelta analysis for
Lu, X., and Ruden, D.M. (2012). A program for annotating and predicting the Saccharomyces cerevisiae strain characterisation. FEMS Microbiol. Lett.
effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of 221, 249–255.
Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92. Lemey, P., Rambaut, A., Drummond, A.J., and Suchard, M.A. (2009). Bayesian
Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., phylogeography finds its roots. PLoS Comput. Biol. 5, e1000520.
Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., et al.; 1000 Genomes Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Bur-
Project Analysis Group (2011). The variant call format and VCFtools. Bioinfor- rows-Wheeler transform. Bioinformatics 25, 1754–1760.
matics 27, 2156–2158. Liti, G., Carter, D.M., Moses, A.M., Warringer, J., Parts, L., James, S.A., Davey,
Dittmar, J.C., Reid, R.J., and Rothstein, R. (2010). ScreenMill: a freely available R.P., Roberts, I.N., Burt, A., Koufopanou, V., et al. (2009). Population genomics
software suite for growth measurement, analysis and visualization of high- of domestic and wild yeasts. Nature 458, 337–341.
throughput screen data. BMC Bioinformatics 11, 353. Liu, Y., Schröder, J., and Schmidt, B. (2013). Musket: a multistage k-mer spec-
Driscoll, C.A., Macdonald, D.W., and O’Brien, S.J. (2009). From wild animals to trum-based error corrector for Illumina sequence data. Bioinformatics 29,
domestic pets, an evolutionary view of domestication. Proc. Natl. Acad. Sci. 308–315.
USA 106 (Suppl 1 ), 9971–9978. Loewe, L., Textor, V., and Scherer, S. (2003). High deleterious genomic muta-
Drummond, A.J., and Suchard, M.A. (2010). Bayesian random local clocks, or tion rate in stationary phase of Escherichia coli. Science 302, 1558–1560.
one rate to rule them all. BMC Biol. 8, 114. Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., Dopman, E.B., Dick-
Drummond, A.J., Suchard, M.A., Xie, D., and Rambaut, A. (2012). Bayesian inson, W.J., Okamoto, K., Kulkarni, S., Hartl, D.L., and Thomas, W.K. (2008). A
phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973. genome-wide view of the spectrum of spontaneous mutations in yeast. Proc.
Natl. Acad. Sci. USA 105, 9272–9277.
Dunham, M.J., Badrane, H., Ferea, T., Adams, J., Brown, P.O., Rosenzweig,
F., and Botstein, D. (2002). Characteristic genome rearrangements in experi- Magwene, P.M., Kayıkçı, Ö., Granek, J.A., Reininga, J.M., Scholl, Z., and Mur-
mental evolution of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA ray, D. (2011). Outcrossing, mitotic recombination, and life-history trade-offs
99, 16144–16149. shape genome evolution in Saccharomyces cerevisiae. Proc. Natl. Acad.
Sci. USA 108, 1987–1992.
Dunn, B., Richter, C., Kvitek, D.J., Pugh, T., and Sherlock, G. (2012). Analysis
of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number McDonald, M.J., Rice, D.P., and Desai, M.M. (2016). Sex speeds adaptation by
variants distributed in diverse yeast strains from differing industrial environ- altering the dynamics of molecular evolution. Nature 531, 233–236.
ments. Genome Res. 22, 908–924. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky,
Eden, E., Navon, R., Steinfeld, I., Lipson, D., and Yakhini, Z. (2009). GOrilla: a A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. (2010).
tool for discovery and visualization of enriched GO terms in ranked gene lists. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-
BMC Bioinformatics 10, 48. generation DNA sequencing data. Genome Res. 20, 1297–1303.
Michel, R.H., McGovern, P.E., and Badler, V.R. (1992). Chemical evidence for
Filteau, M., Hamel, V., Pouliot, M.C., Gagnon-Arsenault, I., Dubé, A.K., and
ancient beer. Nature 360, 24.
Landry, C.R. (2015). Evolutionary rescue by compensatory mutations is con-
strained by genomic and environmental backgrounds. Mol. Syst. Biol. 11, 832. Mukai, N., Masaki, K., Fujii, T., Kawamukai, M., and Iefuji, H. (2010). PAD1 and
FDC1 are essential for the decarboxylation of phenylacrylic acids in Saccharo-
Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for clus-
myces cerevisiae. J. Biosci. Bioeng. 109, 564–569.
tering the next-generation sequencing data. Bioinformatics 28, 3150–3152.
Muller, H.J. (1964). The relation of recombination to mutational advance.
Goddard, M.R., and Greig, D. (2015). Saccharomyces cerevisiae: a nomadic
Mutat. Res. 106, 2–9.
yeast with no niche? FEMS Yeast Res. 15, fov009.
Pattengale, N.D., Alipour, M., Bininda-Emonds, O.R., Moret, B.M., and Stama-
Goddard, M.R., Godfray, H.C.J., and Burt, A. (2005). Sex increases the effi-
takis, A. (2010). How many bootstrap replicates are necessary? J. Comput.
cacy of natural selection in experimental yeast populations. Nature 434,
Biol. 17, 337–354.
636–640.
Pavelka, N., Rancati, G., Zhu, J., Bradford, W.D., Saraf, A., Florens, L.,
Hornsey, I.S. (2003). A History of Beer and Brewing (Cambridge: The Royal
Sanderson, B.W., Hattem, G.L., and Li, R. (2010). Aneuploidy confers quanti-
Society of Chemistry).
tative proteome changes and phenotypic variation in budding yeast. Nature
Hutter, S., Vilella, A.J., and Rozas, J. (2006). Genome-wide DNA polymor- 468, 321–325.
phism analyses using VariScan. BMC Bioinformatics 7, 409. Peng, Y., Leung, H., Yiu, S., and Chin, F. (2010). IDBA – a practical iterative de
Huxley, C., Green, E.D., and Dunham, I. (1990). Rapid assessment of S. cere- Bruijn Graph de novo assembler. In Research in Computational Molecular
visiae mating type by PCR. Trends Genet. 6, 236. Biology, B. Berger, ed. (Springer), pp. 426–440.

Cell 166, 1397–1410, September 8, 2016 1409


Pérez-Ortı́n, J.E., Querol, A., Puig, S., and Barrio, E. (2002). Molecular charac- Steensels, J., Meersman, E., Snoek, T., Saels, V., and Verstrepen, K.J. (2014).
terization of a chromosomal rearrangement involved in the adaptive evolution Large-scale selection and breeding to generate industrial yeasts with superior
of yeast strains. Genome Res. 12, 1533–1539. aroma production. Appl. Environ. Microbiol. 80, 6965–6975.
Pfeifer, B., Wittelsbürger, U., Ramos-Onsins, S.E., and Lercher, M.J. (2014). Stefanini, I., Dapporto, L., Berná, L., Polsinelli, M., Turillazzi, S., and Cavalieri,
PopGenome: an efficient Swiss army knife for population genomic analyses D. (2016). Social wasps are a Saccharomyces mating nest. Proc. Natl. Acad.
in R. Mol. Biol. Evol. 31, 1929–1936. Sci. USA 113, 2247–2251.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, Strope, P.K., Skelly, D.A., Kozmin, S.G., Mahadevan, G., Stone, E.A.,
D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., and Sham, P.C. Magwene, P.M., Dietrich, F.S., and McCusker, J.H. (2015). The 100-genomes
(2007). PLINK: a tool set for whole-genome association and population-based strains, an S. cerevisiae resource that illuminates its natural phenotypic and
linkage analyses. Am. J. Hum. Genet. 81, 559–575. genotypic variation and emergence as an opportunistic pathogen. Genome
Purugganan, M.D., and Fuller, D.Q. (2009). The nature of selection during plant Res. 25, 762–774.
domestication. Nature 457, 843–848. Suyama, M., Torrents, D., and Bork, P. (2006). PAL2NAL: robust conversion
R Development Core Team (2011). R: A language and environment for statis- of protein sequence alignments into the corresponding codon alignments.
tical computing (R Foundation for Statistical Computing). Nucleic Acids Res. 34 (Web Server issue, suppl 2), W609–W612.
Raj, A., Stephens, M., and Pritchard, J.K. (2014). fastSTRUCTURE: varia- Takeda, S., and Matsuoka, M. (2008). Genetic approaches to crop improve-
tional inference of population structure in large SNP data sets. Genetics 197, ment: responding to environmental and population changes. Nat. Rev. Genet.
573–589. 9, 444–457.
Rambaut, A., Suchard, M.A., Xie, D., and Drummond, A.J. (2014) Tracer v1.6.
van Dijk, D., Dhar, R., Missarova, A.M., Espinar, L., Blevins, W.R., Lehner, B.,
http://beast.bio.ed.ac.uk/Tracer.
and Carey, L.B. (2015). Slow-growing cells within isogenic populations have
Rancati, G., Pavelka, N., Fleharty, B., Noll, A., Trimble, R., Walton, K., Perera, increased RNA polymerase error rates and DNA damage. Nat. Commun. 6,
A., Staehling-Hampton, K., Seidel, C.W., and Li, R. (2008). Aneuploidy under- 7972.
lies rapid adaptive evolution of yeast cells deprived of a conserved cytokinesis
Van Wieren, D.P. (1995). American Breweries II (Eastern Coast Breweriana
motor. Cell 135, 879–893.
Association).
Ranwez, V., Harispe, S., Delsuc, F., and Douzery, E.J.P. (2011). MACSE: Mul-
Voordeckers, K., Kominek, J., Das, A., Espinosa-Cantú, A., De Maeyer, D.,
tiple Alignment of Coding SEquences accounting for frameshifts and stop
Arslan, A., Van Pee, M., van der Zande, E., Meert, W., Yang, Y., et al. (2015).
codons. PLoS ONE 6, e22594.
Adaptation to high ethanol reveals complex evolutionary pathways. PLoS
Schacherer, J., Shapiro, J.A., Ruderfer, D.M., and Kruglyak, L. (2009).
Genet. 11, e1005635.
Comprehensive polymorphism survey elucidates population structure of
Saccharomyces cerevisiae. Nature 458, 342–345. Wang, Q.-M., Liu, W.-Q., Liti, G., Wang, S.A., and Bai, F.Y. (2012). Surprisingly
diverged populations of Saccharomyces cerevisiae in natural environments
Selmecki, A.M., Dulmage, K., Cowen, L.E., Anderson, J.B., and Berman, J.
remote from human activity. Mol. Ecol. 21, 5404–5417.
(2009). Acquisition of aneuploidy provides increased fitness during the evolu-
tion of antifungal drug resistance. PLoS Genet. 5, e1000705. Wapinski, I., Pfeffer, A., Friedman, N., and Regev, A. (2007). Natural history and
Sheltzer, J.M., Blank, H.M., Pfau, S.J., Tange, Y., George, B.M., Humpton, evolutionary principles of gene duplication in fungi. Nature 449, 54–61.
T.J., Brito, I.L., Hiraoka, Y., Niwa, O., and Amon, A. (2011). Aneuploidy drives Warringer, J., Zörgö, E., Cubillos, F.A., Zia, A., Gjuvsland, A., Simpson, J.T.,
genomic instability in yeast. Science 333, 1026–1030. Forsmark, A., Durbin, R., Omholt, S.W., Louis, E.J., et al. (2011). Trait variation
Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and in yeast is defined by population history. PLoS Genet. 7, e1002111.
post-analysis of large phylogenies. Bioinformatics 30, 1312–1313. White, M.D., Payne, K.A.P., Fisher, K., Marshall, S.A., Parker, D., Rattray,
Stanke, M., and Morgenstern, B. (2005). AUGUSTUS: a web server for gene N.J.W., Trivedi, D.K., Goodacre, R., Rigby, S.E.J., Scrutton, N.S., et al.
prediction in eukaryotes that allows user-defined constraints. Nucleic Acids (2015). UbiX is a flavin prenyltransferase required for bacterial ubiquinone
Res. 33 (Web Server issue, suppl 2), W465–W467. biosynthesis. Nature 522, 502–506.
Steensels, J., and Verstrepen, K.J. (2014). Taming wild yeast: potential of Zheng, X., Levine, D., Shen, J., Gogarten, S.M., Laurie, C., and Weir, B.S.
conventional and nonconventional yeasts in industrial fermentations. Annu. (2012). A high-performance computing toolset for relatedness and principal
Rev. Microbiol. 68, 61–80. component analysis of SNP data. Bioinformatics 28, 3326–3328.

1410 Cell 166, 1397–1410, September 8, 2016


STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER


Chemicals, Peptides, and Recombinant Proteins
2-methoxy-4-vinylphenol (4-VG) Sigma-Aldrich Cat#W267511; CAS: 7786-61-0
3-Methylbutanol Sigma-Aldrich Cat#I9392; CAS: 123-51-3
Acetic acid VWR Cat#VELC1005.2500; CAS: 64-19-7
Agar powder bacteriological VWR Cat#89132-792; CAS: 9002-18-0
Antimycin A Sigma-Aldrich Cat#A8674; CAS: 1397-94-0
Bacto Peptone BD Biosciences Cat#211820
Cadmium sulfate 8/3-hydrate Sigma-Aldrich Cat#20920; CAS: 7790-84-3
Chloroform VWR Cat#PROL22711.324; CAS: 67-66-3
Copper(II) sulfate Sigma-Aldrich Cat#451657; CAS: 7758-98-7
Cycloheximide Sigma-Aldrich Cat#C7698; CAS: 66-81-9
D-()-Fructose Sigma-Aldrich Cat#47740; CAS: 57-48-7
D-(+)-Glucose monohydrate Sigma-Aldrich Cat#49159; CAS: 14431-43-7
D-(+)-Maltose monohydrate Sigma-Aldrich Cat#M5885; CAS: 6363-53-7
D-(+)-Melibiose Sigma-Aldrich Cat#63630; CAS: 585-99-9
Diethyl ether VWR Cat#PROL23811.292; CAS: 60-29-7
D-Sorbitol Sigma-Aldrich Cat#S1876; CAS: 50-70-4
Ethanol absolute VWR Cat#PROL20802.321; CAS: 64-17-5
Ferulic acid Sigma-Aldrich Cat#128708; CAS: 537-98-4
Formic acid Sigma-Aldrich Cat#6440; CAS: 64-18-6
Galactose Fisher Cat#15061-0010; CAS: 59-23-4
Glycerol Sigma-Aldrich Cat#G5516; CAS: 56-81-5
Levulinic acid Sigma-Aldrich Cat#L2009; CAS: 123-76-2
L-Lysine Sigma-Aldrich Cat#L5501; CAS: 56-87-1
Maltotriose hydrate Sigma-Aldrich Cat#851493; CAS: 207511-08-8
Potassium acetate VWR Cat#PROL26667.293; CAS: 27-08-2
Potassium chloride VWR Cat#26764-298; CAS: 7447-40-7
Propidium iodide solution Sigma-Aldrich Cat#P4864; CAS: 25535-16-4
Roti phenol Carl Roth Cat#38.3; CAS: 108-95-2
SC Amino acid mixture MP biomedicals Cat#114400022
Sodium chloride Fisher scientific Cat#S/3160/60; CAS: 7647-14-5
Sodium sulfite Sigma-Aldrich Cat#S0505; CAS: 7757-83-7
Sucrose Sigma-Aldrich Cat#84100; CAS: 57-50-1
TRIS acetate – EDTA buffer solution Sigma-Aldrich Cat#93296
Yeast Extract powder LabM Cat#MC001; CAS: 013-01-2
Yeast Nitrogen Base Without Amino Acids Sigma-Aldrich Cat#Y0626
Yeast Nitrogen Base Without Amino Acids and Sigma-Aldrich Cat#Y1251
Ammonium Sulfate
Zymolyase 100T AMSBIO Cat#120493-1; CAS: 37340-57-1
Critical Commercial Assays
Genomic-tip 100/G QIAGEN Cat#10243
Genomic DNA buffer set QIAGEN Cat#19060
MasterPure Yeast DNA Purification Kit Epicenter Cat#MPY80200
Nextera XT DNA Library Preparation Kit Illumina Cat#FC-131-1024
(Continued on next page)

Cell 166, 1397–1410.e1–e10, September 8, 2016 e1


Continued
REAGENT or RESOURCE SOURCE IDENTIFIER
SPRI works HT Beckman Coulter Cat#B06938
Deposited Data
De novo assembly DDBJ/ENA/GenBank BioProject PRJNA323691
Experimental Models: Organisms/Strains
See Table S1 for list of strains sequenced This paper N/A
Sequence-Based Reagents
FDC1-W497-FW: 50 -TGCAGATCAGATGGC This study N/A
TTTTG-30
FDC1-W497-RV-STOP: 50 -GCAATTATTTATA This study N/A
TCCGTACCTTTTT-30
FDC1-W497-RV-ALT: 50 -GCAATTATTTATATC This study N/A
CGTACCTTTTC-30
Delta12: 50 -TCAACAATGGAATCCCAAC-30 Legras and Karst, 2003 N/A
0 0
Delta21: 5 -CATCTTAACACCGTATATGA-3 Legras and Karst, 2003 N/A
P86: 50 -ACTCCACTTCAAGTAAGAGTT-30 Huxley et al., 1990 N/A
P87: 50 -GCACGGAATATGGGACATACTT-30 Huxley et al., 1990 N/A
P88: 50 -AGTCACATCAAGATGGTTTAT-30 Huxley et al., 1990 N/A
Software and Algorithms
Agilent Chemstation software Agilent Technologies, USA https://www.agilent.com/en-us/products/
software-informatics/massspec-workstations/
gc-msd-chemstation-software
AUGUSTUS (v2.5) Stanke and Morgenstern 2005 http://bioinf.uni-greifswald.de/augustus/; RRID:
SCR_008417/
BEAST (v1.8.2) Drummond et al., 2012 http://beast.bio.ed.ac.uk/; RRID: SCR_010228
Burrows-Wheeler Aligner (BWA) (v0.6.1) Li and Durbin, 2009 http://bio-bwa.sourceforge.net/; RRID: SCR_010910
CD-HIT (v4.6) Fu et al., 2012 http://weizhongli-lab.org/cd-hit/; RRID: SCR_007105
ExaML (v3) Kozlov et al., 2015 http://sco.h-its.org/exelixis/web/software/examl/
index.html
FASconCAT (v1.0) Kück and Meusemann, 2010 https://www.zfmk.de/en/research/
research-centres-and-groups/fasconcat
fastStructure (v1.0) Raj et al., 2014 https://rajanil.github.io/fastStructure/
FigTree (v1.4.2) http://tree.bio.ed.ac.uk/software/figtree/; RRID:
SCR_008515
Gene-E Broad Institute http://www.broadinstitute.org/cancer/software/
GENE-E/
Genome Analysis Toolkit (GATK) (v2.7.2) Broad Institute, McKenna et al., 2010 https://www.broadinstitute.org/gatk/; RRID:
SCR_001876
Idba_ud (v1.1.1) Peng et al., 2010 http://i.cs.hku.hk/alse/hkubrg/projects/idba_ud/
liftOver Kuhn et al., 2007 http://genomewiki.ucsc.edu/index.php/
Minimal_Steps_For_LiftOver
LogCombiner Drummond et al., 2012 http://beast.bio.ed.ac.uk/logcombiner
MACSE (v1.01b) Ranwez et al., 2011 http://bioweb.supagro.inra.fr/macse/index.php?
menu=intro&option=intro
MAFFT (v7.187) Katoh and Standley, 2013 http://mafft.cbrc.jp/alignment/software/; RRID:
SCR_011811
Musket Liu et al., 2013 http://musket.sourceforge.net/homepage.htm#latest
PAL2NAL (v14) Suyama et al., 2006 http://www.bork.embl.de/pal2nal/
PartitionFinder (v1.1.1) Lanfear et al., 2012 http://www.robertlanfear.com/partitionfinder/
Picard Tools (v1.56) Broad Institute http://broadinstitute.github.io/picard/; RRID:
SCR_006525
(Continued on next page)

e2 Cell 166, 1397–1410.e1–e10, September 8, 2016


Continued
REAGENT or RESOURCE SOURCE IDENTIFIER
PLINK (v1.07) Purcell et al., 2007 http://pngu.mgh.harvard.edu/purcell/plink/; RRID:
SCR_001757
PopGenome (v2.1.6) Pfeifer et al., 2014 https://cran.r-project.org/web/packages/
PopGenome/index.html
RaxML (v8.1.3) Stamatakis, 2014 http://sco.h-its.org/exelixis/web/software/raxml/
index.html; RRID: SCR_006086
SAS (v9.4) http://www.sas.com/en_us/software/sas9.html; RRID:
SCR_008567
ScreenMill macro Dittmar et al., 2010 http://www.rothsteinlab.com/tools/
SnpEff (v3.3) Cingolani et al., 2012 http://snpeff.sourceforge.net/; RRID: SCR_005191
SNPRelate (v1.6.4) Zheng et al., 2012 http://bioconductor.org/packages/release/bioc/html/
SNPRelate.html
Splint This paper Available upon request
Tracer (v1.6) Rambaut et al., 2014 http://tree.bio.ed.ac.uk/software/tracer/
trimAl (v1.2) Capella-Gutiérrez et al., 2009 http://trimal.cgenomics.org/introduction
Trimmomatic (v0.30) Bolger et al., 2014 http://www.usadellab.org/cms/?page=trimmomatic;
RRID: SCR_011848
Variscan (v2.0.3) Hutter et al., 2006 http://www.ub.edu/softevol/variscan/
VcfTools (v0.1.10;0.1.14) Danecek et al., 2011 http://vcftools.sourceforge.net/; RRID: SCR_001235

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests may be directed to, and will be fulfilled by the corresponding author Kevin J. Verstrepen (kevin.
verstrepen@biw.vib-kuleuven.be).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Strain Collection
For this study, a set of 157 Saccharomyces cerevisiae strains was sequenced, phenotyped and analyzed. Strains were obtained from
historical yeast collections of the VIB Laboratory for Systems Biology (KU Leuven, Belgium) and White Labs (USA). While detailed
background information on many of these strains is limited, their geographical origin was in most cases documented and is listed
in Table S1. Beer strains mainly originate from the main fermentation (82/102) or bottle conditioning (10/102) of ale beers. While lager
beer fermentations are usually carried out by S. pastorianus, an interspecific hybrid of S. cerevisiae and S. eubayanus, we identified
10 S. cerevisiae strains typically employed in lager fermentations and included them in the selection.
All strains are long-term stored in 80 C using a glycerol-based standard storage medium (peptone 1% w v-1, yeast extract
0.5% w v-1, glucose 1% w v-1, glycerol 25% v v-1).

METHOD DETAILS

DNA Extraction
For strains BE001-043, BI001-005, BR001-004, LA001, SA001-007, SP001-007, NA001-004 and WI001-018, genomic DNA was pre-
pared using the QIAGEN genomic tip kit (QIAGEN, Germany) according to recommended protocols. Final DNA concentrations were
measured using Qubit (Thermo Fisher Scientific, USA). For the other strains, genomic DNA was extracted with the MasterPureTM
Yeast DNA Purification Kit (Epicenter, USA), but with some modifications to the recommended protocols. Three mL of each of the
liquid yeast cultures were pelleted by centrifugation at 20,800 x g for 5min in 1.7mL micro centrifuge tubes. 300 ml of yeast cell lysis
solution was added to each micro centrifuge tube along with 1 ml of 5g L-1 RNase A. The cells were resuspended by vortex mixing
each micro centrifuge tube for 10 s. Each tube was incubated at 65 C for 15 min and was then chilled on ice for 5 min. Next, 150 ml of
protein precipitation reagent was added to each tube and the tubes were vortexed for 10 s. The suspensions were then centrifuged
for 10 min at 20,800 x g to pellet the cellular debris. The supernatants were transferred to clean 1.7mL micro centrifuge tubes. Next,
500 ml of isopropanol was added to each tube. Each tube was then inverted several times to precipitate the DNA, which was then
pelleted by centrifugation at 20,800 x g for 10 min. The supernatants were discarded and the pellets were washed with 500 ml of
ice-cold 70% ethanol and briefly centrifuged. The ethanol was removed by pipetting. The DNA pellets were centrifuged again for
10 min at 20,800 x g to remove any remaining ethanol. Each DNA pellet was then suspended in 35 ml of TE buffer and stored at

Cell 166, 1397–1410.e1–e10, September 8, 2016 e3


4 C. After isolation, the purified DNA was quantified using fluorimetric methods and diluted to the optimal concentration for library
construction.

Library Prep and Whole Genome Sequencing


For strains BE001-043, BI001-005, BR001-004, LA001, SA001-007, SP001-007, NA001-004 and WI001-018, paired-end sequencing
libraries (100bp) with a mean insert size of 300bp were prepared and run according to the manufacturer’s instructions on an Illumina
HiSeq2000 at the EMBL GeneCore facility, Heidelberg (http://genecore3.genecore.embl.de/genecore3/). For the other strains, li-
braries were prepared using the Nextera XT sample preparation kit. A total of 50ng of yeast DNA was fragmented and tagged
with DNA adapters by the Nextera transposome resulting in adaptor-ligated DNA fragments. The DNA was purified and PCR-ampli-
fied to add the dual indexes as well as the common adapters required for cluster generation and sequencing. All samples were pooled
together and clustered on board a HiSeq 2500 instrument at Illumina (San Diego, USA). Samples were sequenced in both Rapid Run
Mode and High Output Mode using 2 3 100 bp paired-end reads.

De Novo Assembly
For each library, low-quality and ambiguous reads were trimmed using Trimmomatic (v0.30) (Bolger et al., 2014). After k-mer based
read correction with musket (Liu et al., 2013), reads were assembled using idba_ud (Peng et al., 2010). De novo assemblies were
evaluated by mapping back the reads and also by checking BLAST matches of assembled contigs to the SILVA database for
rDNA classification. Each de novo assembly was scaffolded against the S. cerevisiae S288c reference genome assembly (http://
downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/S288C_reference_genome_R64-1-1_20110203.tgz).
The liftOver workflow (Kuhn et al., 2007) was used to determine the coordinates of contigs from each newly assembled strain relative
to the reference strain (http://genomewiki.ucsc.edu/index.php/Minimal_Steps_For_LiftOver). Scaffolded contigs mapped to each
reference strain chromosome were combined into a ‘‘pseudo-molecule,’’ with the placed contigs stitched together with gaps indi-
cated by ‘‘N.’’ Unplaced contigs (including alternative lower scoring matches) were kept. Unplaced contigs less than 300 nucleotides
were not included in the final assembly (Table S1).

Annotation
The genome annotation of the S. cerevisiae S288c reference genome (nr. of genes = 6,692) was downloaded from the UCSC (version
Apr2011/sacCer3) Table Browser in GPR format. FASTA records were renamed to match the chromosome naming convention in the
GPR file. The liftOver workflow was used to create a coordinate conversion file (chain file) between the S. cerevisiae S288c genome
and each newly scaffolded assembly. Using the chain file, the coordinates of the S. cerevisiae S288c genes were ‘‘lifted’’ to each
new genome assembly. Lifted genes were considered valid if they did not contain internal stop codons. Independently, the gene pre-
diction tool AUGUSTUS v2.5 (Stanke and Morgenstern, 2005) was used to predict genes for each new strain using the provided
training set/model for S. cerevisiae S288c with the following parameters (–noInFrameStop = true–maxDNAPieceSize = 1000000–
progress = false–uniqueGeneId = true–keep_viterbi = false). The annotated and predicted genes, using liftOver and AUGUSTUS
respectively, were combined with priority given to the liftOver annotation when the predictions overlapped (Table S1).

Core Genome Analysis and Identification of Single Copy Genes


Across the 157 annotated S. cerevisiae genomes, 986,179 genes were annotated and predicted in total, with an average of 6,281
genes per genome (min = 6,099 genes, max = 6,655 genes). CD-HIT (v4.6) was used to approximate a non-redundant set of putative
translations across the 157 genomes (parameters, -c 0.7 -M 3200 -T 0 -d 60) (Fu et al., 2012). Using a 70% amino acid identity
threshold, the collection of 986,179 translated genes was reduced to 8,410 clusters. A total of 3,519 clusters contained exactly
one gene from each of the 157 assembled genomes. The 3,519 clusters represent an approximation of the S. cerevisiae core genome
across the 157 genomes evaluated. A conservative set of single copy genes was identified across the 157 genome assemblies, the
Saccharomyces paradoxus genome and an additional set of 24 S. cerevisiae strains assembled in recent studies (Table S1) (Berg-
ström et al., 2014; Liti et al., 2009; Strope et al., 2015). First, one-to-one ortholog pairs were extracted from previously identified or-
thologs between S. paradoxus (NRRL-Y17217) and S. cerevisiae S288c (https://portals.broadinstitute.org/regev/orthogroups/
orthologs/Scer-Spar-orthologs.txt) (Wapinski et al., 2007). A total of 5,096 one-to-one ortholog pairs were identified, with a subset
of 5,084 annotations mapped to the S. cerevisiae S288c reference gene set used. The set of 5,084 genes was filtered to a smaller
subset of 2,417 genes based on i) inclusion in the set of 3,333 S. cerevisiae S288c ORFS that could be mapped by liftOver across
all 157 genomes, and ii) inclusion in the set of 3,519 clusters uniquely represented in each of the 157 strains based on CD-HIT results.
Lastly, the presence and the single-copy status of the 2,417 genes were investigated in the additional 24 previously sequenced
S. cerevisiae strains (http://www.moseslab.csb.utoronto.ca/sgrp/download.html and http://www.ncbi.nlm.nih.gov/genbank with
accession numbers as reported in (Strope et al., 2015) - last access June 2015), further reducing the selection to a conservative
set of 2,026 genes. The final set included 2,020 single-copy genes after removing six highly fragmented sequences.

Reference-Based Alignments and Variant Calling


To identify single nucleotide polymorphisms (SNPs) and short insertions and deletions (InDels), reads were pre-processed by filtering
low quality and ambiguous reads, adapters and PhiX contaminations, using Trimmomatic (v0.30) (Bolger et al., 2014). Clean reads

e4 Cell 166, 1397–1410.e1–e10, September 8, 2016


were mapped to the S. cerevisiae reference genome S288c (R64-1-1, EF4-Ensemble Release 74) with the Burrows-Wheeler Aligner
(BWA, v0.6.1) using default parameters except for –q 10 (Li and Durbin, 2009). Non-primary alignments and non-properly paired
reads were filtered out and duplicate reads were marked using Picard Tools (v1.56) (http://picard.sourceforge.net). Before variant
calling, reads were locally realigned in order to eliminate false positives due to misalignment of reads, which was followed by a
base quality score recalibration step, using the Broad Institute Genome Analysis Toolkit (GATK v2.7.2) (McKenna et al., 2010).
SNP and InDel discovery and genotyping was performed across all 157 strains simultaneously, to minimize false positive calls,
with a minimum base quality score of 20, a standard minimum confidence threshold for calling of 50 and a standard emitted confi-
dence of 20. Sites with total quality by depth < 2.00 and mapping quality < 40, genotype quality < 30 and genotype depth < 5 were
filtered out using GATK Variant Filtration. For SNP calling, sites overlapping InDels, sites with more than 50% missing genotypes and
multiallelic sites were filtered out using VcfTools (v0.1.10;v0.1.14) (Danecek et al., 2011). The final set of SNPs included a total of
421,361 biallelic segregating sites accounting for a total of 10,576,934 SNPs across all strains. SnpEff (v3.3) (Cingolani et al.,
2012) was used to annotate and predict the effect of the variants.

Phylogenetic Analyses
Phylogenetic Tree for the Sequenced Collection - Figure S1A
Multiple sequence alignments (MSAs) for the 2,020 amino acid sequences identified above were generated using MAFFT (v7.187),
with default settings and 1,000 refinement iterations (Katoh and Standley, 2013). Codon alignments were obtained from MSAs of pre-
dicted amino acid sequences and the corresponding DNA sequences by the PAL2NAL program (v14) (Suyama et al., 2006). Quality
checks and format conversions were performed using trimAl (v1.2) (Capella-Gutiérrez et al., 2009). The full set of codon alignments
were concatenated into a supermatrix using FASconCAT (v1.0) (Kück and Meusemann, 2010). The resulting supermatrix included
158 taxa and 2,782,494 positions, 99.174% nucleotides, 0.826% gaps and 0% ambiguities. The matrix was partitioned based on
all 2,020 gene blocks and all three codon positions within each block, resulting in 6,060 distinct data partitions accounting for
144,171 distinct alignment patterns. Twenty completely random starting trees and 20 randomized stepwise-addition parsimony
starting trees were obtained using RAxML (v8.1.3) (Stamatakis, 2014). Robinson-Foulds (RF) distances were computed between
all trees in both the fully random and parsimony tree sets, to avoid systematic bias due to low diversity in starting trees. Because
the stepwise addition algorithm generated a set of starting trees with low diversity, all the subsequent analyses were conducted
with fully random starting trees. Twenty maximum-likelihood (ML) tree searches were performed on each of the 20 fully random start-
ing trees under the GTRGAMMA model (4 discrete rate categories) using ExaML (v3) and the rapid hill climbing algorithm (-f d) (Kozlov
et al., 2015). During the ML search, the alpha parameter of the model of rate heterogeneity and the rates of the GTR model of nucle-
otide substitutions were optimized independently for each partition. The branch lengths were optimized jointly across all partitions.
For each starting tree, the best tree was selected based on the highest log-likelihood score. Parameters and branch lengths were re-
optimized on the best 20 topologies with ExaML (-f E) using the median of the four categories for the discrete approximation of the
GAMMA model of rate heterogeneity (-a). The tree with the best overall log-likelihood score of all 20 tree inferences was considered
the final ML tree. Non-parametric bootstrap analysis was performed on the concatenated matrix using RaxML (v8.1.3). The a pos-
teriori boot-stopping criterion (Pattengale et al., 2010) (MRE bootstrapping convergence criterion) was applied to define the number
of replicates. After every 50 replicates, the set of bootstrapped trees generated so far is repeatedly (100 permutations) split in two
equal subsets, and the Weighted Robinson-Foulds (WRF) distance is calculated between the majority-rule consensus trees of
both subsets (for each permutation). Low WRF distances (< 3%) for > = 99% of permutations were used to indicate bootstrapping
convergence. Convergence was reached after 250 replicates: average weighted Robinson-Foulds distance (WRF) = 1.86%, percent-
age of permutations in which the WRF was % 3.00 = 100%. The tree was visualized and rooted in FigTree (v1.4.2) using S. paradoxus
as the outgroup (http://tree.bio.ed.ac.uk/software/figtree/).
Phylogenetic Tree for the Extended Collection - Figure 1A
In order to compare our strain collection with previously sequenced strains, we included 24 additional isolates, previously described
in Liti et al. (2009) (re-sequenced in Bergström et al. (2014)) and Strope et al. (2015) (Table S1). Generation of MSAs and construction
of the concatenation matrix was performed as described earlier. The resulting supermatrix included 2,785,239 positions, 99.077%
nucleotides, 0.922% gaps and 0.001% ambiguities. The matrix was partitioned based on all 2,020 gene blocks and all three codon
positions within each block, resulting in 6,060 distinct data partitions, accounting for 163,920 distinct alignments patterns. The ML
searches and re-optimization were run on 30 fully random starting trees as described above, using RAxML (v8.1.3) and ExaML (v3).
Non-parametric bootstrap analysis was performed as described above. Convergence was reached after 250 replicates: average
weighted Robinson-Foulds distance (WRF) = 2.10%, percentage of permutations in which the WRF was % 3.00 = 99%. The tree
was visualized and rooted in FigTree using S. paradoxus as the outgroup.
Multi-locus Phylogeny - Figure S1C
Nine partial genes previously used to genetically characterize 99 Chinese isolates of S. cerevisiae (Wang et al., 2012) were recovered
from 194 previously sequenced genomes (Table S2) and from the 157 isolates sequenced in this study, for a total of 450 strains. Each
gene was aligned with MAFFT (v7.187) and the final MSAs were concatenated with FASconCAT (v1.0). The concatenated alignment
was trimmed using trimAl (v1.2) with the automated1 option, optimized for ML tree reconstruction. The resulting supermatrix included
19,254 positions, 83.348% nucleotides and 16.652% gaps. The matrix was partitioned based on gene blocks in nine distinct parti-
tions with joint branch length optimization. ML search on 30 fully random starting trees and non-parametric bootstrap analysis were

Cell 166, 1397–1410.e1–e10, September 8, 2016 e5


performed as described above. Convergence was reached after 550 replicates: average weighted Robinson-Foulds distance
(WRF) = 2.10%, percentage of permutations in which the WRF was % 3.00 = 99%. The tree was visualized and rooted in FigTree
using S. paradoxus as the outgroup.

Population Structure and Diversity Analysis


The model-based Bayesian algorithm fastSTRUCTURE (v1.0) was used to detect and quantify the number of populations and the
degree of admixture in the 157 sequenced genomes (Raj et al., 2014). The set of 421,361 biallelic segregating sites identified above
was filtered further by removing SNPs with a minor allele frequency (MAF) < 0.05 and SNPs in linkage-disequilibrium, using PLINK
(v1.07) (Purcell et al., 2007) with a window of size 50 SNPs advanced by 5 SNPs at a time and an r2 threshold of 0.5. fastSTRUCTURE
was run on a filtered set of 53,929 segregating sites, varying the number of ancestral populations (K) between 1 and 10 using the
simple prior implemented in fastSTRUCTURE. The number of iterations varied from 10 at K = 1 up to 80 at K = 10. K = 8 was found
to be optimal, i.e., scoring the highest marginal likelihood (log-marginal likelihood: 0.7298459788, total iterations: 60). Analysis of
estimated ancestry (Q) matrices and plotting were performed in R (v3.1) (R Development Core Team, 2011). The same set of 53,929
SNPs was used to perform a principal component analysis (PCA) as implemented in the SNPrelate package (v1.6.4) (Zheng et al.,
2012). Whole-genome levels of polymorphism were estimated using Variscan (v2.0.3) (Hutter et al., 2006), considering only sites
with valid high-quality alleles (> Q40) in at least 80% of strains within the group (defined with the NumNuc parameter adjusted for
each group together with CompleteDeletion = 0 and FixNum = 0).

Time Divergence Estimate


In order to estimate the timing of the split leading to the Beer 1 and Beer 2 clade, the mutation rate of yeasts in a beer fermentation
environment was calculated. This calculation was based on four assumptions:
US Beer Yeasts Originate from UK Beer Yeasts
Phylogenetic analysis of strains sequenced in this study and previously sequenced wild isolates reveals that US beer yeasts are
genetically closely related to European beer yeasts, but not to strains isolated from natural sources in the US. This strongly suggests
migration of strains from Europe to the US after colonization. Moreover, the average per-site nucleotide divergence (dXY) further in-
dicates that these US strains likely originate from the UK (Table S8). The per-site nucleotide diversity between subpopulations was
calculated on the set of 2,020 genes used for the inference of the strain phylogeny using the R package PopGenome (Pfeifer et al.,
2014).
The Split between US and UK Beer Yeasts Happened between 1607 and 1637
The first permanent English settlement in North America was established in 1607 in Jamestown, Virginia. In 1609, American ‘‘help
wanted’’ advertisements appear in London seeking brewers for this colony, indicating the importance of beer brewing in early colonial
America. In 1637, the well-known colonist Captain Sedgwick founded the first authoritatively recorded brewery in the Massachusetts
Bay Colony. Therefore, it is likely that even in early colonial America, beer was produced using yeasts that were brought in by English
settlers.
Beer Yeasts Reproduce Clonally during Beer Brewing
During beer brewing, yeasts do not face long periods of nutrient starvation. Nutrient starvation is generally required to induce spor-
ulation and thus initiate sexual reproduction. Indeed, our analyses of the sexual lifestyle of beer yeasts revealed that most beer
strains, especially from Beer 1, lost the ability to produce viable spores (only observed in 6% of the strains, and only in response
to severe nutrient-poor conditions), further indicating that sexual reproduction is not favorable, and definitely not common, for beer
yeasts during brewing.
Beer Yeasts Undergo around 150 Doublings per Year
As beer fermentations take around one week, and yeasts on average undergo a bit under three doublings per fermentation, it can be
estimated that they undergo about 150 generations per year.
Using these parameters, an estimated mutation rate per site per generation was calculated from the formula k = 2mt, where k is the
average per-site nucleotide divergence between US and UK strains (1.97E-03, see Table S8), m is the mutation rate per site per gen-
eration and t is the time in generations, assuming 150 generations per year and divergence of US and UK strains between 1607 and
1637. This calculation yields a mutation rate of 1.61-1.73E-08/bp/generation, a value approximately 50x greater than the value typi-
cally calculated in haploid laboratory strains in non-stressful conditions (Lynch et al., 2008). While this value might seem high, it is not
unreasonable for several reasons.
First, the mutation rate estimates obtained here are comparable to those measured in a directed evolution experiment performed in
6%–12% ethanol (Voordeckers et al., 2015). Although the conditions that were used in this directed evolution experiment differ from
real beer fermentations, they do show that ethanol has a drastic effect on the mutation rate. Moreover, a 6%–12% ethanol concen-
tration should be fairly comparable to the ethanol concentrations encountered in beer brewing over the past 400 years. It is a common
misunderstanding that the alcohol percentage of ale beer in the past centuries used to be much lower than it is now. Indeed, in the
time span that we considered in our calculations (the past 400 years), high-alcohol beers were being produced (Stan Hieronymus,
Anders Kissmeyer, Martyn Cornell and Ron Pattinson, personal communication). The generally low ethanol tolerance of beer yeasts
(as compared to wine and spirits yeasts, for example) and the presence of other stress factors during industrial fermentation (nutrient
starvation for example) might further contribute to an increased mutation rate.

e6 Cell 166, 1397–1410.e1–e10, September 8, 2016


Second, yeast only undergoes about three cell divisions during beer fermentations, which generally take place in the first 48 hr of
the fermentation. After this, the yeast cells are further exposed to high ethanol concentrations for several days, and it has been shown
for several microbes that in this state of quiescence, mutations can still accumulate (Loewe et al., 2003). Hence, mutations can also
occur in the second phase of fermentation, when the cells are not dividing, which implies that the mutation rate (per generation) in
industrial growing conditions should be higher than what is measured under conditions where the cells are dividing frequently, as is
usually the case in laboratory experiments.
Given the mutation rate estimate m = 1.61-1.73E-08/bp/generation, an average of 150 generations/year and an average divergence
dxy = 2.14E-03 substitutions/site between the UK/US and Belgium/Germany subclades in the Beer1 lineage, the last common
ancestor of the major Beer 1 subclades is calculated to have existed until dxy / (2 m * 150) = 443-412 years ago. A similar calculation
for Beer 2 (dxy = 1.79E-03 substitutions/site between the earliest diverging Beer 2 subclades) suggests that the last common
ancestor of Beer 2 existed until 371-345 years ago. Given the limited amount of information that could be used for dating, both
ages should be considered only rough approximations.

Copy-Number Variation Analysis


Copy-number variations (CNVs) were identified on the reference-based alignments. Initial read depth profiles were obtained for each
isolate based on the average read depth calculated in non-overlapping windows of 1000bp. In 68 samples (BE044-BE102, LA002,
SP008-SP011, NA005-NA007, WI019), a deviation in read depth was detected: instead of fluctuating around a constant line, the read
depth profile showed a convex trend with high depth at the terminal regions of the chromosomes that gradually decreased toward the
center. These samples also showed high local variance. This bias in coverage is further referred to as a ‘‘smiley pattern.’’ Since con-
ventional methods for CNV detection rely on read depth as a proxy for copy number, these methods were not applicable on the
‘‘smiley pattern’’ strains. To tackle this problem, a custom-built algorithm was developed, dubbed Splint (available upon request),
which instead measures the size of discontinuities in read depth by using a discontinuous spline regression technique. In Splint,
the data were modeled as the product of the bias and the copy number of each region, plus error. Here, the bias was assumed to
be a continuous curve (expected depth as a function of chromosomal location), modeled as a smoothing spline. The copy number
on the other hand is a piecewise constant function, with discontinuities at breakpoints in between regions of constant copy number.
This was modeled as a sum of indicator functions, one for each region. After regression, the fitted value of the coefficient of each
indicator function is proportional to the copy number in the corresponding region. The regression method requires the locations
of the discontinuities as input values. Initially, these are located in a rough manner by comparing the 50kb regions to the left and right
of each 1000 bp window. If the difference between the median depth in the left and right regions is small, the frame is not likely to
contain a copy-number breakpoint; if the difference is large, it may contain a breakpoint. This measure is smoothed (by moving
average) and corrected for linear bias by subtracting a linear trendline. Peaks that exceed 2.5 times the sample-wide median, in ab-
solute value, are annotated as breakpoints. However, this method only gives rough coordinates of discontinuities, delimiting large
regions of constant copy number. After this rough estimation, an initial regression was run, and a hidden Markov model (HMM)
was used to find regions where the regressed values are significantly different from the data. The HMM accepts deviance of the esti-
mated curve from the data as input signals (15% greater than, 15% lesser than, or approximately equal), and aggregates high den-
sities of deviant signals into output states (under-estimation, over-estimation or correct estimation of copy number; better results
were obtained when a special state was reserved for total deletions. The windows where the state changes are seen as likely break-
points. The regression and HMM were re-evaluated until no more deviating regions could be found. The regression coefficients of the
piecewise constant function in the final regression are proportional to the copy number in the corresponding regions, but the propor-
tionality constant depends on the shape and scale of the continuous (spline) factor in the regression, which is different for each chro-
mosome. The form of the spline is such that its value is always 1 in the left telomere for each chromosome. Using the regression co-
efficients of the piecewise function as a proportional proxy for the copy numbers implicitly assumes that the bias is the same for each
chromosome at the left telomere. We observe that the smiley pattern is generally similar on both sides of the chromosomes, so we
repeat the regression setting the spline value at the right telomere at 1, and instead use the means of the two sets of regression co-
efficients to estimate the copy number. Splint was run using frames of 1000bp and 500bp. Shorter frames will result in higher reso-
lution of the CNV calls at the cost of an increased rate of false positive calls. Because results that depend on the window size were
not deemed robust, only CNVs found in both 1000bp and 500bp window analyses were used in the final results. The functional enrich-
ment analysis of CNV-driven genes was carried out using the Gorilla database (Eden et al., 2009) using the complete set of S. cerevisiae
genes as the reference. False discovery rate (FDR) Benjamini & Hochberg adjusted q % 0.05 were considered significant.

Character Evolution Analysis


Ancestral character states for the production of the phenolic off-flavor 4-vinyl guaiacol (4-VG) were estimated based on the se-
quences of the two key genes PAD1 and FDC1. The protein-coding nucleotide sequences of PAD1 and FDC1 were retrieved
from the 157 de novo assemblies obtained in this study and from the outgroup species S. paradoxus. Because the annotation pro-
cedure described above excluded all genes including internal stop codons, a local BLAST database was set up for all the genomes
and BLASTN searches were performed (1E-04 E-value cut-off) using the PAD1 and FDC1 coding sequences from the S. cerevisiae
strain S288c reference genome (R64-2-1). We found that in one bioethanol strain (BI002), both genes resulted from an introgression
event from S. paradoxus. S. paradoxus introgression events involving PAD1 and FDC have been previously reported for the Brazilian

Cell 166, 1397–1410.e1–e10, September 8, 2016 e7


bioethanol strain BG1 (Dunn et al., 2012). In another bioethanol strain (BI006), PAD1 was shown to be a chimeric gene between the
S. cerevisiae and S. paradoxus allele. This latter strain was therefore excluded from the analysis. Considering the presence of stop
codons and frameshift mutations in some of the sequences, the sets of protein-coding sequences for either PAD1 or FDC1 were
aligned with MACSE (v1.0b), a tool that prevents the disruption of the underlying codon structure when aligning non-functional se-
quences (Ranwez et al., 2011). Each MSA was partitioned based on codon positions according to three schemes: i) CP111 – all codon
positions are combined; ii) CP112 – first and second codon positions are combined and third position is independent; iii) CP123 all
codon positions are independent. The optimal partitioning scheme and the best-fit nucleotide substitution model for each partition
of the two MSAs were estimated using the software PartitionFinder (v1.1.1) (Lanfear et al., 2012). For all analyses, branch lengths
were linked between partitions and 24 substitutions models were considered for each partition. Model and partitioning scheme
were selected based on the Bayesian information criterion (BIC). For PAD1, the best scheme was obtained with CP112, and K80
and HKY+G (gamma-distributed rate heterogeneity across sites using four rate categories) were the best-fit nucleotide substitution
models assigned for the two partitions respectively. For FDC1 the best scheme was obtained with CP123, and HKY was selected as
the best-fit nucleotide substitution model for each of the three partitions. All phylogenetic analyses and ancestral state reconstruc-
tions were performed in BEAST (v1.8.2) (Drummond et al., 2012). The trait was treated as discrete and one of two character states
were assigned to each isolate based on GC analysis (see further): production of 4-VG, state = 1; no production of 4-VG, state = 0.
Monophyly was imposed on all the S.cerevisiae strains with the exception of strain BI002. The phylogenetic tree and ancestral state
for all internal nodes were simultaneously inferred for each gene, to account for phylogenetic uncertainty. The partitioned MSAs were
examined under three different clock models: i) a global molecular clock with fixed evolutionary rates; ii) an uncorrelated relaxed mo-
lecular clock with an underlying lognormal distribution (UCLD) on the evolutionary rates; iii) a random local molecular clock (RLC) that
allows different evolutionary rates in sub-regions of the phylogenetic tree (Drummond and Suchard, 2010). Additionally, each clock
model was tested in combination with an asymmetric versus a symmetric model of trait evolution (Lemey et al., 2009). A pure-birth
Yule speciation prior and a random starting tree were used for all the analyses. The suitable number of iterations to allow convergence
and proper mixing of the Markov chain Monte Carlo (MCMC) runs was determined for each MSA-molecular clock-trait model com-
bination using Tracer (v1.6) (Rambaut et al., 2014). Effective Sample Size (ESS) > 100 was reached for all parameters in each run.
Model selection was performed in BEAST by comparing marginal likelihoods estimated using path sampling and stepping-stone
sampling with a chain length of 2 million generations sampling every 200 steps (Baele and Lemey, 2013; Baele et al., 2012). Model
comparison showed strong support for the RLC model and asymmetry in the evolution of the trait for both PAD1 and FDC1. A second
independent run for both PAD1 and FDC1 was performed under the most supported scheme to ensure convergence to the same
topology. LogCombiner (Drummond et al., 2012) was used to remove burn-in trees (10%) and to resample to a frequency of
10,000. The final Maximum Clade Credibility (MCC) trees were obtained in TreeAnnotator (Drummond et al., 2012) on 19,998 trees
in total for each gene. MCC trees were visualized in FigTree (v1.4.2).

Determination of Cell Ploidy


DNA content of the sequenced strains (a measure for the ploidy level) was determined by staining cells with propidium iodide (PI) and
analysis of 50,000 stained cells by flow cytometry on a BD Influx (BD Biosciences, USA). The fluorescent signal of previously estab-
lished haploid (BY4742), diploid (BY4743) and tetraploid (BR001) strains was used to generate a calibration curve. Highly flocculent
strains were excluded from the analysis (missing bar charts Figure 2A).

Phenotypic Analysis
Flavor Production and Flocculation in Fermentation Conditions
To assess the production of aroma-active compounds, lab-scale fermentation experiments were performed. These fermentations
were performed in rich growth medium (YPGlu 10%; peptone 2% w v-1, yeast extract 1% w v-1, glucose 10% w v-1). Yeast
precultures were shaken overnight at 30 C in test tubes containing 5mL of yeast extract (1% w v-1), peptone (2% w v-1) and glucose
(4% w v-1) medium (YPGlu 4%). After 16 hr of growth, 0.5mL of the preculture was used to inoculate 50mL of YPGlu 4% medium in
250mL Erlenmeyer flasks, and this second preculture was shaken at 30 C for 16 hr. This second preculture was used for inoculation
of the fermentation medium (YPGlu 10%) at an initial optical density (at 600nm; OD600) of 0.5, roughly equivalent to 107 cells mL-1. The
fermentations, performed in 250mL Schott bottles with a water lock placed on each bottle, were incubated statically for 7 days at
20 C. Weight loss was measured daily to estimate fermentation progress. After 7 days, the fermentations were stopped, filtered
(0.15 mm paper filter) and samples for chromatographic analysis, density and ethanol measurements were taken.
Headspace gas chromatography coupled with flame ionization detection (HS-GC-FID) (Agilent Technologies, USA) was used for
the quantification of yeast aroma production. The GC was calibrated for 16 important aroma compounds, including esters (ethyl ac-
etate, isobutyl acetate, propyl acetate, isoamyl acetate, phenyl ethyl acetate, ethyl propionate, ethyl butyrate, ethyl hexanoate, ethyl
octanoate, ethyl decanoate), higher alcohols (isoamyl alcohol, isobutanol, butanol, phenyl ethanol), acetaldehyde and 4-VG. The GC
was equipped with a headspace autosampler (PAL system, CTC analytics, Switzerland) and contained a DB-WAXETER column
(length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.5 mm, Agilent Technologies, USA) and N2 was used as the carrier
gas. Samples were heated for 25 min at 70 C in the autosampler. The injector block and FID temperatures were both kept constant
at 250 C. Samples of 5mL filtered fermentation medium were collected in 15 ml glass tubes containing 1.75 g of sodium chloride
each. These tubes were immediately closed and cooled to minimize evaporation of volatile compounds. The oven temperature

e8 Cell 166, 1397–1410.e1–e10, September 8, 2016


was held at 50 C for 5 min, after which it increased to 80 C at 4 C min-1. Next, it increased to 200 C at 5 C min-1 and was held at
200 C for 3 min. Results were analyzed with the Agilent Chemstation software (Agilent Technologies, USA).
Additionally, after fermentation, the flocculation character of each strain was scored visually from 1 (not flocculent) to 6 (extremely
flocculent, big flocs).
Ethanol Accumulation Capacity
To assess the maximal ethanol accumulation capacity of all strains, fermentation tests were performed in rich medium containing
35% w v-1 glucose. Yeast precultures were shaken overnight at 30 C in test tubes containing 5mL YPGlu 4%. After 16 hr of growth,
0.5mL of the preculture was used to inoculate 50mL of YPGlu 4% medium in 250mL Erlenmeyer flasks, and this second preculture
was shaken at 30 C for 16 hr. This preculture was used for inoculation of the fermentation medium (peptone 2% w v-1, yeast
extract 1% w v-1, glucose 35% w v-1; YPGlu 35%) at an initial OD600 of 0.5, roughly equivalent to 107 cells mL-1. The fermentations,
performed in 250 ml Schott bottles with a water lock placed on each bottle, were incubated statically for 14 days at 30 C. Weight
loss was measured daily to estimate fermentation progress. After 14 days, the fermentations were stopped, filtered (0.15 mm
paper filter) and samples for ethanol measurements [performed with the Alcolyzer Beer DMA 4500M (Anton Paar, Austria)] were
taken.
Screening for Environmental and Nutrient Stress Tolerance
All strains were tested in robot-assisted, high-throughput spotting assays in several conditions. All isolates were evaluated on
YPGlu 2% agar (Yeast Extract 1% w v-1, Peptone 2% w v-1, Glucose 2% w v-1, agar 2% w v-1) for (i) temperature tolerance
(4 C - 16 C - 30 C - 40 C), (ii) sugar- and/or osmotolerance using increasing concentrations of glucose and sorbitol (final
osmolyte concentration of 46 - 48 - 50% w v-1), (iii) acid tolerance using increasing concentrations of acetic (50 - 75 - 100mM),
levulinic (25 - 50 - 75mM) and formic acid (50 - 75mM), (iv) sulphite tolerance using increasing concentrations of SO2
(1.50 - 2.25 - 3.00mM), (v) ethanol tolerance using increasing concentrations of ethanol (5 - 7 - 9 - 10 - 11 - 12 - 13% v v-1), (vi)
actidione ( = cycloheximide) tolerance using increasing concentrations of actidione (0.2 - 0.4 mgL-1), (vii) halotolerance using
increasing concentrations of NaCl (250 - 500 - 100mM) and KCl (500 - 1000 – 1500mM) and (viii) metal tolerance using
0.075mM copper and increasing concentrations of cadmium (0.3 - 0.4 - 0.5mM). Stressor concentrations were selected
based on previous pilot experiments with a broader concentration range (data not shown). Additionally, temperature tolerance
(10 C - 39 C) on three different carbon sources (glucose, fructose, sucrose, ethanol and maltose) was assessed on YP agar sup-
plemented with 2% w v-1 of one of the carbon sources (or, in case of ethanol, 2% v v-1). For each of these experiments, growth on
YPGlu 2% agar on 20 C was used as a control condition.
For utilization of different carbon sources, experiments were performed on SC (Synthetic Complete) agar containing 2% w v-1
galactose, glycerol, melibiose, sorbitol, ethanol, fructose, sucrose or maltose as sole carbon source. Additionally, maltose and mal-
totriose were assessed in liquid medium (see further). Growth on SC 2% glucose at 20 C was used as a control.
Prior to the experiment, the 96-well microtiter plates containing the isolates (stored at 80 C) were thawed and spotted on YPGlu
2% agar and incubated at 30 C for 48 hr. Next, 96-well plates containing 150ml of YPGlu 2% in each well were inoculated with the
isolates and incubated overnight at 30 C on a microtiter plate shaking platform (Heidolph Instruments, Germany) at 600rpm, allowing
the cells to reach stationary phase. Then, the OD600 of all wells was measured using a microtiter plate reader (Molecular Devices,
Sunnyvale, USA). Subsequently, the cell density was manually adjusted to OD600 z0.1 in a second 96-well microtiter plate using ster-
ile deionized water in order to standardize the starting cell density for all isolates. This plate was used as the source plate for spotting
the test media. In order to maximize throughput and reproducibility, a high-density array robot (Singer Instruments, UK) was used for
all spotting or replication steps. After spotting, all plates were sealed using plastic paraffin film and all plates (except for plates used in
the thermotolerance assays) were incubated at 20 C. After 4-14 days of incubation (depending on the experiment), all plates were
scanned using a high-definition scanner (Seiko Epson, Japan). Scanned images were processed using ImageJ combined with the
ScreenMill macro (Dittmar et al., 2010). Data were processed by calculating relative growth compared to the control condition,
and subsequent normalization by conversion to z-scores (Table S5). Heat maps were obtained using the Gene-E software (http://
www.broadinstitute.org/cancer/software/GENE-E/). Strains were hierarchically clustered based on phenotypic behavior using a
centered Pearson correlation metric and average linkage mapping.
Maltose and Maltotriose Fermentation Capacity in Liquid Medium
For maltose and maltotriose fermentation capacity, experiments were performed in 96 well plates with 150 ml SC liquid medium con-
taining 1% (w v-1) of maltose or maltotriose, supplemented with 3 mg L-1 antimycin to block respiration. Pregrowth was performed as
described above, and cells were inoculated at OD600 z0.1. OD600 was assessed after 4 days of growth at 20 C (shaken, 900rpm).
Growth in SC liquid medium containing 1% glucose at 20 C was used as a control.
4-VG Production
Screening for 4-VG production was assessed by measuring 4-VG production of each strain in medium enriched in the 4-VG precur-
sor, ferulic acid. Strains were pregrown on YPGlu 2% agar, a single colony was picked and directly inoculated in a GC vial filled with
5mL of test medium (YPGlu 2% supplemented with 100mg L-1 ferulic acid). Vials were capped (but not completely closed) and wrap-
ped with plastic paraffin film and statically incubated at 30 C for 3 days. Next, 4-VG concentration was measured using GC-FID as
described earlier. Strains were scored as 4-VG+ if the concentration produced was significantly higher compared to the non-inocu-
lated fermentation medium.

Cell 166, 1397–1410.e1–e10, September 8, 2016 e9


Investigation of the Yeast’s Sexual Life Cycle
Sporulation was induced on minimal sporulation medium [1% (w v-1) KAc, 0.05% (w v-1) amino acids, 2% (w v-1) agar] at 23 C after
pre-growth in YPGlu 2%. Tetrad dissection of 4 tetrads of each strains was carried out using a Singer micromanipulator (Singer
Instruments, UK), and mating-type determination of all germinated spores was carried out by mating-type PCR.

Development of Artificial Hybrids


Sporulation, tetrad dissection, and mating type characterizations were performed as described earlier. For SNP genotyping, PCR
primers were developed for the W497* stop-gained mutation in FDC1. To detect segregants carrying the stop-gained mutation,
following primers were used: W497-FW (50 -TGCAGATCAGATGGCTTTTG-30 ), W497-RV-STOP (50 -GCAATTATTTATATCCGTACCT
TTTT-30 ). To detect the alternative allele (without the stop-gained mutation), following primers were used: W497-FW (50 -TGCAGATC
AGATGGCTTTTG-30 ), W497-RV-ALT (50 -GCAATTATTTATATCCGTACCTTTTC-30 ).
To hybridize haploid segregants, a direct mating approach as described in Steensels et al., (2014), was performed. The segregants
were first streaked to single colonies on a YPGlu 2% agar plate. One colony of each segregant was picked, and both were mixed on a
second YPGlu 2% agar plate. Ten microliters of distilled water were added to the mixed cell cultures to increase mixing efficiency.
The plates were dried and incubated at room temperature for 10-12 hr. A small fraction of the spot was picked with a toothpick and
streaked to single colonies on a fresh YPGlu 2% agar plate. After 48 hr of incubation, the diploid status of the resulting colonies was
verified by mating type PCR, the presence of both genomes in the hybrid by Interdelta Analysis (Legras and Karst, 2003).

QUANTIFICATION AND STATISTICAL ANALYSIS

Standard statistical analyses were conducted in RStudio (v.0.98.994) (https://www.rstudio.com/) with custom scripts.
Preprocessing of the phenotypic data to produce Figure 3A and Table S5 consisted of a conversion to z-scores, calculated as fol-
lows: z-score = (Xi - m)/s, where Xi is the data value for strain i, m the mean of all strains and s the standard deviation across all strains.
To facilitate direct comparison between different strains for a specific trait, a reference condition for each environmental stressor-
related trait was determined. This reference condition is defined as the most stringent condition (i.e., the condition with the highest
stressor concentration or most extreme temperature) where around 50% of the investigated strains still managed to reach a colony
area greater than 10% of their colony area in the control condition (YPGlu 2% agar, 20 C).
The phenotypic variability explained by the set of mutations identified in MAL11 gene was computed as following: first, all SNPs
and InDels were searched in pairwise comparison for high correlations (> 0.90). A Ward clustering was additionally performed to
retain or exclude mutations according to the clusters identified. Second, a linear regression analysis was performed on individual
mutations; the mutation was retained if significantly associated with the phenotype, with p < 0.001 considered as significant.
Last, REML (restricted maximum likelihood) analysis with completely random effects was carried out on the final set of SNPs in
SAS (v9.4).

DATA AND SOFTWARE AVAILABILITY

Data Resources
The accession numbers for the de novo assembly data reported in this paper are DDBJ/ENA/GenBank: Bioproject ID,
PRJNA323691, Biosample ID SAMN05190362-SAMN05190518, and MBUB00000000-MCAB00000000 (Table S1).

e10 Cell 166, 1397–1410.e1–e10, September 8, 2016


Supplemental Figures

A B
Bee

BE032
BE059
r2

BE062

BE021
BE091

BE080
BE084
SP006

BE083
SP004

4
BE0 3
BI005

BE03
BE0 0

BE0986
BI00

BE01
BE0314

BE 2
Origin

BE 0 3
WI0 07
1.00

063
WI0 18

0
14

BE 0 3 9
2
WI0 10

BE 011
WI0 17

BE 040
WI0 03

BE

BE 085
W I006
Beer e

W E00 7
W 002
I0

B 02
in 0.50

SA I01 5

B I01 4
K=2
W I01 0

LA E0 9
Wine W

00 02
W 02
B

2
1
W
E
8
SP I00 01 6

Estimated ancestry (Q)


Spirits W 01 4
W I00 1
BE I01 0
W 01 1
0.00
1.00
BE I00 5 SP 00 1
Saké

WA
BE 0 8 8
9 LA L00 3
W 00
WI0 024 WL 002
Wild SP 01
0 WL 004 0.50
K=4
WI0 02 SA 007
Bio-ethanol BE 13 SA 05

Asi
0 0
SP0033 SA 003
Bread SP00 8 SA 6
0.00

a
WI0087 SA00 1 1.00
SA00
Laboratory BE00
SP001
6 BI001
BI003
S.paradoxus BE028
BE029
BI004 0.50
K=6
WL004
BE005 WI002
SP009 spar
0.00
Lineage BR004 BE049
Mixed

BE023 BE054 1.00


BE061 BE101
Beer 1 BE038 BE037
BE048
SP003
Britain BE025
1
BE064
BE10
0.50
K=8
BR00 3 BE01 0
US BR00 2 BE03 0 0.00
BR00 7 BE07 6
Belgium/Germany

BE047
BE048
BE049
BE053
BE054
BE055
BE056
BE064
BE069
BE070
BE079
BE089
BE100
BE101
BE001
BE009
BE010
BE037
BE090
BE066
BE050
BE036
BE045
BE099
BE094
WI012
BE065
BE067
BE071
BE082
BE098
BE102
BE007
BE068
BE044
BE051
BE046
BE073
BE075
BE076
BE077
BE078
BE081
BE087
BE008
BE012
BE016
BE022
BE026
BE041
SP005
BE031
BE015
BE043
BE061
WL005
WL006
WL007
BE025
SP003
BR002
BR003
BR004
BR001
BE028
BE023
BE006
BE029
SP001
BE038
BE005
SP011
BE088
BE014
BE020
BE024
BE030
BE033
WI001
WI003
WI004
WI005
WI006
WI007
WI009
WI010
WI011
WI013
WI014
WI015
WI017
WI018
SA002
SP002
BE086
BE092
BE002
BE032
BE003
BE063
BE021
WI019
BE083
BE084
BE085
BE091
BE004
BE011
BE027
BE034
BE039
BE040
BE080
BE013
BE062
BE059
SA001
SA003
SA004
SA005
SA006
SA007
BI001
BI003
BI004
WL004
WI002
WL002
WL003
WL001
LA001
SP004
SP006
WI016
BI005
BI002
LA002
BE018
SP010
BE017
BE074
BE072
BE093
BE035
BE060
BE019
BE057
WI008
SP008
SP007
SP009
BE058
BE042
BE052
BE095
BE096
BE097
00
WL 05 BE 9
0
WL 006 BE 089
Mixed WL 072 WI0 009
BE 074 BE 12
BE 047 Belgium/

in
Wine BE 093 BE 053 Britain US Mixed Wine Beer 2 Asia Mosaic NS
BE 0 1 7

i ta
Beer 2 BE 046
BE 070
B 05
Germany

Br
8 B E0 6
BE 00 1
B E E04 7 BE E00 55
West Africa B E08 06 1
BE E01 3

B 0 9

9
B E07

E0 50

B
BE 01 2

BE 09 0

Asia
45
B E 04 5

BE E09 4
B

B 09
BE 0 8 1

BE 066
SP 016

BE 057
BE 005

BE 019
B

lg
BE 022

BE 035
e

BE 026

BE 60

iu
B E 77

BE 4 4
BE0776

BE 8
BE0 5

BE0682

m
0

78

BE0 8

0
0

BE09

/G
8

BE10
BE042

BE007
BE03

BE052

BE067
BE097

BE071
BE096

BE065
BE095
BE051
BE05

e rm
2

an r1
x1

y Bee

*
US

0.005

C
UWOPS83-787.3
UWOPS87-2

Origin
OakBor21-1
OakArd11
YJM1443

EXF6780
YJM1444
YJM1311

OakGri7
YJM456
YJM681

WL004

YJM1 479
ZP611

YJM4 400
ZP779
ZP785

SDO2 419
YJM1 51
YJM1 527

ZP793
SD4
YJM1M428

SDO3 s1
XJ1

SD 1573
BJ16
421
BJ20

SX9

s1
YJM1
YN4
YN5

T7
YJMM689

YN1

SDO 1402
BJ19
YJ

XJ6

YJMO8s1
BJ18
BJ9
BJ14
BJ108
YJ I008

6s1
s1
YJM
Wild
33

ZP53 50
W 52

11
BE0096
BE 095

SM69

YP M14 73
0
BE 097

ZP10

Y S12 34
90

ZP 0
BE N17

YJM651
YJ 12
ZP68

ZPPS1 8
H 19
HN 18

Z 65663
HNN16
H N9

Clinical

FJP653
H J2
Z 4
HN12
FJ 10

1
FJ N8

X X3
X J7
H 7

H LJ2
N

H J3
HNJ3

J LJ3
F 11
FJ N 6

Fermented source

SAA00 05
F L4
H N2

S J9

S 0 4
H N5

S A0

S A0 03
H N3
H J1

Z A0 06
S TW 01
Z J8
HNN10 1
F 12

U A0 1
K C 07
Laboratory
H HN 11

Y JM 1 i7
Y1 JM 15 389
Y JM ka
HN N1 14

Y yo 5

YJ 2 13 92
H N 13

88
H N J2
H F J7

60
NA

14
F J5

M
FJ 2 1
F

B W FJ 4 YN I00 03
L E0 30 6 B I0 04
CE s A0 18 3 B I0
N.P288 02
Y1K2
SK 0
c
B J1
B L2 14
J JM 86
01

Y P7 23
Z P8
S.paradoxus
Y LA Y55 1 Z N6 2
Y D3 134
YJJM6 001 S JM
M 27 Y D1 3
Y P 19
DB YJ JM1 W5 5 S J1
B X8
VP M1 43 S D2 5
Y G6 24 9 S P67 2
ANJM1 044 8 Z P65 7
3 38 Z 81 1
SO e1s 1 ZPP78 6
N4 1 .4
Y c Z P79 447 3-461
YJ JM XJ4 Z M1 S0
YJ M14470 YJWOP
YJ M16133 U 3
M
YJ 120 5 YNX7
S 5
YJMM6838 J
B 3
YJM 32 BJ 6
6
YJM1549 BJ2 7
YJM 555 BJ1 1
YJM 320 BJ2 2
YJM 554 BJ1 5
BE0541 BJ1
BE0565 BJ2
BE04 1 BJ8
BE08 4 BJ24
BE10 2 BJ725
BJ
BE00 2 BJ6 2
BE07 7 WI00 418
BE06 1 YJM1
BE09 8 HLJ1
8
BE06 JL5
BE0667 BJ28
BE045 BJ27
BE094 BJ4
BE050 BJ22
BE090 BJ23
BE099 SX2
BE069 SX11
BE056 SX5
BE058 SX10
BE047 SX3
BE037 SX1
WI012 SX6
BE009 SX4
spar
BE100 BR001
BE064 WL006
BE010 WL005
BE036 YJM1307
BE089 WL007
BE070 BE025
BE053 BR003
BE048 BR002
FostersB YJM693
BE001 Sigma127
BE079 YJM138 8b
BE055 YJM119 7
BE049 YJM120 9
BE101 YJM130 2
BE0544 YJM1 4
BE07 2 EC9-8 083
BE07 3 SP01
BE09 5 BE02 0
BE01 6 BR00 3
BE04 2 WL0 4
BE01 8 WL0 03
01
BE00 5 WL0
BE03 YJ 02
0
BE0619 RedM339
YJM Star
BE0057 BE 450
BE sO YJM063
er
FostBE0813 YJM 1478
04 YJM 1385
BE 031 YJM 1386
BE 078 YS 1355
BE 0 7 5 BE 9
B E 077 BE 006
BE 005 B 028
SP 076 SPE02
BE 017 B 00 9
BEE0262 B E08 1
B E02 5 BEE0343
B 00 2 B 08
BEE04 6 Y E01 4
B E01 1 B JM 3
B E04 7 B E08 193
Y E0 0
B E0873 Y JM 86
B E0 J2 S JM 145
B X s1 S P0 27 0
2
1g 061 3 B P0 08 1
AN BE 00 8 BEE09 07
B 0 1
SP E03 03 B E0 92
B I0 39 S E0 32
W E0 04 B P0 62
B E0 40 1 E I0 0
B E0 01 7 B X 02 4
Y E F7
B E 2 5
B E0 08 9 Y J 00 14
B E 00 W JM M1 3 5
I0 13 52
SP 14 JL1 3

B P 13 3 6
B I0 002 41
JL

S
E0 01

2
W P 10 4
C W BI0 006 3

S P 02

33
6

Z E 09 8 106
C BS I01 05

B I0 08 G1
M

YJLIB 796 6

W E P
YJ

3 0

B BV 17
ZPM24 24

D I0 41
W 8 4

W P6 02 0
YJ ZP I00 62

Z A0 99 4
E M 86 6

S
YJ XF 135 1

YJ JM 981 6
YJ M16719 6

Y JM 99
M
Y JM 975
AWYJM 141 17

Y JM 0
RI1112 5

Y E02 78
M 4

9
63 9

B JM9 93
1

8
Y
YJ NXX2

YJJM9787
BE 4531

Y JM9 69
N

Y M9

M
lvinBE0030

YJP577
Q 14

Z 578

9 2
WI0A23

ZPP579
M

Y WI0 0

Z 528 6765
YJMJM2714
1

L1 VPG
13 0

DBI005
Dav 83

W 1244
1e

YJM 1242
IO WI018

YJM 1477
YJMC9002

YJM 189
W 74

YJMF6761
EC11I007

EX 15
PY 18

WI0 3
EXF7 R4b
La

15

Vin111
EXF7 179

WI0
200

VL3M1341
ZP56 9

YJ 4
ZP86 8

WI00 1
ZP85 0
39

SP01 G1373
ZP56 0

DBVP
ZP5906

XJ5 336
YJM1 96
ZP633

AWRI7
ZP560

BE059
HUN9.1s 3
1
YJM1

BE021
BE002
ZP561

YJM682
DBVPG101 1

YJM1326
YJM1133
00
OakRom

ZP570

YJM1250
WI019
ZP636
ZP575

ZP541
ZP742
CLIB215
ZP565
ZP848

ZP736
ZP563
ZP562
ZP85

0.002

Figure S1. Phylogeny and Population Structure of the Industrial S. cerevisiae Strains, Related to Figures 1A–1C
(A) Phylogenetic tree of strains sequenced in this study, using Saccharomyces paradoxus as an outgroup. The tree was inferred from the concatenation matrix of
2,020 single copy orthologs. Black dots on nodes indicate bootstrap support values < 70%. Color codes indicate origin (names) and lineage (circular bands). The
basal splits of the five industrial lineages are indicated with a colored dot. Branch lengths reflect the average number of substitutions per site (scale bar = 0.005
substitutions per site).
(B) Population structure plot, with strain codes indicated.

(legend continued on next page)


(C) Maximum likelihood phylogenetic tree inferred from a concatenated alignment of nine partial genes of 450 S. cerevisiae isolates, using S. paradoxus as an
outgroup. Strains are colored according to origin. For a list of the included strains and corresponding references, see Table S2. Branch lengths reflect the average
number of substitutions per site (scale bar = 0.002 substitutions per site). Dots indicate nodes with bootstrap support values > 50%. Font color codes indicate
origin: wild (blue), clinical (orange), fermented source (green), laboratory (gray), not available (NA) (dark gray), S. paradoxus (black).
Amplification

Distribution of amplifications (red) and deletions (blue) in each strain, expressed in percentage of the genome affected (top) and in number of CNV events (bottom).
Deletion

WI002
WL004
BI004
BI003
BI001
SA001
SA006
SA003
SA005
SA007
SA004
WL002
WL003
WL001
LA001
SP010
WI016
BE018
LA002
BE002
WI019
BE004
BE027
BE085
BE040
BE011
BE039
BE003
BE086
BE013
BE034
BE084
BE080
BE021
BE032
BE062
SP006
SP004
BI005
BI002
BE014
BE030
WI014
WI007
WI018
WI010
WI017
WI003
WI006
Figure S2. Copy-Number Variability across the Phylogenetic Tree, Related to Figure 2

SA002
WI011
WI015
BE020
WI004
SP011
WI005
WI009
BE088
BE024
WI001
SP002
WI013
BE033
SP008
SP007
WI008
BE006
SP001
BE028
BE029
BE005
BR004
BE023
BE038
SP003
BE025
BR001
BR003
BR002
WL007
WL005
BE074
BE093
BE017
BE046
BE008
BE041
BE087
BE073
BE012
BE015
BE043
BE081
BE016
SP005
BE022
BE026
BE077
BE076
BE075
BE078

The phylogenetic tree is described in Figure S1A.


BE031
BE058
BE042
BE052
BE097
BE096
BE095
BE051
BE065
BE071
BE067
BE007
BE102
BE098
BE082
BE068
BE044
BE060
BE035
BE019
BE057
BE066
BE094
BE090
BE099
BE050
BE045
BE069
BE001
BE055
BE056
BE053
BE047
WI012
BE009
BE089
BE079
BE036
BE010
BE100
BE064
BE048
BE037
BE101
BE054
BE049

20%
10%

10%
20%
30%
40%
50%
60%

60
40
20
0
20
40
60
80
100
120
140
0%
CNV-(% of genome length) CNV-frequency
Figure S3. Trait Variation in Industrial S. cerevisiae Strains, Related to Figure 3
Graphic representation (violin plots) of trait variation within and between the subpopulations for different environmental stressors. Triangles represent median
values for each subpopulation. All values are depicted as relative growth compared to growth on medium without the stressor. Statistical analysis for each trait is
given in Table S6. au = arbitrary units.
B
A

0.1
0.01
Number of heterozygous sites Amount of mutations (%) Amount of mutations (%)

Significance at
Significance at
0%
25%
50%
75%
100%
0%
25%
50%
75%
100%

0
10000
30000
50000
70000
BE005 BE001
BE009
BE006 BE010
BE036
BE023 BE037
BE025 BE045
BE047

E
C
BE028 BE048

Britain
BE029 BE049
BE050
BE038 BE053
BE061 BE054
BE055
BR001 BE056
BR002 BE064

Mixed
Britain

BE066
BR003 BE069
BR004 BE070
BE079
SP001 BE089

D
C
C
BE090

US
SP003
BE094
WL005 BE099
WL006 BE100
BE101
WL007 WI012

BE014
BE020 BE007
BE024
BE030 BE044
BE033
BE088

E
BE051

D
C
SA002
SP002

Bel/Ger
SP004 BE065
SP011
WI001 BE067
WI003
WI004
US

BE068
Wine

WI005
WI006
WI007 BE071
WI009
WI010

B
A
BE082
WI011
WI013

Beer 2
WI014 BE098
WI015
WI017 BE102
WI018

BE008
BI001
BE012
BE015
BI003
BE016

F
D
BI004 BE022

Mixed
BE026
SA001 BE031
BE041
SA003 BE043
BE046
Asia

SA004
BE073
Bel/Ger

SA005 BE075
BE076
A
A

SA006 BE077

Wine
BE078
SA007 BE081
BE087
WL004 SP005

BE017 BE002
BE018 BE003
BE019
BE035 BE004
BE011
A

BE042
A

BE057 BE013
Asia

BE058 BE021
BE060
BE072 BE027
BE074 BE032
BE093 BE034
BI002 BE039
BI005
LA001 BE040
LA002 BE059
Beer 2

Mosaic

SP006 BE062
SP007 BE063
SP008 BE080
B

SP009
SP010 BE083
WI002 BE084
Mosaic

WI008 BE085
WI016 BE086
WI019
WL001 BE091
WL002 BE092

US

Asia
Wine
Mixed
Britain

Beer 2

Mosaic
Bel/Ger
Homozygous

(legend on next page)


Heterozygous
Figure S4. Heterozygosity of Industrial Yeasts, Related to Figure 4
(A) Percentage of total SNPs identified as heterozygous or homozygous in each strain. Boxes depict subpopulations and bar colors indicate the percentage of
homozygous (red) and heterozygous (blue) SNPs.
(B) Box plots depicting the total number of heterozygous sites per subpopulation. The mean number of heterozygous sites for each comparison group is indicated
by a triangle and the median by a horizontal line. Groups sharing the same letter (top) are not significantly different at the 10% or 1% level.

You might also like