You are on page 1of 48

The Caucasus as an Asymmetric Semipermeable Barrier to Ancient Human Migrations

Bayazit Yunusbaev
1,2,3
*, Mait Metspalu
1
*, Mari Jrve
1
*, Ildus Kutuev
1,2
, Siiri Rootsi
1
, Ene
Metspalu
1
, Doron M. Behar
1
, Krt Varendi
1
, Hovhannes Sahakyan
1,4
, Rita Khusainova
2,3
,
Levon Yeppiskoposyan
4
, Elza K. Khusnutdinova
2,3
, Peter A. Underhill
5
, Toomas Kivisild
1,6
,
Richard Villems
1
1
Department of Evolutionary Biology, University of Tartu and the Estonian Biocentre, Tartu, Estonia
2
Institute of Biochemistry and Genetics, Ufa Research Center, Russian Academy of Sciences, Ufa, Russia
3
Department of Genetics and Fundamental Medicine, Bashkir State University, Ufa, Russia
4
Institute of Molecular Biology of the Academy of Sciences of Armenia, Yerevan, Armenia
5
Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, USA
6
Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, Cambridge, UK
*These authors contributed equally to this work

One sentence summary
The first genetic survey of the Caucasus that combines genomic, Y chromosomal and
mitochondrial DNA variation reveals significant genetic uniformity among its ethnically and
linguistically diverse populations, as well as their largely Near/Middle Eastern origin with
visible, but minor gene flow from the East European Plain and elsewhere, thus suggesting
that movement of peoples between the Near/Middle East and Europe has occurred
predominantly along the western shores of the Black Sea.

The Caucasus, inhabited by modern humans since the Early Upper Paleolithic and
known for its linguistic diversity, is considered to be important for understanding
human dispersals and genetic diversity in Eurasia. We report a synthesis of autosomal,
Y chromosome and mtDNA variation in populations from all major subregions and
linguistic phyla of the area. Signals of regional patrilineal founder effects distinguish the
eastern from western North Caucasians. Variation within the autosomal genome is
consistent with predominantly Near/Middle Eastern origin of Caucasians, with minor
external impacts. Genetic discontinuity between North Caucasus and the East European
Plain contrasts with continuity through Anatolia and Balkans, suggesting major routes
of ancient gene flows and admixture.


The Caucasus is a mountainous region between the Black and Caspian Seas, divided by the
High Caucasus Mountain Range into the North and South Caucasus. The earliest evidence of
the dispersal of the genus Homo outside Africa comes from the Caucasus (1, 2); anatomically
modern humans appeared there at least 42,000 years ago (3). The linguistic diversity in the
Caucasus is remarkably high (4). The Abkhazian-Adyghe (Northwest (NW) Caucasian),
Nakh-Dagestanian (Northeast (NE) Caucasian), and Kartvelian (South Caucasian) language
families are indigenous; additionally, several languages of the Indo-European and Turkic
families are spoken in the Caucasus. To test whether the unique cultural and linguistic
diversity of the Caucasus is reflected in the genetic diversity of the extant populations in the
region, we have analyzed DNA samples from individuals of all the subregions and major
language groups of the Caucasus (Figure 1), for the first time using a synthesis of autosomal
[214 samples analyzed with Illumina 610 K SNP, together with relevant reference data (5, 6)
(Table S1)], Y chromosome [1952 samples (Table S2)] and mtDNA [2262 samples (Table
S3)] data. Our aim is to reveal how much the genetic structure of the Caucasian populations
reflects their linguistic heterogeneity and to understand the demographic history of the
Caucasus in relation to the genetic variation of the populations inhabiting the Near/Middle
East and the East European Plain.

Irrespective of their languages, in our heatmap plot of pairwise F
ST
, the Caucasus populations
show the lowest genetic distances to one another (7), followed closely by their distance to the
populations of the Near/Middle East, Turks in particular (Figure 2A). Meanwhile, a sharp
increase of genetic distance progressing from the Caucasus to the East Europe Plain is evident
(Figure 2A). The Indo-European-speaking Armenians and Ossetians follow the same pattern
and do not show higher genetic similarity to Indo-European-speaking populations from
Europe or the Near/Middle East. Similarly, the populations of the Caucasus cluster together
between their neighbors according to geography on the two-dimensional plots of the principal
components (PC) of autosomal variation (Figure 2B; Figure S1). Importantly, in contrast to
the continuous transition from the Near/Middle East to the Caucasus, there is a noticeable gap
between the Caucasus and the East European Plain (Figure 2B). Geography rather than
language based clustering can also be observed in the PC analysis of Y chromosome data
(Figure S2). Indeed, we see a clustering of Georgians, who belong to a Kartvelian family of
languages, and Indo-European-speaking Armenians from the South Caucasus, together with
the peoples of the NW Caucasus, of which the Karachays and Balkars speak Turkic, the
Ossetians Indo-Iranian, and the rest Abkhazian-Adyghe languages (Figure S2). In a more
detailed phylogenetic analysis (supporting online text) we discuss and illustrate the unique
structure of Y chromosome haplogroup frequencies in NE Caucasian populations, where the
Nakh and Dagestanian language speakers differ considerably. At the level of resolution of
common mtDNA haplogroup frequencies, the differences among the Caucasian populations
do not allow for the separation of geographic subregions or language groups of the Caucasus
(Figure S3). Our results concur with those of a study that showed a considerable difference
between the Y chromosome versus mtDNA and autosomal genetic heritage among the
highland Dagestanian (NE Caucasian) populations (8).

We analyzed the autosomal data of the Caucasus and reference populations (5, 6) using a
structure-like (9) clustering approach (10). At K=7, the major ancestry component of the
Caucasus populations (shown in blue) has comparable presence in the Near/Middle East, but
is almost absent among the immediate northern neighbors of the Caucasus the populations
of the East European Plain (Figure 3; Figure S4). Similarly to the blue ancestry component,
the green component is also ubiquitously present among the Caucasus populations,
irrespective of their linguistic affinities, but at much lower frequencies than blue. The green
component is most frequent in the Indus basin (Pakistan), extending to Central Asia and the
Near/Middle East, while fading away in Europe. Although structure-like clustering cannot be
readily interpreted into (human) migrations, this pattern might suggest a gene flow from
South Asia to the west and northwest. We cannot point to any well documented evidence of
such events during the historic period. However, the noticeable presence in NE Caucasus of
the Y chromosome haplogroup L3, so far found only in Pakistan (11) and North Iran (12),
testifies that at least some partilineal gene flows of such extent have taken place. The ancestry
component depicted with orange, ubiquitous in East Asia, is almost absent in the South
Caucasus and remains at very low frequencies among the North Caucasian populations as
well, with an only marginally stronger presence among the Balkars. Due to their distinct
population history (see below), the Kuban Nogays, living on the northern slopes of the
Caucasus, form an exception. The only populations in the Caucasus that share the other major
Near/Middle Eastern, North and East African ancestry component, depicted in light blue
(Figure 3), are the Armenians and the Caucasian Jewish communities [see also (6)], indicating
the introgression of a genetic component novel to the region. As far as the Armenians are
concerned, this novel component is perhaps linked to the abrupt linguistic change in this part
of South Caucasus at the end of Urartian-dominated period some 2600 years BP (13). On the
other hand, since there is virtually nothing known about the genetic ancestry of the extinct
Hurrians-Urartians, the gene flow postulated here may equally well antedate the arrival of the
Armenian language to the area.

The mountain range that divides the Caucasus into the North and South Caucasus has
apparently not been an impenetrable barrier for gene flows. This is illustrated by the relative
similarity of the ancestry component patterns of the Caucasus populations on either side of the
High Caucasus Mountain Range (Figure 3). However, the dark blue ancestry component,
dominant among the Slavic-, Turkic-, and Finnic-speaking East European Plain populations,
reaches the North Caucasus (10-20%), but just barely (~5%) crosses the High Caucasus to the
three linguistically distinct South Caucasus populations Armenians, Georgians and
Abkhazians (Figure 3). Remarkably, the decrease of Y chromosome haplogroup G and J1
frequencies towards the Eastern European populations inhabiting the area adjacent to the
North Caucasus, such as southern Russians and Ukrainians (14, 15), forms an abrupt
boundary (Figure S5; Figure S6), indicating that gene flow from the Caucasus in the northern
direction has been negligible. This sets a frontier of genetic discontinuity not at the mountain
range itself, but between the North Caucasus and East European Plain. However, similarly to
autosomal data, there is some evidence of opposite gene flow from East Europe into the
populations of NW Caucasus, as demonstrated by the presence of Y chromosome haplogroups
R1a1, including the European-specific lineage R1a1-M458 (16), and I2a (Table S2), and of
certain sublineages of the mtDNA haplogroup H (17). Equally importantly, these Y
chromosome data also show a genetic continuity between the NW Caucasus and South
Caucasus populations that sets them apart from the populations inhabiting the NE Caucasus.

Thus several lines of evidence suggest a genetic discontinuity between the Caucasus and the
East European Plain. To provide a quantitative estimate for this, we used multiple regression
on distance matrices (18, 19) to analyze our whole genome data in order to test whether
factors other than geographic distance can explain the observed variation in genetic distances
between populations. Considering pairwise F
ST
distances between populations, we tested
independent variables which are expected to increase genetic differentiation and thus impact
the linear relationship between geographic and genetic distance, defining them as putative
barriers (7, Table S4, supporting online text). Three of the putative barriers we tested first
were geographic the Caucasus barrier between North Caucasus and Eastern Europe, the
Balkans barrier between Anatolia and Europe, and the South Asian barrier between South
Asia and the Near/Middle East. The other barriers tested separated populations that are known
isolates/outliers due to religion, language, or different origin the Jewish groups, Kuban
Nogays, French Basques, Druze, and Burusho from their respective surrounding
populations. Geographic distance by itself explained only 43% (coefficient of determination r
2

= 0.43) of the variation in genetic distances between populations, measured by F
ST
. The
putative South Asian, Kuban Nogay, French Basque, Druze, Iraqi Jewish, Georgian Jewish,
and Burusho barriers did not prove to be statistically significant (Table S4). The largest
improvement in the fit of the observed F
ST
distances between populations to the model was
achieved with the assumption of a Caucasus barrier r
2
increased by 0.12 (Table S4). It
should be noted that the effect of the Caucasus barrier is observed between the populations of
the northern flank of the High Caucasus Mountain Range and the East European Plain, not at
the mountain range itself, since the populations of North and South Caucasus are autosomally
more similar to each other than either group is to Eastern European populations. A putative
post-LGM northward progression of people (recolonization?) could have been halted due to
preexisting inhabitation of the East European Plain, possibly at higher population densities
than in the mountains, or because of the Khvalynian transgression, a connection between the
Black and Caspian Seas dated 1214,000 years BP (20, 21) which may have served as a
natural barrier (supporting online text).

The Kuban Nogays and the Kara Nogays (Figure 1) have a special status among the
Caucasian populations due to their recent, late 18
th
early 19
th
century arrival from the
Pontocaspian steppes (22), evident from both Y chromosome and autosomal PC plots (Figure
S2; Figure 2B) as well as ADMIXTURE analysis (Figure 3) (only the Kuban Nogays were
included in the autosomal analyses). It has been shown that the Nogays possess 40% of East
Eurasian mtDNA lineages (23). Comparing the two subpopulations with respect to
proportions of typical western Eurasian (G, J, R1a1) and eastern Y chromosome lineages (C,
D, N, O), it becomes apparent that the Kara Nogays have more (~35%) typical eastern Y
chromosome lineages, while among the Kuban Nogays the percentage is around 17% (Table
S2). Perhaps more interestingly, we have found that both the Kuban Nogays and the Kara
Nogays preserve a certain combination of STR haplotypes in the Y chromosome haplogroup
C, the so-called Genghis Khan modal haplotype (24). Because the historic Nogay Khan, a
powerful late 13
th
century general of the Golden Horde, was indeed a great-grandson of
Genghis Khan, such a coincidence is intriguing, even though the Nogay Horde was
established much later and was, typically for that time, a confederation of people of different
ethnic origins (22). Although the introgression of the East Asian ancestry profile is well
visible, about 70-80% of the present-day genetic structure of the Kuban Nogays is closest to
their neighboring populations of NW Caucasus (Figure 3; Figure S4).

Overall, populations of the Caucasus exhibit a much more uniform genetic structure than
might be expected from the remarkable ethnic and linguistic diversity within the region. A
previous study based on 8 Alu insertion loci found the Caucasus populations to exhibit high
levels of between-population differentiation, with an average F
ST
of 0.113, a value which is
almost as large as the F
ST
of 0.157 for worldwide populations (25). Our results, based on
genome-wide data, reveal instead that the populations of the Caucasus region show between
population differentiation (average F
ST
= 0.004) that is slightly lower than that of the Near
East (0.006) and of Europe (0.006), and are thus more consistent with the results of Bulayeva
et al. (26). Whereas the Y chromosome haploid system reveals some cases of high
differentiation between regions and/or individual populations in the Caucasus, such as the
separation of the Dagestanian-speaking NE Caucasian populations from the rest of the region
(Figure S2), the autosomal variation in the Caucasus matches geography rather than linguistic
divisions. While the variation of all three genetic systems analyzed here autosomal, Y
chromosome, and mtDNA shows a genetic continuity between the Caucasus and the
Near/Middle East, there is a clear discontinuity, supported by principal component,
ADMIXTURE, and multiple regression analyses, between the North Caucasus and the East
European Plain. There is no evidence for any sizable genetic contribution of the Caucasus
genetic heritage to the gene pool of the populations living in East Europe at present. In
contrast, we were able to detect elements of gene flow from the East European Plain to the
North Caucasus, consistent with historical narratives as well as linguistic evidence. Another
direction of gene flow to the Caucasus populations, mainly affecting North Caucasians, is
from East Eurasia, but it may well be indirect, carried to the area by prehistoric and historic
nomadic people of the steppe belt. With the exception of the Kuban Nogays, its presence
among the Caucasians is modest (Figure 3). A final clear evidence of admixture/shared
ancestry, more eminent among the NE Caucasian populations, describes a component that
reaches near saturation in almost all Pakistani populations, but also extends to Iranian and
Central Asian Indo-European and Turkic speakers, and reaches East Europeans as well as
Semitic speakers of the Near East and the Arabian Peninsula (Figure 3), having been carried
to the Caucasian populations by route or routes unknown at present.

We conclude that irrespective of the Early Upper Paleolithic presence of anatomically modern
humans both south and north of the Caucasus, the combined high-resolution autosomal and
gender-specific genetic variation of the Caucasian populations testifies to their predominantly
southern, Near/Middle Eastern descent. Y chromosomal variants under strong founder events,
seen in particular among populations inhabiting the northern flank of the High Caucasus
Mountain Range, appear to never have expanded to the East European Plain, while the
nomadic people of the latter, once settled down predominantly on the northern slopes of the
Caucasus, have preserved, to different extent, some of their earlier genetic heritage. In sum,
though the Caucasus may well have served as a corridor for invasive expeditions in the past,
this has had only a minor influence on the largely sedentary core populations of the region,
characterized by much greater autosomal uniformity than that might be expected from a
region of deep linguistic diversity.


Figure legends

Figure 1. Geographical map of the populations of the Caucasus included in this study.
The language family affiliation of each population is given. Adapted from Wikipedia.

Figure 2. Pairwise F
ST
distances and principal component analysis of the Caucasus and
neighboring populations.
A Pairwise F
ST
distances between populations, ranging from red (low) to blue (high), based
on autosomal data. The populations [data from this study and the literature (5, 6)] are divided
into regional groups. B Plot of the first and second components of the principal component
analysis (27) of the Caucasus and neighboring populations based on autosomal data, with the
clustering of populations approximating geography. The thick lines denote probable directions
of movements of people, interrupted between the North Caucasus and Eastern Europe.
Despite the genetic discontinuity between the Caucasus and the East European Plain, some
admixture (denoted by the thin two-pointed arrow) has occurred, apparent from the several
samples occupying the gap between the regions. For population abbreviations see Table S1.

Figure 3. Population structure inferred by ADMIXTURE analysis of the autosomal data
at K=7.
Each individual is represented by a vertical (100%) stacked column of ancestry probabilities
in the seven constructed ancestral populations. Populations introduced in this study and
analyzed together with data from Li et al. (5) and Behar et al. (6) are labeled in color.
Language families of the Caucasus populations are also denoted: AA Abkhazian-Adyghe,
ND Nakh-Dagestanian, KV Kartvelian, IE Indo-European, TU Turkic.


References
1. D. Lordkipanidze et al., Postcranial evidence from early Homo from Dmanisi, Georgia.
Nature 449, 305-310 (2007).
2. D. E. Lieberman, Palaeoanthropology: homing in on early Homo. Nature 449, 291-292
(2007).
3. D. S. Adler et al., Dating the demise: neandertal extinction and the establishment of
modern humans in the southern Caucasus. J. Hum. Evol. 55, 817-833 (2008).
4. B. Comrie, Linguistic Diversity in the Caucasus. Annu. Rev. Anthropol. 37, 131-143
(2008).
5. J. Z. Li et al., Worldwide human relationships inferred from genome-wide patterns of
variation. Science 319, 1100-1104 (2008).
6. D. M. Behar et al., The genome-wide structure of the Jewish people. Nature 466, 238-242
(2010).
7. Materials and methods are available as supporting material on Science Online.
8. E. E. Marchani, W. S. Watkins, K. Bulayeva, H. C. Harpending, L. B. Jorde, Culture
creates genetic structure in the Caucasus: autosomal, mitochondrial, and Y-chromosomal
variation in Daghestan. BMC Genet. 9, 47 (2008).
9. J. K. Pritchard, M. Stephens, P. Donnelly, Inference of population structure using
multilocus genotype data. Genetics 155, 945-959 (2000).
10. D. H. Alexander, J. Novembre, K. Lange, Fast model-based estimation of ancestry in
unrelated individuals. Genome Res. 19, 1655-1664 (2009).
11. S. Sengupta et al., Polarity and temporality of high-resolution y-chromosome distributions
in India identify both indigenous and exogenous expansions and reveal minor genetic
influence of Central Asian pastoralists. Am. J. Hum. Genet. 78, 202-221 (2006).
12. M. Regueiro, A. M. Cadenas, T. Gayden, P. A. Underhill, R. J. Herrera, Iran:
tricontinental nexus for Y-chromosome driven migration. Hum. Hered. 61, 132-143
(2006).
13. G. Gnoli, in History of Humanity. Scientific and Cultural Development. From the Seventh
Century BC to the Seventh Century AD, J. Herrmann, E. Zrcher, Eds. (UNESCO, Paris,
1996), pp. 120-124.
14. O. Balanovsky et al., Two sources of the Russian patrilineal heritage in their Eurasian
context. Am. J. Hum. Genet. 82, 236-250 (2008).
15. V. N. Kharkov et al., Gene pool structure of Eastern Ukrainians as inferred from the Y-
chromosome haplogroups. Russ. J. Genet. 40, 326-331 (2004).
16. P. A. Underhill et al., Separating the post-Glacial coancestry of European and Asian Y
chromosomes within haplogroup R1a. Eur. J. Hum. Genet. 18, 479-484 (2010).
17. U. Roostalu et al., Origin and expansion of haplogroup H, the dominant human
mitochondrial DNA lineage in West Eurasia: the Near Eastern and Caucasian perspective.
Mol. Biol. Evol. 24, 436-448 (2007).
18. S. C. Goslee, D. L. Urban, The ecodist package for dissimilarity-based analysis of
ecological data. J. Stat. Soft. 22, 1-19 (2007).
19. P. Legendre, F.-J. Lapointe, P. Casgrain, Modeling Brain Evolution from Behavior: A
Permutational Regression Approach. Evolution 48, 1487-1499 (1994).
20. E. N. Badyukova, Age of Khvalynian transgressions in the Caspian Sea region.
Oceanology 47, 400-405 (2007).
21. A. A. Svitoch, Khvalynian transgression of the Caspian Sea was not a result of water
overflow from the Siberian Proglacial lakes, nor a prototype of the Noachian flood.
Quatern. Int. 197, 115-125 (2009).
22. M. Kolga, I. Tnurist, L. Vaba, J. Viikberg, The Red Book of the Peoples of the Russian
Empire. (NGO Red Book, Tallinn, ed. 2, 2001).
23. M. A. Bermisheva et al., Phylogeografic analysis of mitochondrial DNA in the Nogays:
the high level of mixture of maternal lineages from Eastern and Western Eurasia. Mol.
Biol. (Mosk.) 38, 617-624 (2004).
24. T. Zerjal et al., The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717-721
(2003).
25. I. Nasidze et al., Alu insertion polymorphisms and the genetic structure of human
populations from the Caucasus. Eur. J. Hum. Genet. 9, 267-272 (2001).
26. K. Bulayeva et al., Genetics and population history of Caucasus populations. Hum. Biol.
75, 837-853 (2003).
27. N. Patterson, A. L. Price, D. Reich, Population structure and eigenanalysis. PLoS Genet.
2, 2074-2093 (2006).
28. We thank the individuals who provided DNA samples for this study, and Mari Nelis,
Georgi Hudjashov and Viljo Soo for conducting the autosomal genotyping. R.V. and
D.M.B. thank the European Commission, Directorate-General for Research for FP7
Ecogene grant 205419. R.V. thanks the European Union Regional Development Fund for
support through the Centre of Excellence in Genomics, the Estonian Ministry of
Education and Research for the Basic Research grant SF 0270177As08, and the Swedish
Collegium for Advanced Studies for support during the initial stage of this work. S.R.
thanks the Estonian Science Foundation for grant 7445. E.M. thanks the Estonian Ministry
of Education and Research for the Basic Research grant SF 0270177Bs08 and the
Estonian Science Foundation for grant 7858. E.K.K. thanks the Russian Academy of
Sciences Program for Fundamental Research Biodiversity and dynamics of gene pools,
the Ministry of Education and Science of the Russian Federation for state contracts P-325
and 02.740.11.07.01, and the Russian Foundation for Basic Research for grants 04-04-
48678- and 07-04-01016-. I.K. thanks the Russian Foundation for Basic Research for
grant 08-06-97011 and the Grant of the President of the Russian Federation of state
support for young Russian scientists MK-488.2006.4. P.A.U. thanks the Sorenson
Molecular Genealogy Foundation.

Supporting Online Material
www.sciencemag.org
Materials and Methods
Supporting online text
Figs. S1-S8
Tables S1-S5


Abkhazians
Adyghe
Cherkessians
Kabardins
Georgians
Andis
Avars
Dargins
Chamalals
Lezgins
Bagvalals
Tabasarans
Chechens
Ingush
Other
Armenians
N. Ossetans
S. Ossetans
Russians
Azeris
Balkars
Karachays
Kumyks
Kara Nogays
2
1
3
4
5
6
7
8
9
10
11
12
14
15
16
17
20
21
22
18
Caucasian
languages
Indo-European
languages
Turkic languages
Sparsely populated or uninhabited areas are shown in white
Abkhaz-Adyghe
Kartvelian
Nakh-Dagestanian
Slavic
Iranian
Armenian
Abazins
23
Kuban
Nogays
13
Turkey Iran
Azerbaijan
Armenia
Georgia
Russia
Black
Sea
Caspian
Sea
1
3
4
5
6
14
15
8
11
13
22
23
16
20
17
18
21
9
7
10
12
24
3
19 Mountain Jews
19
24
*
Fig1
P
C
2
PC1
Bdn
Sdi
Egy
Bdn
Pal
Jor
Drz
Cyp
Syr
Leb
IqJ
InJ
Armenians
MountainJews
GeorgianJews
Georgians
Abkhazians
Adyghe
Chechens
Balkars
Lezgins
Nogais
Chv
Rus
Mrd
Lit
Blr
Ukr
Fre
FrB
Rmn
Bul
NIt
Tus
Uzb
Tjk Trm
Irn
Ptn
Blo
Mak
Brh
Sin
Bur
Tur
M E D I T E R R A N E A N S E A
B
L
A
C
K

S
E
A

C
A
S
P
I
A
N

S
E
A
A R A B I A N S E A
I N D U S B A S I N
C E N T R A L A S I A
E U R O P E
N E A R E A S T
S
a
u
d
i
s
B
e
d
o
u
i
n
s
D
r
u
z
e
P
a
l
e
s
t
i
n
i
a
n
s
J
o
r
d
a
n
i
a
n
s
L
e
b
a
n
e
s
e
S
y
r
i
a
n
s
T
u
r
k
s
I
r
a
n
i
a
n
s
A
r
m
e
n
i
a
n
s
G
e
o
r
g
i
a
n
s
A
d
y
g
h
e
C
h
e
c
h
e
n
s
B
a
l
k
a
r
s
K
u
m
y
k
s
N
O
s
s
e
t
i
a
n
s
R
u
s
s
i
a
n
s
M
o
r
d
v
i
n
s
C
h
u
v
a
s
h
e
s
U
k
r
a
i
n
i
a
n
s
B
e
l
o
r
u
s
s
i
a
n
s
L
i
t
h
u
a
n
i
a
n
s
B
u
l
g
a
r
i
a
n
s
R
o
m
a
n
i
a
n
s
N
I
t
a
l
i
a
n
s
T
u
s
c
a
n
s
F
r
e
n
c
h
F
r
e
n
c
h
B
a
s
q
u
e
s
French Basques
French
Tuscans
N Italians
Romanians
Bulgarians
Lithuanians
Belorussians
Ukrainians
Chuvashes
Mordvins
Russians
Kuban Nogays
N Ossetians
Kumyks
Balkars
Lezgins
Chechens
Adyghe
Abkhazians
Georgians
Armenians
Iranians
Turks
Syrians
Lebanese
Jordanians
Palestinians
Druze
Bedouins
Saudis

A
b
k
h
a
s
i
a
n
s
L
e
z
g
i
n
s
N
o
g
a
i
s
Fig 2
A
B
Near/Middle East Caucasus East Europe
West & South
Europe
B
a
n
t
u

S
A
B
a
n
t
u

N
E
Y
o
r
u
b
a
n
s
M
a
n
d
e
n
k
a
s
E
t
h
i
o
p
i
a
n
s
E
g
y
p
t
a
n
s
S
a
u
d
i
s
B
e
d
o
u
i
n
s
P
a
l
e
s
t
n
i
a
n
s
D
r
u
z
e
J
o
r
d
a
n
i
a
n
s
L
e
b
a
n
e
s
e
S
y
r
i
a
n
s
T
u
r
k
s
F
r
e
n
c
h
F
r
e
n
c
h
B
a
s
q
u
e
s
N
o
r
t
h

I
t
a
l
i
a
n
s
T
u
s
c
a
n
s
R
o
m
a
n
i
a
n
s
B
u
l
g
a
r
i
a
n
s
U
k
r
a
i
n
i
a
n
s
B
e
l
o
r
u
s
s
i
a
n
s
L
i
t
h
u
a
n
i
a
n
s
R
u
s
s
i
a
n
s
C
h
u
v
a
s
h
e
s
A
r
m
e
n
i
a
n
s
G
e
o
r
g
i
a
n
s
A
b
k
h
a
z
i
a
n
s
B
a
l
k
a
r
s
A
d
y
g
h
e
N
.

O
s
s
e
t
a
n
s
C
h
e
c
h
e
n
s
L
e
z
g
i
n
s
K
u
b
a
n

N
o
g
a
y
s
K
u
m
y
k
s
G
e
o
r
g
i
a
n
J
e
w
s
M
o
u
n
t
a
i
n
J
e
w
s
I
r
a
q
i
J
e
w
s
I
r
a
n
i
a
n

J
e
w
s
I
r
a
n
i
a
n
s
T
a
j
i
k
s
T
u
r
k
m
e
n
s
U
z
b
e
k
s
U
y
g
u
r
s
A
l
t
a
i
a
n
s
H
a
z
a
r
a
s
P
a
t
h
a
n
s
B
u
r
u
s
h
o
B
a
l
o
c
h
i
s
B
r
a
h
u
i
M
a
k
r
a
n
i
s
S
i
n
d
h
i
Y
a
k
u
t
s
C
a
m
b
o
d
i
a
n
s
D
a
i
L
a
h
u
H
a
n
D
a
u
r
O
r
o
q
e
n
s
M
o
n
g
o
l
s
J
a
p
a
n
e
s
e
M
o
r
d
v
i
n
s
K
u
r
d
s
Africa Near East & Europe
South North
The Caucasus Cen. Asia
South Asia East Asia
IE KV AA IE ND TU K = 7
Fig 3
Supporting Online Material


Materials and Methods

DNA samples from all the subregions and major language groups of the Caucasus were
analyzed, using whole genome, Y chromosome and mtDNA markers. The geographic
locations and language affiliations of the Caucasus populations studied are presented in
Figure 1. DNA samples were obtained from unrelated male volunteers after getting informed
consent in accordance with the guidelines of the ethical committees of the institutions
involved. DNA was purified from blood by the phenol/chloroform extraction method. DNA
concentrations were determined by spectrometry (NanoDrop products, Wilmington, DE,
USA).

Autosomal analyses
214 samples from this study were genotyped with the Illumina 610 K SNP array and used for
whole genome analysis together with 906 samples from the literature (1, 2) (Table S1).

Genetic clustering analysis
We used ADMIXTURE (3) implementing a structure-like (4) model-based maximum
likelihood (ML) clustering algorithm to assess population structure. Given the genotype data
of a set of individuals, ADMIXTURE gives an ML estimate for allele frequencies in a
predefined number (K) of constructed ancestral populations and the probabilities for ancestry
in each such population for each individual.
We used PLINK software 1.05 (5) to filter the combined dataset to include only SNPs on the
22 autosomal chromosomes with minor allele frequency >1% and genotyping success >97%.
Because background linkage disequilibrium (LD) can affect both principal component and
structure-like analysis, we thinned the marker set by excluding SNPs in strong LD (pairwise
genotypic correlation r
2
>0.4) in a window of 200 SNPs (sliding the window by 25 SNPs at a
time). The final data set consisted of 210,575 SNPs and 1119 individuals that were used in
subsequent analyses.
We explored the population structure at K=3 to K=10. To monitor convergence between
individual runs at each K we ran ADMIXTURE one hundred times and examined the
loglikelihood (LL) values of a ten percent fraction of runs with the highest LL yield at each K.
We assumed that the global maximum had been reached if the maximum difference between
those LL values was negligible (less than one LL unit) (2, 6). Since this was the case for all
tested values of K, we deemed the results to be usable and plotted results from one run in the
ten percent fraction for each K (Figure S4). We also verified that all runs within these 10 runs
at each K did indeed produce a very similar (indistinguishable) ancestry proportions pattern.
ADMIXTURE provides a cross-validation option to help define the best K (7). In our
analysis, we obtained the lowest CV index at K=8, but this was not statistically significantly
different from K=7 or K=9. In applications for reconstructing population demographic
history, there are additional considerations in choosing the K to present (2). For example, if a
new ancestry component introduced at the next K value differentiates only a single
population, it is not very informative for the hierarchical comparison of populations. Taking
these considerations into account, we chose K=7 as the best level of representation for genetic
structure in our sample set (Figure 3). Nevertheless, we stress that it is likely that the ten
percent fraction of runs at all Ks that yielded the highest LL scores occurring on a plateau of
virtually identical values, converged at respective global maxima, and are thus usable
representations of genetic structure at different levels (Figure S4).

Principal component analysis and F
ST

Since the principal component analysis (PCA) method assumes that markers are unlinked, we
thinned our marker set for this analysis with PLINK software 1.05 (5) according to the same
parameters used for genetic clustering analysis in order to mitigate background LD. However,
LD pruning was carried out after the exclusion of the populations from Africa (except
Egyptians), East Asia and Siberia, and also Hazaras, Kurds, Uygurs and Altaians, resulting in
a final data set of 189,747 markers and 838 individuals. PCA was carried out in the smartpca
program (8) using outlier removal procedure (18 outliers were removed, leaving 820
individuals). Pairwise genetic differentiation indices (F
ST
values) were also estimated using
smartpca software based on the same thinned marker set.

Geodesic distances between populations
For each pair of populations, we calculated geodesic distances in kilometers using
distonearth.R R function (9). These distances correspond to the shortest path between two
points along the ellipsoid of the Earth at sea level. Geographic coordinates for populations in
the HGDP dataset are given as ranges of longitude and latitude in Cann et al. (10).
Geographic coordinates for HGDP populations were computed as a central point in a range of
longitude and latitude values. This yielded the same coordinates for the Brahui and Balochi
populations. In order to avoid zero distance between them, we changed the coordinates for the
Brahui people to the center of the Kalat district in Pakistan where the majority of that people
live.

Multiple regression on distance matrices
We used multiple regression on distance matrices (MRM) (11, 12) to explore various
explanatory variables (genetic distance, barriers to gene flow) predicting the genetic distances
between populations. In this method a single dependent distance matrix Y is considered as a
function of multiple independent distance matrices Xi (independent variables), and the
statistical significance of regression coefficients for each independent variable Xi is tested
based on matrix permutations (13). The corresponding permutation procedure is described in
Legendre et al. (13) and implemented in the ecodist R package (14).
In order to test whether factors other than geographic distance can explain the observed
variation in genetic distances between populations and whether the contribution of each
variable is statistically significant, we considered a matrix of pairwise F
ST
distances between
populations as a response matrix and included explanatory variables into the regression model
either separately or in combinations. A variety of independent variables such as assortative
mating, genetic isolation, physical barriers, and admixture with a diverged population lead to
higher genetic differentiation between adjacent populations. These variables can thus be
formally defined as putative geographic barriers, since their impact on genetic differentiation
is similar. For each barrier, pairs of populations have zero distance if located on the same side
of the barrier, and a distance of 1 if located on the opposite sides of the barrier. The barriers
defined were:
X1: Putative Caucasus barrier: all East European populations located north of the Caucasus
mountain range have been assigned zero distances to each other and 1 to populations located
to the south of the mountain range. Note that there is no restriction to gene flow between West
Europe, the Balkans and the Near/Middle East.
X2: Putative Anatolian barrier separating West Europe and the Balkan region from the
Near/Middle East.
X3: Putative barrier separating South Asia from the Near/Middle East.
X4: Zero distances between the isolated J ewish communities and distances of 1 between the
J ews and the populations surrounding them.
X5: Putative barrier separating the Kuban Nogays and the French Basques from other
populations.
X6: Putative barrier separating the Druze from other populations.
X7: Putative barrier separating the Burusho from other populations.

Y chromosome analyses
A total of 1952 samples from 24 populations from the Caucasus were analyzed for Y
chromosome markers. The samples were typed for 51 Y chromosome SNP markers, 2 of
which [M81 and M128 (15)] were found to have the ancestral state in all of the samples. The
rest of the markers 12f2 (16), YAP (17), SRY
10831.2
(18), Tat (19), 92R7 (20), M9, M12,
M20, M35, M40, M48, M52, M67, M70, M73, M76, M78, M89, M92, M123, M124,
M130=RPS4Y
711
, M170, M172, M173, M174, M175, M198, M201, M207, M214, M223
(15), M231, M242, M253, M267, M285, M343 (21), M269 (22), M317, M357, M410 (23),
M436, M438=P215 (24), M458 (25), P15 (26), P37.2, P43 (27), and P58=Page8 (28) each
had the derived state in at least one of the samples. The SNP markers were analyzed by
PCR/AFLP, PCR/RFLP or PCR/sequencing methods. The haplogroup designation in this
study follows the most recent YCC nomenclature presented in Karafet et al. (28). Genotyping
results are presented in Table S2. These data and data about neighboring populations from the
literature were used for principal component analysis using the POPSTR software
(http://harpending.humanevo.utah.edu/popstr/).
In addition, a subset of the samples was analyzed for 19 Y chromosome STR markers: the 17
markers of the Applied Biosystems AmpFlSTRYfiler Kit, and two additional markers,
DYS388 (29, 30) and DYS461 (31). The phylogenetic network of the data obtained was
constructed with the program Network 4.5.0.0 (Fluxus-Engineering), using the median joining
algorithm. Spatial frequency maps were drawn with the program Surfer 8 (Golden Software
Inc., Cold Spring Harbor, NY, USA). Coalescence ages were calculated according to the
ASD
0
method (32).

mtDNA analyses
The haplogroups of 2262 mtDNA samples from 24 Caucasus populations were determined by
typing HVSI and coding region markers according to the nomenclature presented in Richards
et al. (33) (Table S4). The data obtained were used to generate a PC plot of the Caucasus
populations in the context of populations from neighboring regions, using the POPSTR
software (http://harpending.humanevo.utah.edu/popstr/).


The paternal gene pool of the Caucasus

In some instances, the Y chromosome heritage of the Caucasus populations exhibits sharp
differences between populations and subregions, in contrast to that of the autosomes and
mtDNA. Therefore, based on Y chromosome haplogroup frequencies, we could divide the
region into Northeast (NE), Northwest (NW), and South (S) Caucasus; in addition to this, the
Kuban Nogays and the Kara Nogays were considered separately due to their different
population history reflected in their paternal heritage (Table S2). The most obvious example
of local differentiation is the NE Caucasus, where a unique structure of Y chromosome
haplogroup frequencies sets them apart from the other Caucasus populations, although no
such distinction is visible from the autosomal and mtDNA data. The speakers of Dagestanian
languages of the NE Caucasus cluster apart from the rest of the Caucasus peoples who are
genetically similar despite language family barriers, whereas the Nakh-speaking NE
Caucasian Chechens and Ingushes do not fall into either group, being set apart by a high
frequency of haplogroup J 2a2* (Figure S2; Table S2). But this single example of concordance
of genetic and linguistic data in NE Caucasus is only observable in case of the Y
chromosome; neither autosomal nor mtDNA data support the distinctness of the Dagestanian
language group populations (Figure 2B; Figure S1; Figure S3). NE Caucasian Y
chromosomes (n=640) mostly belong to haplogroups J 1* (35.3%), J 2a2* (27.8%), and
R1b1b2 (9.8%); while those from NW Caucasus (n=844) mostly belong to haplogroups G2a
(45.4%), R1a1* (14.9%), and J 2a* (9.1%), and those from S Caucasus (n=305) also mostly to
haplogroups G2a (41.3%), J 2a* (12.1%), and R1a1* (7.9%) (Table S2). In summary, the NE
Caucasian populations are distinguished mainly by a high frequency of haplogroup J 1, more
specifically J 1*, which is distinct from the J 1e* lineages common in the Near/Middle East
(34).

The Y chromosome haplogroup J most possibly arose in the Near East, where it has the
highest frequency and diversity, exhibiting a decreasing clinal pattern from the Near East to
Mediterranean Europe, North Africa, Caucasus, Iran, Central Asia, and India (21, 23, 35-38).
Haplogroup J with its two subclades J 1 and J 2 is a major component of the paternal gene pool
in the Caucasus.

Haplogroup J 1 is divided into J 1*(xP58) and J 1e* by the downstream marker P58 (28, 34).
Haplogroup specific Y-STR data was used to estimate the upper bounds of the coalescence
age for haplogroups. The J 1* clade is naturally older, with the highest coalescence age of
22 800 6 000 years in the Near/Middle East, which appears also to be the region where the
P58 mutation first arose, since the coalescence ages of the J 1e* clade are highest in the
Near/Middle East, the estimate for the whole region being 12 000 2 600 years (Table S5).
The J 1e* subclade has spread to the Caucasus relatively lately, as evidenced by a low
coalescence age, the limited divergence of Caucasian J 1e* STR haplotypes both from
Near/Middle Eastern STR haplotypes and from each other (Figure S7), and the generally low
frequency of this subhaplogroup in the Caucasus. The J 1e* coalescence age of 5 600 1 400
years, considerably lower than the estimates for other J subhaplogroups in the Caucasus,
possibly reflects a migration of Neolithic farmers from the Near East and is consistent with
the scenario proposed by Tofanelli et al. (39). The J 1* clade is not uniform, but divided into
two clusters according to the number of repeats of the STR marker DYS388 (10-14 versus 15-
17 repeats), clearly distinct on median joining networks (Figure S7). The long DYS388
group is the older of the two, apparently having risen in the Near/Middle East, where it has a
coalescence age of 21 400 8 900 years. According to coalescence age estimates, the
J 1*(xP58) long DYS388 variant mostly present in Near/Middle East populations is the
oldest cluster, but it must be borne in mind that coalescence age cannot be reliably estimated
for the low frequency remnants of an ancient cluster.
The J 1*(xP58) short DYS388 group appears to have originated in the Caucasus
(coalescence age 11 900 1 900 years) and spread from there to several neighboring
populations, including Turks [coalescence age 9 100 4 400 years, the unique short 13
repeat DYS388 allele samples reported in northern Turkey by Cinnioglu et al. (21)],
Assyrians (6 900 2 800 years, all J 1* samples have the 13 repeat DYS388 allele) (34), and
Kazakhs (1 800 600 years, all J 1* samples have the 13 repeat DYS388 allele). The Central
Asian Kazakh STR haplotypes are derived in just one step from typical Caucasian ones and
their star-like clustering shows a strong and relatively recent founder effect.

The lineages of the J 1 sister haplogroup J 2 have spread from their Near/Middle Eastern
homeland in both the eastern and western direction, and, with some exceptions, also to the
Caucasus in the north. The haplogroup J 2 subclades J 2a* and J 2a2* have both spread
throughout the Caucasus, although remaining at lower frequencies in NE Caucasus, with the
exception of the Chechens and the Ingush, who have a high frequency of J 2a2* (Figure S8).
On the other hand, the subclades J 2a2a and J 2b, the latter otherwise spread from southern
Europe to India (23, 37), are practically absent in the Caucasus. Our coalescence age
estimates, probably strongly influenced by the high frequency and diversity of subclade J 2a2*
among the Chechens and the Ingush, set the time of the expansion of J 2a2* to the Caucasus
into the distant past, at about 1214,000 years (Table S5).

Both the J 1*(xP58) short DYS388 group and J 2a2* exhibit founder effects about 12,000
years ago in the NE Caucasus, possibly related to the Khvalynian transgression, a connection
between the Black and Caspian Seas dated approximately to this time (40, 41), which may
have served as a natural barrier to the further northward expansion of people for some time.

Similarly to haplogroup J 1 lineages, haplogroup R1 displays notable divergence in frequency
in different subregions. Namely, the occurrence of haplogroup R1a1* is considerably greater
in NW Caucasus, while haplogroup R1b1b2 frequency is highest in NE Caucasus (Table S2).

Another Y chromosome lineage very common in the Caucasus besides J 1 and J 2 is
haplogroup G. This haplogroup has so far been detected to be widely spread in Near/Middle
Eastern populations (Figure S5) with an average frequency ranging from 5 to 15%, and also in
southern European countries (especially Italy and Greece), with a decreasing frequency
gradient towards the Balkans and rapidly disappearing in the northern direction. Although the
presence of this lineage in the Caucasus was shown by Nasidze et al. (42), it was intriguing to
find it to be present so widely and with such a high frequency. Similarly to Anatolia (21), the
absolute majority of haplogroup G samples in the Caucasus belong to subhaplogroup G2, with
only a few (mostly Armenian) samples falling into G1. The major subclade G2 is unevenly
distributed, being very frequent in NW Caucasus and S Caucasus (covering about 45% of the
paternal lineages in both regions, with the highest incidence detected in North Ossetians, at
70%), while present in NE Caucasus with an average frequency of only 5%, ranging from 19
to 0%.

Interestingly, the decrease of both haplogroup J 1 and G frequencies (the two major lineages in
the Caucasus) towards the eastern European populations inhabiting the area adjacent to NW
Caucasus, such as southern Russians and Ukrainians (43, 44), is very rapid and the borderline
very sharp (Figure S5; Figure S6), indicating that gene flow from the Caucasus in the northern
direction has been negligible.


Multiple regression analysis

In this study we used a simple model which assumes that genetic distances between
populations are a function of geographic distances. Factors other than geographic distance
can, in principle, influence that relationship, causing closely located populations to be more
distant genetically because of a barrier to gene flow. Multiple regression analysis was used to
provide indirect but quantitative evidence on independently acting factors leading to
deviations from this simple model.

In addition to physical barriers like mountains and rivers, factors such as linguistic, ethnic and
religious restrictions should be considered as potential barriers to gene flows when analyzing
human populations. Deviations from panmixia, such as assortative mating between members
of specific social groups, or admixture with highly divergent immigrant populations, can also
lead to higher genetic distances between neighboring human populations. Because all these
forces lead to higher differentiation between neighboring populations, they can be considered
as barriers to gene flow. In terms of multiple regression analysis, not considering such factors
can lead to decreased explanatory power of the simpler model if that includes only geographic
distance as a predictor.

Multiple regression analysis on distance matrices provides a convenient framework to
consider multiple, independently acting factors. Whenever such factors can be formulated as
distance matrices, the multiple regression method (MRM) can be used to test their relevance
for the model and statistical significance. In the present study we built a series of distance
matrices (see Materials and Methods for details) formalizing our putative barriers and used
MRM to estimate their relative importance in predicting genetic distances and statistical
significance. For example, geographic distances alone explained roughly 43% of variation in
genetic distances (coefficient of determination) in our study, leaving much of the variation
unexplained (Table S3). When we considered additional factors, such as the Caucasus barrier,
we observed an increase in the coefficient of determination (r
2
) up to 55%, and matrix
permutations indicated that the regression coefficient for this predictor is statistically
significant (p =0.001). As a control of the method, we also intentionally considered a
presumably nonexistent factor, the South Asian barrier, and observed only a slight increase in
the coefficient of determination (r
2
increased by 0.1%), most likely by chance since the
regression coefficient for this factor is not statistically significant (p =0.657).


F
ST
analysis

The heatmap plot (Figure 2A) reveals three clusters of low genetic distance, encompassing
geographically nearby populations: the Near/Middle East, the Caucasus, and Europe.
However, Europe is not as homogeneous as the other clusters. French Basques and Volga
Basin Turkic-speaking Chuvashes are clearly more distant from their immediate neighbor
populations, while geographically somewhat southern populations, from the Atlantic to the
Black Sea (French, Italians, Bulgarians and Romanians), exhibit particularly low inter-
population genetic distances. As already mentioned, the smooth transition from the Caucasus
to Anatolia (Turks) and Iran, and from the latter to Syrians, Lebanese, J ordanians and further
southwards, contrasts with the sharp border between the Caucasians and the Slavic, Finno-
Ugric and Turkic speaking populations of the East European Plain. However, the same
heatmap plot reveals an array of low genetic distances starting from the populations of the
Levant and Syria, extending to the East European Plain. Importantly, this array proceeds
around the western flank of the Black Sea: from Turks to Bulgarians, to Romanians, to
Ukrainians, to Russians, ending with Mordvins, and not through the Caucasus. We suggest
that this similarity gradient reflects one or rather a combination of several ancient human
migrations, perhaps also including elements of the spread of the Neolithic.





Figure legends

Figure S1
Plots of the first and the second, the first and the third, and the second and the third
components of the principal component analysis of the Caucasus and neighboring populations
based on autosomal data [data from this study and the literature (1, 2)]. For population
abbreviations see Table S1.

Figure S2
Plot of the first and second principal components of Y chromosome variation in the Caucasus
and neighboring regions. Populations [data from this study and the literature (21, 23, 34, 36,
43, 45-49)] are colored according to their language group affiliations. Populations of the
Caucasus are denoted by diamond shapes, neighboring populations by circles. For population
abbreviations see Table S1.

Figure S3
Plot of the first and second principal components of mtDNA variation in the Caucasus and
neighboring regions. Populations [data from this study and the literature (50-61)] are colored
according to their language group affiliations. Populations of the Caucasus are denoted by
diamond shapes, neighboring populations by circles. For population abbreviations see Table
S1.

Figure S4
Population structure inferred by ADMIXTURE analysis of the autosomal data. Each
individual is represented by a vertical (100%) stacked column of genetic components
proportions shown in color from K=3 to K=10. Populations introduced for the first time in
this study and analyzed together with data from Li et al. (1) and Behar et al. (2) are labeled in
color.

Figure S5
Spatial frequency distribution of the Y chromosome haplogroup G. Frequency data from this
study and the literature (21, 23, 36, 43, 45-49, 62-76) were converted into a spatial frequency
map using the Surfer software (version 8, Golden Software Inc., Cold Spring Harbor, NY,
USA), applying the kriging algorithm.

Figure S6
Spatial frequency distribution of the Y chromosome haplogroup J 1 and its subclades: A all of
J 1, B J 1e*, and C J 1*(xP58). Frequency data from this study and the literature (21-23, 34, 36,
37, 43-46, 48, 68, 77-84) were converted into spatial frequency maps using the Surfer
software (version 8, Golden Software Inc., Cold Spring Harbor, NY, USA), applying the
kriging algorithm.

Figure S7
Median joining network of Y chromosome haplogroup J 1 STR haplotypes from the Caucasus
and neighboring areas. Haplogroup J 1 is divided into J 1*(xP58) and J 1e* by the downstream
marker P58 (34). The J 1* clade is divided into the long and short DYS388 cluster
according to the number of repeats of the STR marker (10-14 versus 15-17 repeats).



Figure S8
Spatial frequency distribution of the Y chromosome haplogroup J 2 and its subclades: A all of
J 2, B J 2a*, and C J 2a2*. Frequency data from this study and the literature (21-23, 36, 37, 43,
45, 46, 48, 68, 78-80, 84) were converted into spatial frequency maps using the Surfer
software (version 8, Golden Software Inc., Cold Spring Harbor, NY, USA), applying the
kriging algorithm.

References
1. J . Z. Li et al., Worldwide human relationships inferred from genome-wide patterns of
variation. Science 319, 1100-1104 (2008).
2. D. M. Behar et al., The genome-wide structure of the J ewish people. Nature 466, 238-242
(2010).
3. D. H. Alexander, J . Novembre, K. Lange, Fast model-based estimation of ancestry in
unrelated individuals. Genome Res. 19, 1655-1664 (2009).
4. J . K. Pritchard, M. Stephens, P. Donnelly, Inference of population structure using
multilocus genotype data. Genetics 155, 945-959 (2000).
5. S. Purcell et al., PLINK: A tool set for whole-genome association and population-based
linkage analyses. Am. J. Hum. Genet. 81, 559-575 (2007).
6. M. Rasmussen et al., Ancient human genome sequence of an extinct Palaeo-Eskimo.
Nature 463, 757-762 (2010).
7. D. H. Alexander, J . Novembre, K. Lange, ADMIXTURE 1.04 Software Manual (2010;
http://www.genetics.ucla.edu/software/admixture/admixture-manual.pdf).
8. N. Patterson, A. L. Price, D. Reich, Population structure and eigenanalysis. PLoS Genet.
2, 2074-2093 (2006).
9. S. Banerjee, On geodetic distance computations in spatial modeling. Biometrics 61, 617-
625 (2005).
10. H. M. Cann et al., A human genome diversity cell line panel. Science 296, 261-262
(2002).
11. B. F. Manly, Randomization and regression methods for testing for associations with
geographical, environmental and biological distances between populations. Res. Popul.
Ecol. 28, 201-218 (1986).
12. P. E. Smouse, J . C. Long, R. R. Sokal, Multiple regression and correlation extensions of
the Mantel test of matrix correspondence. Syst. Zool. 35, 627-632 (1986).
13. P. Legendre, F.-J . Lapointe, P. Casgrain, Modeling Brain Evolution from Behavior: A
Permutational Regression Approach. Evolution 48, 1487-1499 (1994).
14. S. C. Goslee, D. L. Urban, The ecodist package for dissimilarity-based analysis of
ecological data. J. Stat. Soft. 22, 1-19 (2007).
15. P. A. Underhill et al., The phylogeography of Y chromosome binary haplotypes and the
origins of modern human populations. Ann. Hum. Genet. 65, 43-62 (2001).
16. M. Casanova et al., A human Y-linked DNA polymorphism and its potential for
estimating genetic and evolutionary distance. Science 230, 1403-1406 (1985).
17. M. F. Hammer, S. Horai, Y chromosomal DNA variation and the peopling of J apan. Am.
J. Hum. Genet. 56, 951-962 (1995).
18. L. S. Whitfield, J . E. Sulston, P. N. Goodfellow, Sequence variation of the human Y
chromosome. Nature 378, 379-380 (1995).
19. T. Zerjal et al., Genetic relationships of Asians and Northern Europeans, revealed by Y-
chromosomal DNA analysis. Am. J. Hum. Genet. 60, 1174-1183 (1997).
20. N. Mathias, M. Bayes, C. Tyler-Smith, Highly informative compound haplotypes for the
human Y chromosome. Hum. Mol. Genet. 3, 115-123 (1994).
21. C. Cinnioglu et al., Excavating Y-chromosome haplotype strata in Anatolia. Hum. Genet.
114, 127-148 (2004).
22. F. Cruciani et al., A back migration from Asia to sub-Saharan Africa is supported by high-
resolution analysis of human Y-chromosome haplotypes. Am. J. Hum. Genet. 70, 1197-
1214 (2002).
23. S. Sengupta et al., Polarity and temporality of high-resolution y-chromosome distributions
in India identify both indigenous and exogenous expansions and reveal minor genetic
influence of Central Asian pastoralists. Am. J. Hum. Genet. 78, 202-221 (2006).
24. P. A. Underhill et al., in Rethinking the Human Revolution: New Behavioural and
Biological Perspectives on the Origin and Dispersal of Modern Humans (Mcdonald
Institute Monographs) P. Mellars, K. Boyle, O. Bar-Yosef, C. Stringer, Eds. (McDonald
Institute for Archaeological Research, Cambridge, 2007), pp. 33-42.
25. P. A. Underhill et al., Separating the post-Glacial coancestry of European and Asian Y
chromosomes within haplogroup R1a. Eur. J. Hum. Genet. 18, 479-484 (2010).
26. M. F. Hammer et al., J ewish and Middle Eastern non-J ewish populations share a common
pool of Y-chromosome biallelic haplotypes. Proc. Natl. Acad. Sci. U. S. A. 97, 6769-6774
(2000).
27. N. Ellis et al., A nomenclature system for the tree of human Y-chromosomal binary
haplogroups. Genome Res. 12, 339-348 (2002).
28. T. M. Karafet et al., New binary polymorphisms reshape and increase resolution of the
human Y chromosomal haplogroup tree. Genome Res. 18, 830-838 (2008).
29. P. de Knijff et al., Chromosome Y microsatellites: population genetic and evolutionary
aspects. Int. J. Legal Med. 110, 134-149 (1997).
30. M. Kayser et al., Evaluation of Y-chromosomal STRs: a multicenter study. Int. J. Legal
Med. 110, 125-133 (1997).
31. P. S. White, O. L. Tatum, L. L. Deaven, J . L. Longmire, New, male-specific microsatellite
markers from the human Y chromosome. Genomics 57, 433-437 (1999).
32. L. A. Zhivotovsky et al., The effective mutation rate at Y chromosome short tandem
repeats, with application to human population-divergence time. Am. J. Hum. Genet. 74,
50-61 (2004).
33. M. B. Richards, V. A. Macaulay, H. J . Bandelt, B. C. Sykes, Phylogeography of
mitochondrial DNA in western Europe. Ann. Hum. Genet. 62, 241-260 (1998).
34. J . Chiaroni et al., The emergence of Y-chromosome haplogroup J 1e among Arabic-
speaking populations. Eur. J. Hum. Genet. 18, 348-353 (2010).
35. A. Nebel et al., The Y chromosome pool of J ews as part of the genetic landscape of the
Middle East. Am. J. Hum. Genet. 69, 1095-1112 (2001).
36. M. Regueiro, A. M. Cadenas, T. Gayden, P. A. Underhill, R. J . Herrera, Iran:
tricontinental nexus for Y-chromosome driven migration. Hum. Hered. 61, 132-143
(2006).
37. O. Semino et al., Origin, diffusion, and differentiation of Y-chromosome haplogroups E
and J : inferences on the neolithization of Europe and later migratory events in the
Mediterranean area. Am. J. Hum. Genet. 74, 1023-1034 (2004).
38. R. S. Wells et al., The Eurasian heartland: a continental perspective on Y-chromosome
diversity. Proc. Natl. Acad. Sci. U. S. A. 98, 10244-10249 (2001).
39. S. Tofanelli et al., J 1-M267 Y lineage marks climate-driven pre-historical human
displacements. Eur. J. Hum. Genet. 17, 1520-1524 (2009).
40. E. N. Badyukova, Age of Khvalynian transgressions in the Caspian Sea region.
Oceanology 47, 400-405 (2007).
41. A. A. Svitoch, Khvalynian transgression of the Caspian Sea was not a result of water
overflow from the Siberian Proglacial lakes, nor a prototype of the Noachian flood.
Quatern. Int. 197, 115-125 (2009).
42. I. Nasidzeet al., Mitochondrial DNA and Y-chromosome variation in the caucasus. Ann.
Hum. Genet. 68, 205-221 (2004).
43. O. Balanovsky et al., Two sources of the Russian patrilineal heritage in their Eurasian
context. Am. J. Hum. Genet. 82, 236-250 (2008).
44. V. N. Kharkov et al., Gene pool structure of Eastern Ukrainians as inferred from the Y-
chromosome haplogroups. Russ. J. Genet. 40, 326-331 (2004).
45. A. M. Cadenas, L. A. Zhivotovsky, L. L. Cavalli-Sforza, P. A. Underhill, R. J . Herrera, Y-
chromosome diversity characterizes the Gulf of Oman. Eur. J. Hum. Genet. 16, 374-386
(2008).
46. C. Flores et al., Isolates in a corridor of migrations: a high-resolution analysis of Y-
chromosome variation in J ordan. J. Hum. Genet. 50, 435-441 (2005).
47. J . R. Luis et al., The Levant versus the Horn of Africa: Evidence for bidirectional
corridors of human migrations. Am. J. Hum. Genet. 74, 532-544 (2004).
48. M. Pericic, L. B. Lauc, I. M. Klaric, B. J anicijevic, P. Rudan, Review of croatian genetic
heritage as revealed by mitochondrial DNA and Y chromosomal lineages. Croat. Med. J.
46, 502-513 (2005).
49. P. A. Zalloua et al., Y-chromosomal diversity in Lebanon is structured by recent historical
events. Am. J. Hum. Genet. 82, 873-882 (2008).
50. N. Al-Zahery et al., Y-chromosome and mtDNA polymorphisms in Iraq, a crossroad of
the early human dispersal and of post-Neolithic migrations. Mol. Phylogenet. Evol. 28,
458-472 (2003).
51. D. M. Behar et al., Counting the founders: the matrilineal genetic ancestry of the J ewish
Diaspora. PLoS One 3, e2062 (2008).
52. M. Bermisheva, K. Tambets, R. Villems, E. Khusnutdinova, Diversity of mitochondrial
DNA haplotypes in ethnic populations of the Volga-Ural region of Russia. Mol. Biol.
(Mosk.) 36, 990-1001 (2002).
53. S. Cvjetan et al., Frequencies of mtDNA haplogroups in southeastern Europe--Croatians,
Bosnians and Herzegovinians, Serbians, Macedonians and Macedonian Romani. Coll.
Antropol. 28, 193-198 (2004).
54. T. Kivisild et al., Ethiopian mitochondrial DNA heritage: tracking gene flow across and
around the gate of tears. Am. J. Hum. Genet. 75, 752-770 (2004).
55. V. Macaulay et al., The emerging tree of West Eurasian mtDNAs: a synthesis of control-
region sequences and RFLPs. Am. J. Hum. Genet. 64, 232-249 (1999).
56. B. A. Malyarchuk et al., Mitochondrial DNA variability in Poles and Russians. Ann. Hum.
Genet. 66, 261-283 (2002).
57. M. Metspalu et al., Most of the extant mtDNA boundaries in South and Southwest Asia
were likely shaped during the initial settlement of Eurasia by anatomically modern
humans. BMC Genet. 5, (2004).
58. L. Quintana-Murci et al., Where west meets east: The complex mtDNA landscape of the
southwest and Central Asian corridor. Am. J. Hum. Genet. 74, 827-845 (2004).
59. M. Richards et al., Tracing European founder lineages in the Near Eastern mtDNA pool.
Am. J. Hum. Genet. 67, 1251-1276 (2000).
60. D. J . Rowold, J . R. Luis, M. C. Terreros, R. J . Herrera, Mitochondrial DNA geneflow
indicates preferred usage of the Levant Corridor over the Horn of Africa passageway. J.
Hum. Genet. 52, 436-447 (2007).
61. K. Tambets et al., in Archaeogenetics: DNA and the population prehistory in Europe, C.
Renfrew, K. Boyle, Eds. (Cambridge Univ. Press, Cambridge, 2000), pp. 219-235.
62. D. M. Behar et al., Contrasting patterns of Y chromosome variation in Ashkenazi J ewish
and host non-J ewish European populations. Hum. Genet. 114, 354-365 (2004).
63. E. Bosch et al., High-resolution analysis of human Y-chromosome variation shows a
sharp discontinuity and limited gene flow between northwestern Africa and the Iberian
Peninsula. Am. J. Hum. Genet. 68, 1019-1029 (2001).
64. C. Di Gaetano et al., Differential Greek and northern African migrations to Sicily are
supported by genetic evidence from the Y chromosome. Eur. J. Hum. Genet. 17, 91-99
(2009).
65. F. Di Giacomo et al., Clinal patterns of human Y chromosomal diversity in continental
Italy and Greece are dominated by drift and founder effects. Mol. Phylogenet. Evol. 28,
387-395 (2003).
66. R. Goncalves et al., Y-chromosome lineages from Portugal, Madeira and Acores record
elements of Sephardim and Berber ancestry. Ann. Hum. Genet. 69, 443-454 (2005).
67. M. F. Hammer et al., Dual origins of the J apanese: common ground for hunter-gatherer
and farmer Y chromosomes. J. Hum. Genet. 51, 47-58 (2006).
68. T. M. Karafet et al., High levels of Y-chromosome differentiation among native Siberian
populations and the genetic signature of a boreal hunter-gatherer way of life. Hum. Biol.
74, 761-789 (2002).
69. A. O. Karlsson, T. Wallerstrom, A. Gotherstrom, G. Holmlund, Y-chromosome diversity
in Sweden - A long-time perspective. Eur. J. Hum. Genet. 14, 963-970 (2006).
70. R. J . King et al., Differential Y-chromosome Anatolian influences on the Greek and
Cretan Neolithic. Ann. Hum. Genet. 72, 205-214 (2008).
71. I. Nasidzeet al., Genetic evidence concerning the origins of South and North Ossetians.
Ann. Hum. Genet. 68, 588-599 (2004).
72. I. Nasidze, T. Sarkisian, A. Kerimov, M. Stoneking, Testing hypotheses of language
replacement in the Caucasus: evidence from the Y-chromosome. Hum. Genet. 112, 255-
261 (2003).
73. V. N. Pimenoff et al., Northwest Siberian Khanty and Mansi in the junction of West and
East Eurasian gene pools as revealed by uniparental markers. Eur. J. Hum. Genet. 16,
1254-1264 (2008).
74. Z. H. Rosser et al., Y-chromosomal diversity in Europe is clinal and influenced primarily
by geography, rather than by language. Am. J. Hum. Genet. 67, 1526-1543 (2000).
75. O. Semino et al., The genetic legacy of Paleolithic Homo sapiens sapiens in extant
Europeans: a Y chromosome perspective. Science 290, 1155-1159 (2000).
76. O. Semino, A. S. Santachiara-Benerecetti, F. Falaschi, L. L. Cavalli-Sforza, P. A.
Underhill, Ethiopians and Khoisan share the deepest clades of the human Y-chromosome
phylogeny. Am. J. Hum. Genet. 70, 265-268 (2002).
77. M. C. Bortolini et al., Y-chromosome evidence for differing ancient demographic
histories in the Americas. Am. J. Hum. Genet. 73, 524-539 (2003).
78. C. Capelli et al., A Y chromosome census of the British Isles. Curr. Biol. 13, 979-984
(2003).
79. M. F. Hammer et al., Extended Y chromosome haplotypes resolve multiple and unique
lineages of the J ewish priesthood. Hum. Genet. 126, 707-717 (2009).
80. T. Lappalainen et al., Regional differences among the finns: A Y-chromosomal
perspective. Gene 376, 207-215 (2006).
81. M. Raitio et al., Y-chromosomal SNPs in Finno-Ugric-speaking populations analyzed by
minisequencing on microarrays. Genome Res. 11, 471-482 (2001).
82. S. Rootsi et al., A counter-clockwise northern route of the Y-chromosome haplogroup N
from Southeast Asia towards Europe. Eur. J. Hum. Genet. 15, 204-211 (2007).
83. A. Rosa, C. Ornelas, M. A. J obling, A. Brehm, R. Villems, Y-chromosomal diversity in
the population of Guinea-Bissau: a multiethnic perspective. BMC Evol. Biol. 7, (2007).
84. P. A. Underhill et al., Y chromosome sequence variation and the history of human
populations. Nature Genet. 26, 358-361 (2000).


Table S1. Samples used for whole genome analysis.

Geographic
region
Population Abbreviation Li et
al. (1)
Behar et
al. (2)
This
study
Total
Africa NE Bantu BN 11 11
S Bantu BS 8 8
Mandenka Mnd 22 22
Yoruba Yor 21 21
North Africa Egypt Egy 12 12
Ethiopians Eth 19 19
Near/Middle
East
Saudis
Sdi 20 20
Bedouin Bdn 45 45
Druze Drz 42 42
J ordanians J or 20 20
Lebanese Leb 7 7
Palestinian Pal 46 46
Syrians Syr 16 16
Iranians Irn 20 20
Turks Tur 19 19
Europe French Fre 28 28
French Basques FrB 24 24
N Italians NIt 13 13
Tuscans Tus 7 7
Cypriots Cyp 12 12
Romanians Rmn 16 16
Bulgarians Bul 13 13
Ukrainians Ukr 20 20
Belorussians Blr 9 9
Lithuanians Lit 10 10
Russians Rus 25 2 27
Chuvashs Chv 17 17
Caucasus Armenians Arm 16 16
Georgians Grg 20 20
Abkhasians Abh 20 20
Balkars Blk 20 20
Adyghe Ady 17 17
N Ossetians NOs 15 15
Chechens Che 20 20
Lezgins Lzg 18 18
Kuban Nogays Nog 16 16
Kumyks Kum 13 13
Jews Georgian J ews GrJ 8 8
Mountain J ews MnJ 4 4
Iranian J ews InJ 4 4
Iraq J ews IqJ 11 11
Central Asia Tajiks Tjk 15 15
Turkmens Trm 15 15
Kurds (sampled in
Kazakhstan)
Krd 6 6
Uzbeks Uzb 15 15
Uygur Uyg 10 10
Altaians Alt 13 13
South Asia Hazara Haz 22 22
Pathan Ptn 22 22
Burusho Bur 25 25
Balochi Blo 24 24
Brahui Brh 25 25
Makrani Mak 25 25
Sindhi Sin 24 24
Siberia Yakuts Yak 25 25
East Asia Cambodians Cam 10 10
Dai Dai 10 10
Lahu Lah 8 8
Han Han 44 44
Daur Dau 9 9
Oroqen Oro 9 9
Mongolas Mng 10 10
J apanese J ap 28 28
Total 639 267 214
Grand Total 1120

M20
M436
SRY
10831.2
NE Caucasus N
9 4 s i d n A
2 4 s r a v A
8 2 s l a l a v g a B
7 2 s l a l a m a h C
5 6 1 s n e h c e h C
7 6 s n i g r a D
5 0 1 h s u g n I
3 7 s k y m u K
1 3 s n i g z e L
3 4 s n a r a s a b a T
Mountain Jews 10
NEC 0 4 6 l a t o T
Kuban Nogays 87
Kara Nogays 76
Nogays & Kara Nogays Total 163
NW Caucasus
8 8 s n i z a b A
4 5 1 e h g y d A
5 3 1 s r a k l a B
Cherkessians 126
0 4 1 n i d r a b a K
9 6 s y a h c a r a K
N Ossetans 132
NWC 4 4 8 l a t o T
South Caucasus
Abkhazians 162
7 5 s n a i n e m r A
5 6 s n a i g r o e G
S 1 2 s n a i t e s s O
SC 5 0 3 l a t o T
0 0 226 5 19 178 4 0
2 3 3 0 2 1 1 0 126 13 0 1 77 55 6 4 1 6 1 0 0 6 383 1 2 3 3 4 19 6 1 8 2 2
1
7 1 0 2 6 3
1 1 3 1 5 2 9 6
4
11
19 2
2
4 4 5 2 2 3 3 2
1 4 2 1 3 1 2 1 1 1 1 1 3 9 4 13 5 19
1
1 0 6 1 2 1 3
16 3 3 2 2 3 1 4 1 4 1 5 1 7 5 1 1
8 13 35 3 4 4 1 5
1
5 4 4 4
1 11 3 18 3 1 3 1 5 3 2 18 5
18 3 1
3 1 1
3 3 2 2 2 3 9 1
1 72 2
36 2 2
1 1 2 1 1 2 10 1
1 1
2
1 1 6 9 3 3 1
6 6 1 2 7 2
2 8 1
3 3 2 3 8 1 1 1 2 3 3
15
2
77 8 1 2 2 24 6 1 1 1
12 17
1 4 4 1 1 1
7 0 0 1 7 4 1
1
8 1 0 2 13 2 5 0 0 0 17 10 0 0 0 0
1 2 3
3 4 7 1 9 1
10 3 1 10 1 15
13 0
1 1 2 7 1 2 1
2 2 7 7 1 7 6 2 2 1 1 4
5 2 1
2 1
3
8 4
30 0
10
3 6 0 0 0 0 1 1 8 1 2 0 20 1 4 10 35 0 14 3 0 0 0 9 1
1 2 3 4
17 1 1 21
5
3
2 18 3 1
2
2
15 10 1 1 2 6 10 1 7 14 2
3 9
2
86 5
61
2
2 2 2
1 1 8 3 3 2 1 77
2
26 0 4 1 1 1
1 1 8 1 5
19 1 6
1 1 4 28 1 2 2 3
1 1 3 7 2 8 1 2 1 3
N
1
b
N
1
c
1
L
3
N
*
J
2
a
*
J
2
a
2
*
J
1
e
*
L
1
L
2
R
1
a
1
*
R
1
a
1
f
OQR
*
R
1
a
*
I
2
*
R
1
b
1
b
1
R
1
b
1
b
2
R
2
1
J
1
*
K
*
T E
1
b
1
b
1
c
G
1
J
2
a
2
a
J
2
b
*
G
2
a
H
1
I
1
I
2
a
CC
3
c
DEE
1
b
1
b
1
a
M458
4 2 1 M 3 4 P 2 9 M 3 2 2 M 2 . 7 3 P
M343
M198 M73 M269
M175 M242 M207
Tat M173
M357
M67 M12
4 1 2 M 0 7 M
M231 M76 M317
M52
92R7
M9
M438
12f2 M170
M253
M410
8 5 P 3 2 1 M 8 7 M
M130 YAP
M267
M48 M174
M172
4
M40 M201
M168
M35 M285 P15
3 5 5 37 4 1 2 4 2 1 2 1 9 3 1 4 8 1 3 6 2 1 6 3 6 1 2 5 1 0 0 0 0 0 0
0 0 0 0
0 0 0 0 0
I
2
b
F
*
0
0
2
2
1
1
2
1
0
3
1
1
M89
0
Table S2. Y chromosome data of the Caucasus populations and the phylogenetic relationships

of the 51 Y chromosome markers typed.
NE Caucasus N A B C D E F G H HV* HV0 HV1 HV2 I J K L M M1 N N1a N1b N1c
3 5 8 5 4 2 6 s i d n A
8 1 1 1 4 4 1 3 1 6 s r a v A
7 4 1 3 3 s l a l a v g a B 1
1 3 2 4 3 s l a l a m a h C
6 2 0 1 6 8 2 3 4 2 2 6 7 1 s n e h c e h C
3 1 2 1 7 2 8 6 2 1 0 1 1 s n i g r a D
5 4 3 8 1 9 2 3 2 3 0 1 h s u g n I
1 1 3 9 6 2 2 7 2 3 1 2 1 1 s k y m u K
5 6 2 1 1 2 6 4 s n i g z e L
2 1 3 7 1 9 2 2 5 s n a r a s a b a T
Mountain 3 1 1 1 3 2 s w e J
NEC 0 3 1 1 0 8 0 0 6 4 2 6 1 4 3 3 0 5 2 2 0 2 7 1 0 0 1 4 3 0 2 1 8 l a t o T
Kuban Nogays 131 1 1 2 1 5 6 5 2 1 4 8 2 6 2 5 2 1 6
Kara 3 2 1 2 6 2 7 5 3 1 8 1 5 7 0 3 1 s y a g o N
Nogays & Kara Nogays Total 261 13 6 20 18 0 7 13 54 6 0 1 3 7 6 5 1 5 1 0 0 1 0
NW Caucasus
3 5 4 3 1 9 2 1 5 5 5 0 1 s n i z a b A
1 1 3 1 2 5 3 1 2 7 5 1 2 1 5 5 1 e h g y d A
2 8 1 1 2 3 5 3 1 4 2 0 4 1 s r a k l a B
4 1 9 3 1 1 4 3 2 1 5 7 1 3 2 1 s n a i s s e k r e h C
1 1 1 9 2 1 2 4 3 1 2 1 5 0 1 0 5 1 n i d r a b a K
1 4 6 4 1 7 2 5 1 6 0 1 s y a h c a r a K
N 2 4 2 1 7 1 4 6 2 5 2 1 1 1 2 8 3 1 s n a i t e s s O
NWC 0 5 1 0 4 1 2 0 0 6 5 5 5 1 1 3 1 8 5 1 7 1 2 9 1 0 2 3 6 2 1 3 7 1 9 l a t o T
South Caucasus
1 5 2 3 2 2 2 3 1 4 8 1 6 3 1 s n a i z a h k b A
1 1 2 1 1 3 2 1 6 3 s n a i n e m r A
1 3 4 6 1 1 1 2 6 7 s n a i g r o e G
S 1 4 1 1 1 1 2 1 4 2 s n a i t e s s O
SC 2 7 2 l a t o T 1 0 8 7 0 1 0 57 6 0 4 1 11 9 13 0 0 2 0 1 1 0
NE Caucasus N N1d N2a N9 R R0a T*(xT1) T1 U U1 U2 U3 U4 U5 U6 U7 U8 U9 V W X Y Z
2 1 1 2 4 2 2 2 8 1 3 2 6 s i d n A
3 5 3 8 4 5 1 1 6 s r a v A
1 2 6 2 3 3 s l a l a v g a B
1 1 1 3 2 4 3 s l a l a m a h C 2
3 1 3 3 4 1 6 7 1 s n e h c e h C 3 7 1 3 5 1 8 8 6 2
2 2 1 0 1 1 s n i g r a D 4 1 1 5 1 7 2 4 2
5 3 0 1 h s u g n I 1 2 1 1 4 1 3 9 2 2 2 2 4
2 1 1 1 1 2 1 1 s k y m u K 6 2 1 5 6 7 7 3 5
2 6 3 3 2 4 6 4 s n i g z e L
2 5 s n a r a s a b a T 3 1 5 4 4 4 1 1 2 2
Mountain 2 1 5 3 2 s w e J
NEC 0 0 6 3 8 1 6 0 4 2 1 0 4 7 8 3 5 3 1 2 9 1 4 2 1 3 7 1 3 1 3 1 2 2 2 1 8 l a t o T
Kuban 3 9 1 3 1 s y a g o N 3 3 8 6 4 9 3 5
Kara 4 0 3 1 s y a g o N 3 7 3 2 2 1 3 3 2 8 3
Nogays & Kara Nogays Total 261 0 0 0 0 0 13 3 0 8 10 12 6 8 0 1 3 0 3 10 11 0 6
NW Caucasus
2 6 4 1 1 1 2 1 6 1 1 3 2 5 0 1 s n i z a b A
4 0 1 8 5 5 1 e h g y d A 5 7 1 5 1 3 7 4 2
1 1 5 0 1 5 0 4 1 s r a k l a B 0 1 9 1 1 3 2 1 9 5
5 7 4 2 3 2 1 s n a i s s e k r e h C 2 2 4 6 3 1 8 6 4
1 0 1 9 1 1 1 3 5 1 7 5 6 9 3 0 5 1 n i d r a b a K
2 1 6 2 0 1 6 7 1 6 4 3 6 0 1 s y a h c a r a K
N 3 5 1 1 8 3 1 s n a i t e s s O 1 1 9 4 0 1 1 6 3 7
NWC 0 8 5 2 1 7 6 7 3 7 4 0 0 3 8 4 0 6 2 3 0 0 7 1 9 l a t o T 4 7 0 1 48 44 2 5
South Caucasus
3 1 7 0 1 7 0 1 5 9 5 1 8 6 3 1 s n a i z a h k b A
6 3 s n a i n e m r A 2 3 3 1 1 4 1
0 1 3 1 1 3 0 1 1 3 3 2 8 3 6 7 s n a i g r o e G
S 1 1 2 1 1 2 1 2 1 4 2 s n a i t e s s O
SC 0 4 1 8 1 5 1 0 1 3 1 2 2 1 3 1 2 1 1 0 0 0 2 7 2 l a t o T 3 1 0 0 11 25 0 0
Table S3. mtDNA data of the Caucasus populations.

Table S4. The power of various models to predict the observed genetic distances (pairwise
F
ST
distances) between the populations studied. The coefficient of determination r
2
, its
increase relative to the default model of only geographic distance explaining genetic distance,
the marginal regression coefficient, and p-value are given. Statistically significant predictors
printed in bold.

Predictors (putative
barriers)
r
2
Increment in
r
2

Marginal regression
coefficient
p-
value
only geographic distance 0.4316 * 0.0347 0.0001
Caucasus barrier 0.5522 0.1206 0.0055 0.0001
Balkans barrier 0.5058 0.0742 -0.0048 0.0001
Mountain Jews 0.4751 0.0435 0.0063 0.0030
Iranian Jews 0.4693 0.0377 0.0058 0.0247
Burusho 0.4523 0.0207 0.0044 0.0596
Georgian J ews 0.4481 0.0165 0.0039 0.0959
Iraqi J ews 0.4467 0.0151 0.0037 0.1247
Druze 0.4376 0.0059 0.0023 0.4500
French Basque 0.4341 0.0025 0.0016 0.7090
South Asian barrier 0.4328 0.0012 0.0006 0.6570
Kuban Nogays 0.4318 0.0002 -0.0004 0.9531
All predictors 0.7695 0.3378 * *

Table S5. Coalescence times with standard errors for the Y chromosome haplogroups J1 (and subhaplogroups) and J2a2* in the Caucasus and neighboring
regions/populations. Median haplotypes of the 10 Y chromosome STR markers based on which the coalescence times were calculated [using the ASD
0
method
(32)] are also given.

Haplogroup Region/population N T
C

(ky)
SE
(ky)
DYS19 DYS388 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS439 DYS461
J1* All J1* (Caucasus, Near/Middle East, Central Asia) 121 18.6 4.4 14 13 13 16 23 10 11 12 11 11
J1* Caucasus 72 14.6 4.2 14 13 13 16 23 10 11 12 12 11
J1* Near/Middle East 33 22.8 6.0 14 13 14 17 23 10 11 12 12 11
J1*,
DYS388<15
Caucasus 66 11.9 1.9 14 13 13 16 23 10 11 12 12 11
J1*,
DYS388<15
Near/Middle East (Turks, Assyrians, Egyptians,
Iranians)
22 9.6 3.5 14 13 14 18 23 10 11 12 11 11
J1*,
DYS388<15
Turks 8 9.1 4.4 14 13 14 18 23 10 11 12 11 11
J1*,
DYS388<15
Assyrians (all DYS388<15) 12 6.9 2.8 14 13 14 18 23 10 11 12 11 12
J1*,
DYS388<15
Central Asia (Kazakhs, all DYS388<15) 16 1.8 0.6 14 13 13 15 23 10 11 12 11 10
J1*,
DYS38815
Near/Middle East 11 21.4 8.9 14 15 13 17 25 10 11 12 12 10
J1*,
DYS38815
Caucasus 6 8.2 2.0 14 16 13 16 23 10 11 12 12 11
J1e* All J1e* (Caucasus, Central Asia, Near/Middle East) 130 11.4 2.6 14 16 13 17 23 10 11 12 11 10
J1e* Caucasus 20 5.6 1.4 14 17 13 16 23 10 11 12 11 10
J1e* Near/Middle East 109 12.0 2.6 14 16 13 17 23 10 11 12 11 10
J2a2* All Caucasus 70 15.2 3.9 14 15 13 17 23 10 11 12 11 12
J2a2* Chechens, Ingush 36 12.4 4.2 14 15 13 16 23 10 11 12 11 12
J2a2* Chechens 18 14.3 5.7 14 15 14 17 23 10 11 12 10 12
J2a2* Ingush 18 9.1 5.0 14 15 13 16 21 10 11 12 12 12
J2a2* Near/Middle East (Turks, Assyrians) 27 14.6 3.0 14 15 14 17 23 10 11 12 11 11


P
C
2
PC1
Bdn
Sdi
Egy
Bdn
Pal
Jor
Drz
Cyp
Syr
Leb
IqJ
InJ
Armenians
MountainJews
GeorgianJews
Georgians
Abkhazians
Adyghe
Chechens
Balkars
Lezgins
Nogais
Chv
Rus
Mrd
Lit
Blr
Ukr
Fre
FrB
Rmn
Bul
NIt
Tus
Uzb
Tjk Trm
Irn
Ptn
Blo
Mak
Brh
Sin
Bur
Tur
P
C
3
PC1
Bdn
Sdi
Egy
Bdn
Pal Jor
Drz
Cyp
Syr
Leb
IqJ
InJ
Armenians
MountainJews
GeorgianJews
Georgians
Abkhazians
Adyghe
Chechens
Balkars
Lezgins
Nogais
Chv
Rus
Mrd
Lit
Blr
Ukr
Fre
FrB
Rmn
Bul
NIt
Tus
Uzb
Tjk
Trm
Irn
Ptn
Blo
Mak
Brh
Sin
Bur
Tur
P
C
3
PC2
Bdn
Sdi
Egy
Bdn
Pal
Jor
Drz
Cyp
Syr
Leb
IqJ InJ
Armenians
MountainJews
GeorgianJews
Georgians
Abkhazians
Adyghe
Chechens
Balkars
Lezgins
Nogais
Chv
Rus
Mrd
Lit
Blr
Ukr
Fre
FrB
Rmn
Bul
NIt
Tus
Uzb
Tjk
Trm
Irn
Ptn
Blo
Mak
Brh
Sin
Bur
Tur
Fig S1
Chv
Qtr
Dar
Aba
Abh
Ady
Crk
Kab
Bag
Cha
Tab
Avr
And
Lzg
Che
Ing
UAE
Egy
Irq
Jor
Leb
Pal
Syr
Grg
Blr
Rus
SSv
Ukr
Pak
Irn
Krd
MnJ
Arm
NOs
SOs
Bsh
Ttr
KNo
Nog
Blk
Kar
Kum
Tur
P
C
2

(
1
9
%
)
PC1 (22.8%)
LANGUAGE GROUPS
Nakh-Dagestanian
Abkhaz-Adyghe
Turkic
Indo-European
Afro-Asiatic
Kartvelian
Caucasus populations
Other populations
Fig S2
Aba
Qtr
Bsh
KNo
MnJ
Abh
Ady
Crk
Kab
UAE
Irq
Jor
Leb
Pal
Syr Egy
Irn
Krd
Arm
NOs
SOs
SSv
Rus
Blr
Ukr
And
Avr
Bag
Cha
Che
Dar
Ing
Lzg
Tab Blk
Chv
Kar
Kum
Nog
Ttr
Tur Grg
Pak
P
C
2

(
1
2
.
5
%
)
PC1 (17%)
LANGUAGE GROUPS
Nakh-Dagestanian
Abkhaz-Adyghe
Turkic
Indo-European
Afro-Asiatic
Kartvelian
Caucasus populations
Other populations
Fig S3
3
4
5
6
7
8
9
10
B
a
n
t
u

S
A
B
a
n
t
u

N
E
Y
o
r
u
b
a
n
s
M
a
n
d
e
n
k
a
s
E
t
h
i
o
p
i
a
n
s
E
g
y
p
t
a
n
s
S
a
u
d
i
s
B
e
d
o
u
i
n
s
P
a
l
e
s
t
n
i
a
n
s
D
r
u
z
e
J
o
r
d
a
n
i
a
n
s
L
e
b
a
n
e
s
e
S
y
r
i
a
n
s
T
u
r
k
s
F
r
e
n
c
h
F
r
e
n
c
h
B
a
s
q
u
e
s
N
o
r
t
h

I
t
a
l
i
a
n
s
T
u
s
c
a
n
s
R
o
m
a
n
i
a
n
s
B
u
l
g
a
r
i
a
n
s
U
k
r
a
i
n
i
a
n
s
B
e
l
o
r
u
s
s
i
a
n
s
L
i
t
h
u
a
n
i
a
n
s
R
u
s
s
i
a
n
s
C
h
u
v
a
s
h
e
s
A
r
m
e
n
i
a
n
s
G
e
o
r
g
i
a
n
s
A
b
k
h
a
z
i
a
n
s
B
a
l
k
a
r
s
A
d
y
g
h
e
N
.

O
s
s
e
t
a
n
s
C
h
e
c
h
e
n
s
L
e
z
g
i
n
s
K
u
b
a
n

N
o
g
a
y
s
K
u
m
y
k
s
G
e
o
r
g
i
a
n
J
e
w
s
M
o
u
n
t
a
i
n
J
e
w
s
I
r
a
q
i
J
e
w
s
I
r
a
n
i
a
n

J
e
w
s
I
r
a
n
i
a
n
s
T
a
j
i
k
s
T
u
r
k
m
e
n
s
U
z
b
e
k
s
U
y
g
u
r
s
A
l
t
a
i
a
n
s
H
a
z
a
r
a
s
P
a
t
h
a
n
s
B
u
r
u
s
h
o
B
a
l
o
c
h
i
s
B
r
a
h
u
i
M
a
k
r
a
n
i
s
S
i
n
d
h
i
Y
a
k
u
t
s
C
a
m
b
o
d
i
a
n
s
D
a
i
L
a
h
u
H
a
n
D
a
u
r
O
r
o
q
e
n
s
M
o
n
g
o
l
s
J
a
p
a
n
e
s
e
M
o
r
d
v
i
n
s
K
u
r
d
s
Africa Near East & Europe
South North
The Caucasus Cen. Asia
South Asia East Asia
Fig S4
Fig S5
Fig S6
Fig S7
Fig S8

You might also like