You are on page 1of 9

The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014

ISSN: 1920-2997 http://ru.rjgg.org


RJGG

64

___________________________________________________________

Received: May 10 2014; accepted: May 12 2014; published: May 20 2014.
Correspondence: gurianov.vm@gmail.com acgt@yfull.com




The update
of the phylogenetic structure
of Q1b haplogroup
based on full
Y-chromosome sequencing



Vladimir Gurianov
Roman Sychyev
Vladimir Tagankin
Vadim Urasin


1
Independent Researcher, Russia,
2
YFull Research Group, Russia.


Abstract

The new data of full Y-chromosome sequencing allowed the update of the Q1b (Q-L275) haplogroup struc-
ture, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220)
and Q-Y3030 (downstream Q-Y2200). It created the background for continuation of further researches of the in-
ner structure of the pointed subclades and on comparing of their existing ethno-population composition with the
migration of the Indo-European tribes.


Introduction

Over the short period passed after publica-
tion of V. Gurianov et al.s article (2013)
1
, sev-
eral samples of full Y-chromosome sequencing
referring to Q1b (Q-L275) haplogroup and its
downstream subclades, became public available.

The analysis of new data made it possible to
update the phylogenetic structure of Q1b hap-
logroup, to identify new subclades, to perform a

1
V. Gurianov et al. (2013) Phylogenetic Structure of Q-M378 Subclade Based.
On Full Y-Chromosome Sequencing. The Russian Journal of Genetic Genealogy
Volume 5, 1, 56-74.
more in-depth typing of a range of scientific
samples, and to make a feasible hypothesis on
the pre-historic migration routes of the Indo-
European tribes.


Source Data and Methodology

Data Sets for Comparison

The data on the examined samples are
summarised in the table below:


Table 1. Information on the researched samples of full Y-chromosome sequencing.

Sample code Population Verified origin Source of the information
HGDP00100 Hazara Pakistan Lippold et al. (2014)
2

HGDP00129 Hazara Pakistan Lippold et al. (2014)
HGDP00165 Sindhi Pakistan Lippold et al. (2014)
PGP193 N/A N/A
3
The Personal Genome Project
4

Eu1 Italians Sicilia, Italy Provided by a volunteer
5

Eu2 Portugal Azores, Portugal Provided by a volunteer
5


2
Sebastian Lippold et al. (2014) Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences,
doi: 10.1101/001792
3
Current location California, USA.
4
http://www.personalgenomes.org/
5
The test was performed by Full Genomes Corporation (FGC) in Beijing Genomics Institute at Illumina HiSeq 2000 sequenator, and is characterized by the following pa-
rameters: coverage 50 at read length of 100 base pairs.



The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

65

Genotyping

Data sets in BAM format (BAM/SAM Specifi-
cation
6
) and, in case of PGP193, TSV
7
format
were used for the research. The parameters of
New Generation sequencing (NGS) of Eu1 and
Eu2 samples performed by Full Genomes Corpo-
ration at Beijing Genomics Institute are the
same as were previously described in the article
of V. Gurianov et al. (2013).


Data Processing and Analysis

Processing and analysis of full Y-chromosome
sequencing data were made using the software
developed by YFull research group
8
, and the
VCFTools
9
.

Each sample was analysed for both SNPs dis-
covered during the research and SNPs included
in the ISOGG list under Q1b haplogroup and its
downstream subclades.

Presence of mutation in more than two male
samples not being relatives, as well as data con-
sistency between the new SNPs and the previ-
ously known information on phylogenetic struc-
ture of a respective subclade, served as the cri-
terion of a new SNP discovery.

6
The specification in force is located here: https://github.com/samtools/hts-
specs
7
TSV ( Tab Separated Values) text format to present table values.
8
http://www.yfull.com/
9
http://sourceforge.net/projects/vcftools/
The research also specified phylogenetic po-
sition of SNPs previously described in the article
of V. Gurianov et al. (2013).


Results

Eu1 Data Analysis

The research findings of Eu1 sample were
promptly submitted to ISOGG, and as of the
date of this article have been already included
into the current version of ISOGG SNP Tree.
Nevertheless, we consider it necessary to give a
detailed description of the revealed SNPs and al-
teration of the structure of subclades down-
stream of Q-Y2220 resulted there from.

Level Q-Y2225 and SNPs general for AJ1,
PGP193, and Eu1 were formalized upon compar-
ing Eu1 sample with the samples of YFull data
base.

SNPs typical for this level are included in the
Table below.










Table 2. SNPs of the Q-Y2225 level.

Position (hg19)
Ancestral
value
Value positive
for SNP
SNP name
23646920 C T Y2196
22471554 A T Y2201
19425984 G A Y2206
19053060 C T Y2207
18207170 A G Y2208
18046486 T C Y2210
18043999 G A Y2211
15834557 G A Y2213
15658212 C T Y2214
14385853 T G Y2215
9892635 C T Y2219
8662585 C A Y2224
6949449 C T Y2225

Consequently Q-Y2200 subclade which now
may pretend for a more accurate compliance
Jewish cluster of the Q-L245 branch, is currently
defined by the following single level SNPs:


The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

66

Table 3. SNPs of the Q-Y2200 level.

Position (hg19)
Ancestral
value
Value positive
for SNP
SNP name (Y)
22953894 A G Y2197
22825080 A G Y2198
22588598 C T Y2200
21277083 G A Y2203
16994660 T A Y2212
14353022 A C Y2216
14184253 C A Y2118
9401947 C A Y2221
4606181 C T Y2231
3995524 G A Y2232
3148720 A G Y2233

Since according to the phylogenetic structure
made on base of STR-markers Eu1 sample is
located in the centre of Q-L245 European clus-
ter (and presents a typical value of
DYF395S1=15-17, which is an ancestral), we
may reasonably assume that many of its private
SNPs will form branches of the tree subject to
availability of close samples to compare. To this
end the private SNPs of Eu1 sample are included
into a separate Schedule 1.

Eu2 Data Analysis

Eu2 sample was originally known as positive
to a private SNP L327. Comparison of the stated
sample with other ones stored in YFull data base
defined a new branch Q-Y2990, downstream of
Q-Y2550 and parallel to the Iran branch Q-L301.
Table 4. SNPS of the Q-Y2990 branch.

Position (hg19)
Ancestral
value
Value positive
for SNP
SNP name (Y)
7929100 A C Y2986
5398133 A T Y2987
15540398 G A Y2988
15656595 A C Y2989
17455705 C G Y2990
18205189 C A Y2991
18427622 C T Y2992
21794826 T C Y2993
21824228 C T Y2994
22779292 G A Y2995
23574588 G T Y2996
6675390 A G Y2997

The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

67

The stated branch currently includes two
samples: Eu2 and Kz1.

It is worth noting that Eu2 haplotype on the
phylogenetic tree constructed with regard to
values of 67 STR-markers is located near the
root of the tree, therefore it was possible to cal-
culate the time period when Q-M378 com-
menced to actively divide into two subclades: Q-
L245 and Q-Y2250. Two calculations made with
the use of MURKA software
10
and the method of
random pairs STR haplotype
11
demonstrated
this time to be 5000 years ago.

10
MURKA http://sourceforge.net/projects/phylomurka/
11
Adamov et al. (2011) TMRCA assessment though the method of random pairs
of STR haplotypes: http://rjgg.org/index.php/RJGGRE/article/view/83/102,
http://www.semargl.me/ru/dna/ydna/tools/asd-pairs/
Data Analysis of Samples
under Human Genome Diversity
Project (HGDP) from Pakistan

The above stated book of Lippold et al.
(2014) describes three samples of two Pakistani
ethnic groups: Hazaras and Sindhis.

HGDP00129 sample of a Hazara from the
Northern Pakistan may be identified as the one
of Q-L245 level on base of the following proper-
ties:


Table 5. SNPs showing that HGDP00129 sample belongs to Q-L245 level.

Position
(hg19)
Ancestral
value
Value positive
for SNP
SNP name (Y) SNP name (FGC)
9382621 G T Y2222 FGC1902
17860015 G T Y2139 FGC1849
23733052 A G Y2148 FGC1879

______________________________________

For the avoidance of doubt we shall note that the connection of Hazaras and the population of Khazar Kaganat, is not proved by any sources known to us. Hazaras (from
Persian , hezr thousand) are Shiahs of Mongol or Iran origin who speak Iranian and dwell in the central Afghanistan (8-10% of the total country population).
They speak Hazara dialect or the dialect of Dari language. Some of them speak Mongolian. The historical area of Hazaras dwelling in Afghanistan is Hazaradzhat region
shared in the contemporary Afghanistan by several provinces. Sengupta et al. (2006)
12
identifies the following structure of Hazaras by haplotypes of Y-chromosome: C-
M217 40% (10/25), R1b-M73 32% (8/25), O-M122 8%. Q1b-M378 is mentioned in this research to be localized in only one person. The detailed analysis of
Hazaras haplotypes currently known was presented by Sabitov in his article on the Origin of Hazaras from the point of DNA-genealogy
13
.

12
Sanghamitra Sengupta et al., Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions
and Reveal Minor Genetic Influence of Central Asian Pastoralists, Am J Hum Genet. 2006 February; 78(2): 202221.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380230/
13
Sabitov Zhaksalyk Origin of the Hazara from the point of DNA genealogy, The Russian Journal of Genetic Genealogy, Volume2, 1, 2010, page 38.
rjgg.org/index.php/RJGGRE/article/download/42/53

SNPs Y2139 and Y2148 were defined as posi-
tive: samples PGP130 and PGP193, as well as
the tested samples of Q-L245 (AJ1, AJ2, Ar1)
level. However they are negative in respect to
all Q-L275 (xL245), and namely to samples Ir1,
Kz1, Eu2, HG03914, HG03652, HG03864. We
have similar situation on SNP Y2222 (but for the
fact that it failed to be defined for PGP130).
Therefore, we may very likely suppose that the
tested sample HGDP00129 is referred to para-
subclade Q-L245*. Unfortunately the quality of
sequencing does not allow the sample position
on the phylogenetic tree to be defined more ac-
curately. The same may be stated in respect of
two other samples from HGDP.

The sample HGDP00165 belongs to a Sindhi
from Southern Pakistan may be identified as the
one of Q-Y2250 level based on the following
properties:

Table 6. SNPs showing that HGDP00165 sample belongs to Q-Y2250.

Position
(hg19)
Ancestral
value
Value positive
for SNP
SNP name (Y) SNP name (FGC)
6894323 C T Y2245 PR683
24452225 G C Y2270 FGC4676

SNPs Y2056, Y2091 and F1349 are common
for HGDP00129 and HGDP00165, but are nega-
tive to HGDP00100, which shows both samples
to belong to subclade Q-M378. All SNPs of Q-
Y2990 turned to be negative to sample
HGDP00165; the latter, therefore, refers to pa-
ra-subclade Q-Y2250 (xQ-Y2990).

The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

68

Sample HGDP00100 belonging to a Hazara
from Northern Pakistan may be clearly identified
as the one of Q-L275 branch both with regard to
the above pointed information on availability of
positive mutation of the previously identified
SNP being of the single level with L275, and on
base of the following properties:

1) All three samples are positive to SNPs
L314, F1169, F1337, F1528 being of a single
level with L275.

2) SNP F753 (hg19 3714320) is positive to
all three tested samples and to all Q-L275 in-
cluded into YFull data base, which is also similar
in respect of F1205 (hg19 8440399).

It is similar positive to all Q-L275 Y1150+
and to HGDP00100 SNPs Y1189, Y1209, Y1218,
Y1232, Y1263, L68/S329/PF3781 (hg19
18700150), YP505 (hg19 6388256).

The above mentioned book of Lippold et al.
(2014) included a phylogenetic scheme of Q
haplogroup where the mutual alignment of sam-
ples HGDP00100, HGDP00129 and HGDP00165
proves our conclusions. At the same time, the
in-depth analysis as per SNPs with regard to
specific branches failed to be made; samples
HGDP00129 and HGDP00165 were identified on
the scheme as single level.

Therefore, we managed to specify the phy-
logenetic position of three samples from HGDP,
and stated them to belong to the following
branches:

HGDP00100 Q-L275 -> Q-Y1150
(presumably)

HGDP00129 Q-L275 -> Q-M378 -> Q-L245
(presumably)

HGDP00165 Q-L275 -> Q-M378 -> Q-Y2250
(presumably)

The high level genetic diversity within a sin-
gle population and geographic region demon-
strates that the territory of the contemporary
Pakistan and Afghanistan played a key role in
spreading Q-L275 haplogroup in the past.

We shall note here that the original presence
of the population referring to this haplogroup in
Central Asia (pre Indo-European substrate)
looks more probable than the appearance of this
population in the region together with the Indo-
Europeans. However, the diffusion of indigenous
population and the one originated from the
north may result in establishment of a new
community where L275 haplogroup was a minor
one; its further spreading was connected with
the migrations of the Indo-Europeans to India
and Western Asia.

Due to presence of people belonging to Q-
L275 haplogroup in Central Asia by the close of
the 1
st
millennium B.C. proved by paleoDNA re-
searches, the territory of contemporary Pakistan
and Afghanistan is considered to be a transit
zone which presented the main migration routes
of the Indo-European tribes (which also included
representatives of Q-L275 haplogroup) to
Hindustan through the Hindu Kush (Q-Y1150),
as well as in the direction of Western Asia (Q-
Y2250 and Q-L245). The research of paleoDNA
performed by Chinese scientists based on the
findings of archaeological excavations in Central
Asia demonstrates the presence of Q haplogroup
representatives in these lands; 6 Q1a and 4 Q1b
were found in the Black Gouliang barrow to the
east of the Barkol Basin at the ruins of Hami
(Kumul).
14
With regard to the location of bodies
in the barrow, it may be concluded that repre-
sentatives of Q1b haplotype were of a higher so-
cial status.

The Hami oasis was located at the Great Silk
Road near to Turfan and Khotan (Yarkend). The
barrow dated to the Early (Western) Han (II-I
centuries B.C.).

A part of the contemporary Uyghur popula-
tions may be direct progenies of Q1b haplogroup
settled in the ancient Central Asia. The re-
searches of Hua Zhong et al., 2010
15
and Wen-
juan Shan et al. (2014)
16
show the availability of
Q1b haplogroup only among the people of Xinji-
ang. Unfortunately we have at our disposal only
17-marker haplotypes which prevent us from
making any definite conclusions.

PGP193 Data Analysis

We defined sample PGP193 as referring to a
Jewish cluster of Q-L245 (Y2225+ Y2200+)

14
Li Hongjie, Y chromosome genetic diversity of ancient population in the Northern China, Jilin
Universit, 2012.
http://cdmd.cnki.com.cn/Article/CDMD-10183-1012365432.htm
15
Zhong et al., Extended Y-chromosome investigation suggests post-Glacial migrations of mod-
ern humans into East Asia via the northern route // Molecular Biology and Evolution, First pub-
lished online: September 13, 2010, doi: 10.1093/molbev/msq247 (among four populations of Ui-
gurs from Xinjiang one such person was found in each of the two populations: 1 out of 71, 1 out
of 18).
16
Wenjuan Shan et al. (2014) Genetic polymorphism of 17 Y chromosomal STRs in Kazakh and
Uighur populations from Xinjiang, China. http://link.springer.com/article/10.1007/s00414-013-
0948-y

The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

69

branch. All SNPs out of the stated branches (but
for several
17
) were derived to be positive for
PGP193.

PGP193 -> Q-L275 -> Q-M378 -> Q-L245 -
> Q-Y2220 -> Q-Y2225 -> Q-Y2200


17
Y2114 not read (level Q-Y2225), Y2232, Y2233 and Y2212 (level Q-Y2200).l

Unfortunately this sample is anonymous, and
to justify the tested sample to be of Jewish ori-
gin is impossible.

Notwithstanding, the stated sample was
compared with private SNPs of AJ1 and AJ2
samples described in the article of Gurianov v.
et al. (2013). The results are summarized in the
below Table:

Table 7. SNPs of the Q-Y3030 branch.

Position (hg19) Ancestral value
Value positive
for SNP
SNP name (Y) SNP name (FGC)
6985833 G C Y2746 aka YFS028180 FGC4836
7116693 C G Y3026 aka YFS028187 FGC4837
14683323 G A Y3027 aka YFS028303
17842405 G A Y3028 aka YFS028379 FGC4845
18697269 A G Y2750 aka YFS028399 FGC4846
22545510 G T Y3029 aka YFS028485 FGC4850
22989959 T C Y3030 aka YFS028498 FGC4853
23338485 T C Y2751 aka YFS028509 FGC4854


PGP193 -> Q-L275 -> Q-M378 -> Q-L245 -
> Q-Y2220 -> Q-Y2225 -> Q-Y2200 ->Q-
Y3030

It is currently difficult to speculate on over-
lapping of a new SNP structure of Q-L245 sub-
clade with the earlier delivered phylogenetic
structures as per 67 STR-markers of Y-
chromosome; as well as for the reason that the
data on STR-markers of PGP193 are not avail-
able for the research. Moreover, sample AJ1
presents DYF395S1=15-19 which is typical for a
majority of Q1b Ashkenazi Jews, when AJ2 has
a unique DYF395S1=15-15 (which is apparently
a consequence of RecLOH).

Final Conclusions

The undertaken research resulted in the up-
date of Q1b (Q-L275) haplogroup structure, as
well as in identifying new subclades: Q-Y2990
(downstream Q-Y2250), Q-Y2225 (downstream
Q-Y2220) and Q-Y3030 (downstream Q-Y2200).

It created the background for continuation of
further researches on the inner structure of the
pointed subclades and on comparing of their ex-
isting ethno-population composition with the mi-
gration of the Indo-European tribes which con-
tributed to formalization of the pointed ethnic
groups.

The updated findings in respect of phyloge-
netic structure of Q1b haplogroup are included
in the following scheme.




















The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

70

SNP Phylogenetic Tree of Q1b Haplogroup.


The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

71

The changes made to the SNP scheme of
Q1b haplogroup compared to the one published
by V. Gurianov et al. (2013) are included in
Schedule 2.


Acknowledgements

The authors of the article wish to thank the
following people, who rendered their assistance
in its preparation and conducting the research:

Alessandro Biondo (Italy)
Leon Kull (Israel)
Justin Allen Loe (USA)
Linda Magellan (USA)
Olga Vasilyeva (United Kingdom)















































The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014
ISSN: 1920-2997 http://ru.rjgg.org
RJGG

72

Schedule 1. Private SNPs for Sample Eu1.

Position (hg19) Ancestral value
Value positive
for SNP
SNP name
(YFull internal notation)
3131205 T C YFS068595
3232026 C T YFS068596
3232027 A G YFS068597
3403647 C A YFS068599
3704060 A G YFS068605
6702576 C G YFS068611
6881382 C T YFS068612
7139179 A T YFS068614
7222827 C T YFS068615
8356720 T A YFS068618
8467849 G A YFS068619
8592711 T C YFS068620
8990561 G C YFS068621
9415377 T G YFS068622
13828699 C T YFS068627
14545910 T C YFS068629
15269498 T C YFS068630
15455814 T C YFS068631
16255444 C T YFS068634
17402893 A T YFS068639
17722084 G T YFS068640
18148788 G A YFS068641
18158679 T C YFS068642
18394566 C G YFS068643
19060348 C T YFS068644
19130251 A G YFS068645
19130253 T C YFS068646
19166462 C T YFS068647
21329851 C G YFS068654
21555930 T C YFS068655
22519498 G A YFS068659
23064750 C T YFS068660
24365889 A G YFS068663


Schedule 2. Changes made to SNP scheme of the Q1b haplogroup. SNPs under research.

SNP
Belonging
to subclade
Notes
CTS4507 Q-Y2250 Reverse SNP of P paragroup level (under research). Updated
in terms of specifying the reverse character of the mutation
L68 Q-Y1150 Added (Y-DNA Haplotree, FTDNA 2014)
F753 Q-L275 Added
F1205 Q-L275 Added
Y1193 Excluded from Q-Y1150 level (under research)
F2250 Q-L275 Added (Y-DNA Haplotree, FTDNA 2014)
Y1200 Q-L275 Revised: transfer from Q-Y1150 level
Y1220 Q-Y1150 Added (under research)
Y1228 Excluded from Q-Y1150 level (under research)
Y2118 Q-L245 Misprint correction: the position confirmed (see Y2218)
Y2218 Q-Y2200 Misprint correction: added in lieu of Y2118
YP505 Q-Y1150 hg19: 6388256 (->T)
Z5901 Excluded from Q-Y1150 level (under research)
_______________
Note: the SNPs under research are not included in the SNP scheme of Q1b haplogroup until their positions are clearly identified.