Advances in Genetics Research 16

Complimentary Contributor Copy

ADVANCES IN GENETICS RESEARCH

VOLUME 16
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or
by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no
expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of information
contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in
rendering legal, medical or any other professional services.

Additional books in this series can be found on Nova’s website

under the Series tab.
Additional e-books in this series can be found on Nova’s website

under the e-book tab.


VOLUME 16
KEVIN V. URBANO
EDITOR
New York

Copyright © 2016 by Nova Science Publishers, Inc.
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or
transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical
photocopying, recording or otherwise without the written permission of the Publisher.
We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions
to reuse content from this publication. Simply navigate to this publication’s page on Nova’s
website and locate the “Get Permission” button below the title description. This button is linked
directly to the title’s permission page on copyright.com. Alternatively, you can visit
copyright.com and search by title, ISBN, or ISSN.
For further questions about using the service on copyright.com, please contact:
Copyright Clearance Center
Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: info@copyright.com.
NOTICE TO THE READER

The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or
implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of
information contained in this book. The Publisher shall not be liable for any special,
consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or
reliance upon, this material. Any parts of this book based on government reports are so indicated
and copyright is claimed for those parts to the extent applicable to compilations of such works.
Independent verification should be sought for any data, advice or recommendations contained in
this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage
to persons or property arising from any methods, products, instructions, ideas or otherwise
contained in this publication.
This publication is designed to provide accurate and authoritative information with regard to the
subject matter covered herein. It is sold with the clear understanding that the Publisher is not
engaged in rendering legal or any other professional services. If legal or any other expert
assistance is required, the services of a competent person should be sought. FROM A
DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE
AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.
Additional color graphics may be available in the e-book version of this book.
Library of Congress Cataloging-in-Publication Data
ISSN: 2159-1563
ISBN: (eBook)
Published by Nova Science Publishers, Inc. † New York

CONTENTS
Preface vii
Chapter 1 Recent Insights into the Y Chromosome Role in Human Evolution,
Biomedicine and Forensic Genetics Research 1
Carolina Núñez Domingo
Chapter 2 Male Infertility Associated with TTTY Gene Family Deletions
in the Y Chromosome 21
Yapijakis Christos and Papadimitriou Katerina
Chapter 3 Genetic Diversity Assessment by Random Amplified Polymorphic
DNA in Quercus calliprinos, Q. ithaburensis and Q. boissieri,
Growing in Israel 37
Gabriel Schiller
Chapter 4 How Habitat Fragmentation Affects Genetic Diversity? The Case of
a Sand Dune Plant (Stachys maritima) in the Iberian Peninsula 65
S. Massó, C. Blanché, C. Barriocanal, M. C. Martinell
and J. López-Pujol
Chapter 5 The Use of HeLa Cells As a Model for Studying
DNA damage and Repair 79
Fabio Luis Forti
Chapter 6 The Molecular Genetics of Polycythemia Vera 101
Linda M. Scott
Chapter 7 MicroRNAs in Disease: Recent Advances and
Molecular Background 123
Braoudaki Maria, Tounta Georgia, Lykoudi Alexandra,
Kitsiou Tzeli Sofia, Kanavakis Emmanuel, Papantoniou Nikolas
and Kolialexi Aggeliki
Chapter 8 Papaya Viral Diseases: Recent Advances and Perspectives 135
Marcos Fernando Basso, José Albersio de Araújo Lima,
Michihito Deguchi, Diogo Manzano Galdeano,
Lívio da Silva Amaral and Patricia Machado Bueno Fernandes

vi Contents
Chapter 9 Phylogenetics and Phylogeography of Two Large Neotropical

Rodents (Capybara, Hydrochoerus Hydrochaeris, Hydrochaeridae
and Paca, Cunniculis Paca, Agoutidae; Rodentia) by Means of
Mitochondrial Genes: Opposite Patterns 151
Manuel Ruiz-García, Kelly Luengas-Villamil, Leslie Leal,
Luz Mery Bernal-Parra and Joseph Mark Shostell
Chapter 10 Omics Technologies Applied to Prokaryotes 201
Marcus de Barros Braga, Adonney Allan de Oliveira Veras,
Pablo Henrique Caracciolo Gomes de Sá, Diego Assis das Graças,
Rafael Azevedo Baraúna, Jorianne Thyeska Castro Alves,
Kenny da Costa Pinheiro, Vasco Ariston de Carvalho Azevedo,
Maria Paula Cruz Schneider, Rommel Thiago Jucá Ramos
and Artur Luiz da Costa da Silva
Index 227

PREFACE
“Advances in Genetics Research” presents original research results on the leading edge of
genetics discovery. Each article has been carefully selected in an attempt to present
substantial research results across a broad spectrum. In this continuing series compilation, the
authors present and discuss the most recent Y chromosome progress within the main fields of
genetics; male infertility associated with TTTY gene family deletions in the Y chromosome;
genetic diversity assessment by random amplified polymorphic DNA; the effect of habitat
fragmentation on genetic diversity; the use of HeLa cells as a model for studying DNA
damage and repair; the molecular genetics of polycythemia vera; recent advances and
molecular background of microRNAs in disease; papaya viral diseases; phylogenetics and
phylogeography of large neotropical rodents by means of mitochondrial genes; and omics
technologies applied to prokaryotes.
Chapter 1 - Human Y chromosome has unique features that seem to play against its
usefulness in genetics research. It is one of the smallest chromosomes of the genome, not
essential for life, it contains few genes while the rest consists of highly repeated polymorphic
DNA sequences, and most of it escapes recombination. However, Y chromosome has become
a main character for investigating Human Evolution and has some specific roles in
Biomedicine and Forensic Genetics. Recently, research focused on the Y chromosome has
disentangled some new evidences regarding the evolutionary and migratory history of human
populations. The largest study on Native South American individuals based on the analysis of
Y chromosome genetic markers was conducted, revealing interesting hypotheses about the
origin of some male lineages. Additionally, other studies have obtained new clues from the Y-
chromosome regarding the first peopling of South America. On the other hand, since the
accumulation of data from disease association studies, the importance of Y chromosome in
male health has become more evident. Several diseases associated with the male-specific
region of the Y chromosome (MSY), such as prostate cancer, graft-versus-host disease
(GVHD), autism, non-syndromic speech delay, and gender differences in the disease
pathophysiology of new-onset heart failure (HF) have been reported in the past few years.
Finally, the implementation of new rapid mutating Y chromosome short tandem repeat (STR)
genetic markers has become the novel future of Y chromosome forensic analysis for
differentiating male relatives and paternal lineages. This chapter will address the most recent
Y chromosome progress within the main fields of genetics: Human Evolution, Biomedicine,
and Forensic Genetics.

viii Kevin V. Urbano
Chapter 2 - Infertility is a significant problem worldwide affecting about 10-15% of

couples. Male factors account for about 50% of these cases. It is well known that deletions of
human Y-chromosomal regions are responsible for disorders in spermatogenesis. These
regions are known as the azoospermia factor (AZF) region, which is comprised by at least
three genetic domains in the long arm of Y chromosome, named as AZFa, AZFb, and AZFc.
Within the AZF region there are well-studied genes like DAZ (Deleted in azoospermia), as
well as some members of TTTY gene family, which are known to be transcribed in the testis,
and linked to sperm quality. Several studies have shown that some members of this large gene
family within the AFZ region, including TTTY2L2A, TTTY3 and TTTY4 genes, have also been
associated to both deletions and problematic spermatogenesis. Recently, deletions in two
genes of the TTTY2 multicopy Y-linked gene family, TTTY2L12A and TTTY2L2A, have been
reported in the same patients with oligozoospermia and azoospermia. Since the former gene is
located in the short arm of Y chromosome (Yp) and the latter in its long arm (Yq) these
observations suggest possible non-homologous recombination events. TTTY2L12A and
TTTY2L2A genes in addition to TTTY1, TTTY2, TTTY6, TTTY7, TTTY8, TTTY18, TTTY19,
TTTY21 and TTTY22 genes, located in both Y chromosome arms, are transcribed in non-
coding RNAs (ncRNAs). Several studies have demonstrated that ncRNAs are essential for the
normal production of spermatozoa. Possible mechanisms of function of long ncRNAs, as in
the case of TTTY2L12A and TTTY2L2A gene transcripts, are discussed.
Chapter 3 - Holly oak (Q. coccifera L./Q. calliprinos Webb.), Tabor oak (Quercus
ithaburensis Decne) and Aleppo oak (Quercus boissieri Reut.) are the main oak species is
Israel for which the authors lack the genetic knowledge that is needed as a basis for: forest
management, insitu-exsitu genetic conservation; breeding for new plantings of forests, parks
and gardens. Therefore, random amplified polymorphic DNA (RAPD) analysis was used to
determine the genetic diversity within and among 24 spontaneous occurrences of Quercus
calliprinos, 16-of Q. ithaburensis and 14-of Q. boissieri.
Chapter 4 - Stachys maritima is a species typical of the coastal dunes with a wide
distribution within the Mediterranean Basin. During the last century, this species was
subjected to severe habitat fragmentation, mainly as a consequence of tourism activities and
urban pressures, with a decreasing of area up to 99% in the Iberian Peninsula and a remaining
total population size ca. 420 individuals in less than 50 km2. In spite of some annual
fluctuations, the species shows a clear regression. Allozyme electrophoresis was used to
evaluate levels and distribution of genetic diversity in Iberian populations of this threatened
coastal sand dune plant. Extremely low levels of genetic variation were detected (P = 4.0,
A = 1.1 and He = 0.014). From the 19 interpretable loci found only 4 were polymorphic (Aco-
1, Idh-2, Mdh-2, and 6Pgd-2). In addition, the authors also present some conservation actions
focused on maintaining population size and gene flow, in addition to preserving its habitat.
Chapter 5 - Since 1951, HeLa cancer cells, the first human cell line isolated from an
aggressive cervix adenocarcinoma of a thirty-year-old woman, have been serving scientists
around the globe. Since then, approximately eighty-five thousand scientific articles have been
published in the US National Library of Medicine, National Institutes of Health (PubMed),
and a hundred thirteen thousand articles in the Web of Science (Thomson Reuters), using this
immortalized cell line in the most diverse fields of the biomedical sciences.
Even with the advent of other immortalized human cancer cell lines in the following
decades, the HeLa line is still used as a good cell model in hundreds of annually published
papers.

Preface ix
In the same decade, the structure for deoxyribose nucleic acid (DNA) was determined by
Watson and Crick and, coincidently or not, since then the HeLa cell line has also been used as
cellular model for studies of DNA damage by many different agents and for DNA repair
through different biochemical pathways. The wide diversity of physical, chemical and
biological DNA stressors promote limited types of DNA lesions that are removed or also
repaired by a reduced number of mechanisms: i) homologous recombination (HR), ii) non-
homologous end-joining (NHEJ), iii) nucleotide excision repair (NER), iv) base excision
repair (BER), v) mismatch repair (MMR), and vi) and others less common (such as
interstrand cross-link, or ICL).
Here in this communication, the authors describe some current scientometric analyses
behind the use of the HeLa cell line in the DNA damage and repair field, as well as relevant
reports highlighting the importance of HeLa as a cellular model for each DNA repair
mechanism mentioned. Finally, the authors discuss the use of HeLa cells in their laboratory
within a tentative identification and characterization of atypical functions of other enzymes in
the maintenance of genomic stability.
Chapter 6 - The past decade has seen unprecedented improvements in the diagnosis and
management of polycythemia vera (PV), a blood cancer that is characterized by the
predominant expansion of morphologically normal red blood cells. A significantly increased
understanding of the molecular and cellular biology underlying this disorder has in large part
spearheaded these improvements. Numerous genetic studies have revealed that the vast
majority of PV patients have an acquired mutation within Janus kinase 2 (JAK2), a
cytoplasmic tyrosine kinase that is constitutively associated with cytokine receptors that lack
intrinsic kinase activity. Within the erythroid lineage, JAK2 is the predominant mediator of
intracellular signaling following activation of the erythropoietin receptor. Approximately 95%
of patients with PV have a “JAK2V617F” mutation, which is the result of a single nucleotide
change in exon 14 of the JAK2 gene; the remainder are instead positive for one of a series of
mutations collectively known as the JAK2 exon 12 mutations. All of these variants cause
constitutive activation of JAK2-mediated intracellular signaling in vitro and in vivo, and are
associated with the erythropoietin-hypersensitive expansion of erythroid progenitors that is a
hallmark feature of PV. In various transgenic mouse models, expression of mutant JAK2
generates an erythrocytosis phenotype that is remarkably similar to that of PV in humans,
suggesting that the acquisition of a JAK2 mutation alone is sufficient to initiate the
development of this myeloproliferative neoplasm (MPN). Accordingly, several inhibitors of
JAK2 activity have been evaluated in patients with PV, with one recently being approved by
the Federal Drug Administration.
Chapter 7 - Micro-RNAs (miRNAs) are a type of highly conserved, short non-coding
RNAs, playing key roles in the postracriptional repression of their mRNA targets. The
expression of these transcriptional regulators is critical for basic cellular mechanisms
including cell development, differentiation, proliferation, migration and apoptosis.
Exctracellular circulating miRNAs can be packed in exosomes, microvesicles, or lipoprotein
complexes involved in distant postrascriptional regulation. The detection of these molecules
can be facilitated by the use of high throughput technologies such as NGS and microarrays.
Recent studies in the field provide clear evidence that changes in the expression profile of
specific miRNAs might be critical for the pathogenesis of many disorders. MiRNAs have
been linked with human diseases such as cancer, cardiac disorders and pregnancy related
complications. Profiling patterns of miRNA expression could facilitate the characterization of

x Kevin V. Urbano
a range of diseases as early as the presymptomatic stage. Thus, these molecules can be
introduced as novel tools in stage-specific genetic diagnosis of pathological conditions.
Chapter 8 - Papaya (Carica papaya) is a perennial herbaceous plant native to tropical
America belonging to the family Caricaceae, one of the most important fruit crops and
widely distributed in tropical and subtropical countries. Viral diseases are considered of
worldwide strong economic importance by reductions in the papaya production, quality and
plant longevity. Papaya ringspot virus biotype Papaya (PRSV-P), genus Potyvirus; Papaya
lethal yellowing virus (PLYV), genus Sobemovirus; Papaya leaf distortion mosaic virus
(PLDMV), genus Potyvirus; Papaya apical necrosis virus (PANV), genus Rhabdovirus;
Papaya mosaic virus (PMV), genus Potexvirus; Tomato spotted wilt virus (TSWV), genus
Tospovirus; Papaya mild yellow leaf virus (PMYLV), genus Tenuivirus and Papaya meleira
virus (PMeV), not classified yet by the ICTV are the main virus species which cause disease
of the worldwide economic importance reported in papaya. They are responsible by causing
losses of up to 100% in productivity, strongly decreasing the commercial quality of the fruits
and in advanced stages, some of the diseases can lead plant death. Some of those viruses are
highly stable and/or easily disseminated by insect vectors or mechanically. Infected plants are
not able to be cured, being mainly recommended roguing for eliminating sources of
inoculum. Propagation of healthy material, tolerant or resistant plants, weekly inspections and
roguing, and management of insect vectors have been considered important measures for the
success of this crop; however, field reality hampers its adoption. Here, the authors summarize
the main characteristics of the papaya viral diseases, with emphasis to PRSV-P, PLYV,
PMeV and PLDMV which are considered to be of the highest incidence worldwide with
economic importance. In addition, recent advances in papaya-viruses molecular interaction
are reviewed, aiming to stimulate critical sense and on the search of novel frontiers.
Chapter 9 – The authors analyzed mitochondrial genes (D-loop and Cyt-b) to compare the
genetic structure and phylogeography between the capybara (Hydrochoerus hydrochoeris,
n = 78) and the paca (Cunniculus paca, n = 120). The two species presented very high levels
of gene diversity for both mitochondrial markers, but the paca yielded higher levels than the
capybara. The capybara showed a noteworthy and significant amount of genetic heterogeneity
among different populations, although the mt D-loop gene was more useful in differentiating
the populations than was mt Cyt-b. In contrast, the paca yielded low levels of gene
heterogeneity among different populations. In this case, both mitochondrial genes had
inconspicuous and similar genetic heterogeneities. Estimations of Bayesian female effective
numbers, indicated the paca as having higher values than the capybara. For both species, mt
Cyt-b yielded higher effective sizes than did mt D-loop. Similarly, the Bayesian gene flow
estimates were considerably greater among paca populations than among capybara
populations. Different analyses revealed population expansions in both species. Only the
capybara population of Northern Colombia showed some evidence of a population bottleneck.
An isolation by distance analysis showed that the capybara yielded a very positive and
significant relationship between genetic and geographic distances, whereas among paca
populations there was no significant relationship.The author’s phylogenetic analyses suggest
the capybara to be effected by geographical barriers. This agrees quite well with the fact that
the dispersion of the capybara is restricted by the existence of rivers. The author’s results did
not support similar findings for the paca, and therefore could not confirm any putative ESUs
or subspecies for the paca. Furthermore, the author’s results suggest the mitochondrial
haplotype splits of both species to have occurred during the Miocene, but were older in

Preface xi
capybara than in paca. In the case of the capybara, the original focus of dispersion seems to
be the Western Amazon, whereas for the paca, this origin is not clear. Although many authors
consider the trans-Andean capybara population as a different species (H. isthmus), the
author’s molecular results suggest this population to be a geographical subspecies.
Chapter 10 - Next-generation platforms provide high-throughput sequencing, but these
data must be evaluated for quality to increase the accuracy of analyses. This considerable
amount of data have benefited omic studies in prokaryotes, such as genomics, metagenomics
and transcriptomics. On genomics, became possible to perform the assembly of whole
genomes, which allowed understand the biological processes and the study of the evolution in
a set of genes using bioinformatics, through software and pipelines. Another field affected by
next-generation sequencing (NGS) data are the study of the microbial ecology which can be
reached by the Whole Metagenome Sequencing (WMS) approach or through the sequencing
of molecular markers, such as the 16S rRNA gene, to determine the microbial diversity.
Advances have also been obtained in transcriptomics, allowing to understand the functional
elements, such as mRNAs, non-coding RNAs and small RNAs. For transcriptome, two
methods are commonly used: reference-based, to determine the expression level for each gene
and the de novo approach, which enables the identification of unknown elements in the
genome. Throughout this chapter the authors present the fundamental concepts behind these
omics analyzes based on NGS data, emphasizing its main applications and the most used
computational tools.

In: Advances in Genetics Research. Volume 16 ISBN: 978-1-63484-262-4
Editor: Kevin V. Urbano © 2016 Nova Science Publishers, Inc.
Chapter 1
RECENT INSIGHTS INTO THE Y CHROMOSOME

ROLE IN HUMAN EVOLUTION, BIOMEDICINE
AND FORENSIC GENETICS RESEARCH
Carolina Núñez Domingo

BIOMICs Research Group, University of the Basque Country UPV/EHU, Avda.
Miguel de Unamuno, Vitoria-Gasteiz, Spain
ABSTRACT
Human Y chromosome has unique features that seem to play against its usefulness in
genetics research. It is one of the smallest chromosomes of the genome, not essential for
life, it contains few genes while the rest consists of highly repeated polymorphic DNA
sequences, and most of it escapes recombination. However, Y chromosome has become a
main character for investigating Human Evolution and has some specific roles in
Biomedicine and Forensic Genetics. Recently, research focused on the Y chromosome
has disentangled some new evidences regarding the evolutionary and migratory history of
human populations. The largest study on Native South American individuals based on the
analysis of Y chromosome genetic markers was conducted, revealing interesting
hypotheses about the origin of some male lineages. Additionally, other studies have
obtained new clues from the Y-chromosome regarding the first peopling of South
America. On the other hand, since the accumulation of data from disease association
studies, the importance of Y chromosome in male health has become more evident.
Several diseases associated with the male-specific region of the Y chromosome (MSY),
such as prostate cancer, graft-versus-host disease (GVHD), autism, non-syndromic
speech delay, and gender differences in the disease pathophysiology of new-onset heart
failure (HF) have been reported in the past few years. Finally, the implementation of new
rapid mutating Y chromosome short tandem repeat (STR) genetic markers has become
the novel future of Y chromosome forensic analysis for differentiating male relatives and
paternal lineages. This chapter will address the most recent Y chromosome progress
within the main fields of genetics: Human Evolution, Biomedicine, and Forensic
Genetics.

2 Carolina Núñez Domingo
INTRODUCTION
In humans the Y chromosome is approximately 60 Mb (million base pairs) long and
contains only 78 protein-coding genes (Skaletsky et al. 2003). The SRY gene (sex-
determining region Y) located on the Y chromosome encodes a protein that triggers the
development of the testes and through an extended hormonal pathway causes a developing
fetus to become male (Sinclair et al. 1990).
Genomic studies revealed that apart from two pseudoautosomal regions (PARs), located
at the Y chromosome edges, no recombination with the homologous X chromosome occurs
during meiosis. The remaining 95% of the Y chromosome length is non-recombining, male
specific, and is passed from father to son unchanged, except when mutations occur. This
region has been denominated the male-specific region of the Y chromosome or MSY
(Skaletsky et al. 2003). The lack of recombination may be the reason why there are relatively
few genes on the Y chromosome. If there is no chromosome crossing over, mutations within
genes have little chance to be repaired or rectified and hence will be passed onto the next
generation.
In 2003, Skaletsky et al. reported for the first time the nucleotide sequence of the MSY
region. They observed that MSY is a mosaic of heterochromatic sequences and three classes
of euchromatic sequences: X-transposed, X-degenerate and ampliconic (Figure 1). The X-
transposed sequences are 99% identical to the X chromosome, include only two genes and
they are the result of X-to-Y transposition episodes since the divergence of humans and
chimpanzees (Page et al. 1984). Contrary to the X-transposed, the X-degenerate segments are
scattered along the MSY and include single-copy gene or pseudogene homologues of 27
different X-linked genes, among them the SRY gene. The X-degenerate region is also a
deteriorated version of the ancestral autosome that generated the Y chromosome and it
includes genes that, conversely to those located in the X-transposed regions, are well-
conserved among primates. The third class of euchromatic sequences consist of highly
repetitive sequences known as ampliconic segments which exhibit high similarity with other
MSY sequences. In these segments there is the highest density of genes accounted for the
MSY (Table 1) and that are often located in palindromes (Bachtrog 2013).
Figure 1. The male-specific region of the Y chromosome (MSY). a) Schematic representation of the Y
chromosome, including the pseudoautosomal and heterochromatic regions. b) Enlarged view of a
portion of the MSY. Three classes of euchromatic sequences (X-transposed, X-degenerate and
ampliconic), as well as heterochromatic sequences are shown.

Recent Insights into the Y Chromosome Role in Human Evolution … 3
The rest of the MSY is composed of heterochromatin regions. The human Y chromosome
contains a large heterochromatin block of approximately 40 Mb that spans most of the distal
arm (Figure 1). The centromeric region also holds heterochromatin sequences as well as a
little part of the proximal arm (Yq11.22), which contains more than 3,000 tandem repeats
(Skaletsky et al. 2003).
All the transcription units identified on the Y chromosome are located in euchromatic
sequences, while no evidence of transcription was found in heterochromatin regions. Overall,
there are 60 genes on the MSY as reported in Ensemble v68 (Jangravi et al. 2013). Most of
these genes would be classified into two functional classes, those expressed throughout the
body, and others exclusively in testes (Lahn and Page 1997) (Table 1).
As part of the Chromosome-centric Human Proteome Project (C-HPP), initiated in 2012,
researchers begun mapping the MSY with the final purpose of increase the understanding of
the functions associated with all proteins encoded by the Y chromosome. Moreover, this
project could establish a basis for the development of diagnostic, prognostic, therapeutic, and
preventive medical applications (Jangravi et al. 2013).
Recently, a study carried out by Jangravi et al. (2013) presented the most updated
research on MSY protein-encoding genes and their association with health disorders.
Researchers believe that MSY proteins may play important roles in various human traits and
diseases, mostly consequence of alterations related to their biological processes, i.e.,
spermatogenesis, transcription, cell differentiation, miscellaneous processes, gonad
development, metabolic processes, tissue development, nucleosome assembly, chromatin
modification, single fertilization, translation, sex differentiation, cell adhesion, RNA
metabolism, cell proliferation and sex determination.
Y CHROMOSOME INFERTILITY
Y chromosome infertility is a condition that affects the production of sperm, making it
difficult or impossible for an affected man to have children. The production of sperm cells
may be null (azoospermia), decreased (oligozoospermia), or could lead to cells with abnormal
characteristics. Y chromosome infertility occurs in approximately 1 in 2,000 to 1 in 3,000
males of all ethnic groups (www.ghr.nlm.gov). Although the role of the Y chromosome has
been widely investigated (for review, see Matzuk and Lamb 2008; Vogt 2005), all the
knowledge acquired so far is based on microdeletion research. Studies have detected
microdeletions in the long arm (Yq) up to 18% of men with idiopathic azoospermia and 5-
10% with oligozoospermia (Foresta et al. 2001; Krausz and McElreavey 1999). Despite the
fact that Y chromosome microdeletions in the Yq11, which harbour the azoospermic factor
(AZF) regions, represent an important molecular genetic cause in infertile men, a notable
proportion of idiophatic azoospermic and severe oligospermic men do not have deletions
(Krausz et al. 2003). Therefore, other related factors affecting Y specific genes could
contribute to the infertile phenotype. The expression of several Y chromosome protein-coding
genes are related to the spermatogenesis process, being potential research targets to
investigate male infertility. The VCY2 (variable charge, Y chromosome, 2) is a testis-specific
gene located in the most frequently deleted azoospermia factor region.

Table 1. Information of 22 MSY genes associated to diseases (Skaletsky et al. 2003; www.hupo.ir)
Gene symbol MSY region Locus Nextprot ID Tissue Description Association to disease
expression
BPY2 Ampliconic Yq11 NX_O14599 Testis-specific Testis-specific basic protein Male infertility, prostate cancer
Y2
CDY1 Ampliconic Yq11.23 NX_Q9Y6F8 Testis-specific Testis-specific Male infertility

chromodomain protein Y 1
DAZ1, 2, 3 and 4 Ampliconic Yq11.23 NX_Q9NQZ3, Testis-specific Deleted in azoospermia Male infertility
NX_Q13117, protein 1, 2, 3 and 4
NX_Q9NR90,
NX_Q86SG3
DDX3Y X-degenerate Yq11 NX_O15523 Ubiquitous ATP-dependent RNA Male infertility, chronic graft-versus-
helicase DDX3Y host disease, heart failure, prostate
cancer
EIF1AY X-degenerate Yq11.222 NX_O14602 Ubiquitous Eukaryotic translation Male infertility, chronic graft-versus-
initiation factor 1A, Y- host disease, heart failure, prostate
chromosomal (eIF-1A Y cancer, secondary recurrent
isoform) miscarriage
HSFY Ampliconic Yq11.221 NX_Q96LI6 Testis-specific Heat shock transcription Male infertility
factor, Y-linked
KDM5D X-degenerate Yq11 NX_Q9BY66 Ubiquitous Lysine-specific demethylase Male infertility, prostate cancer
5D
NLGN4Y X-degenerate Yq11.221 NX_Q8NFZ3 Fetal and adult Neuroligin-4, Y-linked Autism
brain, prostate,
Complimentary Contributor Copy testis

expression
PCDH11Y X-transposed Yp11.2 NX_Q9BZA8 Fetal brain, brain Protocadherin-11 Y-linked Non-syndromic language delay,
prostate cancer
PRKY X-degenerate Yp11.2 NX_O43930 Ubiquitous Putative serine/threonine- Prostate cancer

protein kinase PRKY
PRY Ampliconic Yq11.222 NX_O14603 Testis-specific PTPN13-like protein, Y- Prostate cancer

linked
RBMY1A1, B, D, Ampliconic Yq11.23 NX_P0DJD3, Testis-specific RNA-binding motif protein, Prostate cancer, male infertility, liver
E, and F NX_A6NDE4, Y chromosome, family 1 cancer
NX_P0C7P1, members A1, B, D, E and F
NX_A6NEQ0,
NX_Q15415
RPS4Y1 X-degenerate Yp11.3 NX_P22090 Ubiquitous 40S ribosomal protein S4, Y Chronic graft-versus-host disease,
isoform 1 heart failure, prostate cancer,
secondary recurrent miscarriage
SRY X-degenerate Yp11.3 NX_Q05066 Testis, brain Sex-determining region Y Prostate cancer, sex reversal, male
protein infertility
TBL1Y X-degenerate Yp11.2 NX_Q9BQ87 Fetal brain, F-box-like/WD repeat- Autism, non-syndromic coarctation of
prostate containing protein TBL1Y the aorta
TMSB4Y X-degenerate Yq11.221 NX_O14604 Ubiquitous Thymosin beta-4, Y- Chronic graft-versus-host disease,
chromosomal prostate cancer

Table 1. (Continued)
expression
TSPY1-4, 8 and 10 Ampliconic Yp11.2 NX_Q01534, Testis-specific Testis-specific Y-encoded Male infertility, prostate cancer,
NX_A6NKD2, protein 1-4, 8 and 10 gonadoblastoma, hepatocelular
NX_P0CV98, carcinoma, testis germ cell tumor
NX_P0CV99,
NX_P0CW00,
NX_P0CW01
USP9Y X-degenerate Yq11.2 NX_O00507 Ubiquitous Probable ubiquitin carboxyl- Male infertility, chronic graft-versus-
terminal hydrolase FAF-Y host disease, heart failure, prostate
cancer
UTY X-degenerate Yq11 NX_O14607 Ubiquitous Histone demethylase UTY Chronic graft-versus-host disease,
prostate cancer, secondary recurrent
miscarriage
VCY1 Ampliconic Yq11.221 NX_O14598 Testis-specific Testis-specific basic protein Prostate cancer
Y1
XKRY Ampliconic Yq11.221 NX_O14609 Testis-specific Testis-specific XK-related Prostate cancer

protein, Y-linked
ZFY X-degenerate Yp11.3 NX_P08048 Ubiquitous Zinc finger Y-chromosomal Prostate cancer, chronic graft-versus-
protein host disease

It has been observed a decrease in the expression of VCY2 at the spermatogonia and lack
of expression in spermatocytes in testicular biopsy specimens with maturation arrest or
hypospermatogenesis (Tse et al. 2003).
This impaired expression on VCY2 in infertile men would suggest its involvement in the
pathogenesis of male infertility. Further research has focused on the expression of the deleted
on azoospermia protein (DAZ), testis-specific Y-encoded protein (TSPY), ATP-dependent
RNA helicase (DDX3Y), heat shock transcription factor, Y-linked (HSFY), eukaryotic
translation initiation factor 1A, Y-chromosomal (EIF1AY) and RNA-binding motif protein, Y
chromosome (RBMY) (Ferlin et al. 2010; Kleiman et al. 2007; Kuo et al. 2004; Lardone et al.
2007; Lavery et al. 2007; Sato et al. 2006; Song et al. 2007). The expression of these genes
was diminished or absent in subjects with impaired spermatogenesis suggesting some degree
of association with male infertility.
Intriguing hypothesis have also been proposed denoting the relationship between the
spermatogenesis and different Y chromosome lineages with controversial results.
Two works studied the association between Y chromosome haplotypes and deletions
leading to male infertility in individuals from different parts of Europe, observing no
statistically significant evidences (Paracchini et al. 2000; Quintana-Murci et al. 2001).
Another study by Carvalho et al. (2003) investigated the association between Y chromosome
haplogroups (groups of geographically-specific haplotypes) and male infertility in Japanese
oligozoospermic and azoospermic men and found no significant differences between cases
and controls. Contrarily, Arredi et al. (2007) detected a higher frequency of haplogroup E in
north Italian patients carrying microdeletions. A more recent study of non-obstructive
azoospermia cases and controls in the Han Chinese population revealed a significant
predisposition to the condition in Y chromosome haplogroup K*, while haplogroup O3e*
might have a protecting effect (Lu et al. 2013). The authors hypothesized that the
susceptibility of haplogroup K* to azoospermia was possibly attributed to an increase in the
DAZ gene copy number, therefore this over-dosage may be a potential risk factor for
spermatogenic impairment.
It is important to take particular caution to these association studies since different factors
may influence their interpretations, such as the geographical structure of the Y chromosome
variations in the population investigated, and the sample size and selection criteria of the case
and control groups. Hence, further investigations are required in order to attain more
definitive and accurate conclusions in this respect.
PROSTATE CANCER
Prostate cancer is a complex polygenic disorder which would be one of the most known
causes of mortality in men (Siegel et al. 2013). Some Y chromosome genes are involved in
critical functions in male-organs, such as testis and prostate glands. Consequently, it is
consistent to think that these genes might be implicated by some means in male cancer.
Indeed, there were evidences about the association between different forms of tumors and
structural and differential gene expression of the Y chromosome (Bianchi et al. 2006).
Generally, loss of Y chromosome (LOY) material has been reported as one of the most
common aberrations in prostate cancer (Sandberg 1992).

Recently, Khosravi et al. (2014) conducted an analysis of candidate genes related to

prostate cancer. They identified 19 genes putatively involved in this type of cancer, which 12
of them were previously reported (Lau and Zhang 2000; Perinchery et al. 2000). The
remaining 7 genes (USP9X, SF1, BCR, HEXA, TNFRSF25, GPI and OLFM1) were novel
candidates that might play critical parts in prostate cancer based in their observations. The
testis-specfic protein Y-encoded (TSPY) gene has also been linked to prostate cancer. It has
been demonstrated that its expression is preferentially in tumor cells in gonadoblastoma and
testicular germ cell tumors, suggesting an involvement in prostatic oncogenesis besides its
possible role in gonadoblastoma and testicular seminoma (Kido and Lau 2014; Lau et al.
2003). Last year, Forsberg et al. (2014) reported the loss of chromosome Y in peripheral
blood as being related to shorter survival and higher risk of cancer. They studied peripheral
blood DNA from 1,153 individuals ranging 70-83 years old, and discovered that the most
frequent somatic variant was LOY. When further studied men with high degree of LOY they
observed an increased risk of all-cause mortality as well as cancer-related mortality, and a
decreased time of survival of 5.5 years compared to controls.
The loss of chromosome Y in normal hematopoietic cells of elderly men was already
reported in 1963 (Jacobs et al. 1963), although its clinical consequences remained unclear.
Since then, few research has been reported on Y chromosome aneuploidy in ageing males. It
has been recently when several studies have shown that sex chromosome aneuploidy such as
LOY is common to normal aging and often occurs at high frequency in tumors (Jacobs et al.
2013; Veiga et al. 2012). Apart from age, another key factor related to LOY has been reported
to be smoking. A comprehensive study disclosed that current smokers had a significantly
increased level of LOY mosaicism, compared with nonsmokers and past smokers (Dumanski
et al. 2015). Whether LOY induced by smoking is directly related to cancer is unknown yet.
Moreover, some research has determined a higher risk of cancer outside the respiratory tract
in male than female smokers (Jha and Peto 2014). That, in addition with the fact that men
have a higher prevalence and mortality from sex-unspecific cancers, might lead to consider
LOY, being a male-specific smoking induced risk factor, as a link to understand these sex
differences.
GRAFT-VERSUS-HOST DISEASE (GVHD)

Graft-versus-host disease (GVHD) is a common complication following an allogenic
transplant of bone marrow or stem cells, where immune cells (T cells) in the graft recognize
the recipient (host) as foreign and attack the host’s body cells.
Minor histocompatibility antigens (mHAs) are known targets of donor T lymphocytes
after allogenic hematopoietic stem cell transplantation (HSCT). Amid this group are the male-
specific H-Y antigens encoded by genes on the Y chromosome. Although mHAs are also
encoded by autosomal genes, many are known to be H-Y-derived peptides presented by either
HLA class I or class II molecules (Miklos et al. 2005). The H-Y genes has high identity with
their homologues in the X-chromosome. While males develop tolerance to these antigens,
female T cells are able to recognize H-Y antigens following an allogenic transplantation into
male recipients (Wang et al. 1995). The effect of the H-Y antigen system has been
demonstrated in bone marrow transplants where it has been observed that male recipients of

female bone marrow donors have a higher rate of graft versus host disease, while female
recipients of male graft have a higher risk of graft rejection than male to male or female to
female transplants (Gahrton et al. 2005; Miklos et al. 2005). Several studies have searched the
association between the antibody response to H-Y antigens and GVHD, identifying this
response versus antigens such as DDX3Y, EIF1AY, RPS4Y1, TMSB4Y, USP9Y, UTY and
ZFY (Miklos et al. 2005; Torikai et al. 2004; Toubai et al. 2012; Vogt et al. 2000).
AUTISM AND NON-SYNDROMIC LANGUAGE DELAY

Males are at least four times more likely to develop autism than females (Skuse 2000).
Autism is a highly heritable disorder, but despite the well-recognized gender difference, male
predisposition to autistic disorder and the role of sex chromosomes remain unexplained.
The role of Y chromosome in this matter has been investigated due to structural
abnormalities and aneuploidy in reported cases of boys with pervasive developmental
disorders (PDDs) or autism (Blackman et al. 1991; Nicolson et al. 1998). Additionally, males
with a 47, XYY karyotype exhibit a high incidence of cognitive, language, and behavioral
deficits (Fryns et al. 1995).
Several Y-chromosome genes are expressed in the central nervous system (Table 1),
being candidates for the establishment of the sexual dimorphism of the human brain.
PCDH11 is a gene of the cadherin superfamily, present in both sex chromosomes, and
predominantly expressed in fetal brain and spinal cord. A case of a male child with non-
syndromic language delay was genetically investigated, founding a 220 Kb deletion in
Protocadherin11X/Y gene pair (Speevak and Farrell 2011). This observation led to
hypothesize that PCDH11X/Y genes play a part in language development in humans and
alterations in the copy number are possible means for communication disorders (Speevak and
Farrell 2011). The effect of other genes such as Y-linked transducin b-like 1 (TBL1Y) and
neuroligin 4 (NLGN4Y) in autism has also been investigated (Serajee and Mahbubul Huq
2009). Particularly, researchers have determined different haplotypes based on single
nucleotide polymorphism (SNPs) in these genes that were overrepresented or
underrepresented in autistic participants.
Although for a long time the human Y chromosome has been considered a wasteland,
now it has become clear that its importance goes beyond the role in sex determination and
reproduction. The implications of Y chromosome in male disease have been widely proved,
however the molecular mechanism of several disorders still remains unclear. Consequently,
further genetic research on the male chromosome should be conducted in order to help the
understanding, diagnosis and prognosis of gender associated diseases.
Y CHROMOSOME AND HUMAN EVOLUTIONARY HISTORY.

PEOPLING OF SOUTH AMERICA
The Y-linked loci in the MSY region are haploid, paternally inherited and escape cross-
recombination with the X chromosome. Due to these particular characteristics, variation is
translated in the accumulation of mutations along generations and paternal lineages. The field

of population genetics has broadly studied human male lineages to trace past migrations and
reconstruct the human evolutionary history (for review see Jobling and Tyler-Smith 2003).
Several studies have focused on the peopling of the American continent, but especially on
the Beringian hypothesis and the following dispersal of humans into North America.
Nevertheless, little has been investigated about the posterior migration to South America. So
far, most of the scientific community agrees that the initial settlement of the Americans was a
relatively fast process according to studies dating archaeological samples from Paleo-Indian
populations in North and South America. Particularly, archaeological evidences of pre-Clovis
cultures, such as Monte Verde in Chile has an estimated age of 14,220-13,980 YBP (Dillehay
et al. 2008), Manis site in North America dates back to 13,860-13,765 YBP (Waters et al.
2011) and the Oregon cave to ~12,450 YBP (Jenkins et al. 2012). Additionally, further South
American excavation sites have led to the idea that there were already humans spread all over
South America by 12,000 YBP (Salzano and Callegari-Jacques 1988; Silverman
and Isbell 2008).
Recent clues on the origin and spread of the first Native Americans have been provided
by the analysis of the entire mitochondrial genome (Bodner et al. 2012; de Saint Pierre et al.
2012; Kumar et al. 2011; Perego et al. 2012), although several studies have also focused on
the Y chromosome as an important parallel source of information.
South American natives are basically represented by a founder Y chromosome lineage
defined by Y-SNPs M3, typically classified as haplogroup Q1a3a (Bortolini et al. 2003;
Karafet et al. 2008). In fact, most chromosomes that belong to paragroup Q* in South
America are Q1a3*-M346 (Bisso-Machado et al. 2011). Haplogroup Q1a3a has a widely
distribution throughout South America, while strikingly its sub-lineages show a very
restricted distribution (Bisso-Machado et al. 2011).
Battaglia et al. (2013) recently deepened in the genetic history of Central and South
America using Y chromosome variation and investigating exclusively haplogroup Q. Their
analysis of 463 Y chromosomes revealed two main founding lineages, Q-M3 and Q-
L54(xM3), along with two novel M3 sister sub-clades Q-PV3, Q-PV4 and two M3 sub-
lineages Q-M557 and Q-PV2. These new M3 sub-lineages were found in the Andean region
(Peru) while M3 sister sub-clades were present in a very low frequency in Mexico. With these
findings, the authors supported the hypothesis of the arrival to Mesoamerica of the two Q
founding lineages, where Mexico would have been the recipient for the first wave migration.
Moreover, Q-M3 and Q-L54 lineages displayed different patterns of evolution in the Mexican
and Andean areas, reflecting a local differentiation after their arrival to Mexico and then,
along the Pacific coast, in the Andean region. These observations were previously reported by
Sandoval et al. (2012), who pointed out the idea that Mexico might have constituted an area
of transition in the diversification of paternal lineages during the colonization of the
Americas.
In the past few years, the largest population genetic study of South American natives to
date has been reported, also displaying interesting notions about South America demographic
history (Roewer et al. 2013). Contrary to the situation in Europe or Asia, genetic variation
among native South American males is not correlated to geography and language. This
decoupling would be the consequence of a rapid colonization of the wide sub-continent
followed by isolation and evolution in small groups of population.

Figure 2. Simplified phylogenetic tree of Y-chromosome haplogroup Q including newly described Q-

M3 sub-lineages. Underlined sub-lineage PV2 still has to be verified. Haplogroups downstream Q1a3*-
M346 are called by their mutation nomenclature. References are provided for Q-M3 sub-lineages.
In order to increase the phylogenetic resolution of the major haplogroup Q found in the
Americas, by detecting new SNPs, different analysis of large portions of the Y chromosome
have been performed. This way novel SNPs have been identified, establishing more sub-
lineages of Q-M3 (Figure 2). Apart from Q-M557 described by Battaglia et al. (2013), sub-
lineages Q-SA01, found in the Peruvian Quechua (Jota et al. 2011), and Q-MG2, detected
among the Kichwa from Ecuador (Geppert et al. 2015), were discovered. Sub-lineage Q-PV2
(Battaglia et al. 2013) still has to be verified since it was only present in one individual. All
these Q-M3 sub-lineages displayed a very restricted distribution, supporting the notion that
they are likely population/tribe/region-specific (Bisso-Machado et al. 2011).
The introduction of new SNPs in the Y chromosome Q phylogeny, such as Q-L53, Q-
L54, Q-CTS11969 (van Oven et al. 2014) (Figure 2), would slightly alter the haplogroup
nomenclature downstream Q-M346, and no consensus has yet accomplished. Therefore,
formerly haplogroup Q1a3a (Q-M3) (Karafet et al. 2008) would be Q1a3a1a and its sub-
lineages would start by Q1a3a1a1 (M19) (Battaglia et al. 2013). The increasing number of Y-
SNPs being discovered and ascertained force the haplogroup nomenclature to be in
continuous update. Consequently, further high-coverage Y chromosome population studies

are required in order to embrace and verify all Y chromosome variability and, eventually
reach a final consensus nomenclature.
Besides haplogroup Q, Native Americans also present a low frequency of C3b sub-clade
(Y-SNP P39) found only in North America (Karafet et al. 2008; Zegura et al. 2004). On the
other hand, most ancient lineage C3* has only been detected in Waorani and Kichwa
populations from Ecuador (Geppert et al. 2011) and, in a Tlingit individual from Southeast
Alaska (Schurr et al. 2012). Such lineage occurs at high frequency in Asia, but no evidences
of haplogroup C in Central America have been hitherto gathered (Sandoval et al. 2012;
Zegura et al. 2004).
The discovery of C3* in Ecuadorian Waorani and Kichwa from South America pointed
to three different sources (Mezzavilla et al. 2015). First, recent admixture with East Asian
populations could have been occurred. Another possible interpretation would be the entering
of C3* in America 15-20 kya as another founding lineage. In this scenario it could have been
lost by genetic drift in all populations except for Ecuador, where it is detected in low levels.
Finally, C3* could have been introduced directly into South America from East Asia through
a coastal route in an intermediate time. After whole genome analysis of further Waorani and
Kichwa, as well as Japan samples, authors excluded recent admixture and found no evidence
of direct genetic flow from Japan to Ecuador. Thus, C3* Y chromosomes in Ecuador might
represent an ancestral lineage present among the first Native American settlers that would
have disappeared by genetic drift elsewhere.
Since next generation sequencing (NGS) technologies became available, their application
for sequencing large parts of the MSY has been very useful to provide unbiased ascertainment
of SNPs and detailed phylogenies. During the last years, different studies have covered this
purpose, yielding thousands of high-confidence SNPs covering all known clades (Francalacci
et al. 2013; Hallast et al. 2015; Poznik et al. 2013; Scozzari et al. 2014; Wei et al. 2013). This
approach will benefit the Y chromosome phylogeny since it allows the direct assessment of
the times-to-most-recent-common-ancestor (TMRCAs) of nodes. Likewise, all this
information will provide new insights into human evolutionary history and worldwide
population relationships from the patrilineal point of view.
RECENT PROGRESS OF Y CHROMOSOME ANALYSIS

IN FORENSIC GENETICS
Regarding the progress of the Y chromosome analysis in the Forensic Genetics field, it
has been mainly directed to increase the discrimination power by developing new multiplex
genotyping systems with highly diverse genetic markers.
The Y chromosome markers typically used in forensic applications are short tandem
repeats (STRs). The analysis of Y-STRs is an excellent tool to detect the sperm contribution
in swabs collected from sexual assault cases (Honda et al. 1999). It is also valuable to identify
male lineages in missing persons and paternity cases (Jobling et al. 1997). For all these
purposes, the higher the discrimination power of the markers the better, since it means a
superior probability to differentiate among Y chromosome lineages. Moreover, it might be
even possible to differentiate among close relatives within the same patrilineal lineage using
the appropriate set of Y-STRs.

The sequencing of the complete euchromatic region of the Y chromosome (Skaletsky et

al. 2003), as well as the advent of NGS technologies, make more attainable the search for
good Y-STR candidates. A recent study by Ballantyne et al. (2010) identified 167 Y-STRs
previously unknown, 13 of which were proposed as rapidly mutating (RM) markers. They
analyzed these 13 RM Y-STRs in a sample set and evaluated their suitability for
distinguishing close and distantly related males. The RM Y-STR set was able to differentiated
70% of father-son pairs, 56% of brothers, and 67% of cousins. Additionally, relatives
separated by more than 11 generations were differentiable by one or more mutations with the
RM Y-STR set. These results are extremely higher than those obtained with other available
Y-STR kits. Thus, this set of 13 RM Y-STRs could shift forensic Y chromosome analysis
from former male lineage differentiation to future male individual identification (Ballantyne
et al. 2010; 2012) in some forensic applications.
Contrary, the use of RM Y-STRs should be avoided when studying male lineages in
missing person cases and paternity investigations as it would complicate the interpretation.
For that matter, a combination of Y-STR loci of lower mutation rates would be desirable.
Recently, a 23 Y-STR sytem was developed, combining 17 loci commonly included in other
commercially available kits (DYS389I, DYS448, DYS389II, DYS19, DYS391, DYS438,
DYS437, DYS635, DYS390, DYS439, DYS392, DYS393, DYS458, DYS385a/b, DYS456,
and YGATAH4) and six new loci (DYS481, DYS549, DYS533, DYS643, DYS576, and
DYS570) with only the last two being RM loci (Thompson et al. 2013).The first studies using
this new tool revealed a markedly increased haplotype diversity and discriminatory capacity
in comparison with other marker sets (Coble et al. 2013; Davis et al. 2013; Turrina et al.
2014), proving to be highly beneficial for male identification in forensic casework.
In fact, there has been a worldwide collaborative effort which has allowed the
comprehensive analysis of these 23 Y-STRs in nearly 20,000 Y chromosomes from 129
populations in 51 countries (Purps et al. 2014). This study reaffirmed the high discriminative
resolution of the 23 Y-STR marker set, since remarkably in almost one third of the
populations studied, all haplotypes were unique. On the other hand, the huge amount of
genetic information regarding these 23 Y-STRs accomplished with such a large population
study will be of great importance for forensic calculations.
CONCLUSION
The increasing knowledge of the Y chromosome genome and proteome has made
scientists take back the idea of the Y chromosome as a functional wasteland. In fact, it is
responsible for important biological roles such as sex determination and male fertility. Its
implications in different dysfunctions and cancer have also been proven, so understanding the
Y chromosome could help to improve males’ health. The use of the Y chromosome
variability to disentangle recent human evolution and tracing human migrations has been
widely exploited. In this sense, recent studies have focused on the South American native
populations since they still remain not completely understood. The advent of next generation
sequencing technologies has been a great goal for Y chromosome anthropological
investigations.

The identification of novel Y-SNPs by these means has provided more resolution into the
Y chromosome phylogeny revealing new sub-lineages. Finally, massive Y chromosome
genetic studies have also ascertained new Y-STR loci which could revolutionize the field of
Y chromosome forensic analysis by making possible the differentiation of closely related
males from the same lineage.
REFERENCES
Arredi B, Ferlin A, Speltra E, Bedin C, Zuccarello D, Ganz F, Marchina E, Stuppia L, Krausz
C, and Foresta C. (2007). Y-chromosome haplogroups and susceptibility to azoospermia
factor c microdeletion in an Italian population. Journal of medical genetics 44(3):205-
208.
Bachtrog D. (2013). Y-chromosome evolution: emerging insights into processes of Y-
chromosome degeneration. Nature reviews Genetics 14(2):113-124.
Ballantyne KN, Goedbloed M, Fang R, Schaap O, Lao O, Wollstein A, Choi Y, van Duijn K,
Vermeulen M, Brauer S et al. (2010). Mutability of Y-chromosomal microsatellites:
rates, characteristics, molecular bases, and forensic implications. American journal of
human genetics 87(3):341-353.
Ballantyne KN, Keerl V, Wollstein A, Choi Y, Zuniga SB, Ralf A, Vermeulen M, de Knijff
P, and Kayser M. (2012). A new future of forensic Y-chromosome analysis: rapidly
mutating Y-STRs for differentiating male relatives and paternal lineages. Forensic
science international Genetics 6(2):208-218.
Battaglia V, Grugni V, Perego UA, Angerhofer N, Gomez-Palmieri JE, Woodward SR,
Achilli A, Myres N, Torroni A, and Semino O. (2013). The first peopling of South
America: new evidence from Y-chromosome haplogroup Q. PloS one 8(8):e71390.
Bianchi NO, Richard SM, and Pavicic W. (2006). Y chromosome instability in testicular
cancer. Mutation research 612(3):172-188.
Bisso-Machado R, Jota MS, Ramallo V, Paixao-Cortes VR, Lacerda DR, Salzano FM,
Bonatto SL, Santos FR, and Bortolini MC. (2011). Distribution of Y-chromosome Q
lineages in Native Americans. American journal of human biology : the official journal of
the Human Biology Council 23(4):563-566.
Blackman JA, Selzer SC, Patil S, and Van Dyke DC. (1991). Autistic disorder associated with
an iso-dicentric Y chromosome. Developmental medicine and child neurology 33(2):162-
166.
Bodner M, Perego UA, Huber G, Fendt L, Rock AW, Zimmermann B, Olivieri A, Gomez-
Carballa A, Lancioni H, Angerhofer N et al. (2012). Rapid coastal spread of First
Americans: novel insights from South America's Southern Cone mitochondrial genomes.
Genome research 22(5):811-820.
Bortolini MC, Salzano FM, Thomas MG, Stuart S, Nasanen SP, Bau CH, Hutz MH, Layrisse
Z, Petzl-Erler ML, Tsuneto LT et al. (2003). Y-chromosome evidence for differing
ancient demographic histories in the Americas. American journal of human genetics
73(3):524-539.
Carvalho CM, Fujisawa M, Shirakawa T, Gotoh A, Kamidono S, Freitas Paulo T, Santos SE,
Rocha J, Pena SD, and Santos FR. (2003). Lack of association between Y chromosome

haplogroups and male infertility in Japanese men. American journal of medical genetics
Part A 116A(2):152-158.
Coble MD, Hill CR, and Butler JM. (2013). Haplotype data for 23 Y-chromosome markers in
four U.S. population groups. Forensic science international Genetics 7(3):e66-68.
Davis C, Ge J, Sprecher C, Chidambaram A, Thompson J, Ewing M, Fulmer P, Rabbach D,
Storts D, and Budowle B. (2013). Prototype PowerPlex(R) Y23 System: A concordance
study. Forensic science international Genetics 7(1):204-208.
de Saint Pierre M, Bravi CM, Motti JM, Fuku N, Tanaka M, Llop E, Bonatto SL, and Moraga
M. (2012). An alternative model for the early peopling of southern South America
revealed by analyses of three mitochondrial DNA haplogroups. PloS one 7(9):e43486.
Dillehay TD, Ramirez C, Pino M, Collins MB, Rossen J, and Pino-Navarro JD. (2008).
Monte Verde: seaweed, food, medicine, and the peopling of South America. Science
320(5877):784-786.
Dumanski JP, Rasi C, Lonn M, Davies H, Ingelsson M, Giedraitis V, Lannfelt L, Magnusson
PK, Lindgren CM, Morris AP et al. (2015). Mutagenesis. Smoking is associated with
mosaic loss of chromosome Y. Science 347(6217):81-83.
Ferlin A, Speltra E, Patassini C, Pati MA, Garolla A, Caretta N, and Foresta C. (2010). Heat
shock protein and heat shock factor expression in sperm: relation to oligozoospermia and
varicocele. The Journal of urology 183(3):1248-1252.
Foresta C, Moro E, and Ferlin A. (2001). Y chromosome microdeletions and alterations of
spermatogenesis. Endocrine reviews 22(2):226-239.
Forsberg LA, Rasi C, Malmqvist N, Davies H, Pasupulati S, Pakalapati G, Sandgren J, Diaz
de Stahl T, Zaghlool A, Giedraitis V et al. (2014). Mosaic loss of chromosome Y in
peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet.
46(6):624-628.
Francalacci P, Morelli L, Angius A, Berutti R, Reinier F, Atzeni R, Pilu R, Busonero F,
Maschio A, Zara I et al. (2013). Low-pass DNA sequencing of 1200 Sardinians
reconstructs European Y-chromosome phylogeny. Science 341(6145):565-569.
Fryns JP, Kleczkowska A, Kubien E, and Van den Berghe H. (1995). XYY syndrome and
other Y chromosome polysomies. Mental status and psychosocial functioning. Genetic
counseling 6(3):197-206.
Gahrton G, Iacobelli S, Apperley J, Bandini G, Bjorkstrand B, Blade J, Boiron JM, Cavo M,
Cornelissen J, Corradini P et al. (2005). The impact of donor gender on outcome of
allogeneic hematopoietic stem cell transplantation for multiple myeloma: reduced relapse
risk in female to male transplants. Bone marrow transplantation 35(6):609-617.
Geppert M, Ayub Q, Xue Y, Santos S, Ribeiro-dos-Santos A, Baeta M, Nunez C, Martinez-
Jarreta B, Tyler-Smith C, and Roewer L. (2015). Identification of new SNPs in native
South American populations by resequencing the Y chromosome. Forensic science
international Genetics 15:111-114.
Geppert M, Baeta M, Nunez C, Martinez-Jarreta B, Zweynert S, Cruz OW, Gonzalez-
Andrade F, Gonzalez-Solorzano J, Nagy M, and Roewer L. (2011). Hierarchical Y-SNP
assay to study the hidden diversity and phylogenetic relationship of native populations in
South America. Forensic science international Genetics 5(2):100-104.
Hallast P, Batini C, Zadik D, Maisano Delser P, Wetton JH, Arroyo-Pardo E, Cavalleri GL,
de Knijff P, Destro Bisol G, Dupuy BM et al. (2015). The Y-Chromosome Tree Bursts

into Leaf: 13,000 High-Confidence SNPs Covering the Majority of Known Clades.
Molecular biology and evolution 32(3):661-673.
Honda K, Roewer L, and de Knijff P. (1999). Male DNA typing from 25-year-old vaginal
swabs using Y chromosomal STR polymorphisms in a retrial request case. Journal of
forensic sciences 44(4):868-872.
Jacobs PA, Brunton M, Court Brown WM, Doll R, and Goldstein H. (1963). Change of
human chromosome count distribution with age: evidence for a sex differences. Nature
197:1080-1081.
Jacobs PA, Maloney V, Cooke R, Crolla JA, Ashworth A, and Swerdlow AJ. (2013). Male
breast cancer, age and sex chromosome aneuploidy. British journal of cancer 108(4):959-
963.
Jangravi Z, Alikhani M, Arefnezhad B, Sharifi Tabar M, Taleahmad S, Karamzadeh R,
Jadaliha M, Mousavi SA, Ahmadi Rastegar D, Parsamatin P et al. (2013). A fresh look at
the male-specific region of the human Y chromosome. Journal of proteome research
12(1):6-22.
Jenkins DL, Davis LG, Stafford TW, Jr., Campos PF, Hockett B, Jones GT, Cummings LS,
Yost C, Connolly TJ, Yohe RM, 2nd et al. (2012). Clovis age Western Stemmed
projectile points and human coprolites at the Paisley Caves. Science 337(6091):223-228.
Jha P, and Peto R. (2014). Global effects of smoking, of quitting, and of taxing tobacco. The
New England journal of medicine 370(1):60-68.
Jobling MA, Pandya A, and Tyler-Smith C. (1997). The Y chromosome in forensic analysis
and paternity testing. International journal of legal medicine 110(3):118-124.
Jobling MA, and Tyler-Smith C. (2003). The human Y chromosome: an evolutionary marker
comes of age. Nature reviews Genetics 4(8):598-612.
Jota MS, Lacerda DR, Sandoval JR, Vieira PP, Santos-Lopes SS, Bisso-Machado R, Paixao-
Cortes VR, Revollo S, Paz YMC, Fujita R et al. (2011). A new subhaplogroup of native
American Y-Chromosomes from the Andes. American journal of physical anthropology
146(4):553-559.
Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, and Hammer MF.
(2008). New binary polymorphisms reshape and increase resolution of the human Y
chromosomal haplogroup tree. Genome research 18(5):830-838.
Khosravi P, Zahiri J, Gazestani VH, Mirkhalaf S, Akbarzadeh M, Sadeghi M, and Goliaei B.
(2014). Analysis of candidate genes has proposed the role of y chromosome in human
prostate cancer. Iranian journal of cancer prevention 7(4):204-211.
Kido T, and Lau YF. (2014). The Y-located gonadoblastoma gene TSPY amplifies its own
expression through a positive feedback loop in prostate cancer cells. Biochemical and
biophysical research communications 446(1):206-211.
Kleiman SE, Yogev L, Hauser R, Botchan A, Maymon BB, Paz G, and Yavetz H. (2007).
Expression profile of AZF genes in testicular biopsies of azoospermic men. Human
reproduction 22(1):151-158.
Krausz C, Forti G, and McElreavey K. (2003). The Y chromosome and male fertility and
infertility. International journal of andrology 26(2):70-75.
Krausz C, and McElreavey K. (1999). Y chromosome and male infertility. Frontiers in
bioscience : a journal and virtual library 4:E1-8.

Kumar S, Bellis C, Zlojutro M, Melton PE, Blangero J, and Curran JE. (2011). Large scale
mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native
American origins. BMC evolutionary biology 11:293.
Kuo PL, Wang ST, Lin YM, Lin YH, Teng YN, and Hsu CC. (2004). Expression profiles of
the DAZ gene family in human testis with and without spermatogenic failure. Fertility
and sterility 81(4):1034-1040.
Lahn BT, and Page DC. (1997). Functional coherence of the human Y chromosome. Science
278(5338):675-680.
Lardone MC, Parodi DA, Valdevenito R, Ebensperger M, Piottante A, Madariaga M, Smith
R, Pommer R, Zambrano N, and Castro A. (2007). Quantification of DDX3Y, RBMY1,
DAZ and TSPY mRNAs in testes of patients with severe impairment of spermatogenesis.
Molecular human reproduction 13(10):705-712.
Lau YF, Lau HW, and Komuves LG. (2003). Expression pattern of a gonadoblastoma
candidate gene suggests a role of the Y chromosome in prostate cancer. Cytogenetic and
genome research 101(3-4):250-260.
Lau YF, and Zhang J. (2000). Expression analysis of thirty one Y chromosome genes in
human prostate cancer. Molecular carcinogenesis 27(4):308-321.
Lavery R, Glennon M, Houghton J, Nolan A, Egan D, and Maher M. (2007). Investigation of
DAZ and RBMY1 gene expression in human testis by quantitative real-time PCR.
Archives of andrology 53(2):71-73.
Lu C, Wang Y, Zhang F, Lu F, Xu M, Qin Y, Wu W, Li S, Song L, Yang S et al. (2013).
DAZ duplications confer the predisposition of Y chromosome haplogroup K* to non-
obstructive azoospermia in Han Chinese populations. Human reproduction 28(9):2440-
2449.
Matzuk MM, and Lamb DJ. (2008). The biology of infertility: research advances and clinical
challenges. Nature medicine 14(11):1197-1213.
Mezzavilla M, Geppert M, Tyler-Smith C, Roewer L, and Xue Y. (2015). Insights into the
origin of rare haplogroup C3* Y chromosomes in South America from high-density
autosomal SNP genotyping. Forensic science international Genetics 15:115-120.
Miklos DB, Kim HT, Miller KH, Guo L, Zorn E, Lee SJ, Hochberg EP, Wu CJ, Alyea EP,
Cutler C et al. . (2005). Antibody responses to H-Y minor histocompatibility antigens
correlate with chronic graft-versus-host disease and disease remission. Blood
105(7):2973-2978.
Nicolson R, Bhalerao S, and Sloman L. (1998). 47,XYY karyotypes and pervasive
developmental disorders. Canadian journal of psychiatry Revue canadienne de
psychiatrie 43(6):619-622.
Page DC, Harper ME, Love J, and Botstein D. (1984). Occurrence of a transposition from the
X-chromosome long arm to the Y-chromosome short arm during human evolution.
Nature 311(5982):119-123.
Paracchini S, Stuppia L, Gatta V, Palka G, Moro E, Foresta C, Mengua L, Oliva R, Ballesca
JL, Kremer JA et al. (2000). Y-chromosomal DNA haplotypes in infertile European
males carrying Y-microdeletions. Journal of endocrinological investigation 23(10):671-
676.
Perego UA, Lancioni H, Tribaldos M, Angerhofer N, Ekins JE, Olivieri A, Woodward SR,
Pascale JM, Cooke R, Motta J et al. (2012). Decrypting the mitochondrial gene pool of
modern Panamanians. PloS one 7(6):e38337.

Perinchery G, Sasaki M, Angan A, Kumar V, Carroll P, and Dahiya R. (2000). Deletion of Y-

chromosome specific genes in human prostate cancer. The Journal of urology
163(4):1339-1342.
Poznik GD, Henn BM, Yee MC, Sliwerska E, Euskirchen GM, Lin AA, Snyder M, Quintana-
Murci L, Kidd JM, Underhill PA et al. (2013). Sequencing Y chromosomes resolves
discrepancy in time to common ancestor of males versus females. Science
341(6145):562-565.
Purps J, Siegert S, Willuweit S, Nagy M, Alves C, Salazar R, Angustia SM, Santos LH,
Anslinger K, Bayer B et al. (2014). A global analysis of Y-chromosomal haplotype
diversity for 23 STR loci. Forensic science international Genetics 12:12-23.
Quintana-Murci L, Krausz C, Heyer E, Gromoll J, Seifer I, Barton DE, Barrett T, Skakkebaek
NE, Rajpert-De Meyts E, Mitchell M et al. (2001). The relationship between Y
chromosome DNA haplotypes and Y chromosome deletions leading to male infertility.
Human genetics 108(1):55-58.
Roewer L, Nothnagel M, Gusmao L, Gomes V, Gonzalez M, Corach D, Sala A, Alechine E,
Palha T, Santos N et al. (2013). Continent-wide decoupling of Y-chromosomal genetic
variation from language and geography in native South Americans. PLoS Genet
9(4):e1003460.
Salzano FM, and Callegari-Jacques SM. 1988. South American Indians: A case study in
evolution. New York: Oxford University Press.
Sandberg AA. (1992). Chromosomal abnormalities and related events in prostate cancer.
Human pathology 23(4):368-380.
Sandoval K, Moreno-Estrada A, Mendizabal I, Underhill PA, Lopez-Valenzuela M,
Penaloza-Espinosa R, Lopez-Lopez M, Buentello-Malo L, Avelino H, Calafell F et al.
(2012). Y-chromosome diversity in Native Mexicans reveals continental transition of
genetic structure in the Americas. American journal of physical anthropology
148(3):395-405.
Sato Y, Yoshida K, Shinka T, Nozawa S, Nakahori Y, and Iwamoto T. (2006). Altered
expression pattern of heat shock transcription factor, Y chromosome (HSFY) may be
related to altered differentiation of spermatogenic cells in testes with deteriorated
spermatogenesis. Fertility and sterility 86(3):612-618.
Schurr TG, Dulik MC, Owings AC, Zhadanov SI, Gaieski JB, Vilar MG, Ramos J, Moss MB,
Natkong F, and Genographic C. (2012). Clan, language, and migration history has shaped
genetic diversity in Haida and Tlingit populations from Southeast Alaska. American
journal of physical anthropology 148(3):422-435.
Scozzari R, Massaia A, Trombetta B, Bellusci G, Myres NM, Novelletto A, and Cruciani F.
(2014). An unbiased resource of novel SNP markers provides a new chronology for the
human Y chromosome and reveals a deep phylogenetic structure in Africa. Genome
research 24(3):535-544.
Serajee FJ, and Mahbubul Huq AH. (2009). Association of Y chromosome haplotypes with
autism. Journal of child neurology 24(10):1258-1261.
Siegel R, Naishadham D, and Jemal A. (2013). Cancer statistics, 2013. CA: a cancer journal
for clinicians 63(1):11-30.
Silverman H, and Isbell W. 2008. The Handbook of Soutn American Archaeology. New
York: Springer.

Sinclair AH, Berta P, Palmer MS, Hawkins JR, Griffiths BL, Smith MJ, Foster JW, Frischauf
AM, Lovell-Badge R, and Goodfellow PN. (1990). A gene from the human sex-
determining region encodes a protein with homology to a conserved DNA-binding motif.
Nature 346(6281):240-244.
Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S,
Pyntikova T, Ali J, Bieri T et al. (2003). The male-specific region of the human Y
chromosome is a mosaic of discrete sequence classes. Nature 423(6942):825-837.
Skuse DH. (2000). Imprinting, the X-chromosome, and the male brain: explaining sex
differences in the liability to autism. Pediatric research 47(1):9-16.
Song NH, Yin CJ, Zhang W, Zhuo ZM, Ding GX, Zhang J, Hua LX, and Wu HF. (2007).
AZF gene expression analysis in peripheral leukocytes and testicular cells from
idiopathic infertility. Archives of andrology 53(6):317-324.
Speevak MD, and Farrell SA. (2011). Non-syndromic language delay in a child with
disruption in the Protocadherin11X/Y gene pair. American journal of medical genetics
Part B, Neuropsychiatric genetics : the official publication of the International Society of
Psychiatric Genetics 156B(4):484-489.
Thompson JM, Ewing MM, Frank WE, Pogemiller JJ, Nolde CA, Koehler DJ, Shaffer AM,
Rabbach DR, Fulmer PM, Sprecher CJ et al. (2013). Developmental validation of the
PowerPlex(R) Y23 System: a single multiplex Y-STR analysis system for casework and
database samples. Forensic science international Genetics 7(2):240-250.
Torikai H, Akatsuka Y, Miyazaki M, Warren EH, 3rd, Oba T, Tsujimura K, Motoyoshi K,
Morishima Y, Kodera Y, Kuzushima K et al. (2004). A novel HLA-A*3303-restricted
minor histocompatibility antigen encoded by an unconventional open reading frame of
human TMSB4Y gene. Journal of immunology 173(11):7046-7054.
Toubai T, Tawara I, Sun Y, Liu C, Nieves E, Evers R, Friedman T, Korngold R, and Reddy
P. (2012). Induction of acute GVHD by sex-mismatched H-Y antigens in the absence of
functional radiosensitive host hematopoietic-derived antigen-presenting cells. Blood
119(16):3844-3853.
Tse JY, Wong EY, Cheung AN, O WS, Tam PC, and Yeung WS. (2003). Specific expression
of VCY2 in human male germ cells and its involvement in the pathogenesis of male
infertility. Biology of reproduction 69(3):746-751.
Turrina S, Caratti S, Ferrian M, and De Leo D. (2014). Haplotype data and mutation rates for
the 23 Y-STR loci of PowerPlex(R) Y 23 System in a Northeast Italian population
sample. International journal of legal medicine.
van Oven M, Van Geystelen A, Kayser M, Decorte R, and Larmuseau MH. (2014). Seeing
the wood for the trees: a minimal reference phylogeny for the human Y chromosome.
Human mutation 35(2):187-191.
Veiga LC, Bergamo NA, Reis PP, Kowalski LP, and Rogatto SR. (2012). Loss of Y-
chromosome does not correlate with age at onset of head and neck carcinoma: a case-
control study. Brazilian journal of medical and biological research = Revista brasileira de
pesquisas medicas e biologicas / Sociedade Brasileira de Biofisica [et al] 45(2):172-178.
Vogt MH, de Paus RA, Voogt PJ, Willemze R, and Falkenburg JH. (2000). DFFRY codes for
a new human male-specific minor transplantation antigen involved in bone marrow graft
rejection. Blood 95(3):1100-1105.

Vogt PH. (2005). Azoospermia factor (AZF) in Yq11: towards a molecular understanding of
its function for human male fertility and spermatogenesis. Reproductive biomedicine
online 10(1):81-93.
Wang W, Meadows LR, den Haan JM, Sherman NE, Chen Y, Blokland E, Shabanowitz J,
Agulnik AI, Hendrickson RC, Bishop CE et al. (1995). Human H-Y: a male-specific
histocompatibility antigen derived from the SMCY protein. Science 269(5230):1588-
1590.
Waters MR, Stafford TW, Jr., McDonald HG, Gustafson C, Rasmussen M, Cappellini E,
Olsen JV, Szklarczyk D, Jensen LJ, Gilbert MT et al. (2011). Pre-Clovis mastodon
hunting 13,800 years ago at the Manis site, Washington. Science 334(6054):351-353.
Wei W, Ayub Q, Chen Y, McCarthy S, Hou Y, Carbone I, Xue Y, and Tyler-Smith C.
(2013). A calibrated human Y-chromosomal phylogeny based on resequencing. Genome
research 23(2):388-395.
Zegura SL, Karafet TM, Zhivotovsky LA, and Hammer MF. (2004). High-resolution SNPs
and microsatellite haplotypes point to a single, recent entry of Native American Y
chromosomes into the Americas. Molecular biology and evolution 21(1):164-175.

Chapter 2
MALE INFERTILITY ASSOCIATED WITH TTTY GENE

FAMILY DELETIONS IN THE Y CHROMOSOME
Yapijakis Christos* and Papadimitriou Katerina
ABSTRACT
Infertility is a significant problem worldwide affecting about 10-15% of couples.
Male factors account for about 50% of these cases. It is well known that deletions of
human Y-chromosomal regions are responsible for disorders in spermatogenesis. These
regions are known as the azoospermia factor (AZF) region, which is comprised by at
least three genetic domains in the long arm of Y chromosome, named as AZFa, AZFb,
and AZFc. Within the AZF region there are well-studied genes like DAZ (Deleted in
azoospermia), as well as some members of TTTY gene family, which are known to be
transcribed in the testis, and linked to sperm quality. Several studies have shown that
some members of this large gene family within the AFZ region, including TTTY2L2A,
TTTY3 and TTTY4 genes, have also been associated to both deletions and problematic
spermatogenesis. Recently, deletions in two genes of the TTTY2 multicopy Y-linked gene
family, TTTY2L12A and TTTY2L2A, have been reported in the same patients with
oligozoospermia and azoospermia. Since the former gene is located in the short arm of Y
chromosome (Yp) and the latter in its long arm (Yq) these observations suggest possible
non-homologous recombination events. TTTY2L12A and TTTY2L2A genes in addition to
TTTY1, TTTY2, TTTY6, TTTY7, TTTY8, TTTY18, TTTY19, TTTY21 and TTTY22 genes,
located in both Y chromosome arms, are transcribed in non-coding RNAs (ncRNAs).
Several studies have demonstrated that ncRNAs are essential for the normal production
of spermatozoa. Possible mechanisms of function of long ncRNAs, as in the case of
TTTY2L12A and TTTY2L2A gene transcripts, are discussed.
*
Corresponding author: Prof. Christos Yapijakis, DMD, MS, PhD, 1st Department of Neurology, University of
Athens Medical School, Eginition Hospital, Vas. Sofias 74, Athens 11528, Greece. Email:
cyapijakis_ua_gr@yahoo.com.

22 Yapijakis Christos and Papadimitriou Katerina
1. MALE INFERTILITY ASSOCIATED WITH Y

CHROMOSOME DELETIONS
1.1. Causes of Male Infertility
About 15% of couples in developed countries are not able to conceive after multiple
attempts within one year and seek treatment for infertility (Ferlin et al., 2007; O’Flynn
O’Brien et al., 2010; Jungwirth et al., 2013). In approximately 50% of these cases, the male
partner has qualitative or quantitative abnormalities of sperm production (Jungwirth et al.,
2013).
A variety of factors, both environmental and genetic ones, has been ascertained to play a
significant role in male infertility. A high association between duration of exposure to
dibromochloropropane (DBCP) and decline in sperm count, leading to oligo-and/or
azoospermia was the first reported factor (Whorton et al., 1977). Other environmental risk
factors include high temperature working places, noise related to manufacturing, exposure to
radiation, chemical substances and electromagnetic waves (EMW) (Miyamoto et al., 2012). It
has been, also, suggested that EMW emitted from cellular phones may adversely alter the
sperm parameters like sperm count, motility, viability and morphology or leading to oxidative
stress which negatively change motility and vitality and DNA fragmentation in spermatozoa
(Agarwal et al., 2008; De Luliis et al., 2009). Other proposed risk factors include air
pollution, long automobile driving per day, stress, alcoholism, mumps, while there is good
evidence that obesity is associated with lower sperm concentrations and higher index of DNA
fragmentation (Hammoud et al., 2008; Miyamoto et al., 2012).
Genetic causes such as chromosomal abnormalities, Y chromosome deletions and gene
mutations represent about 15%-30% in males encountering infertility problems (Ferlin et al.,
2007; O’Flynn O’Brien et al., 2010). Genetic abnormalities appear to affect essential
physiological processes like hormonal homeostasis and spermatogenesis (Ferlin et al. 2007;
O’Flynn O’Brien et al., 2010). The most common chromosome abnormality in infertile men
is Klinefelter syndrome (KS), which is a XXY aneuploidy (Ferlin et al., 2007; O’Flynn
O’ Brien et al., 2010; Jungwith et al., 2013). KS is prominent in oligospermic and
azoospermic men with a percentage of 5% and 10% respectively. Another chromosomal
abnormality involves Robertsonian translocations occurring in 1 out of 1000 male subjects
and found in about 2% of infertile men (O’Flynn O’Brien et al., 2010). Y chromosome
microdeletions comprise the most frequent source of genetic abnormalities in infertile men
and are detected mostly in severely oligozoospermic and azoospermic men. Particularly,
deletions in azoospermia factor (AZF) regions in the long arm of Y chromosome (Yq11) are
responsible for impairment spermatogenesis leading to oligozoospermia or azoospermia.
Furthermore, several autosomal genes seem to be essential to normal spermatogenesis,
including cystic fibrosis transmembrane conductance regulator (CFTR) gene, sex hormone-
binding globulin (SHBG) gene and leucine-rich repeat–containing G–protein coupled receptor
8 (LGR8) gene (Ferlin et al., 2007; O’Flynn O’Brien et al., 2010).

Male Infertility Associated with TTTY Gene Family Deletions … 23
1.2. Deletions in Azoospermia Factor (AZF) Regions of Y Chromosome
Human Y chromosome is much smaller than X chromosome, and is consisted of two

regions (Figure 1): a) the pseudoautosomal region (PAR) at the ends of its arms, which
recombines with X chromosome and undergoes meiotic division, and b) the non-recombinant
(NRY) or male specific (MSY) region which differentiates the sexes (Marshall, 2000; Stuppia
et al., 2000).
The latter represents 95% of Y chromosome’s length containing genes important not only
for sex determination, but also for spermatogenesis and therefore for male infertility
(Marshall, 2000; Skaletsky et al., 2003).
Tiepolo and Zuffardi (1976) were the first to note the relation between a deletion in Yq11
and azoospermic men, suggesting that factors on the long arm of Y chromosome may be
associated to spermatogenesis. Since then, deletions of AZF region in Yq11 have been
frequently associated with failure of spermatogenesis, since they are observed in 8-12% of
azoospermic and in 3-7% of oligozoospermic men (Vicdan et al., 2004; Ferlin et al., 2005;
Vogt et al., 2008; O’Flynn O'Brien et al., 2010; Jungwirth et al. 2013). Moreover, they are
likely to be related with Y chromosomal instability (Siffroi et al., 2000).
The AZF region in Yq11 encompasses a variety of genes involved in spermatogenesis. In
particular, there are three non-overlapping domains of AZF known as AZFa, AZFb, and
AZFc, in which microdeletions have been identified in infertile men (Vogt et al., 2008;
O’Flynn O' Brien et al., 2010; Navarro-Costa et al., 2010). The highest frequency of Yq
microdeletions involves AZFc region (65-70%), followed by AZFb, AZFb+c or AZFa+b+c
(25-30%), while deletions in AZFa only comprise 5% (Junwirth et al., 2013).
AZFa deletions. AZFa contains two main genes termed as USP9Y and DDX3Y (Huynh et
al., 2002; Vogt et al., 2008; Navarro-Costa et al.; 2010 O’Flynn O’Brien et al., 2010; Vineeth
et al., 2011). Total removal of this genetic domain is correlated to severe testicular pathology
of Sertoli cell only syndrome with no observed somatic phenotype (Vogt et al., 2005;
Navarro-Costa et al., 2010; O’Flynn O’Brien et al., 2010; Junwirth et al., 2013). USP9Y is
implicated in spermatogenesis, because mutations in this gene leads to oligozoospermia,
oligoasthenozoospermia or azoospermia, while removal of DDX3Y is thought to be involved
in the AZFa deletion phenotype (Huynh et al., 2002; Navarro-Costa et. al, 2010; O’Flynn
O’Brien et al., 2010; Vineeth et al., 2011).
AZFb deletions. The AZFb region contains both single copy (CYorf15, RPS4Y, EIF1AY,
KDM5D) and multicopy genes (XKRY, HSFY, PRY, RBMY1A1) (Navarro-Costa et.al, 2010).
Total deletion of this genetic domain is correlated to spermatogenic arrest (Vogt et al., 2005;
Navarro-Costa et al., 2010; O’Flynn O’Brien et al., 2010; Junwirth et al., 2013). Partial
deletions of genes in AZFb are implicated in male infertility due to non-obstructive severe
oligospermia and azoospermia (Ferlin et al., 2003). PRY and RBMY1 multicopy genes seem to
be the most important, since in case they are both removed spermatogenesis is arrested
(O’Flynn O’Brien et al., 2010).

Figure 1. Schematic representation of the human Y chromosome. The loci of the two genes
(TTTY2L2A, TTTY2L12A) of the TTTY2 gene subfamily that are involved in spermatogenesis are
shown, as well as other genes of the three AZF regions. PAR 1, PAR 2: pseudoautosomal regions 1 & 2
at the telomeres of Y chromosome; NRY/MSY: non recombinant (NRY) or male specific (MSY)
region; AZFa, AZFb, AZFc: Azoospermia factor regions a, b, and c.
AZFc deletions. The AZFc region includes protein-coding gene families (BRY2, DAZ,
CDY, TSPY) and non-coding transcription units of the TTTY gene family (TTTY2, TTTY3,
TTTY4, TTTY7, TTTY17, TTTY18) (Makrinou et al., 2001; Skaletsky et al., 2003; Ferlin et al.,
2005; Vogt et al., 2008; Navarro-Costa et al., 2010; Singh et al., 2011). Complete removal of
the AZFc region leads to different phenotypes ranging from oligozoospermia to azoospermia
(Vogt et al., 2005; Navarro-Costa et al., 2010; O’Flynn O’Brien et al., 2010; Junwirth et al.,
2013). AZFc deletions are the most common in cases of severe male infertility (Ferlin et al.,
2005; Navarro-Costa et al., 2010; O'Flynn O'Brien et al., 2010; Vineeth et al., 2011; Junwirth
et al., 2013). Several studies have indicated that deletions in the deleted in azoospermia gene
family (DAZ) are the most important of this region. DAZ genes have been associated with
formation of mature spermatozoa and their deletions result in a variety of phenotypes
including moderate or severe oligozoospermia, as well as complete azoospermia (Ma et al.,
2000; Fernandes et al., 2002; Vicdan et al., 2004; Ferlin et al., 2005; Aarabi et al., 2009;
Navarro-Costa et al., 2010; Vineeth et al., 2011).

2. THE TESTIS TRANSCRIPT Y LINKED (TTTY) GENE FAMILY AND

MALE INFERTILITY
2.1. The TTTY Gene Family
The testis transcript Y linked (TTTY) is a large gene family of the Y chromosome
expressed in testis and therefore probably associated with spermatogenesis. Until now little is
known about this poorly studied gene family. Lahn and Page (1997) first identified two
members of this family (TTTY1 and TTTY2 genes) which were strongly expressed in testis.
Both genes consisted of repetitive DNA sequences which were transcribed but not translated
in proteins (Lahn et al., 1997).
Four additional members of this family (TTTY3, TTTY4, TTTY5 and TTTY6), which were
also spliced and noncoding transcripts were identified when the total nucleotide sequence of
the AZFc domain was analyzed (Kuroda-Kawaguchi et al. 2001). Two other gene family
members (TTTY2L2A and TTTY2L12A) located on the long and short arm of the
Y chromosome respectively were subsequently identified (Makrinou et al., 2001).
Skaletsky et al. (2003) specified more accurately the location and the identity of the
members of the TTTY gene family on Y chromosome. They indicated that the TTTY family
belongs to one of the three classes of the euchromatic sequences of the male specific (MSY)
or non recombining region (NRY) of the Y chromosome, called the ampliconic region. This
region encompasses an assortment of tandem arrays of which the most discrete are the no
long open reading frame (NORF) clusters containing 622kb on both the short and the long
arm of Y chromosome. NORF arrays are comprised of a great variety of spliced and non-
coding transcription units as TTTY1, TTTY2, TTTY6, TTTY7, TTTY8, TTTY18, TTTY21 and
TTTY22 gene families, all expressed totally or partially in testis (Skaletsky et al., 2003). Some
members of the TTTY family (TTTY3, TTTY4, and TTTY2 subfamily members) are located
within the AZFc domain on Yq (Makrinou et al., 2001; Skaletsky et al., 2003; Navarro-Costa
et al., 2010) (Figure 1). Apart from humans, NORF transcripts have been found to be fully or
partially expressed also in cat testis (Murphy et al., 2006).
The epigenetic profile of a great variety of genes in the euchromatic region of
Y chromosome was scanned in order to study the functional status of its genes (Singh et al.
2011). The members of the TTTY gene family presented a range of histone modifications.
Notably, the majority of these genes, including TTTY1, TTTY2, TTTY5, TTTY6B, TTTY7,
TTTY8, TTTY13, TTTY16, TTTY17A, TTTY17B, TTTY20, and TTTY21 presented a low level
of H3K9 methylation, while TTTY3, TTTY22, and TTTY23 were slightly more enriched in
H3K9 methylation (Singh et al., 2011). Since histone methylation is usually associated with
gene silencing, it could be suggested that these genes are usually silenced or they are perhaps
involved in the regulation of others neighboring or more distant genes’ suppression (Rajender
et al., 2011). Four genes (TTTY8, TTTY11, TTTY14, and TTTY19) presented both methylation
and acetylation of H3K9 (Singh et al., 2011). Chromatin acetylation is a histone modification
related to increased gene transcription (Rajender et al. 2011). Consequently, the expression of
these genes probably is influenced by other genes. Only two genes (TTTY10 and TTTY15)
were enriched in H3K9 acetylation indicating that these two genes are constitutively
transcribed (Singh et al., 2011).

2.2. The TTTY2 Gene Subfamily
TTTY2 is a multicopy subfamily and member of the larger gene family TTTY of
chromosome Y (Makrinou et al., 2001; Yapijakis et al., 2015). TTTY2 genes are non-coding
transcription units in tandem arrays that are strongly expressed mainly in testis (Lahn et al.,
1997; Makrinou et al., 2001). Like all members of the TTTY family, TTTY2 genes reside in
the ampliconic MSY (Makrinou et al., 2001; Skaletsky et al., 2003) (Figure 1).
The TTTY2 gene subfamily contains about 26 members, divided in 14 subgroups. All
TTTY2 genes are consisted of seven exons and six introns spanning about 17.9 kb. The size of
exons ranges from 79 bp to 1.8 kb, while introns range from 120 bp to 7.4 kb. Members of
each subgroup are about 93%-99% similar, whereas similarity between different subgroups is
55%-87%. There are two clusters of TTTY2 genes on Y chromosome, with the larger on
Yq within the AZFc region and the smaller cluster on Yp (Makrinou et al., 2001). It has been
proposed that the observed pattern of TTTY2 genes resulted through multistage evolution,
involving one wave of duplications that gave rise to the subfamily founder member followed
by a series of duplications that expanded the number of certain subgroup members, as well as
a major translocation event between Yq and Yp (Makrinou et al., 2001).
The function of TTTY2 genes is not clearly undestood. Since they are abundantly
transcribed but they do not possess an open reading frame, it is possible that they are either
non-coding RNAs with various actions or pseudogenes (Makrinou et al., 2001; Yapijakis et
al., 2015). The second possibility seems less likely since their constitutive gene expression
suggests that their transcripts play a functional role.
The promoters of TTTY2 genes present specific histone methylation marks, especially an
enrichment in H3K9me3 (Singh et al., 2011). This observation suggests that H3K9me3 in
combination with a DNA methylation epigenetic mark is necessary for the suppression of
TTTY2 gene clusters (Singh et al., 2011).
2.3. TTTY2 Gene Subfamily Deletions are Associated with Male Infertility
There is evidence implicating two specific TTTY2 genes in spermatogenesis. Deletions of

genes TTTY2L2A as well TTTY2L12A have been detected in men with oligozoospermia and
azoospermia with unknown etiologies.
Specifically, Yapijakis et al. (2015) have studied TTTY2L2A and TTTY2L12A deletions in
three groups of infertile patients: a) with idiopathic moderate oligozoospermia, b) with
idiopathic oligozoospermia and azoospermia, and c) with oligozoospermia and azoospermia
of different known causes. The researchers have noticed that up to 30% of patients in the two
idiopathic groups presented deletions in either one of the two genes. Surprisingly, 8% of
patients carried deletions in both TTTY2L2A and TTTY2L12A genes, although they reside very
far apart in the long and short arms of Y chromosome, respectively (Makrinou et al., 2001;
Yapijakis et al., 2015). These data indicate that both genes are strongly associated with
spermatogenesis and imply a possible non-homologous recombination mechanism that may
generate Y chromosome instability leading to infertility.
Genes TTTY2L2A and TTTY2L12A are located on human Yq11 and Yp11, respectively
(Makrinou et al., 2001). In particular, TTTY2L2A spans more than 100 bp within the AZFc
region on Yq11, which is frequently deleted in infertile men (Yapijakis et al., 2015). The two

genes have been identified also at distinct locations on the Y chromosome long and short
arms of male primates such as chimpanzee (Pan troglodytes), bonobo chimpanzee (Pan
paniscus) and gorilla (Gorilla gorilla) (Makrinou et al., 2001).
Both genes are consisted of seven exons and six introns, like all TTTY2 genes. The level
of exon identity of these two genes with the TTTY2 exons varies from 72% to 82% (Makrinou
et al., 2001). Their exon/intron structure is not consistent with the conventional AG-GT rule.
Moreover, they present about 10 exonic deletions and insertions similar to TTTY2, and span
26.5 kb and 29 kb respectively. Apart from testis, the two genes are also expressed in lung
and kidney (Makrinou et al., 2001). It has been speculated that the two genes are either on the
transition to become pseudogenes or they are transcribed to non-coding RNAs with several
possible functions (Yapijakis et al., 2015).
3. NON-CODING RNAS
3.1. Characteristics of Non-Coding RNAs and Association with
Spermatogenesis
Non-coding RNAs (ncRNAs) are transcribed RNAs which do not encode a protein. They
used to be characterized as “junk” in the genome, but in recent years accumulating evidence
has indicated that they are extremely important for several biological functions, including
regulation of gene expression, translational regulation, normal development, DNA synthesis,
genome stability, and spermatogenesis (Calore et al., 2013; Cech et al., 2014; Mukherjee et
al., 2014). They include two main classes: small non-coding RNAs (sncRNAs) with a length
of 20-30 nucleotides (nt) and long non-coding RNAs (lncRNAs) spanning more than 200 nt.
Some lncRNAs have a very long size of several kb in length (like the TTTY2 genes), and it
has been proposed to classify as "very long intergenic non-coding" (vlncRNAs) those
intergenic RNA regions of more than 50 kb (Kapranov et al., 2010). Small non-coding RNAs
in their turns, are divided in three main classes: small interfering RNAs (siRNAs),
microRNAs (miRNAs) and PIWI-interacting RNAs (piRNAs) (Aravin et al., 2006; Calore et
al., 2013; Kitano et al., 2013; Libri et al., 2013; Cech et al., 2014).
siRNAs. Small interfering RNAs (siRNAs) are about 22 nt in length (Zeng et al., 2003;
He et al., 2009; Kitano et al., 2013; Cech et al., 2014). They are dispensed among taxa, but
they are well characterized particularly in nematodes, flies and mammals (Carthew et al.,
2009; Kitano et al., 2013). Endogenous siRNAs are originated from double-stranded RNA
precursors (dsRNAs) which are inserted in the cells by viral infection or transfection (Zeng et
al., 2003; Carthew et al., 2009; Kim et al., 2009; Libri et al., 2013). They are associated with
Dicer enzymes which cleave their precursors and RNA-binding Argonaute family proteins
(AGO proteins) and the RNA induced silencing complex (RISC complex), which is a
ribonucleoprotein aggregate. With the aid of siRNAs the RISC complex is attached to mRNA
molecules through complimentary pairing resulting either in cleavage or degradation of the
coding transcripts (Zeng et al., 2003; Carthew et al., 2009; Kim et al., 2009; Libri et al., 2013;
Zimmermann et al., 2014). siRNAS have been implicated in gene silencing and
spermatogenesis (Doench et al., 2003; Zeng et al., 2003; Morris et al., 2004; Cathew et al.,
2009; He et al., 2009; Libri et al., 2013; Zimmermann et al., 2014).

miRNAs. MicroRNAs (miRNAs) are endogenous, single-stranded non-coding RNAs with

a length of about 22 nucleotides. They are present in plants, animals, and viruses exhibiting a
quite conserved sequence across taxa (Pang et al., 2006; Carthew et al., 2009; He et al., 2009;
Papaioannou et al., 2010; Kitano et al., 2013; Yang et al., 2013; Cech et al., 2014; Mukherjee
et al., 2014). They are transcribed in animal cells by RNA polymerase II giving rise to a
primary miRNA transcript (pri-miRNA), which is cleaved producing the precursor miRNA
(pre-miRNA). Pre-miRNA is then transported from the nucleus into cytoplasm and cleaved
by Dicer enzymes resulting in miRNA/miRNA*duplex. The latter is loaded onto the
ribonucleoprotein complex RISC with members of RNA-binding AGO proteins and bound on
an mRNA target (Figure 2). There it provokes either translation repression or mRNA
degradation (Winter et al., 2009; Papaioannou et al., 2010; Libri et al., 2013; Zimmermann et
al., 2014). miRNAs play a pivotal role in many cellular processes, like the regulation of post-
transcriptional gene expression to different genomic regions, mammalian spermatogenesis
and male infertility, tissue repair, development and aging, normal growth and function of
neural system (He et al., 2009; Papaioannou et al., 2010; Yuan et al., 2011; Huang et al.,
2012; Calore et al., 2013; Godnic et al., 2013; Kitano et al., 2013; Yang et al., 2013;
Zimmermann et al., 2014).
Figure 2. Biogenesis of miRNAs. Gene transcription results in primary miRNA transcripts (pri-
miRNAs). These are cleaved producing the precursor miRNAs (pre-miRNAs). Pre-miRNAs are
exported from nucleus to cytoplasm and cleaved by Dicer giving rise to miRNA/miRNA* duplex. This
duplex is then, associated with RISC complex and Ago proteins leading to either translation repression
or mRNA degradation (based on Winter et al., 2009 with modifications).

piRNAs. PIWI-interacting RNAs (piRNAs) spanning from 24 to 32 nt in length, are

single-stranded endogenous RNAs (He et al., 2009; Kitano et al., 2013; Yang et al., 2013).
They are connected with specific proteins (PIWI proteins) which are observed in germline of
different organisms protecting genomes from transposable elements (Thomson et al., 2009;
Calore et al., 2013; Yang et al., 2013; Cech et al., 2014). They are less conserved across taxa
in relation to miRNAs, and characterized by uridine at the 5’-end (Kitano et al., 2013). They
are organized in clusters in the genome and each cluster is comprised by 10 to 1000 piRNAs
covering 1 to 100 kb (Mukherjee et al., 2014). They are transcribed from uni-directional and
bi-directional active transposons giving rise to piRNA precursors, which are exported from
nucleus to cytoplasm. There, either the piRNA precursors are subjected to primary processing
(shortening) leading to mature piRNAs or they are are loaded onto PIWI proteins and
participate in post-transcriptional “ping-pong” amplification with complimentary piRNAs
resulting in many more mature piRNAs. PIWI-bound mature piRNAs translocate into the
nucleus, where they block transcription of complimentary transposon elements (Thomson et
al., 2009; Calore et al., 2013). The piRNAs are also involved in spermatogenesis, egg
activation and fertilization (He et al., 2009; Thomson et al., 2009; Yang et al., 2013;
Zimmermann et al., 2014).
lncRNAs. Long non-coding RNAs (lncRNAs) are single or multiple exonic with a polyA
tail and span more than 200 nt in length (Moran et al., 2012; Calore et al., 2013; Cech et al.,
2014; Mukherjee et al., 2014). Unlike miRNAs, these are not that conserved across taxa
(Pang et al., 2006). The majority is transcribed by RNA polymerase II, spliced,
polyadenylated and capped at the 5’-end (Moran et al., 2012; Nie et al., 2012). A few of them
have been noticed to be transcribed by RNA polymerase III (Nie et al., 2012). The lncRNAs
are associated with a variety of essential biological functions including transcriptional control
of gene expression, genomic imprinting, dosage compensation, nuclear compartmentalization,
chromatin modification, spermatogenesis, pluripotency of embryonic stem cells such as
differentiation of neural stem cells, control of cell cycle and cancer (Moran et al., 2012; Nie et
al., 2012; Rinn et al., 2012; Calore et al., 2013; Kung et al., 2013; Novikova et al., 2013; Luk
et al., 2014; Mukherjee et al., 2014; Ramos et al., 2015). Among the various biological
processes in which lncRNAs hold a key role, epigenetic regulation, transcriptional control and
post-transcriptional modification are the most well studied. Accumulating data have indicated
that lncRNAs influence chromatin remodeling machinery controlling gene expression in
specific genomic loci (Mercer et al., 2009; Nie et al., 2012). Several lncRNAs have been
found to induce transcription of enhancers and promoters of their target genes under specific
stimuli (Mercer et al., 2009; Moran et al., 2012; Nie et al., 2012). In addition, lncRNAs affect
post-transcriptional mRNA processing including splicing, editing, trafficking, translation and
degradation of mRNAs (Mercer et al., 2009; Nie et al., 2012).
3.2. Mechanisms Involving Long Non-Coding RNAs
Long non-coding RNAs (lncRNAs) are located in both nucleus and cytoplasm (Moran et
al., 2012; Nie et al., 2012; Cech et al., 2014). Several lncRNAs exhibit their action in specific
tissue types and in different development stages (Mercer et al., 2008; Wilusz et al., 2009;
Kung et al., 2013; Sun et al., 2013). Since only approximately 200 of the 20,000 lncRNAs
existing in human genome have been characterized as functional, the question whether all

lncRNAs are functional or not remains to be answered in the future (Mercer et al., 2008;
Wilusz et al., 2009; Moran et al., 2012).
Regarding their genomic location, there are three main subgoups of lncRNAs: a) natural
antisense transcripts (NATs) which are transcribed in an antisense direction of the protein
genes; b) long intervening non-coding RNAs (lincRNAs) expressed by intergenic regions of
the genome which used to be considered “junk DNA”; c) intronic lncRNAs, which are
encoded within introns of genes coding proteins (Moran et al., 2012; Nie et al., 2012; Rinn et
al., 2012). In particular, NATs play a pivotal role in the regulation of mRNAs. Some of them
lead the epigenetic machineries to specific loci, while some others induce alternative splicing
of the target genes by masking crucial cis-elements in mRNA and forming RNA duplexes
(Mercer et al., 2009; Nie et al., 2012).
Regarding their function, there seem to be three main archetypic actions of lncRNAs
(Figure 3), although a combination of these archetypes could be possible: decoys, guides and
scaffolds.
Figure 3. Mechanisms involving various actions of lncRNAs. (a) Decoys: Some lncRNAs block
attachment of regulatory proteins on DNA; (b) Guides: Some lncRNAs bind to particular epigenetic
machineries leading them to specific genomic regions; (c) Scaffolds: Some lncRNAs bring together
two or more epigenetic machineries assembling them into a distinct unit.

Decoys. This kind of archetype lncRNAs express their action by blockading the
attachment of regulatory proteins in the DNA. These lncRNAs bind to their target
complimentary DNA strand forming a DNA-RNA hybrid which dissociates the effector
proteins. An example of this model is the synergistic function between the PANDA lncRNA
and the pro-apoptotic nuclear transcription factor NF-YA which controls apoptotic response
upon DNA damage. PANDA inhibits expression of apoptotic genes by blocking NF-YA and
therefore cell survival is promoted (Wang et al., 2011; Rinn et al., 2012; Mukherjee et al.,
2014).
Guides. When lncRNAs act as guides, they tether to specific RNA-binding proteins of
epigenetic machineries leading them to selective genomic regions and thus controlling gene
expression locally (Wang et al., 2011; Rinn et al., 2012; Mukherjee et al., 2014). Guides can
alter the gene expression either in cis or in trans (Wang et al., 2011; Mukherjee et al., 2014).
A typical paradigm of this model is the inactivation of one of the two X chromosomes in
female mammals. X inactivation center (Xic) encodes several noncoding RNAs including the
lncRNA Xist. Particularly the lncRNA RepA recruits the Polycomb repressive complex 2
which forms a “heterochromatic state” in combination with the Histone 3 lysine
trimethylation of the Xist gene promoter. That state induces the transcription of Xist gene
encoding the lncRNA Xist which in its turn “coats” the X chromosome that will eventually
become inactive (Wang et al., 2011; Rinn et al., 2012; Mukherjee et al., 2014).
Scaffolds. Assembly of two or more proteins and epigenetic machineries into a unique
structural complex is effected by lncRNA scaffolds. A classic example of this model refers to
the known lncRNA HOTAIR. HOTAIR brings together two complexes, the PRC2 and the
LSD1-CoREST, in one distinct unit. This unique complex induces simultaneously H3K27
methylation and H3K4me2 demethylation leading to gene suppression (Spitale et al., 2011;
Wang et al., 2011; Rinn et al., 2012; Mukherjee et al., 2014).
In addition, several studies have shown that some other mammalian lncRNAs control
gene expression post-transcriptionaly by tethering in miRNAs and preclude their attachment
to their target mRNAs (Moran et al., 2012). They may act as competitors Inhibiting miRNA-
induced mRNA degradation and translation (Luk et al., 2014).
3.3. Possible Function of TTTY2 Transcripts in Spermatogenesis
The involvement of lncRNAs in spermatogenesis remains unclear. Spermatogenesis is an

intricate process producing male gametes and involves a wide range of mechanisms regulated
by several genes and gene products like lncRNAs. Only a few lncRNAs related to
mammalian spermatogenesis have been discovered and characterized until now (Anguera et
al., 2011; Sun et al., 2013; Luk et al., 2014). For example, an identified lncRNA is encoded
by the testis specific X-linked gene (Tsx) which is expressed in pachytene spermatocytes,
therefore it appears to possibly play an essential role in meiotic division of germ cells
(Anguera et al., 2011). Lee et al. (2012) observed in type A spermatogonia, pachytene
spermatocytes and round spermatocytes an amount of 50, 35 and 24 lncRNAs respectively.
It has been suggested that the two genes of the TTTY2 gene family which were deleted in
infertile men, TTTY2L2A and TTTY2L12A, encode for lncRNAs possibly acting as antisense
transcripts (NATs) to neighboring genes suspending their action and resulting in failure of
normal spermatogenesis (Yapijakis et al., 2015). A known possible RNA interference

involves two adjacently located genes on Yp, TTTY2 and TTTY1. A 157 bp sequence of
TTTY2 gene is antisense to spliced TTTY1 gene, therefore there is possibly a regulated
alternate expression of the two genes (Skaletsky et al., 2003). Deletions of TTTY2L2A gene
might allow the continuous expression of TTTY1 with possible deleterious effects on
spermatogenesis.
Due to the very long size of the TTTY2 genes that span more than 50 kb in length, their
transcripts may be considered as very long intergenic non-coding RNAs (vlncRNAs)
(Kapranov et al., 2010). Their size suggests that they (or parts of them) may be involved in
several different mechanisms, including inhibition of DNA-binding proteins (Figure 3),
inhibition of miRNA-induced mRNA degradation and translation, as well as site-specific
DNA methylation and histone modification (Luk et al., 2014). Nevertheless, the exact role of
TTTY2 lncRNAs in spermatogenesis is still elusive and remains to be unveiled.
REFERENCES
Aarabi, M; Saliminejad, K; Sadeghi, MR; Soltanghoraee, H; Amirjannati, N; Modarressi,
MH. Deletion and Testicular Expression of DAZ (Deleted in Azoospermia) Gene in
Patients with Non-Obstructive Azoospermia. Iranian J Publ Health, 2009, 38, 17-23.
Agarwal, A; Deepinder, F; Sharma, RK; Ranga, G; Li, J. Effect of cell phone usage on semen
analysis in men attending infertility clinic: an observational study. Fertil Steril, 2008,
89,124-128.
Agarwal, A; Desai, NR; Makker, K; Varghese, A; Mouradi, R; Sabanegh, E; Sharma, R.
Effects of radiofrequency electromagnetic waves (RF-EMW) from cellular phones on
human ejaculated semen: an in vitro pilot study. Fertil Steril, 2009, 92, 1318-1325.
Anguera, MC; Ma, W; Clift, D; Namekawa, S; Kelleher III, RJ; Lee, JT. Tsx Produces a Long
Noncoding RNA and Has General Functions in the Germline, Stem Cells, and Brain.
PloS Genet, 2011, 7:e1002248.
Aravin, A; Gaidatzis, D; Pfeffer, S; Lagos-Quintana, M; Landgraf, P; Iovino, N; Morris, P;
Brownstein, MJ; Kuramochi-Miyagawa, S; Nakano, T; Chien, M; Russo, JJ; Ju, J;
Sheridan, R; Sander, C; Zavolan, M; Tuschl, T. A novel class of small RNAs bind to
MILI protein in mouse testes. Nature, 2006, 442, 203-207.
Calore, F; Lovat, F; Garofalo, M. Non-coding RNAs and Cancer. Int J Mol Sci, 2013, 14,
17085-17110.
Carthew, RW; Sontheimer, EJ. Origins and Mechanisms of miRNAs and siRNAs. Cell, 2009
136, 642-655.
Cech, TR; Steitz, JA. The Noncoding RNA Revolution-Trashing Old Rules to Forge New
Ones. Cell, 2014, 157, 77-94.
De Iuliis, GN; Newey, RJ; King, BV; Aitken, RJ. Mobile Phone Radiation Induces Reactive
Oxygen Species Production and DNA Damage in Human Spermatozoa In Vitro. PLoS
One, 2009, 4:e6446.
Doench, JG; Petersen, CP; Sharp, PA. siRNAs can function as miRNAs. Genes Dev, 2003,
17, 438-442.

Ferlin, A; Moro, E; Rossi, A; Dallapiccola, B; Foresta, C. The human Y chromosome’s

azoospermia factor b (AZFb) region: sequence, structure, and deletion analysis in
infertile men. J Med Genet, 2003, 40, 18-24.
Ferlin, A; Raicu, F; Gatta, V; Zuccarello, D; Palka, G; Foresta, C. Male infertility: role of
genetic background. Reprod Biomed Online, 2007, 14, 734-745.
Ferlin, A; Tessari, A; Ganz, F; Marchina, E; Barlati, S; Garolla, A; Engl, B; Foresta, C.
Association of partial AZFc region deletions with spermatogenic impairment and male
infertility. J Med Genet, 2005, 42, 209-213.
Fernandes, S; Huellen, K; Goncalves, J; Dukal, H; Zeisler, J; Rajpert De Meyts, E;
Skakkebaek, NE; Habermann, B; Krause, W; Sousa, M; Barros, A; Vogt, PH. High
frequency of DAZ1/DAZ2 gene deletions in patients with severe oligozoospermia. Mol
Hum Reprod, 2002, 8, 286-298.
Godnic, I; Zorc, M; Jevsinek Skok, D; Calin, GA; Horvat, S; Dovc, P; Kovac, M; Kunej, T.
Genome-Wide and Species-Wide In Silico Screening for Intragenic MicroRNAs in
Human, Mouse and Chocken. PloS One, 2013 8:e65165.
Hammoud, AO; Gibson, M; Peterson, CM; Meikle, AW; Carrell, DT. Impact of male obesity
on infertility: a critical review of the current literature. Fertil Steril, 2008, 90, 897-904.
He, Z; Kokkinaki, M; Pant, D; Gallicano, GI; Dym, M. Small RNA molecules in the
regulation of spermatogenesis. Reproduction, 2009, 137, 901-911.
Huang, TC; Pinto, SM; Pandey, A. Proteomics for understanding miRNA biology.
Proteomics, 2013, 13, 558-567.
Huynh, T; Mollard, R; Trounson, A. Selected genetic factors associated with male infertility.
Hum Reprod Update, 2002, 8, 183-198.
Jungwirth, A; Diemer, T; Dohle, G.R; Giwercman, A; Kopa, Z; Tournaye, H; Krausz, C.
European Association of Urology Guidelines on Male Infertility, 2013.
Kapranov, P; St Laurent, G; Raz, T; Ozsolak, F; Reynolds, CP; Sorensen, PH; Reaman, G;
Milos, P; Arceci, RJ; Thompson, JF; Triche, TJ.. "The majority of total nuclear-encoded
non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA.” BMC Biol,
2010, 8, 149.
Kim, VN; Han, J; Siomi, MC. Biogeneis of small RNAs in animals. Nat Rev Mol Cell Biol.,
2009, 10, 126-139.
Kitano, J; Yoshida, K; Suzuki, Y. RNA sequencing reveals small RNAs differentially
expressed between incipient Japanese threespine sticklebacks. BMC Genomics, 2013, 14,
1-13.
Kung, JT; Colognori, D; Lee, JT. Long Noncoding RNAs: Past, Present, and Future.
Genetics, 2013, 193, 651-669.
Kuroda-Kawaguchi, T; Skaletsky, H; Brown, LG; Minx, PJ; Cordum, HS; Waterston, RH;
Wilson, RK; Silber, S; Oates, R; Rozen, S; Page, DC. The AZFc region of the Y
chromosome features massive palindromes and uniform recurrent deletions in infertile
men. Nat Genet, 2001, 29, 279-286.
Lahn, BT; Page, DC. Functional Coherence of the Human Y Chromosome. Science, 1997,
278, 675-680.
Lee, TL; Xiao, A; Rennert, OM. Identification of Novel Long Noncoding RNA Transcripts in
Male Germ Cells. Methods Mol Biol, 2012, 825, 105-114.
Libri, V; Miesen, P; vanRij, RP; Buck, AH. Regulation of microRNA biogenesis and
turnover by animals and their viruses. Cell Mol Life Sci, 2013, 70, 3525-3544.

Luk, AC; Chan, WY; Rennert, OM; Lee, TL. Long noncoding RNAs in spermatogenesis:
insights from recent high-throughput transcriptome studies. Reproduction, 2014, 147,
R131-141.
Makrinou, E; Fox, M; Lovett, M; Haworth, K; Cameron, JM; Taylor, K; Edwards, YH.
TTY2: A Multicopy Y-Linked Gene Family. Genome Res, 2001, 11, 935-945.
Ma, K; Mallidis, C; Bhasin, S. The role of Y chromosome deletions in male infertility. Eur J
Endocrinol, 2000, 142, 418-430.
Marshall Graves, JA. Human Y Chromosome, Sex Determination, and Spermatogenesis-A
Feminist View. Biol Reprod, 2000, 63, 667-676.
Mercer, TR; Dinger, ME; Mattick, JS. Long non-coding RNAs: insights into functions. Nat
Rev Genet, 2009, 10, 155-159.
Mercer, TR; Dinger, ME; Sunskin, SM; Mehler, MF; Mattick, JS. Specific expression of long
noncoding RNAs in the mouse brain. Proc Natl Acad Sci USA, 2008, 105, 716-721.
Miyamoto, T; Tsujimura, A; Miyagawa, Y; Koh, E; Namiki, M; Sengoku, K. Male infertility
and Its Causes in Human. Adv Urol, 2012, 2012, 1-7.
Moran, VA; Perera, RJ; Khalil, AM. Emerging functional and mechanistic paradigms of
mammalian long non-coding RNAs. Nucleic Acids Res, 2012, 40, 6391-6400.
Morris, KV; Chan, SWL; Jacobsen, SE; Looney, DJ. Small Interfering RNA-Induced
Transcriptional Gene Silencing in Human Cells. Science, 2004 305, 1289-1292.
Mukherjee, A; Koli, S; Reddy, KV. Regulatory non-coding transcripts in spermatogenesis:
shedding light on ‘dark matter.’ Andrology, 2014, 2, 360-369.
Navarro-Costa, P; Plancha, CE; Gonçalves, J. Genetic Dissection of the AZF Regions of the
Human Y Chromosome: Thriller or Filler for Male (In)Fertility? J Biomed Biotechnol,
2010, 2010, 1-18.
Nie, L; Wu, HJ; Hsu, JM; Chang, SS; LaBaff, AM; Li, CW; Wang, Y; Hsu, JL; Hung, MC.
Long non-coding RNAs: versatile master regulators of gene expression and crucial
players in cancer. Am J Transl Res, 2012, 4, 127-150.
Novikova, IV; Hennelly, SP; Sanbonmatsu, KY. Tackling Structures of Long Noncoding
RNAs. Int J Mol Sci, 2013, 14, 23672-23684.
O'Flynn O'Brien, KL; Varghese, AC; Agarwal, A. The genetic causes of male factor
infertility: A review. Fertil Steril, 2010, 93, 1-12.
Pang, KC; Frith, MC; Mattick, JS. Rapid evolution of noncoding RNAs: lack of conservation
does not mean lack of function. Trends Genet, 2006, 22, 1-5.
Papaioannou, MD; Nef, S. microRNAs in the Testis: Building Up Male Fertility. J Androl,
2010, 31, 26-33.
Rajender, S; Avery, K; Agarwal, A. Epigenetics, spermatogenesis and male infertility. Mutat
Res, 2011, 727, 62-71.
Ramos, AD; Andersen, RE; Liu, SJ; Nowakowski, TJ; Hong, SJ; Gertz, CC; Salinas, RD;
Zarabi, H; Kriegstein, AR; Lim, DA. The Long Noncoding RNA Pnky Regulates
Neuronal Differentiation of Embryonic and Postnatal Neural Stem cells. Cell Stem Cell,
2015, 16, 439-447.
Rinn, JL; Chang, H. Genome regulation by long noncoding RNAs. Annu Rev Biochem, 2012,
81, 145-166.
Siffroi, JP; Le Bourhis, C; Krausz, C; Barbaux, S; Quintana-Murci, L; Kanafani, S; Rouba, H;
Bujan, L; Bourrouillou, G; Seifer, I; Boucher, D; Fellous, M; McElreavey, K; Dadoune,

JP. Sex chromosome mosaicism in males carrying Y chromosome long arm deletions.
Hum Reprod, 2000, 15, 2559-2562.
Singh, NP; Madabhushi, SR; Srivastava, S; Senthilkumar, R; Neeraja, C; Khosla, S; Mishra,
RK. Epigenetic profile of the euchromatic region of human Y chromosome. Nucleic
Acids Res, 2011, 39, 3594-3606.
Skaletsky, H; Kuroda-Kawaguchi, T; Minx, PJ; Cordum, HS; Hillier, L; Brown, LG;
Repping, S; Pyntikova, T; Ali, J; Bieri, T; Chinwalla, A; Delehaunty, A; Delehaunty, K;
Du, H; Fewell, G; Fulton, L; Fulton, R; Graves, T; Hou, SF; Latrielle, P; Leonard, S;
Mardis, E; Maupin, R; McPherson, J; Miner, T; Nash, W; Nguyen, C; Ozersky, P; Pepin,
K; Rock, S; Rohlfing, T; Scott, K; Schultz, B; Strong, C; Tin-Wollam, A; Yang, SP;
Waterston, RH; Wilson, RK; Rozen, S; Page, DC. The male specific region of the human
Y chromosome is a mosaic of discrete sequence classes. Nature, 2003, 423, 825-837.
Spitale, RC; Tsai, MC; Chang, HY. RNA templating the epigenome: long noncoding RNAs
as molecular scaffolds. Epigenetics, 2011, 6, 539-543.
Stuppia, L; Gatta, V; Fogh, I; Gaspari, AR; Grande, R; Morizio, E; Fantasia, D; Pizzuti, A;
Calabrese, G; Palka, G. Characterization of novel genes in AZF regions. J Endocrinol
Invest., 2000, 23, 659-663.
Sun, J; Lin, Y; Wu, J. Long Non-Coding RNA Expression Profiling of Mouse Testis during
Postnatal Development. PloS One, 2013, 8:e75750.
Thomson, T; Lin, H. The Biogenesis and Function of PIWI Proteins and piRNAs: Progress
and Prospect. Annu Rev Cell Dev Biol, 2009, 25, 355-376.
Tiepolo, L; Zuffardi, O. Localization of Factors Controlling Spermatogenesis in the
Nonfluorescent Portion of the Human Y Chromosome Long Arm. Hum Genet, 1976, 34,
119-124.
Vicdan, A; Vicdan, K; Günalp, S; Kence, A; Akarsu, C; Işik, AZ; Sӧzen, E. Genetic aspects
of human male infertility: the frequency of chromosomal abnormalities and Y
chromosome microdeletions in severe male factor infertility. Eur J Obstet Gynecol
Reprod Biol, 2004, 117, 49-54.
Vineeth, VS; Malini, SS. A Journey on Y Chromosomal Genes and Male Infertility. Int J
Hum Genet, 2011, 11, 203-215.
Vogt, PH; Falcao, CL; Hanstein, R; Zimmer, J. The AZF proteins. Int J Androl, 2008, 31,
383-394.
Yang, Q; Hua, J; Wang, L; Xu, B; Zhang, H; Ye, N; Zhang, Z; Yu, D; Cookie, HJ; Zhang, Y;
Shi, Q. MicroRNA and piRNA Profiles in Normal Human Testis Detected by Next
Generation Sequencing. PloS One, 2013, 8:e66809.
Yuan, Z; Sun, X; Liu, H; Xie, J. MicroRNA Genes Derived from Repetitive Elements and
Expanded by Segmental Duplication Events in Mammalian Genomes. PloS One, 2011, 6,
e17666.
Yapijakis, C; Serefoglou, Z; Papadimitriou, K; Makrinou, E. High frequency of TTTY2-like
gene related deletions in patients with idiopathic oligozoospermia and azoospermia.
Andrologia, 2015, 47, 536-544.
Wang, KC; Chang, HY. Molecular mechanisms of long noncoding RNAs. Mol Cell, 2011,
16, 904-914.
Whorton, D; Krauss, RM; Marshall, S; Milby, TH. Infertility in male pesticide workers.
Lancet, 1977, 2, 1259-1261.

Wilusz, JE; Sunwoo, H; Spector, DL. Long noncoding RNAs: functional surprises from the
RNA world. Genes Dev, 2009, 23, 1494-1504.
Winter, J; Jung, S; Keller, S; Gregory, RI; Diederichs, S. Many roads to maturity: microRNA
biogenesis pathways and their regulation. Nat Cell Biol, 2009, 11, 228-234.
Zeng, Y; Yi, R; Cullen, BR. MicroRNAs and small interfering RNAs can inhibit mRNA
expression by similar mechanisms. Proc Natl Acad Sci U S A, 2003, 100, 9779-9784.
Zimmermann, C; Romero, Y; Warnefors, M; Bilican, A; Borel, C; Smith, LB; Kotaja, N;
Kaessmann, H; Nef, S. Germ Cell-Specific Targeting of DICER or DGCR8 Reveals a
Novel Role for Endo-siRNAs in the Progression of Mammalian Spermatogenesis and
Male Infertility. Plos One, 2014, 9:e107023.

Chapter 3
GENETIC DIVERSITY ASSESSMENT

BY RANDOM AMPLIFIED POLYMORPHIC DNA
IN QUERCUS CALLIPRINOS,
Q. ITHABURENSIS AND Q. BOISSIERI,
GROWING IN ISRAEL
Gabriel Schiller*
Department of Agronomy and Natural Resources, Institute of Plant Sciences,
The Volcani Center, Bet Dagan, Israel
ABSTRACT
Holly oak (Q. coccifera L. / Q. calliprinos Webb.), Tabor oak (Quercus ithaburensis
Decne) and Aleppo oak (Quercus boissieri Reut.) are the main oak species is Israel for
which we lack the genetic knowledge that is needed as a basis for: forest management,
insitu-exsitu genetic conservation; breeding for new plantings of forests, parks and
gardens. Therefore, random amplified polymorphic DNA (RAPD) analysis was used to
determine the genetic diversity within and among 24 spontaneous occurrences of
Quercus calliprinos, 16-of Q. ithaburensis and 14-of Q. boissieri. The genetic parameters
found are summarized as follows:
Q. calliprinos Q. ithaburensis Q. boissieri

ne-Effective number of alleles 1.624±0.339 1.754±0.246 1.700±0.242
Ht-Total genetic diversity of the species 0.354±0.023 0.417±0.010 0.400±0.009
Hs-Genetic diversity within populations 0.325±0.020 0.360±0.009 0.335±0.009
Gst- Proportion of total diversity due to differences between populations 0.080 0.135 0.162
Nm-Gene flow 5.660 3.199 4.649
*
vcgabi@volcani.agri.gov.il.

38 Gabriel Schiller
Using pairwise Fst values as pairwise genetic distances between 24 Q. calliprinos

populations revealed that on average each population of the 24 analyzed differed significantly
from 13 ± 5 populations randomly distributed in the country. Three populations namely: Park
Goren, Tzafririm and Dir Razach differed significantly from most or all of the other
populations analyzed, whereas the Park Ha'Sharon, Mt Amiad and Beit Ha'Emeq populations
were similar to most of the populations. The low but significant genetic differentiation
between and within regions among populations makes the impact of management decisions
critical. In spite of the low levels of population structure and apparently high levels of
historical gene flow among populations, still the differentiation between populations suggest
that acorn collection for ex-situ conservation, restoration and reforestation should be done in
the nearest population within the same geographic region, and from a high number of stools
to capture most of the genetic variation.
Using UPGMA analysis based on genetic distances revealed that the 16 Q. ithaburensis
populations analysed were aggregated into three clusters that are coherent with the geographic
regions of the country: the Golan Heights, the Lower Galilee and the Coastal Plain groups. A
few of the primers used revealed allele frequencies that were significantly correlated with
geo-climatic parameters. Three populations, those of the Kachal, Alona Forest and Hirbet-
Zerkess, were clustered with geographically distant groups, which may indicate the
possibility that these populations are the result of human activity. Hence genetic propagation
material for planting should be collected within the geographic- climatic zone.
The small genetic distances or the high identity between populations of Q. boissieri on
the one hand, and the lack of aggregations or relatedness to geographic parameters on the
other hand, gives rise to the hypothesis that all the populations analyzed are remnants of a
larger population that existed during the last glaciations. The period of warming since then,
the fragmentation of the populations into sites differing in their ecological set-up, and Human
activity, did not give rise to significant genetic differentiation among them. Inference from
these results suggests that plantations for acorn production, as propagation material for
reforestation, should include the best phenotypes, selected in all the relict populations
regardless of ecological-geographical boundaries.
1. OAK SPECIES IN ISRAEL

Five oak species are growing in Israel namely: Quercus calliprinos Webb. an ever green
species, Q. ithaburensis Decne., Q. boissieri Reut., Q. cerrris L. and Q. look Kotschy, which
are deciduous species. The last two listed species are represented by only few trees which
grow on the slopes of Mt. Hermon in the Golan Heights; at this habitat these species are at the
most southern limit of their natural geographic distribution.
Probably because of today's low economic value, the first three listed species, which
grow also in other Mediterranean countries, receive only very little attention in the field of
forestry and forest genetics. These three oak species constitute the backbone of the chaparral
in Israel and elsewhere. Palynological research in Israel has shown that Quercus species grew
in our area during the Würm Pluvial era of the late Pleistocene and the Holocene; they
dominated the forest and maquis landscape [21, 38, 40, 72].

Genetic Diversity Assessment by Random Amplified Polymorphic DNA … 39
Due to their long life cycle forest trees are among the species that cannot migrate or adapt
fast enough to cope with the rapid changes in the environment imposed by Mans’ activity,
which might create ecological and forest management problems. Genetic diversity determines
the adaptive potential of a species and is an essential component of the stability of
ecosystems. Analysis of within and among populations genetic diversity is a fundamental step
for the development of strategies for conservation of genetic resources and consequently of
the adaptability.
The fragmented forest populations of Quercus ithaburensis, Q. boissieri and Q.
calliprinos in Israel are peripheral populations to their area of distribution in the
Mediterranean basin [4]. According to Safriel et al. [62] peripheral populations rather than
core ones may be tolerant to environmental extremes and changes, which result of their
higher genetic variability resulting from fluctuating selection. It is also likely that peripheral
populations evolve resistance to extreme conditions; thus, they should be treated as a
biogenetic resource used for rehabilitation and restoration of damaged ecosystems.
A.
Holly oak (Q. coccifera L. / Q. calliprinos Webb.) is the only evergreen oak with a
circum-Mediterranean distribution [77]; it has two predominant morphological forms, which
therefore were considered as two distinct subspecies [85, 88]. Today's Q. calliprinos is most
probably a descendant of a similar evergreen oak species from the Tertiary and Pleistocene
eras that did not evolve under contemporary Mediterranean climates [59]. This oak species
survived the hazardous combination of climatic fluctuations and prehistoric and historic
aggravating human interventions in the form of agro-pastoral activity in a very wide array of
ecological niches in Palestine, adjacent countries and overseas [24, 51, 59, 84, 86, 88]. Q.
calliprinos grows from sea level up to 1700 m a.s.l. on a variety of soil types that are the
weathering products of Calcareous, Dolomite, Basalt and Sandy bedrock formations, and on
alluvial plains [12, 25]. According to Zohary [86] the occurrence of Q. calliprinos at high
altitude, and its close approach to the eastern boundaries of the Mediterranean territory, is
testimony to its wide climatic tolerance. This species has a root system which can reach down
for water to more than 8m' through the bedrock fissures which enables him to withstand the
summer drought and continue transpiring [68] (and citations there in).
This species is much polymorphic or has phenotypic plasticity; it was, therefore, divided
into several varieties on the basis of cupule and acorn size, shape and form; some of which
may co-occur within the same site [4, 85, 87, 88]. Furthermore, two growth forms
(phenotypes) were distinguished, one which grows forming a bush (No apical dominance),
the second forming a tree (Apical dominance) [91].
B.
Tabor oak (Quercus ithaburensis Decne) is winter deciduous and a thermophiluous

species growing at altitudes between 50 and 500 m a.s.l., rarely up to 1000 m [41, 87]. This
species contain two subspecies; the one subsp. macrolepis growing in Turkey; the second
subsp. ithaburensis growing in Israel and Jordan [14]. There is no doubt that since antiquity

40 Gabriel Schiller
and until recently, Tabor oak was also planted as a supplementary food source for the sake of
its very large and sweet acorns [16]. In spite of its importance as a dominant tree species in
Israel’s landscape, Q. ithaburensis is among the many Mediterranean oak species of which
we lack knowledge about its genetic constituency that is needed as a basis for genetic
conservation [69] and for forest management. The morphology of Tabor oak is extremely
varied, possibly as a result of interaction between genotype and site conditions, or
hybridization as suggested by Zohary and Feinbrun-Dotan [89], or because of genetic
variation [36]. There are wide variations in: acorn and cupula size, in leaf shape and size and
in serration and spinocent [4, 85]. These variations are considered to be of ecological
importance [5, 34].
The fragmented Q. ithaburensis populations in Israel grow on different bedrock
formations that have a common feature of water seeping through the bedrock formation to the
soil-rock interface within the deeper soil pockets [35]. This ecological feature provides the
trees with enough moisture to support the high transpiration rates of this species, compared
with those of other Mediterranean sclerophyllous species [57, 58, 68]. In accordance with site
conditions Q. ithaburensis populations are park-like forests with or without accompanying
trees and shrubs species and undergrowth [15, 87].
C.
Aleppo oak (Quercus boissieri Reut.) is a winter deciduous species with marginally
continental ecological requirements; it belongs to the eastern floral elements in Israel. Its
distribution ranges from the Armeno-Kurdistan Mountains to the Zagros Mountains in
western Iran, from the Amanus and the Aegean and Mediterranean Taurus mountain range in
Turkey via Syria, and Lebanon to Israel; it also grows in Cyprus [4, 85, 88]. Q. boissieri
populations in Israel occur from Mt. Hermon in the north (lat. 33Ο17' N, long. 35Ο46' E.) to
Mt. Sansan in the Judean Mountains chain to the south (lat. 31Ο42' N, long. 34Ο06' E), this
area represents the southernmost fringe of its global distribution in the Middle East. The
populations are small and very scattered.
Not much is known about this species in Israel because it is only an associated species
within the Quercus calliprinos-Pistacia Palestine association of the Quercion calliprini
Alliance; in other words, the Mediterranean sclerophyllous broad-leaf forest. The species
occurs mainly in Upper Galilee where, according to Zohary [88], "It dominates the upper
mountains zone which is considerably colder and rainier than lower zones.” At present the
species grows at altitudes of between 250 and 1500 m a.s.l. mainly on red-brown or Terra
Rosa soils, which are the weathering product of marble limestone, limestone, and some flinty
limestone bedrock formations of the Cenomanian - Turonian era.
2. WHY INVEST IN GENETIC DIVERSITY ASSESSMENT

In nature genetic diversity is the result of evolutionary processes, and it is apparent within
species at different levels in both the enzymes and the DNA [51]. Various authors have
proposed adaptation to local environmental conditions as an explanation for differences in

allele frequencies in geographically widely distributed forest tree species. It has been
hypothesized that genetic differentiation within species are related to adaptation to micro-
geographic changes within the area of distribution [32, 44]. Furnier & Adams [22] found
correlation between allele frequencies and adaptation to ultramafic soils, and Guries and
Ledig [30] found significant correlations between allele frequencies and climatic variables
such as winter temperature. In the Swiss sub-alpine stands of Picea abies and Fagus sylvatica
Mueller-Starck [51] found relatively large intra-population and average inter-population
genetic variation in comparison with reference populations in Europe. The diversity of the site
conditions in which the different oaks are growing (See Tables 1 & 2) might have an effect on
the genetic composition and cause differentiation and create races [56, 75]. An increasing
number of investigations of plant micro-evolutionary processes at fine scale levels show the
important additions to demographic data and information about reproductive biology for an
adequate management of endangered populations and species. During their evolutionary and
ecological histories, forest tree species have experienced numerous environmental changes.
Genetic exhaustion, characteristic of species with a history of fragmented populations and
small population sizes lead to dramatic consequences on the ability of a forest tree species to
survive environmental changes. In this case, neutral markers are useful to estimate the relative
evolutionary importance of genetic factors such as mutation rates, gene flow, and genetic
drift. Because of their narrow ranges, endemics are commonly characterized by low levels of
genetic diversity [18, 26, 33].
Usually, the differentiation among landscape populations of forest trees and within-
population genetic diversity are affected by geo-climatic parameters. Out of Africa, Israel
(Palestine) is among the lands inhabiting old human settlements using fire (780.00 years ago)
with all the consequences to the indigenous flora and fauna [90]. Since antiquity, waves of
land colonization and wars have strongly contributed to the very extensive fragmentation and
destruction of the Mediterranean lowland sclerophyllous broad-leaf forests in the Middle East
[3, 15, 42, 87]. This situation has created island-like scattered populations that, because of
geographic features, probably cannot exchange genetic material, which raises the danger of
genetic drift.
Therefore, the overall aim of the present program is to acquire knowledge about the
influence of the strongly fragmented landscape and the climate diversity of Israel on the
genetic differentiation between populations and the genetic diversity within populations in the
three main oak species growing today within the boundaries of the country (Table 1). Such
knowledge is needed to create a basis for nature preservation, genetic conservation, forest
management; selection of populations and single trees for propagation materials (acorns) used
in nurseries to grow better trees in gardens and planted forests. The specific objectives of the
present study where to use random amplified polymorphic DNA (RAPD) markers to assess
the extent of genetic differentiation between populations and to estimate levels of genetic
diversity within populations of Q. calliprinos, Q. ithaburensis and Q. boissieri as influenced
by geo-climatic parameters in Israel.

42 Gabriel Schiller
3. GENETIC DIVERSITY
a. Materials and Methods
1. Collection of Genetic Material
Table 1. The species and the name of locations of genetic material collection
oak species Q. calliprinos abbr. Q. boissieri abbr. Q. ithaburensis abbr.

Geographic Region
The Golan heights Mt. Hermon He Mt. Hermon He Wadi Metzer WM
Massade M Massade North Mn Yehudiya Forest YF
Massade South Ms
Ein Zivan EZ Ein Zivan EZ
The upper Galilee Manara Ma Manara Ma Horshat tal HT
Mt. Meron Me Mt. Meron Me
Beit Ha'Emeq BH Mt. Adir Ad
Biranit Bi
Mt. Hilal Hi
The Lower Galilee Mt. Atzmon MA Yodfat Yo Kachal KH
Mt. Turan Mtu Beit Keshet BK
Mt. Tabor Mta Ha'Movil Junction HM
Mt. Amiad MA Waldheim WL
Mt. Chazon MC Ramat Johanan RJ
Western Galilee Park Goren PG
The Gilboa Range Malkishua ML
Mt. Carmel Range Recreation Road RR Ha' Muchraka Mu Ha' Muchraka MU
Ramat Ha' Nadiv RH Bat Shlomo BS
Coastal plains Park Ha'Sharon PH Hedera North HN
Ilanot-Kadima IL
Samaria Um Reichan Forest Um Um Reichan Forest Um Alona Forest AL
Um Zaffa Forest Uz Hirbet Zerkess HZ
Eiron Forest ER
Judean Mountain Range Ha' Massreq Hm
Mt. Sansan Sa Mt. Sansan Sa
El- Kern Forest EK
Judean foothills Dir Razach DR
Tzaffririm Tz
Kingdom of Jordan Ras Moneef RM
Dhana reserve DR
Table 1 present the geographic regions and locations of the forests were leaf material that
was not visibly affected by insects or fungi was sampled from 45 to 60 randomly selected
trees, without discriminating between possible varieties, growing at 24 sites in Israel and
Jordan at which natural occurrence of Q. calliprinos species is recorded; 16 sites at which Q.
ithaburensis occur naturally; 14 sites at which Q. boissieri occur naturally. This is the
minimum sample size needed for detecting all alleles present at frequencies not less than 10%
[29].

Table 2 presents the geographic location, the annual average rainfall, the soil and
geological properties at each of the sites. Each selected site is situated on more or less leveled
area to avoid possible within site differences resulting from selection due to light environment
influences or water availability [6, 23, 80]. Sampling of leaf material was done along several
paths created by browsing small ruminants; along these paths the distances between the
sampled trees were not less than 20 meters, so that sampling at each population covered an
area of about 2-ha. A total of 1200 trees of Q. calliprinos, 789 trees of Q. ithaburensis and
700 trees of Q. boissieri were sampled. Leaf material of each tree was stored separately at –
20°C pending DNA extraction.
Table 2. Geographic parameters, Rainfall, bedrock and soil characteristics at the site of
the different oak species analyzed
Geographic Population Latitude Longitude Altitud Annual Aver. Bedrock formation Soil type
Regions (N) (E) (m' asl) Rainfall (mm)
Golan Heights Mt. Hermon 33.17' 35.45' 1650 >1000 Limestone and chalk Brow Rendzina
Massade 33.13' 35.45' 1100 >1000 Golan flows, Basalt Basaltic lithosol
Ein Zivan 33.06' 35.47' 950 988 Basalt Basaltic lithosol
Wadi Metzer 32.44' 35.41' 250 450 Chalk and marl Brow Rendzina
Yehudiya forest 32.55' 35.40' 250 584 Basalt Basaltic lithosol
Upper Galilee Horshat Tal 33.14' 35.37' 100 809 Travertine Brown Rendzina & Alluvium
Manara 33.12' 35.32' 750 758 Limestone and chalk Terra rossa
Mt. Meron 32.59' 35.25' 1000 900 Dolomite and chalk Brown grumusols
Mt. Adir 33.02' 35.22' 900 Dolomite and chalk Brown grumusols
Biranit 33.03' 35.20' 700 Dolomite and chalk Brown grumusols
Mt. Hilal 32.7' 35.20' 950 Dolomite and chalk Brown grumusols
Beit Ha' Emeq 32.57' 35.15' 430 685 Limestone/Dolomite Terra rossa
Lower Galilee Mt. Atzmon 32.49' 35.16' 425 555 Dolomite with marl Terra rossa/Rendzina
Mt. Chazon 32.54' 35.24' 584 555 Dolimite/Limestone Terra rossa/Rendzina
Mt. Turan 32.48' 35.22' 548 516 Dolimite/Limestone Terra rossa/Rendzina
Mt. Tabor 32.41' 35.28' 600 562 Dolimite/Limestone Terra rossa/Rendzina
Mt. Amiad 32.56' 35.37' 486 697 Limestone Terra rossa
Yodfat 32.50' 35.16' 400 450 Limestone Terra rossa
Kachal 32.54' 35.30' 200 492 Lime stone and Chalk Brown Rendzina
Beit Keshet 32.43' 35.23' 150 536 Marl covered by Nari Gray Rendzina
Ha' Movil Junc. 32.47' 35.17' 215 582 Marl covered by Nari Brown Rendzina
Waldheim 32.44' 35.11' 200 648 Chalk Brown Rendzina
Ramat Johanan 32.47' 35.08' 175 566 Marl covered by Nari Brown Rendzina
Western Galilee Park Goren 33.03' 35.18' 250 793 Limestone/Dolomite Terra rossa
Gilboa Range Malkishua 32.26' 35.25' 500 451 Limestone/chalk/chert
Mt. Carmel Range Ha'Muchraka 32.40' 35.05' 450 687 Limestone Terra rossa
Recreation Road 32.44' 35.08' 400 686 Dolomite Terra rossa
Bat Shlomo 32.36' 35.00' 150 661 Marl covered by Nari Light Rendzina
Ramat Ha' Nadiv 32.33' 34.56' 125 574 Volcanic Tuff, Dolomite Terra rossa & Rendzina
Coastal plains Park Ha'Sharon 32.25' 34.53' 25 509 Sand dunes/calcareous Red sandy loam
Hedera North 32.26' 34.54' 50 603 Calcareous sandstone Red sandy loam
Ilanot-Kadima 32.18' 34.54' 75 621 Calcareous sandstone Red sandy loam
Samaria Um Reichan For. 32.09' 35.08' 400 666 Limestone/chalk/chert Terra rossa
Um Zaffa For. 32.01' 35.08' 580 Limestone/chalk/chert Brown Rendzina
Alona For. 32.33' 34.59' 225 581 Chalk Brown Rendzina
Hirbet Zerkess 32.36' 34.58' 50 603 Calcareous sandstone Red sandy loam
Eiron For. 32.29' 35.04' 200 685 Limestone with Nari Brown Rendzina
Judean Mt. Range Ha' Massreq 31.48' 35.02' 575 625 Dolomite/Limestone Terra rossa/ Brown Rendzina
Mt. Sansan 31.43' 35.05' 700 573 Dolomite/Limestone Terra rossa/ Brown Rendzina
El-Kern For. 31.37' 35.07' 975 662 Dolomite/Limestone Terra rossa/ Brown Rendzina
Judean foothills Dir Rasach 31.28' 35.02' 800 <400 Dolomite/Limestone Brown Rendzina
Tzaffririm 31.39' 34.56' 350 378 Chalk Brown grumusol/ Rendzina
Kigdom of Jordan Ras Moneef 32.20' 35.45' 115 650
Dhana reserve 30.40' 35.36' 1170 250

44 Gabriel Schiller
2. DNA Extraction and RAPD Procedure

Leaves collected from each individual tree were manually ground in liquid nitrogen with
a mortar and pestle, and homogenized in an extraction buffer (100 mM Tris, 1.4 M NaCl, 20
mM EDTA, 2% CTAB, 2% PVP-40, 0.2% 2-mercaptoethanol, pH 8.0). Total genomic DNA
was extracted according to Doyle and Doyle [13], with minor modifications. The samples
were incubated for 30 min at 37ºC with RNase (10 mg/ml), and the DNA concentration of
each sample was measured by spectrophotometric assay at 260 nm.
60 primers of the Operon Company were analyzed for their suitability to produce clear
readable polymorphic loci (bands) in the gels. The best conditions for amplification of all
primers were: total volume of 15 µl containing 1.0 unit Taq-polymerase, 50 mM Tris-HCl pH
9.1, 3.5 mM MgCl2, 200 µM dNTP (Sigma), 150 µg/ml BSA, 5 pmol primer (Operon kits A,
B, D) and 20 ng DNA.
Amplifications were performed with the following parameters: denaturation step 95ºC for
2 min, followed by 44 cycles of 94ºC for 1 min, 37ºC for 1 min, 72ºC for 2 min, and a final
step at 72ºC for 5 min. The amplification products were loaded on 1.8% agarose gel (TBE
was used as a running buffer) and were electrophoresed. The gels were stained with 0.5 µg/l
ethidium-bromide for 30 min [63]. Molecular sizes of the RAPD products were estimated by
means of molecular weight markers (pGEM, Promega).
Only 20 primers out of the 60 fulfilled all the requirements. In Q. ithaburensis, they
generated 72 polymorphic loci ranging in size from 300 to 2000 bp that were used for
analysis. Each random primer was represented by 2 to 7 scored RAPD fragments. These 20
primers produced 32 clearly identifiable polymorphic loci in Q. Q. ithaburensis and Q.
boissieri. In Q. calliprinos only 10 primers of the 20 were used because the others have not
produced good amplification or clear bands. A total of 23 bands (loci) produced by the 10
RAPD primers appeared consistently in all trees assayed. Figure 1 present visibly
polymorphic band (P) scored, created by the primers.
3. Data Analysis
The RAPD-PCR fragments were analyzed as genetic markers under the following
assumptions: (1) RAPD markers represent homologous loci and segregate in a Mendelian
fashion; (2) genotype frequencies at RAPD loci are in Hardy-Weinberg proportions; (3)
RAPD fragments behave as diploid, dominant markers with alleles being either recessive
"band absent" alleles are identical in status (iis) among and within individuals; or dominant
"band present" alleles are iis among and within individual.
Photographs of ethidium bromide-stained agarose gels were used to score the results of
RAPD-PCR fragments electrophoresis (Figure. 1).
Amplified fragments were recorded as absent (0) or present (1) in all individuals. For
each fragment, these two possible states were considered as the molecular phenotypes
resulting from the expression of two alleles at a single locus, one dominant and one recessive,
the dominant being the one that determines the presence of the band. We named each RAPD
allele by its primer and a hyphenated numeral corresponding to its locus and allele. Molecular
diversity within each population was assessed by calculating the percentage of polymorphic
fragments (P %) and Shannon diversity index (I), gene diversity (h), number of observed
alleles (na) and effective number of alleles. Calculations of intra- and inter-population genetic
diversity were done with Yeh et al. [83] POPGENE software, version 1.32. Gene diversity (h)
equivalent to the expected heterozygosity was estimated according to Nei [52] h = 1 - Σp2i

where pi is the frequency of variant i. We also used the Shannon index [73], to estimate gene
diversity I = Σ pi*ln(pi.). Genetic diversity within each population was calculated for each
locus and then averaged over all loci and populations. Genetic differentiation was calculated
as Gst(h) = d(h)/Ht where Ht – total diversity and d(h) is the difference between total and
within-population diversity according to Nei [52]. At each polymorphic locus, the total
diversity is represented by Ht, and the mean allelic diversity within populations by Hs. The
d(h) was calculated as the ratio (Ht - Hs)/Ht [54]. For population differentiation analysis that
tested the differentiation between diploid populations' pair-wise FST values as genetic
distances between populations were computed by means of ARLEQUIN 2.0 software and for
phylogenetic relations among the populations and constructing phylogenetic trees the
phylogenetic inference package PHYLIP 3.5 [20] was used.
Figure 1: Amplification products from primer OPA19. Lines 13 and 27-pGEM DNA markers
(Promega) with size in bp indicated on the right. Lines 1-12 and 14-26 are amplified bands of
individual trees of Q. ithaburensis. On the left side, molecular sizes in bp of scored bands are shown.
b. Results
1. Q. calliprinos
a. Intra-population Genetic Diversity

Values of diversity (Hs) and species total gene diversity values (Ht) are summarized in
Table 3. The results show that within-population genetic diversity (Hs) of 0.325±0.020
constitutes 91.88% of the total genetic diversity (Ht) of 0.354±0.023. The mean proportion of
the total diversity attributable to differences between populations (Gst) was 0.0812. The low
differentiation between populations is probably due to high gene flow (Nm), i.e., the
estimated number of migrants exchanged between local populations in each of the loci
analyzed, which resulted in a mean value of 5.7.

46 Gabriel Schiller
Table 3. Partitioning of the genetic diversity within and among 24 populations of Q.

calliprinos in Israel and Jordan for twenty-three RAPD loci
Primers and Sample

Locus Size na ne Hs Ht Gst Nm
OPA01-1 1184 2 2.000 0.489 0.500 0.021 22.952
OPA01-2 1184 2 1.442 0.298 0.305 0.023 21.358
OPA12-1 1184 2 1.973 0.441 0.493 0.105 4.270
OPA12-2 1184 2 1.169 0.141 0.145 0.027 17.776
OPA17-1 1182 2 1.952 0.466 0.488 0.044 10.986
OPA17-2 1182 2 1.769 0.418 0.434 0.037 12.853
OPA17-3 1182 2 1.748 0.398 0.426 0.066 7.138
OPA17-4 1182 2 1.176 0.143 0.148 0.035 13.921
OPA19-1 1183 2 1.863 0.447 0.465 0.037 12.937
OPA19-2 1183 2 1.960 0.469 0.490 0.044 11.007
OPA19-3 1185 2 1.235 0.182 0.190 0.040 11.937
OPA20-1 1182 2 1.831 0.428 0.453 0.055 8.546
OPB01-1 1181 2 1.968 0.474 0.492 0.037 12.898
OPB04-1 1183 2 1.276 0.199 0.215 0.075 6.163
OPB04-2 1181 2 1.392 0.272 0.281 0.032 15.007
OPB04-3 1183 2 1.095 0.084 0.087 0.032 15.243
OPB05-1 1187 2 1.761 0.415 0.432 0.040 11.964
OPB05-2 1187 2 1.489 0.316 0.329 0.041 11.815
OPD03-1 1183 2 1.961 0.453 0.489 0.074 6.253
OPD03-2 1183 2 1.895 0.431 0.471 0.086 5.296
OPD03-3 1186 2 1.027 0.025 0.026 0.053 8.925
OPD06-1 1184 2 1.398 0.276 0.285 0.032 15.387
OPD06-2 1184 2 1.968 0.212 0.493 0.570 0.377
Mean 1183 2 1.624 0.325 0.354 0.081 5.661
S.D. 0 0.340 0.143 0.153 0.111 5.337
Na = Observed number of alleles; ne = Effective number of alleles [52].
Hs = Mean genetic diversity within populations; Ht = Total genetic diversity.
Gst = Mean proportion of total diversity due to differences between populations [(Ht-Hs)/Ht].
Nm = estimate of gene flow from Gst or Gcs. i.e., Nm = 0.5(1 - Gst)/Gst; [See McDermott and
McDonald (1993), Ann. Rev. Phytopathol. 31:353-373].
b. Inter- populations Genetic Differentiation

Within the populations genetic diversity estimates are summarized in Table 4. Mean
sample size, i.e., number of trees sampled in a population was 49.6 ± 3.1. No relationship
between the gene sample size and diversity values were found: the smallest sample size was
recorded in Park Ha'Sharon population, and the largest - in the Dhana population; whereas,
gene diversity (h) ranged from 0.288 (Um-Reichan population) to 0.355 (Dir Razach
population).

Table 4. Genetic diversity estimates in 22 Israeli and 2 Jordanian

Q. calliprinos populations
Population Sample ne h PI P%
size
Mt. Hermon 49 1.556 0.321 21 91.30
Odem forest 50 1.583 0.341 22 95.65
Manara 48 1.493 0.302 23 100.00
Mt. Meron 49 1.561 0.322 22 95.65
Beit Ha'Emeq 48 1.570 0.330 23 100.00
Park Gorn 50 1.551 0.326 22 95.65
Mt. Chazon 49 1.553 0.312 21 91.30
Mt. Atzmon 50 1.554 0.320 22 95.65
Mt. Turan 50 1.554 0.329 22 95.65
Mt. Tabor 50 1.566 0.330 22 95.65
Mt. Amiad 48 1.567 0.322 22 95.65
Mt. Malkishua 49 1.571 0.334 23 100.00
Ras moneef* 54 1.583 0.338 22 95.62
Mt. Carmel 50 1.596 0.337 22 95.65
Ramat Hanadiv 49 1.613 0.349 23 100.00
Park Ha'Sharon 42 1.540 0.308 21 91.30
Um-Reichen 50 1.496 0.289 20 86.96
Um Zaffa 50 1.522 0.301 21 91.30
Ha'Masreq 49 1.517 0.309 23 100.00
Mt. Sansan 47 1.567 0.329 22 95.65
El Karen 50 1.559 0.324 21 91.30
Tzafririm 50 1.592 0.341 23 100.00
Dir Razach 50 1.622 0.356 22 95.65
Dhana* 55 1.579 0.331 21 91.30
Average 49.6 1.566 0.326 21.8 94.98
S.D. 3.1 0.038 0.020 1.0 4.29
Minmum 42 1.493 0.289 20 86.96
Maximum 55 1.622 0.356 23 100.00
ne = Effective number of alleles [52].
h = Nei's gene diversity within population [52].
Pl = Number polymorphic loci.
P% = Percentage of polymorphic loci at 99% criterion.
* - Population sampled in Jordan.
An attempt was made to identify differences among distinct regional groups of Q.

calliprinos populations in their genetic parameters by implementing analysis of molecular
variation (AMOVA); the regions were: the Golan Heights, the Galilee, the Carmel Range,
Samaria and Judea. The AMOVA performed over 22 Israeli populations for partitioning of
RAPD variation between the main groups, among populations within these five groups, and
among individuals within populations revealed that most of the variation (99.79%) is found
within populations. The Difference between populations groups is 0.13%, P < 0.0001, and
among populations within groups- 0.8%, P < 0.0001. This accounts for comparatively small

48 Gabriel Schiller
amounts of the total variance, although both effects were significant; the overall
differentiation among Q. calliprinos populations (ΦST) was 0.002. The two populations from
Jordan were not included in this analysis. Comparison of the genetic differentiation of Q.
calliprinos in the five geographic zones within Israel is presented in Table 5.
Table 5. Genetic differentiation between and within geographical regions, and among
sites of Q. calliprinos in Israel
Groups Trees Ht Hs Gst Nm

_________________________________________________________
GOLAN 99 0,340 0,331 0,027 17,776
GALILEE 441 0,341 0,322 0,056 8,362
CARMEL 99 0,353 0,343 0,031 15,911
SAMARIA 100 0,310 0,295 0,051 9,356
JUDEA 246 0,351 0,331 0,055 8,522
Ht = Total genetic diversity; Hs=Genetic diversity within populations.
Gst = Proportion of total diversity due to differences between populations.
Nm = Estimate of gene flow from Gst.
The table shows that the migration rates obtained display a differential gene flow
between populations in the different regional groups. Gene flow values were greater within
the Golan Heights and the Carmel Range groups than within the other ones, where the
number of migrants per generation was relatively smaller.
No considerable differences among the genetic parameters of the geographic groups of Q.
calliprinos were found and, furthermore, analysis by means of Cavali-Sforza and Edwards
[10] chord distances, or of Nei's [53] genetic distance revealed no patterns related to
geography (results not presented). In addition, no relations could be established between the
genetic and the geographic distances among populations.
The FST values that statistically represent the population structure were calculated
between pairs of populations according to Slatkin [74]. The results (not presented) show that
on average, the genetic distance between populations differed significantly [P< 0.05] in 13
pairs, with no relationship to geographical distances. Only the Tzaffririm and Dir Razach
populations, which represent the most southerly and driest region of Q. calliprinos
distribution in Israel, growing on rendzina soils, had significant genetic distances from all
other populations; the Park-Goren population too differs significantly from 20 of the 24
populations. On the other hand, the Beit Ha'Emeq, Park Ha'Sharon and Mt. Ami'ad
populations Fst values where similar to more than 15 other populations, each of them growing
in a very different set of ecological conditions (see Table 2).
2. Q. ithaburensis
a. Intra-populations Genetic Diversity

Within populations genetic variation parameters are presents in Table 6. Nei's average
within-population gene diversity [52] (Hs) was 0.362, which constitute 81.37% of the total
genetic diversity (Ht) of 0.414. The mean proportion of the total diversity attributable to

differences between populations (Gst) was 0.126. Differentiation between populations is

probably also due to high gene flow (Nm), i.e., the estimated number of migrants exchanged
between local populations in each of the loci analyzed, which resulted in a mean value of
3.460. Low values of the various genetic parameters were obtained in the Horshat-Tal and the
Alona Forest populations, and the highest values were found in the Ha’Movil Junction
population.
Table 6. Within Q. ithaburensis populations genetic

diversity estimates
Pop No. The populations Abrr. Npol P%99 Na Ne Hs
1 Ein Zivan EZ 71 98.610 1.986 1.635 0.367

2 Kachal Ka 70 97.220 1.972 1.669 0.380
3 Bet Keshet BK 70 97.220 1.972 1.612 0.356
4 Alona forest AF 69 95.830 1.958 1.571 0.333
5 Hirbet Zerkess HZ 69 95.830 1.958 1.646 0.370
6 Horshat Tal HT 66 91.670 1.917 1.554 0.323
7 Wadi Metzer WM 70 97.220 1.972 1.597 0.344
8 Yehudya forest YF 70 97.220 1.972 1.662 0.376
9 Ha' Movil Junction HJ 72 100.000 2.000 1.704 0.396
10 Bat Shkomo BS 71 98.610 1.986 1.654 0.369
11 Waldheim Wa 70 97.220 1.972 1.628 0.362
12 Ilanot-Kadima IL-K 69 95.830 1.958 1.636 0.364
13 Ramat Johanan RJ 68 94.440 1.944 1.635 0.363
14 Ha' Muchraka Mu 70 97.220 1.972 1.638 0.360
15 Hedera north HN 68 94.440 1.944 1.598 0.346
16 Eiron Forest EF 70 97.220 1.972 1.690 0.387
Average 69.6 96.610 1.966 1.633 0.362
S. D. 1.41 1.96 0.02 0.04 0.02
Npol = number of polymorphic loci; P%99 = percentage of polymorphic loci.
Na = Average number of alleles; Ne = Effective number of alleles [52].
b. Inter-populations Genetic Differentiation

A consensus tree (Bootstrap value = 1000), based on the cord distance values of Covalli-
Sforza and Edwards [10], was obtained by the UPGMA method (Figure 2).
The 17 populations analyzed (see Table 1 + the population of the Botanical Garden of the
Hebrew University of Jerusalem) clearly aggregate themselves into three main groups
surrounding the Botanical Garden population, which is meant to represent the diversity of this
species. The first group includes the Golan Heights and Upper Galilee populations and the
Kachal and Alona Forest populations (Populations 1 to 4, 5 and 12, Table 1). The second
group includes some population from Lower Galilee and the Hirbet-Zerkess population
(Populations 6 to 8, and 13, Table 1). The third group includes the Coastal Plain, the
Ha’Muchraka and Samaria populations (Populations 9 to 11, and 14 to 16, Table 1).

50 Gabriel Schiller
Figure 2. Phylogenetic relations among Q. ithaburensis populations in Israel based on Cavalli-Sforza

and Edwards (1967) cord distance values obtained by UPGMA method. (Abbreviations of population
names according to Table 1).
The mean allele frequencies of three markers, out of the 72 used, in the three groups and
in the four single populations, namely the Botanical Garden in Jerusalem, and the Kachal,
Hirbet-Zerkess and Alona Forest populations, are shown. There are wide differences, among
the three groups of populations and among the four single populations, in the mean allele
frequencies of the three markers, and the Botanical Garden population does not typify the
allele frequencies in the three markers used.
Table 7 presents the genetic differentiation among the main groups that were defined.
The overall total genetic diversity (Ht) of Q. ithaburensis found in our present study was
0.414, the overall within-population diversity (Hs) was 0.362, and the overall proportion of
the total diversity attributed to differences among populations (Gst) was 0.126. There were
differences among the three geographic regions in the relations among Ht, Hs, and Gst. Also,
the estimated gene flow (Nm) within each of the three groups is high.
Table 7. Differentiation between groups in Quercus ithaburesnsis
The groups Abbr. Npol P%99 Ht Hs Gst Nm

Golan Heights and Upper galilee Gr1 69 96.18 0.394 0.352 0.106 4.235
Lower Galilee Gr2 70 97.22 0.409 0.372 0.091 4.970
Mt. Carmel, Samaria and Coastal Plain Gr3 70 96.66 0.394 0.364 0.075 6.132
Across all populations 69.6 96.61 0.414 0.362 0.126 3.46
Npol = Number of polymorphic loci; P%99 = Percentage of polymorphic loci; Hs = Gene diversity
within populations; Ht = Total gene diversity; Gst = Component diversity between populations (Ht-
Hs)/Hs; Nm= Estimated gene flow from Gst = 0.5(1-Gst)/Gst.

An effort was made to relate the marker frequencies in each population to geographic and
climatic parameters, and in only two cases – the OPA05-2 and OPA12-2 markers – did the
frequencies correlate significantly with the latitude of the population habitats. The linear
regression equations for the two markers are as follows:
a. Latitude (Israel’s grid) = 2771.335 – (1127.615 x OPE05-2). r = 0.596, p = 0.015.

b. Latitude (Israel’s grid) = 1377.952 + (1273.480 x OPA12-2). r = 0.529, p = 0.035.
Frequency of the OPA01-2 marker correlated almost significantly with the mean annual
rainfall in the various regions; the linear regression equation for this marker is as follows:
c. Rainfall = 1028.176 – (528.283 x OPA01-2). r = 0.453, p = 0.078.
In spite of the average geographic separation among all possible pairs of populations
analyzed (n = 208) which is only 47.5± 30.0 km (ranging from 5 to 145 km), a relationship
was found between the geographic and genetic distances between populations [62]. The
general trend of this relation is described by the linear regression equation:
d. Genetic distance = 0.067343 x Geographic distance (km). r = 0.259 (n = 208 pairs), p

= 0.005.
3. Quercus Boissieri
a. Intra-Populations Genetic Diversity

Genetic diversity parameters within loci are presented in Table 8. The average number of
effective alleles (ne) was 1.700 ± 0.242, with a range from 1.141 to 2.000. Within-loci
average diversity (HS) was 0.335 ± 0.009, with a range from 0.101 to 0.478. Hs constitute
84.17% of the total genetic diversity (h, i.e., HT) which is 0.398 ± 0.010, with a range from
0.124 to 0.500. The proportion of genetic diversity residing among all populations within
each locus (GST) averaged 0.162, with a range from 0.038 to 0.553. Differentiation between
populations is probably also due to high gene flow (Nm), i.e., the estimated number of
migrants exchanged between local populations in each of the loci analyzed, which resulted in
a mean value of 4.649.
The attempt to reveal relations between the site's geographic parameters and the genetic
parameters presented in table 8 by means of linear regression analysis revealed that of the 32
loci only a few loci, namely OPA12-1, OPA15-1, OPB01-2, OPB04-1 and OPB16-1, had
significant correlation coefficient (r) with the altitude. Three loci, namely OPA15-1, OPB01-2
and OPB16-2 had significant p < 0.05 (i.e., r > 0.49) correlation coefficients between allele
frequencies and the altitude of the population sites. Five loci had a significant correlation
coefficient (r) between the number of effective alleles (ne) and the altitude; three loci had
significant correlation coefficients (r) with the Nei's gene diversity (h).

52 Gabriel Schiller
Table 8. Genetic diversity parameters of 32 loci in Q. boissieri
Locus N na* ne* h* Hs* Gst* Nm*

1 OPA12-1 675 2.000 1.992 0.498 0.478 0.038 12.656
2 OPA12-2 675 2.000 1.413 0.292 0.259 0.081 5.654
3 OPA15-1 674 2.000 1.448 0.310 0.240 0.233 1.643
4 OPA17-1 677 2.000 1.674 0.403 0.236 0.420 0.690
5 OPA17-2 677 2.000 1.674 0.403 0.396 0.026 19.076
6 OPA17-3 677 2.000 1.610 0.379 0.356 0.059 7.938
7 OPA17-4 677 2.000 1.235 0.190 0.172 0.104 4.330
8 OPA18-1 679 2.000 1.650 0.394 0.276 0.317 1.076
9 OPA19-1 678 2.000 1.723 0.420 0.381 0.092 4.930
10 OPA19-2 678 2.000 1.662 0.398 0.343 0.159 2.646
11 OPA19-3 678 2.000 1.698 0.411 0.376 0.083 5.530
12 OPA20-1 677 2.000 1.953 0.488 0.413 0.157 2.689
13 OPA20-2 677 2.000 1.912 0.477 0.430 0.091 4.982
14 OPA20-4 677 2.000 2.000 0.500 0.363 0.275 1.321
15 OPB01-1 675 2.000 1.597 0.374 0.164 0.553 0.404
16 OPB01-2 675 2.000 1.574 0.365 0.342 0.091 4.985
17 OPB04-1 679 2.000 1.965 0.491 0.425 0.137 3.149
18 OPB04-2 679 2.000 1.227 0.185 0.157 0.239 1.596
19 OPB04-3 679 2.000 1.899 0.473 0.417 0.109 4.085
20 OPB04-4 679 2.000 1.988 0.497 0.363 0.264 1.395
21 OPB16-1 673 2.000 1.994 0.499 0.463 0.071 6.505
22 OPB16-2 673 2.000 1.809 0.447 0.326 0.281 1.278
23 OPB20-1 678 2.000 1.583 0.368 0.306 0.162 2.591
24 OPB20-2 677 2.000 1.995 0.499 0.416 0.165 2.540
25 OPD03-1 677 2.000 1.695 0.410 0.390 0.049 9.716
26 OPD06-1 664 2.000 1.141 0.124 0.101 0.185 2.196
27 OPD06-2 663 2.000 1.936 0.483 0.453 0.063 7.499
28 OPD06-3 663 2.000 1.543 0.352 0.331 0.061 7.668
29 OPD17-1 671 2.000 1.987 0.497 0.336 0.320 1.063
30 OPD17-2 671 2.000 1.489 0.329 0.315 0.051 9.278
31 OPD20-1 677 2.000 1.633 0.388 0.328 0.153 2.774
32 OPD20-2 677 2.000 1.692 0.409 0.375 0.093 4.876
MEAN 675 2.000 1.700 0.398 0.335 0.162 4.649
S. D. 0.000 0.242 0.010 0.009 0.121 3.986
MAX 2.000 0.500 0.478 0.553 19.076
MIN 1.141 0.124 0.101 0.026 0.404
Na = number of alleles; Ne= effective number of alleles; h*= Nei's genetic diversity.
Hs = within population diversity; Gst= proportion of genetic diversity.
Nm = gene flow.
b. Inter-populations Genetic Differentiation

Parameters of genetic differentiation among the 14 populations analyzed, and the
correlation coefficients (r) of the regressions between the genetic and the geographic
parameters are presented in Table 9.

Table 9. Genetic differentiation among populations of Q. boissirti and correlation

coefficient with geographic parameters
Population Sample Npol P%99 na ne h Gst

Size
Ein Zivan 48 29 90.62 1.906 1.532 0.317 0.208
Mesada S 58 32 100.00 2.000 1.599 0.351 0.123
Mt. Adir 47 28 87.35 1.875 1.502 0.292 0.270
Mt. Sansan 50 30 93.75 1.938 1.578 0.337 0.157
Mt. Hermon 52 32 100.00 2.000 1.628 0.362 0.095
Mt. Hilal 46 30 93.75 1.938 1.528 0.312 0.219
Masade N 56 30 93.75 1.938 1.612 0.347 0.133
Manara 58 30 93.75 1.938 1.586 0.343 0.142
Mt. Meron 50 32 100.00 2.000 1.607 0.350 0.124
Biranit 50 32 100.00 2.000 1.566 0.328 0.180
Yodfat 36 28 87.50 1.875 1.523 0.303 0.243
Mt. Muchra 44 32 100.00 2.000 1.639 0.369 0.077
Um Reichan 37 29 90.62 1.906 1.567 0.328 0.181
Park Gorn 48 32 100.00 2.000 1.624 0.356 0.111
Average 49 30 95.078 1.951 1.578 0.335 0.162
S.D 7 2 4.878 0.049 0.043 0.023 0.058
correlation coefficients (r) between the Geographic and the Genetic parameters
Altitude 0.121 0.119 0.121 0.045 0.083 0.270
Latitude 0.200 0.199 0.200 0.053 0.067 0.077
Longitude 0.074 0.074 0.074 0.036 0.075
Mean sample size was 49±7 and mean number of polymorphic loci was 30±2. Mean
genetic diversity within populations (Hs) was 0.335±0.023 with a range from 0.292 to 0.369.
The proportion of genetic differentiation residing among all populations (Gst) was 0.162 with
a range from 0.077 to 0.270. Analysis of the relations between the geographic parameters and
the population's genetic parameters, by means of linear regression analysis revealed that no
significant relations could be established between these two sets of parameters. The two
populations that had extreme genetic parameters were the Mt. Adir population that had the
smallest percentage of polymorphic loci (87.35%), the fewest effective alleles (1.502) and the
least within-population diversity (0.292), and the largest Gst (0.270); whereas the Muchraka
population had the most effective alleles (1.639), the greatest within-population diversity
(0.369), and the smallest Gst (0.077).
Phylogenetic consensus relations between the populations that were analyzed based on
the Cord distance values [10] are shown in Figure 3. No clear aggregation of the 14
populations into geographical-climatological groups could be distinguished, but several
populations showed relatedness that did not correspond with the geographic distances
between them. Such pairs of populations include: the Mt. Sansan population in Judaea with
the Massade North population in the Golan Heights; the Ein Zivan population of the Golan
Heights with the Mt. Adir population in Upper Galilee; and the Mt. Muchraka population of
the Carmel Range with the Manara population of Upper Galilee.

54 Gabriel Schiller
Figure 3. Phylogenetic relations among Q. boissieri populations in Israel, based on Cavalli-Sforza and
Edwerds (1967) cord distance values obtained by UPGMA method. (Abbreviation of population names
according to Table 1).
Average genetic distance or identity, by means of Nei’s unbiased genetic distance or

genetic identities, were 0.106 ± 0.051 or 0.901 ± 0.028, respectively, which means that there
is no real differentiation among the populations. No relations, by means of linear regression
analysis, were found between the geographic and the genetic distances nor identities among
the populations analyzed. The Massade South population had the lowest average genetic
distance from or highest identity with all the other populations, whereas the Yodfat
population had the greatest average genetic distance from or lowest identity with them. The
lowest genetic identity or the greatest genetic distance was between the Park Goren and the
Yodfat populations, the highest genetic identity or the lowest genetic distance was between
the Massade North and the Mt Sansan populations. In an attempt to understand this
phenomenon better, the occurrence and frequencies of haplotypes, defined according to the
alleles, were analyzed by using the Raymond and Rousset, and Goudet et al., [32, 73]
approach. Each of the 680 trees analyzed differed in its haplotype, but use of only the most
common haplotypes enabled several differentiations among populations to be distinguished. it
was revealed that the Um-Rechan population differs significantly (significance level = 0.05)
in haplotype frequencies from seven other populations, the Biranit population from only four,
the Mt. Adir population from only two, and the Manara population only from the
geographically relatively close population of Mt. Hermon. The other populations do not differ
among themselves in haplotype frequencies.

4. DISCUSSION
Population genetics studies of species within the genus Quercus, using neutral markers of
nuclear origin, have shown that as with other highly out-crossing, wind pollinated, and long-
lived forest tree species, oaks generally have high levels of within-population genetic
variation and low differentiation among populations [27, 37, 47, 50, 55].
a. Q. calliprinos
High within population diversity (Hs) of 0.325 was revealed in Q. calliprinos that
constitute 91.88% of the total diversity (Ht) of 0.354; which is relatively high compared with
total diversity in the other Mediterranean evergreen oaks. Only 8.12% of the variation is due
to differentiation between populations (Gst) which is lower then found in those oaks [17, 48,
49, 77]. This differentiation between populations might result from ecological differences
between the sites analyzed. On average, each of the 24 populations analyzed differed
significantly (P<0.05) in its Fst value from 13 ± 5 geographically randomly spread
populations with no relations to distance between themselves, and did not differ from other 10
± 5 such populations (Results not presented). Unfortunately, no data are available about
genetic diversity of Q. coccifera by means of isoenzyme or RAPD analysis that would enable
comparisons.
Recently, analysis by means of Chloroplast DNA of the genetic relations among the three
sympatric Mediterranean evergreen oak species (Q. suber, Q. ilex and Q. coccifera) [39]
revealed high diversity in the chloroplast genome, which is explained by the authors as the
result of most ancient and continuous presence of the species. According to Barbero et al. [7],
two species (Q. ilex and Q. coccifera) are able to inhabit very different ecosystems because of
their great plasticity and in addition their ability to resprout after major perturbation that could
favor the persistence of the species. Furthermore, the results showed almost complete sharing
of haplotypes between Q. ilex and Q. coccifera with two lineages, of which lineage II is
restricted to the central Mediterranean basin.
Similar to the ecology Q. coccifera [79] also Q. calliprinos grows under very different
site conditions (ecosystems) and has a strong resprouting ability [60] together with
quantitatively reduced natural regeneration [2, 31, 57, 71, 81, 82]. It is therefore very likely
that the areas in Cis– and Trans- Jordan growing Q. calliprinos are living remnants (by
sprouting) of more coherent forests that dominated the landscape in the distant past [45].
Disturbances such as logging, fire and herbivory may kill woody plants unless they resprout
from vegetative tissue [8]. Resprouters can occupy the same space for hundreds to thousands
of years and have minimal changes in population size unless they are uprooted. According to
Bond and Midgley [9] "sprouters tend to have low seedling recruitment rates, and samplings
take longer to reach maturity.” It follows that the genetic diversity in Q. calliprinos revealed
in this study is most probable a captured diversity from times prior to the sever destruction of
the forests by Human activity. In spite of the average gene flow of 5.7 migrants per
generation, which should smooth the differentiation between populations, there is a low but
significant amount of differentiation between regions and within regions between
populations, probably due to ecological characteristics of the sites. Similar results on

56 Gabriel Schiller
population structure and genetic diversity of an evergreen oak species in southern California
(Q. chrysolepis) that sprouts vigorously after sever disturbances were revealed by Montalvo
et al. [50]. The study considered population structure and genetic diversity in relation to
sprouting. An average of 97% of the stools had distinct genotypes, within sites mean genetic
diversity (Hs) was 0.443 and only a small proportion of the total genetic diversity was
explained by variation among sites, mean Gst of 0.018. The authors concluded that "the low
value of Gst indicate a nearly random distribution of genotypes which suggests that historical
levels of gene flow have been very high among subpopulations, creating a large cohesive
metapopulation. Consequently, the potential for local population adaptation appears small
despite the high levels of genetic diversity.” Similarly in Q. calliprinos, were a small
proportion of the total diversity of only 8.12% is due to differentiation among sites, probably
due to extreme ecological conditions.
Recently, Korol et al., and Schiller et al., [43, 67] showed positive association between
fitness and heterozygosity in Pinus halepensis growing in arid ecosystems. Similarly, high
within population genetic diversity in Q. calliprinos could also be maintained by natural
selection of les fitted seedlings and stools to environmental perturbations such as drought.
High mortality rate probably related to sever drought was observed in Q. calliprinos stands in
the Judean Mountains in the 1962 and 1965.
Larger differentiation probably as the result of adaptation to harsher local ecological
conditions evolved in few of the populations analyzed. The Tzafririm and Dir Razach sites
have the highest genetic diversity of 0.341 and 0.355 respectively (Table 4), and their Fst
values differ significantly from the Fst values of all the other sites. These two sites can be
considered as peripheral sensu Safriel et al., [62] growing at the edge of the southern Judean
Mountains in areas of rendzina soils with less than 400 mm annual rainfall. Therefore, it can
also be expected that the Dhana population in trans-Jordan, which marks the South-eastern
edge of Q. calliprinos area of distribution, will genetically be more divers than the species
average and differ significantly from other populations, which according to Table 4 turned out
not to be the case. The Dhana forest composed of Q. calliprinos and Juniperus phoenicea
association is a unique loose forest that grows about 200 km to the south of the Rass moneef
(Ajlun) area, under very harsh ecological conditions, i.e., on calcareous bedrock and soil with
only 250 mm average annual rainfall including winter snow [87]. The geographic and
climatic features of the Negev desert and the Rift Valley depression have not changed over
several millennia, which means that the Dhana population could have been established either
by migration from north to south along the trans-Jordan mountain chain, or as the result of
ancient planting, which is very doubtful. Although it is possible that the Dhana population
originated from more northern Q. calliprinos populations, comparison between alleles
frequency of the Ajlun and Dhana populations (Results not presented) shows large differences
in several loci, which could be the result of site influences and/or dysgenic selection as the
result of human activity. These two populations differ significantly in their Fst values, both
from one another and from the other analyzed populations. Contrary to the above, the Park-
Goren population, which can be considered as a core population sensu Safriel et al., [62] has a
gene diversity value of 0.326; which too is the average value of within population gene
diversity (Hs). But, the Fst value differs significantly from 20 out of 23 other populations.
Other populations such as Beit Ha'Emeq, Park Ha'Sharon and Mt Ami'ad geographically
distant one from the other, growing at very different site condition do not differ significantly
in their Fst values from most of the other populations.

The low but significant genetic differentiation between and within regions among
populations of Q. calliprinos makes the impact of management decisions critical. In-situ
conservation is probably granted as most of the area's growing Q. calliprinos are nature
reserves. In spite of the low levels of population structure and apparently high levels of
historical gene flow among populations, still the differentiation between populations suggest
that acorn collection for ex-situ conservation, restoration and reforestation should be done in
the nearest population within the same geographic region, and from a high number of stools
to capture most of the genetic variation.
b. Q. Ithaburensis
The UPGMA analysis (Figure 2) has revealed aggregation of the 16 populations analyzed
into three main groups that basically correspond with the country’s three main geographic
regions in which the species grows; these results support our hypothesis on genetic
differentiation of populations according to the geo-climatic conditions of the site.
Nevertheless, three out of the 16 populations analyzed did not satisfy this hypothesis: the
Kachal and Alona Forest populations clustered within the Golan Heights group, and the
Hirbet-Zerkess population, which is located in the Coastal Plain, clustered within the Lower
Galilee group.
Analysis of site-specific RAPD markers (results not presented) indicated that the
presence and the mean frequencies of the three markers were correlated with the results of the
UPGMA cluster analysis. The frequencies of these RAPD markers provide an easy means to
define to which group a population is related. Nevertheless, the inclusion of the Kachal and
Alona Forest populations within the Golan Heights group, and the inclusion of the Hirbet-
Zerkess (an old cemetery) population within the Lower Galilee group cannot be easily
explained on the basis of these data. It seems likely that the inclusion of these populations in
distant geographical groups’ indicates that these populations probably were established with
propagation material selected by humans and brought from those distant regions. The
inclusion of the Horshat-Tal population within the Golan Heights group suggests that this
population is a planted one, originated from the Yehudiya forest.
Small populations are subject to a high risk of genetic drift and increased inbreeding, and
this can result in fixation and loss of rare alleles. Determination of the density of particular
allelic configurations within spatially small-scale populations of forest trees can be
considered as the first step in the identification of loci involved in the micro-evolutionary
processes [1, 32, 44]. Population size and level of gene flow are very critical parameters in in-
situ genetic conservation, therefore, small isolated populations such as the one in Ilanot-
Kadima should be enlarged as much as possible by the planting of local material.
Furthermore, analysis of the variations among major environmental parameters and in the
frequencies of multi-locus genotypes could be a useful approach to the study of the selective
forces involved in micro-evolutionary processes.
Geo-climatic parameters, the founder effect, and genetic flow are the parameters that
determine the degree of differentiation between populations. The genetic structure of
populations could also be changed as the result of human activity, as is the case in the eastern
Mediterranean region and in other areas, which have supported human activity since

58 Gabriel Schiller
antiquity. In the case of oak trees, this has occurred because acorns have long been a food
source [60] and oak wood a desirable resource. Nevertheless, in spite of the relatively short
distances between populations in the present study (maximum 145 km, average 47.5 km),
three clearly distinct regional groups of populations could be identified. These findings have
implications for decisions on in-situ and ex-situ genetic conservation, and for forest
management planning and practices. On the one hand, the clear grouping according to
geographical zones and the significant relations between climatic (rainfall) and geographical
parameters (latitude), and on the other hand, the differences in allele frequencies, highlight
the most basic principle: that afforestation or reforestation should be undertaken only through
the use of locally (within group) selected trees for collection of propagation material.
c. Q. Boissieri
Genetic analysis revealed high average within-population diversity (HS) of 0.335, i.e.,
~84% of the species total genetic diversity (HT) of 0.398, henceforth, only relative small
between-population diversity, (GST) of 0.162. These results confirm earlier results [4, 106]
that all populations examined (in Israel, Cyprus or Turkey) were very variable and in every
stand a few forms could be distinguished. Also dendro-climatological studies of Q. boissieri
[53] revealed high within-population diversity in the response of the annual-ring growth
parameters to temperature but not to rainfall variations.
The small genetic distances or the high identity between populations on the one hand, and
the lack of aggregations or relatedness to geographic parameters (Figure 3), on the other hand,
gives rise to the hypothesis that all the populations analyzed are remnants of a larger
population that existed during the last glaciations. The period of warming since then, the
fragmentation of the populations into sites differing in their ecological set-up, did not give
rise to significant genetic differentiation among them. Nevertheless, the use of haplotype
frequencies enabled some differentiation among populations to be detected: the Um Rechan
population differs significantly from many (seven) populations, other populations differ much
less.
In contrast to the geo-climatic aggregation found in Q. ithaburensis [65], figure 1, which
is based on Cavalli-Sforza and Edwards chord distance [10], that is better suited for
determining the existence of geographic patterns [83], using neighbour-join algorithm and
one thousand bootstrap, revealed that there is no meaningful aggregation of Q. boissieri
populations according to geo-climatic parameters. Contrary to Q. ithaburensis, which is a
thermophilous species, that has probably spread in the area since the end of the last
glaciations because of warming, Q. boissieri populations at the Lower Galilee, the Carmel
range, in Samaria and the Judean Range, might be relicts from eras in which the spread of
species adapted to colder and wetter continental climates, such as Q. boissieri, was enhanced.
This pattern resembles the occurrence and distribution patterns of glacial relict plants such as
Pistacia atlantica, Pistacia khinjuk at high elevations on Mt. Hermon in the north of Israel
and on Mt. St. Katrina about 600 km to the south in the Sinai peninsula [72, 76]. However,
the high relatedness between several populations separated by large geographic distances
gives rise to the assumption that cannot be discarded out of hand that Q. boissieri was planted
in historical times, to provide leaf gall for use in the manufacture of leather or ink [19].

To conclude: genetic diversity determines the adaptive potential of a species and is an

essential component of the stability of ecosystems. Analysis of within- and among-population
genetic diversity is a fundamental step in the development of strategies for conservation of
genetic resources and, consequently, of their adaptability.
Israel, with its oak forests, that comprise Quercus ithaburensis, Q. boissieri and Q.
calliprinos, is in a geographically peripheral position to the main area of distribution of these
species in the Mediterranean basin [4]. According to Safriel et al., [62] peripheral populations
rather than core ones may be tolerant to environmental extremes and changes, because of
their higher genetic variability, which has resulted from fluctuating selection. It is also likely
that peripheral populations evolve resistance to extreme conditions; therefore, they should be
treated as a biogenetic resource, to be used for rehabilitation and restoration of damaged
ecosystems. Owing to their long life cycle forest trees are among the species that cannot
migrate or adapt quickly enough to cope with the rapid changes imposed on the environment
by human activity, and this could create ecological and forest management problems.
Thus, attention should be given to in-situ and ex-situ conservation of:
1. The Q. calliprinos populations of Park Goren, Tzafririm and Dir Razach, which
differed significantly from most or all other 21 populations analyzed; and the Part
Ha' Sharon, Mt. Amiad and Beit Ha' Emeq populations which are most similar to
most of the populations.
2. The varieties of Q. ithaburensis genetic material represented by the three main
assemblages of its distribution in this region.
3. Inference from the results gained for Q. boissieri suggests that plantations for acorn
production, as a propagation material for reforestation, should include the best
phenotypes selected from all the relict populations regardless of ecological-
geographical boundaries.
REFERENCES
[1] Allard, R. W., 1975. The mating system and microevolution. Genetics 79, 115-126.
[2] Alon, G. and Kadmon, R., 1996. Effect of successional stages on the establishment of
Quercus calliprinos in an East Mediterranean maquis. Isr. J. Plant Sci. 44, 335-345.
[3] Aloni, R., and Orshan, G. 1972. A vegetation map of the lower Galilee. Israel J. Bot.
21, 209-227.
[4] Avishai, M., 1967. A taxonomical revision of the oaks of the Middle East. M. Sc.
Thesis, Hebrew Univ. Jerusalem, Israel, 134 pp. (in Hebrew, English summary).
[5] Baker-Brosh, K.F., and Peet, R.K., 1997. The ecological significance of lobed and
toothed leaves in temperate forest trees. Ecology 78: 1250-1255.
[6] Balaguer, L., Martinez-Ferri, E., Valladares, F., Perez-Corona, M.E., Baquedano, F. J.,
Castillo, F.J. and Manrique, E., 2001. Population divergence in the plasticity of
response of Quercus coccifera to light environment. Functional Ecology 15, 124-135.
[7] Barbero, M., Loisel, R. and Quezel, P., 1992. Biogeography, ecology and history of
Quercus ilex ecosystems in Mediterranean region. Vegetatio 99-100, 14-19.

60 Gabriel Schiller
[8] Bellingham, J.P., 2000. Resprouting as alife history strategy in woody plant
communities. Oikos 89, 409-416.
[9] Bond, W.J. and Midgley, J.J., 2003. The evolutionary ecology of sprouting in woody
plants. Int. J. Plant. Sci. 164, 103-114.
[10] Cavalli-Sforza, L.L. and Edwards, A.W.F., 1967. Phylogenetic analysis: models and
estimation procedures. Evolution 21, 550-570.
[11] Crow, J. F. and Kimura, M., 1964. The number of alleles that can be maintained in a
finite population. Genetics 49, 725-738.
[12] Dan, Y., and Raz, Z., 1970. The soil association map of Israel (Scale 1: 250.000).
Ministry of Agriculture, Department of Scientific publications, Bet dagan, Israel. 147
pp and two maps (Hebrew with English summary).
[13] Doyle, J.J., Doyle, J.H. 1990. Isolation of plant DNA from fresh tissue. Focus 12, 145-
151.
[14] Dufour-Dror, J.M., Ertas, A., 2004. Bioclimate perspectives in the distribution of
Quercus ithaburensis Decne. Subspecies in Turkey and in the Levant. J. of
Biogeography 31, 461-474.
[15] Eig, A. 1933. A historical-phytosociological essay on Palestinian forests of Quercus
aegilops L. ssp. ithaburensis (Decne.) in past and present. Beiheft Botanisches
Centralblat 40, 225-272.
[16] Eliav, U. 1985. Use of Tabor oak acorns as food. Rotem 14, 72-73 (Hebrew).
[17] Elena-Rossello, J.A. and Cabrera, E., 1996. Isoenzuyme variation in natural
populations of Cork oak (Q. suber L.) Silvae Genetica 45, 229-235.
[18] Ellstrand, N.C. and Elam, D.R., 1993. Population genetic consequences of small
population size: implications for plant conservation. Annual Review of Ecology and
Systematics, 24, 217–242.
[19] Feliks, J,. 1968. Plant World of the Bible, pp 107-108, Massada, Israel.
[20] Felsenstein, J., 1993. PHYLIP (Phylogeny Inference Package) Ver. 3.5c. Department
of Genetics, University of Washington, Seattle, WA, Focus 12, 13-15.
[21] Frumkin, A., Carmi, I., Gopher, A., Ford, D.C., Schwarcz, H.P., Tsuk, T., 1999. A
Holocene millennial-scale climatic cycle from a speleothem in Nahal Qanah cave,
Israel. The Holocene 9, 677-682.
[22] Furnier, G.R. and Adams, W.T., 1986: Geographic patterns of allozyme variation in
Jeffery pine. Amer. J. Bot. 73, 1009-1015.
[23] Garcia, M. and Retana, J., 2004. Effect of site quality and shading on sprouting patterns
of holm oak coppices. For. Ecol. and Manage. 188, 39-49.
[24] Gentile, S., Gastaldo, P., 1976. Quercus calliprinos Webb. and Quercus coccifera L.:
researches on the leaf anatomy and taxonomical and chronological considerations.”
Giorn. Bot. Ital. 110, 89-115.
[25] Geological Survey of Israel, the Ministry of national Infrastructure, 1998. Geological
map of Israel 1: 200,000.
[26] Gitzendanner, M.A. and Soltis, P.M., 2000. Patterns of genetic variation in rare and
widespread plant congeners. American Journal of Botany 87, 783–792.
[27] Gömöry, D., Yakovlev, I., Zhelev, P., Jedináková, J., and Paule, L., 2001. Genetic
differentiation of oak populations within the Quercus robur / Quercus petraea complex
in Central and Eastern Europe. Heredity 86, 557–563.

[28] Goudet, J., Raymond, M., de Meeüs, T. and Rousset, F., 1996. Testing differentiation
in diploid populations. Genetics 144, 1933-1940.
[29] Gregorius, H-R., 1980. The probability of losing an allele when diploid genotypes are
sampled. Biometrics 36, 632-652.
[30] Guries, R.P. and Ledig, F.T., 1981. Genetic structure of populations and differentiation
in forest trees. pp. 42-47. In: Proceedings of the Symposium on Isozymes of North
American Forest Trees and Forest Insects. (ed. M.T. Conkle), USDA For. Serv. Gen.
Tech. Rep. PSW-48. pp. 42-48.
[31] Harif, I., 1974. First year development of leading species of plant communities in the
Judean Hills and its role in succession. Ph. D. Thesis, The Hebrew University of
Jerusalem, 88 p. (In Hebrew with English summary).
[32] Hamrick, J.L. and Godt, M.J.W., 1989. Allozyme diversity in plant species. In: Plant
Population Genetics, Breeding and Genetic Resources. (eds. A.H.D. Brown, M.T.
Clegg, A.L. Kahler, & B.S. Weir). pp. 43-63. Sinauer Associates, Inc., Sunderland,
Massachusetts.
[33] Hamrick, J.L, Godt, M.J.W., Murawski, D.A., Loveless, M.D., 1991 Correlations
between species traits and allozyme diversity: implications for conservation biology.
In: Genetics and Conservation of Rare Plants (eds. Falk DA, Holsinger KE), pp. 75–
86. Oxford University Press, New York.
[34] Harper, J.L., Lovell, P.H., and Moore, K.G., 1970. The shape and size of seeds. Ann.
Rev. Ecol. Syst. 1, 327-356.
[35] Herr, N., 1998. Rock and soil as an ecological factor of distribution and development
of Quercus ithaburensis forest in the Alonim-Shefar’am region. M.Sc. Thesis, Hebrew
University of Jerusalem, Faculty of Agriculture, Rehovot, 130 pp.
[36] Herzog, S., and Krabel, D., 1996. Genetic studies on leaf retention in Quercus robur.
Silvae Genet. 45, 272-276.
[37] Hokanson, S.C., Isebrands, J.G., Jensen, R.J., and Hancock, J.F., 1993. Isozyme
variation in oaks of the Apostle Islands in Wisconsin: genetic structure and levels of
inbreeding in Quercus rubra and Q. ellipsoidalis (Fagaceae). Am. J. Bot. 80, 1349–
1357.
[38] Horowitz, A., 1979. The Quaternary of Israel. Academic Press, New York, 394 pp.
[39] Jiménez, P., Lopez de Heredia, U., Collad, C., Lorenzo, Z. and Gil, L., 2004. High
variability of chloroplast DNA in three Mediterranean evergreen oaks indicates
complex evolutionary history. Heredity 93, 510-515.
[40] Kadosh, D., Sivan, D., Kutial, H., Weinstein-Evron, M., 2004. A late quaternary
paleoenvironmental sequence from Dor, Carmel, coastal plain, Israel. Palynology 28,
143-157.
[41] Kaplan, Y., 1984. The ecosystem of the Yehudiya Nature Reserve with emphasis on
dynamics of germination and development of Quercus ithaburensis (Desc.). Ph.D.
Thesis. University of Nijmegen, the Netherlands.
[42] Karschon, R. 1982. In defense of the Turks. A study of the destruction of Tabor oak
forests in the southern Plain of Sharon. La-Yaaaran 32, 54-59.
[43] Korol, L., Shklar, Galina., and Schiller, G., 2001. Site influences on the genetic
variation and structure of Pinus halepensis Mill. provenances. Forest Genetics 8, 295-
306.

62 Gabriel Schiller
[44] Linhart, Y.B., Mitton, J.B., Sturgeon, K.B. and Davis, M.L., 1981. Genetic variation in
space and time in a population of ponderosa pine. Heredity 46, 407- 426.
[45] Liphschitz, N., Biger, G., 1990. Ancient dominance of the Quercus calliprinos –
Pistacia palaestina association in Mediterranean Israel. J. Veg. Sci. 1, 67-70.
[46] Liphschitz, N. and Waisel, Y., 1967. Dendro-chronological studies in Israel: 1.
Quercus boissieri of Mt. Meron. La-Ya'aran 17, 111-115.
[47] Mayes, S.G., McGinley, M.A., and Werth, C.R., 1998. Clonal population structure and
genetic variation in sand-shinnery oak, Quercus havardii (Fagaceae). Am. J. Bot. 85,
1609–1617.
[48] Michaud, H., Lumaret, R. and Romane, F., 1992. Variation in the genetic structure and
reproduction biology of Holm oak populations. Vegetatio 99-100, 107-113.
[49] Michaud, H., Toumi, L., Lumaret, R., Li, T.X., Romane, F., and Di Giusto, F., 1995.
Effect of geographical discontinuity on genetic variation in Quercus ilex L. (holm oak);
Evidence from enzyme polymorphism. Heredity 74, 590–606.
[50] Montalvo, A.M., Conard, S.G., Conkle, M.T., and Hodgskiss P.D., 1997. Population
structure, genetic diversity, and clone formation in Quercus chrysolepis (Fagaceae).
Am. J. Bot. 84, 1553– 1564.
[51] Mueller-Starck, G. 1985. Genetic variation under extreme environmental conditions.
In: Population Genetics and Genetic Conservation of Forest Trees. (eds. Ph. Baradat,
W.T. Adams & G. Müller-Starck). pp. 201-210. SPB Acad. Pub. Amsterdam.
[52] Nei, M., 1973. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad.
Sci. U.S.A. 70, 3321-3323.
[53] Nei, M., 1978. Molecular evolutionary genetics. Columbia University Press, New
York, pp 187-192.
[54] Nei, M., 1987. Estimation of average heterozygosity and genetic distance from a small
number of individuals. Genetics 89, 283-290.
[55] Nevo, A., 1998. Molecular evolution and ecological stress at global, regional and local
scales: the Israeli perspective. J. Exp. Zool. 282, 95-119.
[56] Nevo, E., Fragman, O., Dafni, A., Beiles, A., 1999. Biodiversity and interslope
divergence of vascular plants caused by microclimatic differences at “Evolution
Canyon,” Lower Nahal Oren, Mount Carmel, Israel. Isr. J. Plant Sci. 47, 49-59.
[57] Oppenheimer, H.R., 1940. Etudes sur le probleme de la reconstitution des chenaies en
Palestine. J. Bot. 3 ( Rehovot series), 105-143.
[58] Oppenheimer, H.R., 1949. The water turn-over of the Valonea oak. Palestine J. Bot., 7
(Rehovot series), 171-179.
[59] Paffetti, D., Vettori, C., Giannini, R., 2001. Relict populations of Quercus calliprinos
Webb. on Sardinia island identified by chloroplast DNA sequences. Forest Genetics 8,
1-11.
[60] Pervolotsky, A. and Haimov, Y., 1992. The effect of thinning and goat browsing on the
structure and development of Mediterranean woodland in Israel. For. Ecol. Manage.
49, 61-74.
[61] Raymond, M. and Rousset, F., 1995. An exact test for population differentiation.
Evolution 49, 1280-1283.
[62] Safriel, N.U., Volis, S., Kark, S., 1994. Core and peripheral populations and global
climate change. Isr. J. Plant Sci. 42, 331-345.

[63] Sambrook, J., Fritsch, E.F. and Maniats, T., 1989. Molecular cloning. A laboratory
manual. 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
[64] Samuel, R., Pinsker, W., and Ehrendorfer, F., 1995. Electrophoretic analysis of genetic
variation within and between populations of Quercus cerris, Q. pubescens, Q. petraea
and Q. robr (Fagaceae) from eastern Austria. Botanica-Acta 108, 290-299.
[65] Schiller, G., Shklar, G., Korol, L., 2003. Genetic diversity assessment by random
amplified polymorphic DNA of oaks in Israel. 1. Tabor oak (Quercus aegilops L. ssp.
ithaburensis [Decne] Boiss.). Isr. J. Plant Sci. 51, 1-10.
[66] Schiller, G., Shklar, G., Korol, L., 2004. Genetic diversity assessment by random
amplified polymorphic DNA of oaks in Israel. 2. Quercus boissieri Reut. Isr. J. Plant
Sci. 52, 315-322.
[67] Schiller, G., Korol, L. Shklar, G., 2004. Habitat effects on adaptive genetic variation in
Pinus halepensis Mill. provenances. Forest Genetics 11, 325-335.
[68] Schiller, G., Ungar E. D., Cohen, S., Herr, N., 2010. Water use by Tabor and Kermes
oaks growing in their respective habitata in the Lower Galileeregion of Israel. For.
Ecol. Manag. 259, 1018-1024.
[69] Schirone, B., and Spada, F., 2000. Some remarks on the conservation of genetic
resources of Mediterranean oaks. In: Borelli, S., and Varela, M.C., eds. EUFROGEN-
Mediterranean Oaks Network, IPGRI-Report of First Meeting. pp. 21-27.
[70] Schneider, S., Roessli, D. Excoffier, I., 2000. ARLEQUIN, ver. 2.000: A software for
population genetic data analysis. Genetics and Biometry Laboratory, the University of
Geneva, Switzerland.
[71] Shamir, T., 1985. Dynamics of oak maquis (Quercus calliprinos) in the Jerusalem
Mountains: vegetative growth, regeneration and acorn production. Rotem (Bulletin of
the Israel Plant Information Centre) 18, 93-112.
[72] Shmida, A., 1978. Remarks on the paleo-climates of Sinai based on the distribution
patterns of relict plants. In: Bar-Yosef, O. and Phillips, J.L. (eds), Prehistoric
investigations in Gabel Maghara, Northern Sinai. Qedem (Monographs of the Institute
of Archeology, The Hebrew Univ. of Jerusalem) 7, 36-41.
[73] Shannon, C.E., Weaver, W. 1949 The mathematical theory of communication. Univ. of
Illinois Press, Urbana.
[74] Slatkin, M.1995. A measure of population subdivision based on microsatellite allele
frequencies. Genetics 139, 457-462.
[75] Sternberg, M. and Shoshany, M., 2001. Aboveground biomass allocation and water
content relationships in Mediterranean trees and shrubs at two climatological regions in
Israel. Plant Ecology 157, 171–179.
[76] Tchernov, E., 1980. The biogeographical position of Mount Hermon in the Middle
East. In: Shmida, A and Liven, M. (eds). Mt. Hermon, Nature and Landscape. pp. 248-
255. ha' Kibutz ha'Meuchad Publishers, Tel Aviv.
[77] Toumi, L., and Lumaret, R., 2010. Genetic variation and evolutionary history of holly
oak: a circum –Medoterranean species-complex [Quecus coccifera L./Q. calliprinos
(Web) Holmboe, Fagaceae]. Plant Syst. Evol. 290, 159-171.
[78] Tsiouvaras, C., 1987. Ecology and management of Kermes oak (Quercus coccifera L.)
shtubland in Greece: A review. J. Range Manage. 40, 542-546.
[79] Valladares, F., Balaguer, L., Martinez-Ferri, E., Perez-Corona, E. and Manrique, E.,
2002. Plasticity, instability and canalization: is the phenotypic variation in seedlings of

64 Gabriel Schiller
sclerophyll oaks consistant with the environment unpredictability of Mediterranean

ecosystems. New Phytologist 156, 457-467.
[80] Waisel, Y., Friedman, J., 1960. Germination and seedling survival of Quercus
calliprinos Webb. La-Yaaran 3-4, 10-13 (In Hebrew).
[81] Weinstein, A., 1984. Acorn production and seedling crop in Quercus calliprinos Webb.
La-Yaaran 34, 1- 3 (In Hebrew).
[82] Westfall, R.D. and Conkle, M.T., 1992. Allozyme markers in breeding zone
designation. New Forests 6: 279-309.
[83] Yeh, F.C., Yang, R-C., Boyle, T.B.J., Ye, Z-H., Mao, J.X. 1997. POPGEN Ver. 1.32.
The user-friendly software for population genetic analysis. Molecular Biology and Bio-
technology Center, University of Alberta, Canada.
[84] Zohary, M., 1960. The maquis of Quercus calliprinos in Israel and Jordan. Bull. Res.
Coun. Israel 9D, 51-72.
[85] Zohary, M., 1961. On the oak species of the Middle East. Bull. Res. Coun. Israel 9D,
161-186.
[86] Zohary, M., 1962. Plant life of Palestine, Israel and Jordan. The Ronald Press Co.
New York, 262 pp.
[87] Zohary, M., 1966. Flora Palestina. The Israel Academy of Sciences and Humanities,
Jerusalem.
[88] Zohary, M.,1973. Geobotanical foundation of the Middle East. Gustav Fischer Verlag,
Stuttgart, 738 pp.
[89] Zohary, M., and Feinbrun-Dothan, N., 1966. Flora Palestina, The Israel Academy of
Sciences and Humanities, Jerusalem.
[90] Pervelotzky A.,Rosen B., Rosenberg D., 2013. Man as an Ecological Super-engineer
in the Mediterranean maquis. In: Pervolotzky A.(Edt.), Management and preservation
of the Mediterranean ecosystem: Ramat Ha'Nadiv as an example, Ramat Ha' Nadiv
press, Zichron-Yaakov, pp 181-200.
[91] Teper A., 1997. Factors influencing the growth shap of Q. calliprinos. M.Sc. Thesis,
Deoartment of Botany, The Hebrew University of Jerusalem.

Chapter 4
HOW HABITAT FRAGMENTATION AFFECTS GENETIC

DIVERSITY? THE CASE OF A SAND DUNE PLANT
(STACHYS MARITIMA) IN THE IBERIAN PENINSULA
S. Massó 1,2,*, C. Blanché1, C. Barriocanal3,

M. C. Martinell1 and J. López-Pujol2
1
BioC-GReB, Institut de Recerca de la Biodiversitat, Laboratori de Botànica,
Facultat de Farmàcia, Universitat de Barcelona. Barcelona, Catalonia
2
BioC-GReB, Institut Botànic de Barcelona (IBB-CSIC-ICUB).
Barcelona, Catalonia
3
Grup de Recerca en Medi Ambient i Tecnologies de la Informació Geogràfica,
Institut de Medi Ambient, Universitat de Girona. Girona, Catalonia
ABSTRACT
Stachys maritima is a species typical of the coastal dunes with a wide distribution
within the Mediterranean Basin. During the last century, this species was subjected to
severe habitat fragmentation, mainly as a consequence of tourism activities and urban
pressures, with a decreasing of area up to 99% in the Iberian Peninsula and a remaining
total population size ca. 420 individuals in less than 50 km2. In spite of some annual
fluctuations, the species shows a clear regression. Allozyme electrophoresis was used to
evaluate levels and distribution of genetic diversity in Iberian populations of this
threatened coastal sand dune plant. Extremely low levels of genetic variation were
detected (P = 4.0, A = 1.1 and He = 0.014). From the 19 interpretable loci found only 4
were polymorphic (Aco-1, Idh-2, Mdh-2, and 6Pgd-2). In addition, we also present some
conservation actions focused on maintaining population size and gene flow, in addition to
preserving its habitat.
*
Corresponding author: S. Massó (sergimasso@gmail.com).

66 S. Massó, C. Blanché, C. Barriocanal et al.
INTRODUCTION
One of the major threats to biodiversity is the habitat fragmentation derived from human
activities (Primack & Ros, 2002). Habitat fragmentation reduces population size and
increases both spatial isolation and edge-to-area ratio of habitat remnants (Young et al., 1996;
Neel & Ellstrand, 2001). In genetic terms, there are two possible outcomes of habitat
fragmentation. On the one hand, population genetics theory predicts that reductions in
population size and gene flow in fragmented landscapes will result in losses of genetic
diversity and increased genetic divergence between populations, through decreased gene
flow, increased random genetic drift, and higher rates of inbreeding (Ellstrand and Elam,
1993; Young et al., 1996; Honnay et al., 2005; Lowe et al., 2005; Honnay and Jacquemyn,
2007; Aguilar et al., 2008, Chung et al., 2014). On the other hand, some studies suggest that
habitat fragmentation can increase gene flow among populations instead of decreasing it
(reviewed in Kramer et al., 2008). Response to habitat fragmentation is probably also highly
dependent on life history characteristics of the species, including longevity, breeding system,
pollen/seed dispersal mechanism, and the existence of soil seed bank and dispersal vectors
(Young et al., 1996; Foré & Guttman, 1999; Buza et al., 2000). One of the major problems in
studying effects of fragmentation on plant species is a lack of knowledge of their genetic
structure before habitat disturbance.
One of the clearest examples of habitat fragmentation for a Mediterranean flora species is
Stachys maritima Gouan (Lamiaceae). It is a characteristic species of the Mediterranean sand
vegetation complex associations of the coastal sand dune systems in the Mediterranean basin
(Agropyretum mediterraneum + Crucianelletum maritimae + Glaucion maritimi), and it is
currently distributed in a nearly continuous area from coasts of northeastern Spain to Albania,
although it also occurs on the Black Sea coast (Romania, Bulgaria, and Turkey), Corsica and
northern Africa (Algeria and Tunisia; Greuter et al., 1986). However, its distribution in some
areas is, at least, doubtful or inconstant (e.g., Croatia or southern Italy). According to Stancic
et al. (2008), the Agropyretum mediterraneum association has been recorded in few localities
in Croatia and mostly in the older phytosociological literature (Horvatic, 1939; Pavletic,
1973; Trinajstic, 1973, 1989), suggesting that S. maritima could have disappeared
encompassed with habitat progressive rarification. In 2002 the species was found for the first
time in the Italian region of Campania (Del Guacchio, 2002) but less than ten years later, this
population was already missing (S. Massó & J. López-Pujol, pers. obs.; Figure 1). Finally, on
Perişor sand hill, in the Danube Delta, S. maritima was re-recorded in 2011 after more than
100 years (Ciocârlan, 2011). A recent assessment in Provence (France) states the IUCN CR
(“critically endangered”) cathegory for this species (Noble et al., 2015).
The last census in the Iberian Peninsula was performed in 2010 with 418 individuals
distributed among 13 natural populations (in addition to three artificial ones) (Blanché et al.,
2010), but records from literature and herbarium specimens suggest that S. maritima was
common on the northeastern Iberian coast (including the beaches of the city of Barcelona;
Ibáñez, 2006) until the middle decades of the 20th century, when massive urbanization and
subsequent habitat fragmentation began. At present, very small subpopulations size with
individual numbers extremely reduced (1–10 individuals), giving censuses of emerged
rosettes fluctuating around 0–3/year, suggest that most nuclei are on the verge of extinction
and still subsist only relying in vegetative propagation (Blanché et al., 2010).

How Habitat Fragmentation Affects Genetic Diversity? 67
Figure 1. Burned beach from Salerno (Italy) in July 2011. Stachys maritima was firstly observed in this
area in 2002. Burning of plant materials (wood, stems, leafs, branches) is a relatively common
management action to reduce the amount of organic materials accumulated in sand beaches after
seasonal episodes of sea storms.
The aims of the present study are (i) to evaluate the levels of intrapopulation genetic
diversity of the Iberian populations of Stachys maritima (including the artificial ones); (ii) to
estimate the genetic divergence among populations; and (iii) suggesting some conservation
guidelines on the basis of the genetic data.
Figure 2. Some images of Stachys maritima. (a) inflorescence; (b) plant growing in its habitat, and (c)
seedling.

MATERIAL AND METHODS

Plant Material
Stachys maritima is a diploid (2n = 34; Aydin, 1978; Koeva-Todorovska, 1988;

Baltisberger, 1991; Cerrillo, 2002) perennial softy hairy herb, 10–30 cm tall, erect or
ascending, with persistent rosettes of leaves. Inflorescences are composed of verticillasters of
4–6 flowers, with a pale yellow corolla (Figure 2). Pollination of this species is
entomophilous but limited clonal reproduction is also possible by rhizomatous spread, which
may be responsible for the certain colonizing ability observed for this species, a strategy that
could provide adaptability to the mobility of sand dune habitats. Stachys maritima is
protected in part of its current distribution area: e.g., Catalonia (DOGC, 2008), France (JORF,
1994; 1998), and Bulgaria (Bulgarian State Gazette, 2002).
Figure 3. Location of the studied Iberian populations of Stachys maritima. Red dots: wild populations;
black dots: artificial populations.
Table 1. Studied populations of Stachys maritima
POP. Code Location Pop. Sizea Sample size Last censusb

Wild populations
PROV Rovina Beach 92 (2002) 24 338
FLUV Fluvià River mouth 8 (2008) 8 9
SME1 Sant Martí d’Empúries North 1 (2003) 1 0
SME2 Sant Martí d’Empúries South 30 (2008) 14 4
GTER Ter River mouth 9 (2002) 7 0

POP. Code Location Pop. Sizea Sample size Last censusb

MON1 Montgrí Massif 3 (2008) 3 0
PPAL Pals Beach 123 (2002) 71 30
SPUN Sa Punta Beach 25 (2003) 21 0
PPRA El Prat de Llobregat Beach 11 (2008) 11 1
Artificial populationsc
REGE Regencós 100 (2008) 15 -
MON2 Montgrí Massif, man-made 611 (2008) 35 -
sandy area
URBP Palsmar residential area 15 (2008) 7 -
a
The numbers in brackets correspond with collection year
b
Only in wild populations according to Blanché et al. (2010)
c
Artificial populations are those originated by man-made activities such as soil removing or sand
transportation from other nearby sandy beaches.
Sampling Design
Leaf tissue from all known Iberian populations was collected between March 2002 and
June 2008 with a sample size range between one [the only existing individual from Sant Martí
d’Empúries (SME1) population] and 71, with a total sample size of 217 individuals (Figure 3;
Table 1). Except small populations (where all individuals were collected), sampling was done
by a linear transect within populations; samples were collected about 50–100 cm apart to
avoid collecting ramets from the same genet. Samples consisted of just a few young leaves
from basal rosettes that were placed into paper envelopes, transported to the laboratory, and
stored at 4ºC until extraction one day later. Leaf samples were collected carefully in order to
minimize the potential damage to populations with permission of the biodiversity
conservation autority in Catalonia (Generalitat de Catalunya).
Electrophoresis
Genetic diversity was assessed using the standard methods described by Soltis and Soltis
(1989) for the starch gel electrophoresis of allozymes. Leaf fragments were homogenized on
refrigerated porcelain plates using a cold extraction buffer [0.05 M Tris-citric acid, 0.1%
(m/v) cisteine-HCl, 0.1% (m/v) ascorbic acid, 8% PVP-40 and 1 mM (v/v) 2-
mercaptoethanol]. Extracts were absorbed onto 3 MM Whatman filter paper and analyzed
immediately or stored at –80ºC until its analysis. Using horizontal 11% starch gels, 13
enzymes were resolved in four buffer systems, obtaining 19 interpretable loci (Aat-1, Aco-1,
Aco-3, Acp-2, Adh-1, Dia-1, Idh-2, Mdh-1, Mdh-2, 6Pgd-1, 6Pgd-2, Pgi-1, Pgi-2, Pgm-1,
Pgm-2, Skd, Sod-1, Tpi-1, and Tpi-2). Aconitate hydratase (ACO, EC 4.2.1.3), acid
phosphatase (ACP, EC 3.1.3.2), malate dehydrogenase (MDH, EC 1.1.1.37),
phosphoglucoisomerase (PGI, EC 5.3.1.9), phosphoglucomutase (PGM, EC 5.4.2.2), and
superoxide dismutase (SOD, EC 1.15.1.1) were resolved on histidine-citrate buffer pH 5.7;
alcohol dehydrogenase (ADH, EC 1.1.1.1), phosphogluconate dehydrogenase (6PGD, EC
1.1.1.44), and shikimate dehydrogenase (SKD, EC 1.1.1.25) were resolved on morpholine-
citrate buffer pH 6.1; and aspartate aminotransferase (AAT, EC 2.6.1.1), diaphorase (DIA,

EC 1.6.99.-), and triosephosphate isomerise (TPI, EC 5.3.1.1) were resolved on tris-

citrate/lithium borate pH 8.2. Only isocitrate dehydrogenase (IDH, EC 1.1.1.42) was obtained
with histidine buffer pH 7.0. Staining procedures for all enzymes followed the method
described by Wendel and Weeden (1989) with minor modifications. Loci were numbered
consecutively and alleles at each locus were labelled alphabetically from the most anodal
form in both cases. Banding patterns were interpreted on the basis of the quaternary structure
of isozymes, subcellular localization and number of loci usually expressed in diploid plants
(Gottlieb 1982; Soltis & Soltis 1989). Allele frequencies at each locus were calculated, and
the following parameters were estimated: P, the percentage of polymorphic loci
when the most common allele had a frequency less than 0.95; A, the mean number of alleles
per locus; He, the expected panmictic heterozygosity, and Wright’s (1965) FST. All
calculations were performed with BIOSYS-1 (Swofford & Selander, 1989).
Figure 4. Banding pattern obtained for the most cathodal region of activity of IDH, consisting of three
bands in almost all studied individuals. Please note: (a) not all the three-banded phenotypes showed an
identical migration; (b) an individual showed a five-banded pattern; and (c) the relative intensities of
bands within the 3-banded phenotypes varied (from 1:2:1 to 2:2:1) when the same individuals were run
in different starch gels.

Table 2. Allele frequencies for the 19 studied loci in the Iberian populations of Stachys maritima
Wild populations Artificial populations

Locus Allele PROV FLUV SME1 SME2 GTER MON1 PPAL SPUN PPRA REGE MON2 URBP
Aat-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Aco-1 a 1.000 1.000 1.000 1.000 0.929 1.000 0.824 1.000 1.000 1.000 1.000 1.000
b 0.000 0.000 0.000 0.000 0.071 0.000 0.176 0.000 0.000 0.000 0.000 0.000
Aco-3 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Acp-2 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Adh-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Dia-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Idh-2 a 1.000 0.688 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
b 0.000 0.313 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
Mdh-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Mdh-2 a 0.750 1.000 1.000 1.000 1.000 1.000 0.986 1.000 1.000 1.000 1.000 1.000
b 0.250 0.000 0.000 0.000 0.000 0.000 0.014 0.000 0.000 0.000 0.000 0.000
6Pgd-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
6Pgd-2 a 1.000 1.000 1.000 0.000 0.571 1.000 0.298 0.132 1.000 0.125 0.697 0.000
b 0.000 0.000 0.000 1.000 0.429 0.000 0.702 0.868 0.000 0.875 0.303 1.000
Pgi-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Pgi-2 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Pgm-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Pgm-2 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Skd a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Sod-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Tpi-1 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Tpi-2 a 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

72 S. Massó, C. Blanché, C. BarriocanaL et al.
Table 3. Summary of genetic variation for 19 loci in the Iberian

populations of Stachys maritima
Population P A Ho He
Wild populations
PROV 5.3 1.1 0.009 0.02
FLUV 5.3 1.1 0.007 0.024
SME1 0 1 0 0
SME2 0 1 0 0
MON1 0 1 0 0
GTER 10.5 1.1 0.008 0.035
PPAL 10.5 1.2 0.016 0.039
SPUN 5.3 1.1 0.014 0.012
PPRA 0 1 0 0
Mean 4.100 1.067 0.006 0.014
Standard deviation 4.380 0.071 0.006 0.016
Artificial Populations
MON2 5.3 1.1 0.013 0.023
URBP 0 1 0 0
REGE 5.3 1.1 0 0.012
Mean 3.533 1.067 0.004 0.012
Total mean 3.958 1.067 0.006 0.014

RESULTS AND DISCUSSION

The studied populations of Stachys maritima harbored a low number of alleles (19–22;
Table 2); only four out of 19 analysed loci were polymorphic (Aco-1, Idh-2, Mdh-2, and
6Pgd-1). Thus, only 14 multilocus genotypes (MLGs) have been identified among the 217
studied individuals, with some populations being monomorphic for all studied loci. The
banding pattern obtained for the most cathodal region of activity of IDH—usually reported to
be a dimeric enzyme (Weeden & Wendel, 1989)—consisted of three bands in almost all
studied individuals; although such pattern may be suggestive of duplications or even
allopolyploidy, we believe that it should be simply attributed to post-translational
modifications (Pedersen & Simonsen, 1987; Gastony, 1988; Pascual et al., 1993; Driscoll et
al., 2003) due to a series of reasons: (i) not all the three-banded genotypes showed an
identical migration; (ii) an individual showed a five-banded pattern; and (iii) the relative
intensities of bands within the 3-banded phenotypes varied (from 1:2:1 to 2:2:1) when the
same individuals were run in different starch gels (Figure 4).
Levels of intrapopulation genetic diversity are very low for all the surveyed populations
(mean values: P = 4.0%, A = 1.1 and He = 0.014; Table 3). No significant differences in
genetic variation were found between natural and artificial populations (Table 2). The
comparison between observed (Ho) and expected heterozygosity (He) indicated a deficit of

heterozygotes, which were very significant at some loci (Table 4). Genetic divergence among
populations was quantified by means of FST (Wright, 1965), a measure of differentiation
between local populations. The mean value of FST was very high (0.671; 0.669 excluding
artificial populations), indicating that a great portion (67.1%) of total genetic variability was
due to differences among populations and, thus, that gene flow would be highly restricted.
Levels of intrapopulation genetic variability in Stachys maritima are low compared with
reference values for threatened plants in the Mediterranean Basin (P = 23.7%, A = 1.49 and
He = 0.094; López-Pujol et al., 2006). Primarily, the observed severe fluctuations in the size
of natural populations [usually a consequence of droughts or storms, in particular llevant
(eastern/sea winds) or tramuntana (northern dry strong winds)] may have contributed to a
generic loss of genetic diversity in S. maritima, since these episodes may significantly
decrease effective population size (Ne) (Carroll & Fox, 2008). A long and repeated history of
local extinctions and founding events during the last 10,000 years in the studied area (after a
dynamic coast progradation and changes in sea level—more than 60 m—and of shoreline
associated to mobility of the system of dune belts and lagoon/marshes) has been documented
during the Holocene in the studied area (Riera, 2005; Gesti, 2006; Quintana et al., 2008; Julià
& Riera, 2012). This “old”–“basal”–“natural” genetic footprint on coastal flora should have
been produced in a similar way in many other sand dunes systems, as well as detected in
many other plant coastal species (e.g., Fant et al., 2014). But, over this general syndrome of
historical/natural genetic erosion, a new, rapid and recent diversity loss would have been
added. Loss and destruction of the natural habitat of Stachys maritima (well described in
Barriocanal & Blanché, 2002) would have lead to genetic erosion driven by recent and
repeated population fragmentation and bottlenecking (both causing allele loss and
inbreeding).
We have failed to find a correlation between genetic diversity and size of the studied
populations, although this lack of signal may be simply due to the fact that most populations
are well below the minimum viable population size (MVP) necessary to avoid extinction. For
example, there are extremely small populations such as GTER (N = 9) that are showing levels
of genetic diversity (P = 10.0%, A = 1.1 and He = 0.034) close to those of much larger ones,
which can be interpreted as remnants of a former diversity (and that fragmentation has been
too recent to erode much of this remnant diversity); on the other hand, some of the largest
populations, probably man-promoted, show very low levels of genetic diversity (e.g., REGE;
Table 2), an expected result given that these populations are the result of a single founder
effect.
Genetic divergence among S. maritima populations should be regarded as very large
(FST = 0.671), specially when compared with reference values for threatened plants in the
Mediterranean Basin (FST = 0.248; López-Pujol et al., 2006). Population fragmentation, as
well as the local extinction of formerly known localities (cf. Barriocanal & Blanché, 2006)
would have contributed to the interruption of gene flow. Likewise, metapopulation dynamics
is only probably circunscribed to very limited geographic areas (e.g., all the populations
within a given bay). Among Mediterranean beach plants, some studies have shown a clear
relationship between genetic clustering of populations and the direction of sea currents
(Cakile maritima; Gandour et al., 2008); such hypothesis should be tested in Stachys
maririma.

Table 4. Values of fixation index (F) for all polymorphic loci in the studied populations of Stachys maritima.
Locus PROV FLUV SME1 SME2 MON1 GTER PPAL SPUN PPRA MON2 URBP REGE
Aco-1 - - - - - -0.077ns 0.466*** - - - - -
Idh-2 - 0.709* - - - - - - - - - -
Mdh-2 0.556** - - - - - -0.014ns - - - - -
6Pgd-2 - - - - - 1.000** 0.695*** -0.152ns - 0.426* - 1.000***
Conformance to Hardy-Weinberg equilibrium was tested using chi-square analysis: ns p≥0.05; * p<0.05; ** p<0.01; *** p<0.001

Given the high values of FST detected among natural populations of S. maritima, at least
11 populations should be protected to ensure the preservation of 99% of the detected genetic
variation (GV) within the species according to the formula of Hamrick et al. (1991) (N = ln
(1-GV)/ln FST). Therefore, we suggest the preservation of all the known populations in the
Iberian Peninsula. In fact, the species was included in 2008 in the Catalogue of Treatened
Plants of Catalonia (Catàleg de Flora Amenaçada de Catalunya; DOGC, 2008), which
theoretically guarantees its full protection. However, some populations have become extinct
whereas others are shrinking rapidly. Despite some partial monitoring surveys, or occasional
ex situ cultivation experiences or translocation actions have been done (Blanché et al., 2011),
no recovery plan has been elaborated for S. maritima yet, despite being manadatory according
to the Catàleg (DOGC, 2008). Such a plan, in addition to include the physical protection of
populations, should comprise: (i) periodical monitoring of all populations, as well as
prospecting new populations (for example, some new populations have been discovered in
recent years; Nualart et al., 2012; J. Font, pers. comm.), (ii) population reinforcements (with
special care to avoid negative effects such as outbreeding depression) and reintroductions,
(iii) genetic monitoring (today a widely recognized tool in management of threatened species;
Schwartz et al., 2006; Alonso et al., 2014); and (iv) seed storage in germplasm banks. In
addition, as clearing and opening of previously stabilized and densely vegetated locations of
Stachys maritima has resulted in temporarily increased population numbers—inducing some
additional perturbation (Pals and Prat de Llobregat populations), management essays (coupled
with the corresponding genetic monitoring) of production of experimental perturbation
regimes could be carried out to promote both demographic and genetic recovery (from
underground seeds or stocks).
REFERENCES
Aguilar, R., Quesada, M., Ashworth, L., Herrerias-Diego & Lobo, J. (2008). Genetic
consequences of habitat fragmentation in plant populations: susceptible signals in plant
traits and methodological approaches. Molecular Ecology, 17, 5177–5188.
Alonso, M. T., Guilló, A., Pérez-Botella, J., Crespo, M. B. & Juan, A. (2014). Genetic
assessment of population restorations of the critically endangered Silene hifacensis in the
Iberian Peninsula. Journal for Nature Conservation, 22, 532-538.
Aydin, A. (1978). Reports. In IOPB Chromosome Numbers Reports LXI (Löve, A. ed.).
Taxon, 27, 375-376.
Baltisberger, M. (1991). Chromosomenzahlen einiger Labiaten aus Albanien. Berichte des
Geobotanischen Institutes der Eidgenössischen Technischen Hochschule Stiftung Rübel,
57, 165-181.
Barriocanal, C. & Blanché, C. (2002). Estat de conservació I propostes de gestió per a Stachys
maritima Gouan (Labiatae) a la peninsula Ibèrica. Orsis, 17, 7-20.
Blanché, C., Barriocanal, C., Martinell, M. C., Massó, S. & López-Pujol, J. (2010). Deu anys
de seguiment demogràfic i genètic d’Stachys maritima a Catalunya (2001-2010).
Implicacions per a un pla de recuperació. Collectanea Botanica (Barcelona), 29, 59-78.
Blanché, C., Barriocanal C., Martinell, M. C., Massó, S. & López-Pujol, J. (2011). Planta del
Mes núm., 7, Estaquis de mar. Plant Conservation Biology Web Portal, Laboratory of

Botany, Faculty of Pharmacy, University of Barcelona. URI: http://hdl.handle.net/

2445/17444 [University of Barcelona Digital Repository].
Bulgarian State Gazette. (2002). Biodiversity Act. Bulgarian State Gazette, 77.
Carroll, S. P. & Fox, C. W. (Eds.). (2008). Conservation biology. Evolution in action. Oxford
University Press, New York.
Cerrillo, N. (2002). Estudis sobre germinació d’espècies vegetals endèmiques, rares o
amenaçades. Treball Pràctic de Botànica I. Facultat de Farmàcia, Universitat de
Barcelona, Barcelona.
Chung, M. Y., Nason, J. D., López-Pujol, J., Yamashiro, T., Yang, B.-Y., Luo, Y.-B. &
Chung, M. G. (2014). Genetic consequences of fragmentation on populations of the
terrestrial orchid Cymbidium goeringii. Biological Conservation, 170, 222-231.
Ciocârlan, V. (2011). Vascular flora of the Danube Delta. Analele Stiintifice Ale Univ. Al
Cuza Iasi, 57, 41-64.
Del Guacchio, E. (2002). Note floristiche per la Campania. Delpinoa, 44, 75-80.
DOGC. (2008). Decret 172/2008, de 26 d’agost, de creació del Catàleg de Flora Amenaçada
de Catalunya. Diari Oficial de la Generalitat de Catalunya 5204, 65881-65895.
Driscoll, H. E., Barrington, D. S. & Gilman, A. V. (2003). A reexamination of the apogamous
tetraploid Phegopteris (Thelypteridaceae) from northeastern North America. Rhodora,
105, 309-321.
Ellstrand, N. C. & Elam, D. R. (1993). Population genetic consequences of small population
size: implications for plant conservation. Annual Review of Ecology and Systematics, 24,
217-242.
Foré, S. A. & Guttman, S. I. (1999). Genetic structure of Helianthus occidentalis (Asteraceae)
in a preserve with fragmented habitat. American Journal of Botany, 86, 988-995.
Gandour, M., Hessini, K. & Abdelly, C. (2008). Understanding the population genetic
structure of coastal species (Cakile maritima): seed dispersal and the role of sea currents
in determining population structure. Genetic Research, 90, 167-178.
Gastony, G. J. (1988). The Pellaea glabella complex: Electrophoretic evidence for the
derivations of the agamosporous taxa and a revised taxonomy. American Fern Journal,
78, 44-67.
Gesti, J. (2006). El poblament vegetal dels Aiguamolls de l’Empordà. Arxius de les seccions
de ciències, 138. Secció de Ciències Biològiques, Institut d’Estudis Catalans, Barcelona.
Gottlieb, L. D. (1982). Conservation and duplication of isozymes in plants. Science, 216, 373-
380.
Greuter, W., Burdet, H. M. & Long, G. (1986). Med-Checklist, Vol. 3. Genève:
Conservatoire et Jardin Botaniques de la Ville de Genève, Genève.
Honnay, O. & Jacquemyn, H. (2007). Susceptibility of common and rare plant species to the
genetic consequences of habitat fragmentation. Conservation Biology, 21, 823–831.
Honnay, O., Jacquemyn, H., Bossuyt, B. & Hermy, M. (2005). Forest fragmentation effects
on patch occupancy and population viability of herbaceous plant species. New
Phytologist, 166, 723–736.
Horvatic S. (1939). Pregled vegetacije otoka Raba s gledista biljne sociologije. Prirodoslovna
istraživanja Jugoslavije Akademij, 22, 1-96.
JORF. (1994). Arrêté du 9 mai 1994 relatif à la liste des espèces végétales protégées en région
Provence-Alpes-Côte d'Azur. NOR: ENVN9430087A. Journal Officiel de la Republique

Française du 26 juillet (1994). Also available at: http://legifrance.gouv.fr/affichTexte.do?

cidTexte=JORFTEXT000000548796&dateTexte=&categorieLien=id.
JORF. (1998). Arrêté du 29 octobre 1997 relatif à la liste des espèces végétales protégées en
région Languedoc-Roussillon. NOR: ATEN9760338A. Journal Officiel de la Republique
Française du 16 janvier (1998). Also available at: http://legifrance.gouv.fr/
affichTexte.do?cidTexte= JORFTEXT000000203584&dateTexte=&categorieLien=id.
Julià, R. & Riera, S. (2012). Proposta d’evolució del front marítim de Barcelona durant
l’Holocè, a partir de la integració de dades geotècniques, intervencions arqueològiques i
cronologies absolutes. Quarhis (Època II), 8, 16-37.
Koeva-Todorovska, J. T. (1988). Kariologicno I polenomorfologicno izsledvane na vidovete
ot rod Stachys L. in Balgarija. Centenary of Academy N. J. Stojanov: 138-151.
Kramer, A. T., Ison, J. L., Ashley, M. V. & Howe, H. F. (2008). The paradox of forest
fragmentation genetics. Conservation Biology, 22, 878–885.
López-Pujol, J., Bosch, M., Simon, J. & Blanché, C. (2006). Diversitat isoenzimàtica de la
flora vascular Silvestre dels Països Catalans. Butlletí de la Institució Catalana d’Història
Natural, 74, 5-28.
Lowe, A. J., Boshier, D., Ward, M., Bacles, C. F. E. & Navarro, C. (2005). Genetic resource
impacts of habitat loss and degradation; reconciling empirical evidence and predicted
theory for neotropical trees. Heredity, 95, 255–273.
Neel, M. C. & Ellstrand, N. C. (2001). Patterns of allozyme diversity in the threatened plant
Erigeron parishii (Asteraceae). American Journal of Botany, 88, 810-818.
Noble, V., Vanes, J., Michaud, H. & Garraud, L. (coord.). (2015). Liste Rouge de la flore
vasculaire de Provence-Alpes.Côte d’Azur – Version mise en ligne. Direction Régionale
de l’Environnement, de l’Aménagement et dur Logement & Région Provence-Alpes-Côte
d’Azur, 14 pp.
Nualart, N., Montes-Moreno, N., Gavioli, L. & Ibáñez, N. (2012). L’herbari de l’Institut
Botànic de Barcelona com una eina per la conservació dels tàxons endèmics i amenaçats
de Catalunya. Collectanea Botanica (Barcelona), 31, 81-101.
Pascual, L., García, F. J., & Perfectti, F. (1993). Inheritance of isozyme variations in seed
tissues of Abies pinsapo Boiss. Silvae Genetica, 42, 335-340.
Pavletic Zi. (1973). Flora i vegetacija otoka Biseva s posebnim obzirom na biljnogeografski
polozaj otoka. PhD thesis. Faculty of Science, University of Zagreb, Zagreb, 131 pp.
Pedersen, S. & Simonsen, V. (1987). Tissue specific and developmental expression of
isozymes in barley (Hordeum vulgare L.). Hereditas, 106, 59-66.
Primack, R. B. & Ros, J. (2002). Introducción a la Biología de la Conservación. Ariel
Ciencia, Barcelona.
Quintana, X., Feo, C., López, R. & Gesti, J. (2008). Pla de gestió dels espais naturals del
Baix Ter (Torroella de Montgrí i Pals, Baix Empordà). Document tècnic inèdit en el
marc del Projecte LIFE-Natura. Institut d'Ecologia Aquàtica, Universitat de Girona,
Girona. 164 pp.
Riera Mora, S. (2005). Canvis ambientals i modelació antròpica del territori entre l’època
ibèrica i l’altmedieval a Catalunya: aportacions de la palinologia. Cota Zero (Vic), 20, 99-
107.
Schwartz, M. K., Luikart, G. & Waples, R. S. (2006). Genetic monitoring as a promising tool
for conservation and management. Trends in Ecology & Evolution, 22, 25-33.
Soltis, D. E., & Soltis, P. S. (1989). Isozymes in Plant Biology. Dioscorides Press, Portland.

Stancic, Z., Brigic, A., Liber, Z., Rusak, G., Franjic, J. & Skvorc, Z. (2008). Adriatic coastal
plant taxa and communities of Croatia and their threat status. Acta Botanica Gallica, 155,
179-199.
Swofford, D. L. & Selander, R. B. (1989). BIOSYS-1, A computer program for the analysis of
allelic variation in genetics. User’s manual. Department of Genetics and Development,
University of Illinois, Urbana-Champaign.
Trinajstic I. (1973). As. Agropyretum mediterraneum (Kühn) Br.-Bl. 1933, u vegetaciji
juznodalmatinskog otoka Korcule. Glas. republ. zavoda zast. prirode, Prirodnjackog
muzeja Titograd, 6, 71-76.
Trinajstic I. (1989). Vegetation of the class Ammophiletea Br.-Bl. & R. Tx. 1943 in the
eastern Adriatic littoral of Yugoslavia. Colloques Phytosociologiques, 19, 387-394.
Young, A. G., Boyle, T. & Brown, T. (1996). The population genetic consequences of habitat
fragmentation for plants. Trends in Ecology & Evolution, 11, 413-418.
Weeden, N. F. & Wendel, F. (1989). Genetics of plant isozymes. In: Soltis, D. E., & Soltis, P.
S. (Eds.), Isozymes in Plant Biology. Dioscorides Press, Portland: 46-72.
Wendel, F. & Weeden, N. F. (1989). Visualization and interpretation of plant isozimes. In:
Soltis, D. E., & Soltis, P. S. (Eds.), Isozymes in Plant Biology. Dioscorides Press,
Portland: 5-45.
Wright, S. (1951). The genetic structure of populations. Annals of Eugenetics, 15, 323-354.

Chapter 5
THE USE OF HELA CELLS AS A MODEL

FOR STUDYING DNA DAMAGE AND REPAIR
Fabio Luis Forti∗

Department of Biochemistry, Institute of Chemistry,
University of Sao Paulo, Sao Paulo, Brazil
ABSTRACT
Since 1951, HeLa cancer cells, the first human cell line isolated from an aggressive
cervix adenocarcinoma of a thirty-year-old woman, have been serving scientists around
the globe. Since then, approximately eighty-five thousand scientific articles have been
published in the US National Library of Medicine, National Institutes of Health
(PubMed), and a hundred thirteen thousand articles in the Web of Science (Thomson
Reuters), using this immortalized cell line in the most diverse fields of the biomedical
sciences.
Even with the advent of other immortalized human cancer cell lines in the following
decades, the HeLa line is still used as a good cell model in hundreds of annually
published papers.
In the same decade, the structure for deoxyribose nucleic acid (DNA) was
determined by Watson and Crick and, coincidently or not, since then the HeLa cell line
has also been used as cellular model for studies of DNA damage by many different
agents and for DNA repair through different biochemical pathways. The wide diversity of
physical, chemical and biological DNA stressors promote limited types of DNA lesions
that are removed or also repaired by a reduced number of mechanisms: i) homologous
recombination (HR), ii) non-homologous end-joining (NHEJ), iii) nucleotide excision
repair (NER), iv) base excision repair (BER), v) mismatch repair (MMR), and vi) and
others less common (such as interstrand cross-link, or ICL).
Here in this communication, we describe some current scientometric analyses behind
the use of the HeLa cell line in the DNA damage and repair field, as well as relevant
reports highlighting the importance of HeLa as a cellular model for each DNA repair
mechanism mentioned. Finally, we discuss the use of HeLa cells in our laboratory within
∗
e-mail: flforti@iq.usp.br.

80 Fabio Luis Forti
a tentative identification and characterization of atypical functions of other enzymes in

the maintenance of genomic stability.
BRIEF INTRODUCTION
The first paper using the HeLa cell line archived in the PubMed library and Web of
Science dates from May 1953, published by Scherer WF et al. in the Journal of Experimental
Medicine [1]. This report was about infection, propagation and production of different types
of poliomyelitis viruses into a human malignant epithelial cell line, the strain “HeLa of Gey.”
This was how the HeLa cell line was named after the prominent scientist George Gey, MD
and director of the Tissue Culture Laboratory in the Department of Surgery at The Johns
Hopkins Hospital, and his wife and collaborator Margaret Gey, continued this laboratory’s
tradition of working on tissue culture to drive a variety of investigations related to
endocrinology, cancer, and virology using HeLa cells [2].
Scherer WF, publishing another paper in 1954 about cryopreservation and storage of cell
lines [3], was among the select group of pioneer scientists that boosted the in vitro culture of
cell lines, together with Eagle H, working with HeLa [4] and many other lines of mammalian
cells [5], culminated in the discovery and patent of many cell culture mediums and in the
publication of highly cited papers in Scientific and JBC journals [6, 7].
Between the fifties and sixties, the majority of investigations carried out in the HeLa cell
line were about viral infections and determination of the best conditions of cell culture and
mediums, but some reports using X-irradiated HeLa cells were initiated [8]. The same
authors, Puck TT & Marcus PI, published a study showing the actions of X-rays in HeLa
cells, paving the way for the first publication in 1963 by Terasima T & Tolmach LJ in a
Science paper about the X-ray effects on DNA synthesis of HeLa cells [9]. Since then, about
four thousand papers can be found using the terms “HeLa cells” and “DNA damage” in the
PubMed library (and Web of Science) in a large diversity of chemical, biological and
biomedical science journals. HeLa has been used not only as an in vitro model system, but as
a model for general cancer cells and, more specifically, also for cervical carcinomas.
Finally, at the very end of the seventies and in the eighties, another relevant aspect of the
HeLa cells justifying their use in DNA damage and repair investigations, and that deserves a
brief mention here, were the findings about the absence in the HeLa cell of the p53
phosphoprotein in comparison to other tumor cells [10]. This protein, which cDNA was
originally cloned from a mouse in 1983, is one of the most important tumor suppressor genes
in virus-transformed cells [11], and considered as the principal “genome guardian” in the
cellular conditions of genotoxic stress, since acting as a transcription factor, it regulates many
different genes involved in apoptosis, growth arrest and DNA repair [12].
HELA CELLS AND DNA REPAIR

The DNA repair ability of a unique cell is vital to the integrity of its genome and thus to the
normal functionality of the whole organism. DNA repair is a collection of biochemical
mechanisms by which a cell identifies and corrects damage to the DNA molecules that
comprise its genome. In most living organisms, endogenous metabolic activities and

The Use of HeLa Cells As a Model for Studying DNA Damage and Repair 81
exogenous factors can promote different types of DNA damage, as characterized by millions
of lesions detected every day in cells. Part of these lesions can be controlled or eliminated by
repair processes not being accumulated in the genome of the next generation of cells, but in
cases of repair failure, these cells can even undergo one of many different ways of cell death
or growth arrest.
A)
B)
Figure 1. Published and archived papers in two public libraries using HeLa cells in a wide variety of
fields related to DNA damage and repair (September 2015). HR, homologous recombination, NHEJ,
non-homologous end-joining, NER, nucleotide excision repair, BER, base excision repair, MMR,
mismatch repair, ICL, interstrand cross-link, p53 tumor suppressor protein.
Another possibility for not repairing lesions is to cause mutations in the cell's genome,
which can lead to the formation of tumors and cancers. Thus, these general conceptual ideas
were used altogether to screen the two public libraries (PubMed and Web of Science) with the
first seven terms shown in Figure 1 (Genomic stability, Genomic Instability, Genome
integrity, DNA damage, DNA repair, DNA damage response, DNA damage response &
Repair) crossed one by one with HeLa cells. Additionally, another screening was performed

82 Fabio Luis Forti
following the same crossings with the term “HeLa cells” was performed by using the eight
terms also shown in Figure 1 (HR, NHEJ, Alternative NHEJ, BER, NER, MMR, ICL, p53),
which correlates to the most essential DNA repair mechanisms (most of them mediated or
influenced by the tumor suppressor protein p53, also included as a search term).
A)
B)
Figure 2. Citation reports from the Web of Science database using the terms “HeLa cells” vs “DNA
damage.” A) Published items in each year. B) Citations in each year.
The survey shown in Figure 1 resulted in a number of studies using HeLa cells involving
the interrogated issue that were published as manuscripts up to this date, including and not
distinguishing overlaps in both libraries. The results, isolated or summed (13,688 in PubMed
versus 15,432 in Web of Science), are numerically very similar in both databases, despite the
total number of papers citing HeLa cells to be significantly different (84,377 in PubMed
versus 113,319 in the Web of Science). Even so, the average number of about fourteen

thousand papers is elevated considering so many different studies in such a research field
(DNA damage and repair) and only one cell line. More importantly, both the number of
published papers and the citation each year, which were low from the fifties up to the nineties,
have been vertiginously increasing since then, as can be seen in Figure 2 from the Web of
Science database (only for the terms “HeLa cells” vs “DNA damage”).
The results shown in Figure 2 prove that, despite the growing number of investigations in
the DNA damage and repair field, HeLa cells have been continuously used as a cellular model
for these studies up to now. Moreover, the three additional cited papers covered by the results
from Figure 2 include: i) a work describing, for the first time, the histone H2AX
phosphorylation on serine 139 after treatment of cells with ionizing radiation as a method for
sensing DNA double strand breaks (2,445 citations, average of 136 citations per year) (13); ii)
a work describing the first cloning of the RNA component of human telomerase (hTR), a
critical enzyme for the long-term proliferation of immortal tumor cells such as HeLa (1,887
citations, average of 90 citations per year) [14]; iii) a study showing benzo[a]pyrene adducts
formation along key codons in the p53 gene correlating with mutational hotspots of human
lung cancers linking smoking cigarettes, chemical carcinogens and human lung cancers
(1,187 citations, average of 60 citations per year) [15].
Other papers performed entirely or partially in HeLa cells, also with hundreds of
citations, include relevant studies in biology such as cloning of human DNA polymerase eta,
p53-dependent triggering apoptosis in telomere dysfunction, p53 localization at mitochondria,
p53-dependent DNA damage response, mitotic catastrophe, histone deacetylase-dependent
DNA repair and so on.
HELA CELLS AND HOMOLOGOUS RECOMBINATION (HR) REPAIR

Homologous recombination (HR) is the exchange of DNA strands of similar or identical
nucleotide sequence between two sister chromosomes. It can be used by cells to direct error-
free repair of double-strand DNA breaks and generates new sequence variations in gametes
during meiosis, the process by which eukaryotes make gamete cells. These new combinations
of DNA represent genetic variation in offspring, which enable populations to adapt along the
course of evolution. Although HR can vary among different organisms and cell types, most
forms involve the same basic molecular mechanisms and once HR is conserved across all
three domains of life, as well as viruses, it is suggested that it is a nearly universal biological
process [16]. The approximately five hundred papers citing HeLa cells and HR (Figure 1)
only reached 50 publications in 2013 and reached a maximum citation of 2,100 in 2014
(Figure 3) from 14,834 total citations and an average of 550 citations per year (Web of
Science source). The three most cited papers are: i) a revision about regulation of DSB repair
by NHEJ and HR, in yeast and higher eukaryotes, through regulatory factors including
expression and phosphorylation of repair proteins, chromatin modulation of repair factor
accessibility, and the availability of homologous repair templates.

84 Fabio Luis Forti
A)
B)
Figure 3. Citation reports from the Web of Science database using the terms “HeLa cells” vs
“Homologous recombination.” A) Published items in each year. B) Citations in each year.
These works discuss why a number of DSB repair proteins (MRE11/RAD50/NBS1

complexes, BRCA1, H2AX, PARP1, RAD18, DNA-PK, ATM, etc.) function in both
pathways, while others exclusively influence the NHEJ or HR (476 citations, average of 60
citations per year) [17]; ii) a work showing that the apoptosis-inducing factor (AIF)
deficiency in mice or human cells (obtained by HR or siRNA) lead to a high lactate
production and a reduced oxidative phosphorylation (OXPHOS) caused by a reduction of
respiratory chain complex I activity, pointing to a dual role of AIF in the cellular control of
life and death (347 citations, average of 29 citations per year) [18]; iii) a study demonstrating
that Fanconi anemia (FA) subtype D2 undergoes mono-ubiquitination during the S-phase of
the cell cycle, co-localizing with BRCA1 and RAD51 in S-phase-specific nuclear foci to form

complexes that participate in the S-phase-specific process, such as DNA repair by

homologous recombination (287 citations, average of 26 citations per year) [19].
HELA CELLS AND NON-HOMOLOGOUS END JOINING

(NHEJ) REPAIR
Non-homologous end joining (NHEJ) is the repair of double-strand DNA breaks by direct
ligation of the broken ends. No homology is required to promote the end-joining reaction, in
contrast to homology-directed repair, which requires a homologous sequence to guide the
repair, but it is more error prone than recombination-based repair. NHEJ utilizes short
homologous DNA sequences called microhomologies, present in single-stranded overhangs
on the ends of double-strand breaks to drive the break repair usually in an accurate manner.
But when inefficient NHEJ occurs, it can lead to translocations and telomere fusion,
hallmarks of cancer. NHEJ is evolutionarily conserved throughout all kingdoms and is the
predominant double-strand break repair pathway in mammalian cells, while in budding
yeasts, the homologous recombination dominates especially when the organism is grown in
laboratory conditions [20].
The approximately two hundred papers citing HeLa cells and NHEJ (Figure 1) reached
17 publications in 2011 and a maximum citation of 500 in 2014 (Figure 4), from 3,975 total
citations and an average of 153 citations per year (Web of Science source).
A)
Figure 4. (Continued).

86 Fabio Luis Forti
B)
Figure 4. Citation reports from the Web of Science database using the terms “HeLa cells” vs “Non-
homologous end joining.” A) Published items in each year. B) Citations in each year.
The three additional cited papers from the present survey are: i) the identification and
characterization of the LIG4 gene that encodes a protein with high homology to mammalian
DNA ligase IV, which is not essential for DNA replication and for the repair of DNA damage
induced by UV light, but encodes a crucial component of the NHEJ pathway providing
insights into the mechanisms of DNA repair, suggesting that NHEJ is highly conserved
throughout the eukaryotic kingdom (202 citations, average of 11 citations per year) [21]; ii) a
review presenting and discussing all the known enzymes playing key roles in the NHEJ
process, which are known to capture both ends of the broken DNA molecule, to close them
together in a synaptic DNA-protein complex, and finally to repair the DNA break (175
citations, average of 22 citations per year) [22]; iii) a study showing that the majority of DNA
end joining activity in extracts of HeLa cells derives from DNA ligase III, which knocking
down by RNA interference reduces up to 80% of the DNA end joining activity of mouse
embryo fibroblasts deficient of DNA ligase IV, and point to DNA ligase III as a strong
component for the B-NHEJ pathway (174 citations, average of 16 citations per year) [23].
HELA CELLS AND NUCLEOTIDE EXCISION REPAIR (NER)

Nucleotide excision repair (NER) is a process that repairs damage to one strand of the
DNA, particularly from UV irradiation, which distorts the DNA helix. The DNA flanking the
damage site is cleaved to generate a single-stranded gap that is repaired by copying the
undamaged strand to restore an intact helix. In human cells, it is the main pathway for the
removal of damage caused by UV light, but it also acts on a wide variety of other bulky helix-
distorting lesions caused by chemical mutagens. Several components of NER have roles in
ubiquitination, telomere maintenance, DNA replication, and gene transcription. Mutations in
NER genes are associated with several human diseases, Xeroderma pigmentosum (XP),
Cockayne syndrome (CS) and Trichothiodystrophy (TTD), which exhibit increased cancer

incidence, developmental delay, and Neurodegeneration. NER is found throughout nature, in

eubacteria, eukaryotes and archaea [24].
The approximately three hundred and fifty papers citing HeLa cells and NER (Figure 1)
reached 25 publications in 2003 and 2011 and a maximum citation of 1,250 in 2012 (Figure
5), from 16,850 total citations and an average of 558 citations per year (Web of Science
source).
A)
B)
“Nucleotide excision repair.” A) Published items in each year. B) Citations in each year.
The three additional cited papers from the present survey are: i) a work mapping a locus
associated with Seckel syndrome to chromosome 3q22.1-q24 in two consanguineous
Pakistani families, an autosomal recessive disorder characterized by intrauterine growth
retardation, dwarfism, microcephaly and mental retardation that shares common features with

88 Fabio Luis Forti
disorders showing impaired DNA-damage responses such as Nijmegen breakage syndrome

and LIG4 syndrome (419 citations, average of 33 citations per year) [25]; ii) a study
establishing a sensitive assay system using an SV40 origin-based plasmid to detect XP-V
complementation activity in XP-V cell extracts, which are normal in NER but defective in
post-replication repair, that identified a protein from HeLa cells capable of complementing
the defects and displaying a novel DNA polymerase activity which replicates cyclobutane
pyrimidine dimer-containing DNA templates (320 citations, average of 19 citations per year)
[26]; iii) a report describing two microRNAs, miR-210 and miR-373, up-regulated in a
hypoxia-inducible factor-dependent manner in hypoxic cells, suggested to regulate factors
implicated in DNA repair, the RAD52 and RAD23B proteins involved in the HDR and NER
pathways, respectively, providing new mechanistic insight into the effect of hypoxia on DNA
repair and genetic instability in cancer (217 citations, average of 31 citations per year) [27].
HELA CELLS AND BASE EXCISION REPAIR (BER)

Base excision repair (BER) is a mechanism that repairs DNA damage during the cell
cycle by removing small, non-helix-distorting nucleotide base lesions, which could otherwise
cause mutations by mispairing or lead to breaks in DNA during replication. BER is initiated
by DNA glycosylases that recognize and remove damaged or inappropriate bases, forming
AP sites which are then cleaved by an Apurinic/apyrimidinic endonuclease (APE). The
single-strand break formed by the endonuclease are then processed by either short-patch (a
single nucleotide is replaced), or long-patch (2-10 new nucleotides are synthesized) repairs.
Defects in a variety of DNA repair pathways lead to cancer predisposition, and BER appears
to follow this pattern. Deletion mutations in BER genes have shown to result in a higher
mutation rate in a variety of organisms, implying that loss of BER could contribute to the
development of cancer and aging phenotypes [28].
The approximately four hundred papers citing HeLa cells and BER (Figure 1) reached 26
publications in 2012 and a maximum citation of 1,350 in 2012 (Figure 6) from 15,650 total
citations and an average of 522 citations per year (Web of Science source). The three
additional cited papers from the present survey are: i) a work that used two-hybrid systems to
find genes of proteins interacting with PARP and identified a physical association with the
BER protein XRCC1 (X-ray repair cross-complementing 1) in the Saccharomyces cerevisiae,
also confirmed to exist in mammalian cells, describing PARP as a member of the BER
multiprotein complex involved in the detection of DNA interruptions and possibly in the
recruitment of XRCC1 and its partners for efficient processing of the DNA breaks (612
citations, average of 34 citations per year) [29]; ii) a study showing that increased expression
of APE mRNA and protein in the HeLa S3 tumor line and in WI 38 primary fibroblasts was
accompanied by its translocation to the nucleus, and that ROS-treated cells showed
significant resistance to H2O2 cytotoxicity and bleomycin, but not of UV light, in a kind of
APE-dependent adaptive response (324 citations, average of 18 citations per year) [30]; iii) a
report describing that uracil-DNA glycosylase (UNG2) increases in the S phase and co-
localizes with incorporated BrdU in replication foci, together with proliferating cell nuclear
antigen (PCNA) and replication protein A (RPA), demonstrating a rapid post-replicative

removal of incorporated uracil by UNG2 and the formation of a BER complex close to the
replication fork (237 citations, average of 14 citations per year) [31].
A)
B)
Figure 6. Citation reports from the Web of Science database using the terms “HeLa cells” vs “Base
excision repair.” A) Published items in each year. B) Citations in each year.
HELA CELLS AND MISMATCH REPAIR (MMR)

Living organisms are capable of repairing mismatched base pairs in their DNA. These
mismatches can arise by many different processes (exogenous chemicals, physical agents, and
endogenous reactive metabolites) and one of the most important occurs during normal DNA
metabolism or aberrant DNA processing reactions, including DNA replication, recombination

90 Fabio Luis Forti
and repair. In cases of DNA replication, the nucleotide mis-incorporation generates DNA
base-base mismatches during DNA synthesis at variable rates, depending on many factors,
including the specific DNA polymerases. The correct base of the mismatched base pair is
located in the parental strand of the newly replicated DNA, and proper correction of the
mismatch contributes to the maintenance of the fidelity of the genetic information. To that,
one such system is the critical pathway known as DNA mismatch repair (MMR), which is an
evolutionarily conserved process that corrects mismatches generated during DNA replication
and escape proofreading. MMR proteins participate in many other DNA transactions, such
that inactivation of MMR in human cells is associated with hereditary and sporadic human
cancers [32, 33].
The approximately two hundred and fifty papers citing HeLa cells and MMR (Figure 1)
reached 23 publications in 2003 and a maximum citation of approximately 800 in 2013
(Figure 7), from 11,525 total citations and an average of 461 citations per year (Web of
Science source). The three additional cited papers from the present survey are: i) a study by
Drummond et al. in Science, 1995, showing that mutations in the hMutS alpha component of
the heterodimer with hMSH2 leads to a serious defect in the repair of base-base and single-
nucleotide insertion-deletion mismatches, causing hypermutability and cancer predisposition
in different tumor cells (496 citations, average of 24 citations per year) [34]; ii) a work
describing that HeLa extracts restore the ability of MT1 B-cell line mutator phenotype, which
is even sensitive to mismatch pairing errors promoted by the cytotoxic drug N-methyl-N'-
nitro-N-nitrosoguanidine (MNNG), usually repaired by the MMR pathway (392 citations,
average of 17 citations per year) [35]; iii) the identification and characterization of a general
repair process for DNA heteroduplexes in HeLa cell extracts using M13mp2 DNA substrates
containing single-base mismatches and extra nucleotides; the repair is Mg2+ and ATP-
dependent, but not dNTPs, and the use of inhibitors suggest that DNA polymerase-alpha may
function in this mismatch repair (277 citations, average of 15 citations per year) [36].
HELA CELLS AND INTERSTRAND CROSSLINK (ICL) REPAIR

In DNA damage, mutagenesis and genetic fields, the crosslinking of DNA occurs when
exogenous and/or endogenous agents react almost simultaneously with two different positions
(reactive functional groups) in the DNA molecules. This can either occur in the same strand
(intrastrand crosslink) or in the opposite strands of the DNA (interstrand crosslink).
Interstrand crosslinks (ICLs), the most common, are highly toxic DNA lesions that
prevent transcription and replication processes by inhibiting DNA strand separation.
Chemical agents that induce ICLs were the earliest chemotherapeutic drugs for treating or
preventing dissemination of tumors and are still widely used, despite that high doses of ICL-
inducing drugs may also lead to the development of tumor resistance. The understanding of
how cells repair these lesions started recently from insights and studies of individuals with
Fanconi anemia (FA), a rare genetic disorder that leads to ICL sensitivity. Uncovering how
the FA pathway links nucleases, helicases and other DNA-processing enzymes should lead to
more targeted uses of ICL-inducing agents in cancer treatment and could provide novel
insights into drug resistance. Further investigation into FA and other model diseases will
provide key insights regarding the use of DNA crosslinking agents in chemotherapy [37].

A)
B)
Figure 7. Citation reports from the Web of Science database using the terms “HeLa cells” vs “Mismatch
repair.” A) Published items in each year. B) Citations in each year.
The approximately fifty papers citing HeLa cells and ICLs (Figure 1) reached 07
publications in 2009/2012 and a maximum citation of approximately 180 in 2014 (Figure 8),
from 1,520 total citations and an average of 35 citations per year (Web of Science source).
The three additional cited papers from the present survey are: i) the identification of two new
nucleases, FAN1 and EXDL2, that interacts at the sites of damage with the key complex of
the ICL repair pathway, the FANCI-FAND2 (ID), in a way that FAN1 exerts its exo- and
endonuclease activity needed for crosslink repair (151 citations, average of 25 citations per
year) [38]; ii) a work describing new proteins interacting with FANCD2 such as BRCA2,
independent of its wild-type, suggesting that FANCD2 may play a role in the cellular
response to stalled replication forks or in the repair of replication-associated double-strand
breaks (133 citations, average of 11 citations per year) [39]; iii) a study showing that FACNJ

92 Fabio Luis Forti
interacts with the MutL complex of MMR repair, through its helicase domain, making this
interaction essential to correct the FANCJ-null phenotype of cells that usually exhibit
tetraploidy induced by ICL agents, demonstrating this important link between FA and MMR
enlarges the roles of FANCJ in ICL repair signaling independent of BRCA1 (98 citations,
average of 11 citations per year) [40].
HELA CELLS, ATYPICAL ENZYMES, AND THE GENOMIC STABILITY

Aging, cancer, and several human malignancies can be characterized by distinct
dysfunctional DNA repair pathways culminating in genomic instability. Despite the massive
amount of information about DNA damage-inducing agents and the classical mechanisms of
DNA repair, usually comprehending the six pathways briefly and previously described in this
communication, the number of studies investigating non-classical or non-canonical
mechanisms has grown vertiginously in the literature, aiming to enlarge the understanding of
new proteins and their new biological functions affecting genomic stability.
For example, the triad of typical Rho gtpases, consisting of Rhoa, Rac1, and Cdc42, were
identified and characterized around the 1990s and since then, these enzymes have been
associated with the regulation of diverse steps of cytoskeletal remodeling that are essential for
adhesion, migration, division, and other important biological processes [41-44]. Hence, they
are implicated in tumorigenesis and the invasion of metastatic tumors [45, 46]. More recently
(from the last five years), these enzymes have been identified in the nucleus and controlling
DNA-related processes by unknown mechanisms [47-49].
For instance, much data suggest that CDC42 may play a role in the DNA damage
response by interacting with or regulating possible repair pathway components, such as
proteins involved in senescence or apoptosis [50]. This hypothesis and data were recently
confirmed by Ascer L et al. 2015 that showed increased sensitivity to UV radiation of HeLa
cells overexpressing the constitutively active CDC42-V12 mutant, which presented failure in
DNA repair and DDR pathways [51].
In the case of Rac1, it was shown to be activated in response to gamma radiation in order
to mediate G2/M cell cycle arrest in MCF-7 breast cancer cells and to promote survival [52].
Similar results were observed in the DNA damage response and repair of cells lesioned by
doxorubicin and etoposide topoisomerase II inhibitors [53, 54]. Espinha G et al. 2015
observed an increase in DNA damage upon disturbed Rac1 GTPase activity under different
radiation treatments applied to HeLa cells and showed that the ATR/CHK1/H2AX axis of the
DDR pathway is affected by Rac1 inhibition [55]. Novel Rhoa functions regulated by reactive
oxygen species are relevant to some pathological conditions such as genotoxic stress-induced
DNA damage [56].
It was recently shown that Rhoa activation is mediated by its physical interaction with the
OGG1 protein, a key enzyme in the DNA repair of 8-oxoG modifications [57]. The crosstalk
of Rhoa signaling with DNA repair pathways was confirmed by the Osaki JH et al. 2015
paper using HeLa cells as a model, where the inhibition of Rhoa led to attenuation of DDR
pathways and reduction of HR and NHEJ mechanisms with consequent accumulation of DNA
strand breaks induced by ionizing radiation [58].

A)
B)
“Interstrand crosslink Repair.” A) Published items in each year. B) Citations in each year.
The survey shown in Figure 9 interrogated the terms” HeLa cells and Rho gtpases” in the
Web of Science and resulted in approximately three hundred papers published up to this date,
which reached 28 publications in 2011 (9A) and a maximum citation of approximately 1,150
in 2014 (9B), from 11,470 total citations and an average of 383 citations per year. When the
same survey included the term “DNA damage,” the result was eight papers totalizing 135
citations and average of 5 citations per year (Web of Science source), being the most cited (37
total citations) a work describing the involvement of Rhoa on the regulation of NF-κB under
genotoxic stress or TNFα in HeLa cells [59], and the three most recent coming from the
laboratory of Dr. Forti, the author of this chapter, all of them using HeLa cells as a cellular
model [51, 55, 58].

94 Fabio Luis Forti
A)
B)
Figure 9. Citation reports from the Web of Science database using the terms “HeLa cells” vs “Rho
gtpases.” A) Published items in each year. B) Citations in each year.
Another example of atypical enzymes acting directly or indirectly in genomic stability is

the case of the superfamily of protein tyrosine phosphatases (PTPs). Within the sub-class of
dual-specificity phosphatases (DSPs or DUSPs), some members are localized in the nucleus
and exert key functions related to DNA by not completely known mechanisms, such as the
Cdc25 [60] and the Cdc14 [61-62]. More recently, DUSP3 or VHR (vaccinia virus VH1-
related phosphatase), an atypical dual-specificity phosphatase widely expressed in many
tissues and found preferably in the nucleus of cells [63-64], has been shown to contribute to
genomic stability. Unpublished results from Dr. Forti`s group show that silencing (by RNAi)
or inhibiting the activity (with specific pharmacological inhibitors or using an inactive
Cys124 catalytic mutant) of DUSP3 interfere with DNA repair pathways (HR, NHEJ, NER)
of HeLa cells submitted to genotoxic stress.

A)
B)
Figure 10. Citation reports from the Web of Science database using the terms “HeLa cells” vs “Dual-
Specificity Phosphatases.” A) Published items in each year. B) Citations in each year.
Other recent works from the same group discovered that the DUSP3 could interact with
potential new nuclear targets under genotoxic stress conditions such as NBS1, NUCL, NPM
and hnRNP C1/C2 proteins, which are directly related to p53 and other DNA repair proteins
(Mre11, Rad51, ATM/ATR, CHK1/2, pH2AX and others) through poorly known
mechanisms [65-66]. Thus, DUSP3 seems to play an intimate role in the interplay or crosstalk
of different DNA repair mechanisms and cellular processes, for instance senescence, since
Ku70 and TERT proteins directly interact with these new DUSP3 targets, this being critical
for the dynamic intracellular localization of the telomerase complex, which was shown to be
visited by the DUSP3 [67].
When a survey interrogating the terms “HeLa cells and protein tyrosine phosphatases”
was performed in the Web of Science (not shown), it resulted in approximately four hundred

96 Fabio Luis Forti
papers published up to this date, which reached 18,721 total citations and an average of 624
citations per year.
When the same survey exchanged the term “PTP by DUSP” (dual-specificity
phosphatases), the result was 84 papers that reached 12 publications in 2013 (10A) and a
maximum citation of approximately 300 in 2014 (10B) from 4,272 total citations and an
average of 186 citations per year (Web of Science source).
The most cited work (total citations of 1265 and 55 citations per year) is a Cell paper
from 1993 describing for the first time the CDI1 phosphatase that forms complexes with
CDK2 dephosphorylates for the regulation of cell cycles [68].
Thus, doing the same survey and including the term “DNA damage” (HeLa cells + DUSP
+ DNA damage), the result was eleven papers totalizing 585 citations and an average of 25
citations per year (Web of Science source), being the most cited (total citations of 125) a
work proposing a dual mechanism of fine-tune regulation of the Cdc25 dual-specificity
phosphatase to control the progression through the eukaryotic cell division cycle in response
to cell environment changes [69]. Moreover, amongst the more recent publications are three
coming from the laboratory of Dr. Forti, the author of this chapter, all of them using HeLa
cells (and others) as the main cellular model [65, 66, 70].
CONCLUSION
This communication showed and described lots of scientometric analyses using the HeLa
cell line as a cellular model for investigating canonical DNA repair mechanisms, as well as
novel proteins that might have some impact on these, aiming to emphasize the importance of
HeLa cells for the genomic stability field, and obviously for many other scientific interests.
However, in closing, perhaps the greatest scientific contribution was not due to the HeLa cell
line, but due to Henrietta Lacks.
REFERENCES
[1] Scherer, W.F. et al., Studies on the propagation in vitro of poliomyelitis viruses. IV.
Viral multiplication in a stable strain of human malignant epithelial cells (strain HeLa)
derived from an epidermoid carcinoma of the cervix. J Exp Med. 1953. 97(5):695-710.
[2] Brendan P. Lucey, MD; Walter A. Nelson-Rees, PhD†; Grover M. Hutchins, MD.
Henrietta Lacks, HeLa Cells, and Cell Culture Contamination. Arch Pathol Lab Med.
Vol 133, September 2009.
[3] Scherer, W.F. et al., Preservation at subzero temperatures of mouse fibroblasts (strain
L) and human epithelial cells (strain HeLa). Proc Soc Exp Biol Med. 1954. 87(2):480-
487.
[4] Eagle, H., The specific amino acid requirements of a human carcinoma cell (Stain
HeLa) in tissue culture. J Exp Med. 1955. 102(1):37-48.
[5] Eagle, H., Nutrition Needs of Mammalian Cells in Culture. Science, 1955. 122, 501.
[6] Eagle, H., et al., myo-Inositol as an Essential Growth Factor for Normal and Malignant
Human Cells in Tissue Culture. J. Biol. Chem., 1956. 214, 845-847.

[7] Eagle, H., Amino Acid Metabolism in Mammalian Cell Cultures. Science, 1959. 130,
432-437.
[8] Puck, T.T., et al., A rapid method for viable cell titration and clone production with
HeLa cells in tissue culture: the use of X-irradiated cells to supply conditioning factors.
Proc Natl Acad Sci U S A., 1955. 41(7):432-437.
[9] Terasima, T., X-ray sensitivity and DNA synthesis in synchronous populations of HeLa
cells. Science, 1963. 140(3566):490-492.
[10] Crawford, L.V., et al., Detection of a common feature in several human tumor cell
lines--a 53,000-dalton protein. J. Proc Natl Acad Sci USA., 1981. 78(1):41-45.
[11] Oren, M., et al., Molecular cloning of a cDNA specific for the murine p53 cellular
tumor antigen. Proc Natl Acad Sci U S A., 1983. 80(1):56-59.
[12] Brown, C.J., Awakening guardian angels: drugging the p53 pathway. Nature Reviews
Cancer, 2009. 9:862-873.
[13] Rogakou, E.P., et al., DNA double-stranded breaks induce histone H2AX
phosphorylation on serine 139. Journal of Biological Chemistry, 1998. 273(10):5858-
5868.
[14] Feng, J.L., et al., The RNA component of human telomerase. Science, 1995.
269(5228):1236-1241.
[15] Denissenko, M.F., et al., Preferential formation of benzo[a]pyrene adducts at lung
cancer mutational hotspots in P53. Science, 1996. 274(5286):430-432.
[16] Moynahan, M.E., et al., Mitotic homologous recombination maintains genomic stability
and suppresses tumorigenesis. Nat Rev Mol Cell Biol., 2010. 11(3):196-207.
[17] Shrivastav, M., et al., Regulation of DNA double-strand break repairs pathway choice.
Cell Research, 2008. 18(1):134-147.
[18] Vahsen, N., et al., AIF deficiency compromises oxidative phosphorylation. Embo
Journal, 2004. 23(23): 4679-4689.
[19] Taniguchi, T., et al., S-phase-specific interaction of the Fanconi anemia protein,
FANCD2, with BRCA1 and RAD51. Blood, 2002. 100(7): 2414-2420.
[20] Lieber, M.R., et al., Mechanism and regulation of human non-homologous DNA end-
joining. Nat Rev Mol Cell Biol., 2003. 4(9):712-20.
[21] Teo, S.H., et al., Identification of Saccharomyces cerevisiae DNA ligase .4.
Involvement in DNA double-strand break repair. Embo Journal, 1997. 16(15):4788-
4795.
[22] Weterings, Eric., et al., The endless tale of non-homologous end-joining. J. Cell
Research, 2008. 18(1):114-124.
[23] Wang, H.C., et al., DNA ligase III as a candidate component of backup pathways of
nonhomologous end joining. Cancer Research, 2005. 65(10):4020-4030.
[24] Batty, D.P., et al,. Damage recognition in nucleotide excision repair of DNA. Gene,
2000. 241(2):193-204.
[25] O'Driscoll, M., et al., A splicing mutation affecting expression of ataxia-telangiectasia
and Rad3-related protein (ATR) results in Seckel syndrome. Nature Genetics, 2003.
33(4):497-501.
[26] Masutani, C., et al., Xeroderma pigmentosum variant (XP-V) correcting protein from
HeLa cells has a thymine dimer bypass DNA polymerase activity. Embo Journal, 1999.
18(12):3491-3501.

98 Fabio Luis Forti
[27] Crosby, M.E., et al., MicroRNA Regulation of DNA repair Gene Expression in
Hypoxic Stress. Cancer Research, 2009. 69(3):1221-1229.
[28] Leandro, G.S., et al., The impact of base excision DNA repair in age-related
neurodegenerative diseases. Mutat Res., 2015. 776:31-39.
[29] Masson, M., et al., XRCC1 is specifically associated with poly(ADP-ribose)
polymerase and negatively regulates its activity following DNA damage. Molecular
And Cellular Biology, 1998. 18(6):3563-3571.
[30] Ramana, C.V., et al., Activation of apurinic/apyrimidinic endonuclease in human cells
by reactive oxygen species and its correlation with their adaptive response to
genotoxicity of free radicals. PROCEEDINGS OF The National Academy of Sciences of
the United States Of America, 1998. 95(9):5061-5066.
[31] Otterlei, M., et al., Post-replicative base excision repair in replication foci. Embo
Journal, 1999. 18(13):3834-3844.
[32] Jiricny J., The multifaceted mismatch-repair system. Nat Rev Mol Cell Biol., 2006.
7(5):335-46.
[33] Lynch, H.T., et al., Milestones of Lynch syndrome: 1895-2015. Nat Rev Cancer, 2015.
15(3):181-94.
[34] Drummond, J.T., et al., Isolation of an HMSH2-P160 heterodimer that restores DNA
mismatch repair to tumor-cells. Science, 1995. 268(5219):1909-1912.
[35] Kat, A., et al., An alkylation-tolerant, mutator human cell-line is deficient in strand-
specific mismatch repair. Proceedings of the National Academy of Sciences of the
United States of America, 1993. 90(14):6424-6428.
[36] Thomas, D.C., et al., Heteroduplex repair in extracts of human HeLa-cells. Journal of
Biological Chemistry, 1991. 266(6):3744-3751.
[37] Deans, A.J., et al., DNA interstrand crosslink repair and cancer. Nat Rev Cancer, 2011.
11(7):467-80.
[38] Smogorzewska, A., et al., A Genetic Screen Identifies FAN1, a Fanconi Anemia-
Associated Nuclease Necessary for DNA Interstrand crosslink Repair. Molecular Cell,
2010. 39(1):36-47.
[39] Hussain, S., et al., Direct interaction of FANCD2 with BRCA2 in DNA damage
response pathways. Human Molecular Genetics, 2004. 13(12):1241-1248.
[40] Peng, Min., et al., The FANCJ/MutL alpha interaction is required for correction of the
cross-link response in FA-J cells. Embo Journal, 2007. 26(13):3238-3249.
[41] Aktories, K., et al., The rho gene product expressed in E. coli is a substrate of
botulinum ADP-ribosyltransferase C3. Biochemical and Biophysical Research
Communications, 1989. 158(1):209-213.
[42] Polakis, P.G., et al., Identification of the ral and rac1 gene products, low molecular
mass GTP-binding proteins from human platelets. The Journal of Biological Chemistry,
1989. 264(28):16383-16389.
[43] Bender, A., et al., Multicopy suppression of the cdc24 budding defect in yeast by
CDC42 and three newly identified genes including the ras-related gene RSR1.
Proceedings of the National Academy of Sciences of the United States of America,
1989. 86(24):9976-80.
[44] Ridley, A.J., Rho gtpases and actin dynamics in membrane protrusions and vesicle
trafficking. Trends in Cell Biology, 2006. 16(10):522-529.

[45] Fritz, G., et al., Rho gtpases are over-expressed in human tumors. International Journal
of Cancer, 1999. 81(5):682-687.
[46] Wilson, K.F., et al., Rho gtpases and their roles in cancer metabolism. Trends in
Molecular Medicine, 2013. 19(2):74-82.
[47] Heynen, S.R., et al., Retinal degeneration modulates intracellular localization of
CDC42 in photoreceptors. Molecular Vision, 2011. 17:2934-46.
[48] Navarro-Lérida, I., et al., Rac1 nucleocytoplasmic shuttling drives nuclear shape
changes and tumor invasion. Dev Cell, 2015. 32(3):318-334.
[49] Dubash, A.D., et al., The small GTPase Rhoa localizes to the nucleus and is activated
by Net1 and DNA damage signals. PLoS One, 2011. 6(2):e17380.
[50] Wang, L., et al., Cdc42 GTPase activating protein deficiency promotes genomic
instability and premature aging-like phenotypes. Proc Natl Acad Sci USA, 2007.
104:1248–1253.
[51] Ascer, L.G., et al., CDC42 Gtpase Activation Affects Hela Cell DNA repair and
Proliferation Following UV Radiation-Induced Genotoxic Stress. Journal of Cellular
Biochemistry, 2015. 116(9):2086-2097.
[52] Yan, Y., et al., RAC1 GTPase plays an important role in gamma-irradiation induced
G2/M checkpoint activation. Breast Cancer Research, 2012. 14(2):R60.
[53] Huelsenbeck, S.C., et al., Rac1 protein signaling is required for DNA damage response
stimulated by topoisomerase II poisons. The Journal of Biological Chemistry, 2012.
287(46):38590-38599.
[54] Wartlick, F., et al., DNA damage response (DDR) induced by topoisomerase II poisons
requires nuclear function of the small GTPase Rac. Biochimica et Biophysica Acta,
2013. 1833(12):3093-3103.
[55] Espinha, G., et al., Rac1 GTPase-deficient HeLa cells present reduced DNA repair,
proliferation, and survival under UV or gamma irradiation. Molecular and Cellular
Biochemistry, 2015. 404(1-2):281-297.
[56] Aghajanian, A., et al., Direct activation of Rhoa by reactive oxygen species requires a
redox-sensitive motif. PloS One, 2009. 4(11):e8045.
[57] Luo, J., et al., 8-Oxoguanine DNA glycosylase-1-mediated DNA repair is associated
with Rho gtpase activation and alpha-smooth muscle actin polymerization. Free
Radical Biology & Medicine, 2014. 73:430-438.
[58] Osaki, J.H., et al., Modulation of Rhoa GTPase Activity Sensitizes Human Cervix
Carcinoma Cells to -Radiation by Attenuating DNA repair Pathways. Oxidative
Medicine and Cellular Longevity, 2015. In Press.
[59] Gnad, R., et al., Rho gtpases are involved in the regulation of NF-kappa B by genotoxic
stress. Experimental Cell Research, 2001. 264(2):244-249.
[60] Wurzenberger, C., et al., Phosphatases: providing safe passage through mitotic exit,
Nat. Rev. Mol. Cell Biol., 2011, 12:469–482.
[61] Mocciaro, A., et al., Cdc14: a highly conserved family of phosphatases with non-
conserved functions, J. Cell Sci., 2010, 123:2867–2876.
[62] Wei, Z., et al., Early-onset aging and defective DNA damage response in Cdc14b-
deficient mice, Mol. Cell. Biol., 2011, 31:1470–1477.
[63] Ishibashi, T., et al., Expression cloning of a human dual-specificity phosphatase. Proc
Natl Acad Sci USA., 1992. 89(24):12170-12174.

100 Fabio Luis Forti
[64] Henkens, R., et al., Cervix carcinoma is associated with an up-regulation and nuclear
localization of the dual-specificity protein phosphatase VHR. BMC Cancer, 2008.
8:147.
[65] Panico, K., et al., Proteomic, cellular, and network analyses reveal new DUSP3
interactions with nucleolar proteins in HeLa cells. Journal of Proteome Research, 2013.
12(12):5851-5866.
[66] Forti, F.L., Combined experimental and bioinformatics analysis for the prediction and
identification of VHR/DUSP3 nuclear targets related to DNA damage and repair.
Integrative Biology, 2015. 7(1):73-89.
[67] Bartocci, C., et al., Isolation of chromatin from dysfunctional telomeres reveals an
important role for Ring1b in NHEJ-mediated chromosome fusions. Cell Reports, 2014.
7(4):1320-1332.
[68] Gyuris, J., et al., CDI1, a human G1-phase and S-phase protein phosphatase that
associates with CDK2. CELL, 1993. 75(4):791-803.
[69] Donzelli, M., et al., Dual mode of degradation of Cdc25 A phosphatase. EMBO
JOURNAL, 2002. 21(18):4875-4884.
[70] Forti, F.L., et al., Investigating roles of dual tyrosine phosphatases in DNA damage
responses. International Journal of Low Radiation, 2010. 7(4):259-267.

Chapter 6
THE MOLECULAR GENETICS

OF POLYCYTHEMIA VERA
Linda M. Scott∗
The University of Queensland Diamantina Institute,
University of Queensland, Translational Research Institute,
Brisbane, Australia
ABSTRACT
The past decade has seen unprecedented improvements in the diagnosis and
management of polycythemia vera (PV), a blood cancer that is characterized by the
predominant expansion of morphologically normal red blood cells. A significantly
increased understanding of the molecular and cellular biology underlying this disorder
has in large part spearheaded these improvements. Numerous genetic studies have
revealed that the vast majority of PV patients have an acquired mutation within Janus
kinase 2 (JAK2), a cytoplasmic tyrosine kinase that is constitutively associated with
cytokine receptors that lack intrinsic kinase activity. Within the erythroid lineage, JAK2
is the predominant mediator of intracellular signaling following activation of the
erythropoietin receptor. Approximately 95% of patients with PV have a “JAK2V617F”
mutation, which is the result of a single nucleotide change in exon 14 of the JAK2 gene;
the remainder are instead positive for one of a series of mutations collectively known as
the JAK2 exon 12 mutations. All of these variants cause constitutive activation of JAK2-
mediated intracellular signaling in vitro and in vivo, and are associated with the
erythropoietin-hypersensitive expansion of erythroid progenitors that is a hallmark
feature of PV. In various transgenic mouse models, expression of mutant JAK2 generates
an erythrocytosis phenotype that is remarkably similar to that of PV in humans,
suggesting that the acquisition of a JAK2 mutation alone is sufficient to initiate the
development of this myeloproliferative neoplasm (MPN). Accordingly, several inhibitors
of JAK2 activity have been evaluated in patients with PV, with one recently being
approved by the Federal Drug Administration.
∗
Corresponding author: Linda M. Scott, Ph.D. The University of Queensland Diamantina Institute, Translational
Research Institute, Kent Street, Brisbane, Queensland 4012, Australia, email: l.scott3@uq.edu.au, phone: +61
7 3443 7093, FAX: +61 7 3443 6966.

102 Linda M. Scott
I. THE PATHOGENESIS OF POLYCYTHEMIA VERA

Discovery of the JAK2V617F Mutation
Since its initial description in the medical literature in the late 1800s, there have been
numerous findings that have improved our understanding of polycythemia vera (PV), our
ability to accurately diagnose it, and, most recently, to manage its treatment. Arguably the
most important findings have been the discoveries of the acquired genetic mutations that
account for all, or almost all, cases of PV: specifically, those mutations that inappropriately
activate the intracellular signaling molecule, Janus kinase 2 (JAK2) – the JAK2V617F and
JAK2 exon 12 mutations.
Two studies originally pointed the direction of investigators towards JAK2 as being
important in the pathogenesis of the MPNs. In 2002, in a genome-wide screen for regions of
loss-of-heterozygosity (LOH) in patients with PV, Joseph Prchal and colleagues identified a
significant proportion of individuals (~33%) with LOH involving chromosome 9p, a region of
the genome that included the JAK2 locus [1]. At that time, this gene was viewed as an
attractive target because it encoded JAK2, a cytoplasmic tyrosine kinase that interacts with
several receptors for cytokines that are important in regulating hematopoietic differentiation
and/or function – including those for erythropoietin (EPO) and thrombopoietin (TPO).
Unfortunately, no mutations were detected in the protein-encoding regions of the JAK2 locus
or in another eighteen neighboring loci when assessed by DNA sequencing, suggesting that
the LOH observed might not related to the pathogenesis of the MPNs.
However, the importance of JAK2 was further underscored by the finding that EPO-
independent differentiation of cells cultured from the peripheral blood of PV patients could be
prevented in vitro by exposure to inhibitors of JAK2, as well as PI3K and SRC [2]. This led
investigators to once again assess the JAK2 coding sequence in patients with PV, with the
identification of an acquired single G-to-T substitution within exon 14 [3]. At the same time,
Radek Skoda and colleagues confirmed the chromosome 9p LOH detected three years earlier
[4], but also identified the sequence change detected by William Vainchenker and colleagues.
Two other groups used emerging high-throughput DNA sequencing approaches to interrogate
the entire 518-gene kinome [5] or the 85-gene tyrosine kinome [6] in patients with an MPN,
after recognizing that the targets for mutation in two related blood disorders (KIT in systemic
mastocytosis [7] and PDGFRA in hypereosinophilic syndrome [8]) encoded tyrosine kinases.
Using this approach, both teams identified the recurrent G-to-T substitution in JAK2 exon 14.
The mutation reported in all four of these studies results in the alteration of an amino acid
at position 617 of the JAK2 protein – a valine (V) residue is replaced by a phenylalanine (F);
accordingly, it is denoted as the JAK2V617F mutation. This mutation was detected not only
in patients with PV but also in those diagnosed with ET or MF, although the frequencies in
each of the MPN subtypes varied between reports. This was due primarily to the method used
to detect the base substitution: most groups screened patient granulocyte DNA samples by
standard dideoxy sequencing, whereas one study additionally used an approach based upon
allele-specific PCR [5, 9]. This more sensitive technique revealed that approximately 95% of
PV patients had the JAK2V617F mutation, compared to about 55% of cases of ET and 50% of
cases of MF. The improved sensitivity of the PCR-based assay was most relevant when
screening patients with ET – when DNA sequencing was used, 45 of 51 cases appeared

Mutations Underlying PV 103
JAK2V617F-negative, yet 23 of these cases subsequently tested positive when the mutation-
specific PCR assay was employed [5]. With multiple follow-up studies having been reported,
these disease-specific frequencies have not altered appreciably, and do not appear to be
influenced by the patient’s ethnicity.
The JAK2V617F mutation occurs rarely in solid tumors [10], although it has been
detected in very occasional biopsies of non-small cell lung cancer [11]. It appears relatively
more frequently in myeloid neoplasms other than the MPNs [10, 12-15]; in particular, two
myelodysplasia/myeloproliferative neoplasm (MDS/MPN) overlap syndromes: refractory
anemia with ringed sideroblasts and thrombocytosis (RARS-T) and chronic myelomonocytic
leukemia (CMML). It can also occur in cases of Philadelphia chromosome-positive chronic
myeloid leukemia (CML), although, in the cases characterized to date, evidence suggested the
presence of a co-existent MPN [10, 16, 17]. As expected, the JAK2V617F mutation is
detected frequently in patients with acute myeloid leukemia (AML) that is secondary to an
MPN, but is infrequent in cases of de novo AML. The presence of a JAK2V617F mutation is
not associated with chronic lymphocytic leukemia (CLL) and acute lymphoblastic leukemia
(ALL).
JAK2V617F-Negative Polycythemia Vera
Even after the development of methods that could detect the JAK2V617F mutant allele at
low levels (< 2% of alleles within genomic DNA from purified granulocytes), a small
percentage of PV patients remained JAK2V617F-negative. Several explanations for the
inability to detect this mutation were proposed: that these cases were inaccurately or
incorrectly diagnosed, or that the mutant allele burden remained below the detection threshold
as a consequence of treatment with interferon [18]. As neither explanation accounted for the
two JAK2V617F-negative patients from their cohort of 73 PV cases [5], Scott and colleagues
instead reasoned that these individuals might have acquired mutations elsewhere in JAK2, or
alternatively in other members of the JAK family. A mutation screen of the twenty-five exons
of JAK2 in these two patients was first undertaken; in each case, an acquired genetic mutation
was observed within exon 12, although they differed from one another – one was a 6-base
pair deletion that resulted in the loss of two amino acids and substitution of a third, while the
other was a three-base change that resulted in the substitution of adjacent amino acids [19].
Nonetheless, both mutations affected the same region of JAK2, from phenylalanine-537 to
lysine-539. An additional eight JAK2V617F-negative patients diagnosed with PV were then
found to also have an acquired mutation in JAK2 exon 12; six produced a lysine-to-leucine
substitution at position 539, whereas residues 542 and 543 were deleted in another two cases
[19]. A review of the literature published four years later highlighted the complexity of
mutations present in the distal half of JAK2 exon 12, with 37 different variants having been
reported in a total of 172 patients [20]. These variants included eleven non-synonymous base
substitutions, twenty deletions, and six duplications that affected residues 536-547, a region
that spans the start of the JH2 domain and has limited homology to the analogous regions of
JAK1, JAK3 or TYK2. The most frequent mutations identified were the N542-E543del
variants (23% of total), the N543-D544del variant (11%), the F537-K539delinsL and K539L
variants (10% each), and the R541-E543delinsK variant (8%).

104 Linda M. Scott
Interestingly, patients with a JAK2 exon 12 mutation present with a phenotype that is
different from classic (JAK2V617F-positive) PV. Rather than having tri-lineal involvement,
these cases are associated with an isolated erythrocytosis, with white cell and platelet counts
in the majority of individuals being within the normal range [19-21]. In contrast, hematocrit
and hemoglobin levels at diagnosis are significantly higher in JAK2 exon 12-positive cases
than in JAK2V617F-positive cases (61±8% compared to 55 ± 7%, and 190 ± 28g/L compared
to 180 ± 23g/L, respectively) [20]. This strong erythroid drive may in part explain the earlier
age at diagnosis (52 years for exon 12-mutated cases, 59 years for JAK2V617F-positive
cases) [20]. These genotype-associated phenotypic differences are apparent in assessments of
bone marrow morphology, with the caveat that this has only been rigorously examined in a
small number of cases [22, 23]. Moderately hypercellular bone marrow was noted in all JAK2
exon 12-mutated patients, with the pan-myelosis that is observed in most instances of PV
absent in the JAK2 exon 12-mutated cases. Instead, predominant erythroid hyperplasia was
apparent, with myeloid-to-erythroid (ME) ratios ranging from 1:1 to 1:6. The prominent
clusters of megakaryocytes that characterize JAK2V617F-positive PV were absent, although
some cases had elevated megakaryocyte numbers. These formed subtle clusters on occasion;
others displayed mildly atypical chromatin distribution patterns and nuclear structure.
Despite these differences in phenotype, Passamonti and colleagues have shown that PV
patients with a JAK2 exon 12 mutation or a JAK2V617F mutation have similar incidence rates
for thrombosis, hemorrhage, transformation to myelofibrosis or leukemia, and death [21].
Accordingly, the current risk stratification for PV patients (based upon data from those with
JAK2V617F-positive PV) can also be applied to those with a JAK2 exon 12 mutation.
JAK2 Exon 12 Mutations and the Diagnostic Criteria for PV
The phenotype associated with a JAK2 exon 12 mutation raised the possibility that a
subset of individuals with this genetic lesion might have been diagnosed with idiopathic
erythrocytosis (IE), a disease classification that is used for an increased red cell mass of
unknown etiology. A single study has evaluated the frequency of JAK2 exon 12 mutations in
patients diagnosed with IE [22]. Fifty-eight patients that lacked mutations in the EPOR,
PHD2 or VHL genes, and presented with normal or low serum EPO levels were screened for
a JAK2 exon 12 mutation by DNA sequencing and allele-specific PCR. A quarter of the cases
in the latter grouping (8 patients) were mutation-positive, making the JAK2 exon 12 variants
the most common molecular defect identified within this patient subgroup. Furthermore, a
comparison of the hematologic parameters of mutation-positive and mutation-negative low
serum EPO cases showed that hemoglobin concentration and white cell counts did not differ,
although the mutation-positive individuals presented with higher average platelet numbers
(309 ± 74 x109/L versus 227 ± 70 x109/L). Each mutation-positive patient also tested positive
for the presence of EPO-independent erythroid colonies (EECs), which were absent from
each of the seven mutation-negative, low EPO patients assessed. This observation suggested
that a patient presenting with erythrocytosis that was also EEC-positive should be tested for
the presence of a JAK2 exon 12 mutation.
Patients with a JAK2 exon 12 mutation could have been diagnosed with either PV or IE
as a consequence of the diagnostic guidelines that were issued in the early 2000s by the
World Health Organization (WHO) or the Polycythemia Vera Study Group (PVSG) [24, 25].

A proportion of affected patients failed to fulfil these criteria, resulting in the exclusionary
diagnosis of IE. Of 145 JAK2 exon 12 mutation-positive patients whose diagnosis has been
recorded in the literature, 119 (82%) did not meet these criteria [20]. However, revisions to
the WHO criteria in 2008 that included the presence of a JAK2V617F “or other functionally
similar mutation, such as a JAK2 exon 12 mutation” as one of two required major criteria
ensured that these individuals would now be correctly diagnosed as having PV.
Do Cases of PV without a JAK2 Mutation Exist?
The incidence of JAK2 exon 12 mutations in three independent cohorts of PV patients

from Asia, Australia or the UK ranged between 2.5 and 3.0% [26-28], although the true
frequency of these variants is likely to be slightly higher given that the cases in these cohorts
had been diagnosed using 2001 diagnostic criteria, and therefore excluded those diagnosed as
having IE. The high incidence of JAK2 exon 12 mutations detected within Taiwanese patients
(25%) in another study raised the possibility that some ethnicities are more likely to acquire a
JAK2 exon 12 mutation [29], but might simply reflect the small sample size of this particular
cohort (n = 25). In each of these four studies, however, all patients had either a JAK2V617F
or a JAK2 exon 12 mutation, although only a majority of JAK2V617F-negative PV cases were
JAK2 exon 12 mutation-positive in other studies. For example, in two large cohorts [30, 31],
1 of 220 patients and 4 of 338 patients lacked any JAK2 mutation. The existence of patients
with an apparent wildtype JAK2 genotype could be the consequence of several factors:
sensitivity of the genotype assay used, treatment such that the mutant allele burden at
sampling is below the limits of detection, or an inaccurate diagnosis. Alternatively, these
cases might have arisen following the acquisition of genetic mutations that remain to be
identified. Frame-shift mutations that affect calreticulin are not, however, responsible; CALR
mutations occur frequently in patients with ET or MF [32, 33], but have not been reported in
instances of PV.
Functional Consequences of the PV-Associated JAK2 Mutations
The JAK family of cytoplasmic tyrosine kinases in vertebrates is comprised of four

closely related members: JAK1, JAK2, JAK3 and tyrosine kinase 2 (TYK2). Each of these
family members is found constitutively associated with a cytokine receptor that itself lacks
intrinsic tyrosine kinase activity. In hematopoietic cells, these receptors can include those for
EPO, TPO, and granulocyte colony-stimulating factor (G-CSF). There are four distinct
domains within each JAK protein. The FERM domain mediates its interactions with cytokine
receptor subunits, whereas the SH2 domain mediates interactions with positive or negative
regulators of JAK kinase activity. The JAK proteins are distinctive in that they also contain
two kinase-like domains, referred to as JAK-homology-1 (JH1) and JAK-homology-2 (JH2).
The latter domain lacks several features that are thought to be required for a functioning
kinase, but it nevertheless has an essential role in suppressing basal kinase activity [34, 35]. In
a cytokine-low environment, the JAK2 JH2 domain is constitutively phosphorylated on two
residues, serine-523 and tyrosine-570, which strengthens its inhibitory interactions with the
JH1 domain, thereby suppressing the kinase activity of JAK2 [36, 37]. Engagement of a

106 Linda M. Scott
receptor with its ligand induces structural changes [38], resulting in JAK2 activation via auto-
phosphorylation on tyrosines 1007 and 1008, with a reduction in phosphorylated serine-523
and tyrosine-570 levels (Figure 1).
Figure 1. Erythropoietin (EPO)-responsive canonical and non-canonical JAK2 signaling in erythroid

precursor cells. Exposure of erythroid cells to EPO causes the activation of both canonical and non-
canonical JAK2 signaling. Engagement of the erythropoietin receptor (EPOR) with its ligand, EPO,
induces structural changes within the receptor and JAK2, resulting in JAK2 activation via auto-
phosphorylation. In canonical signaling, activated JAK2 then phosphorylates tyrosine residues in the
EPOR cytoplasmic domain to which it is tethered, providing docking sites for signaling proteins such as
members of the signal-transducer-and-activator-of-transcription (STAT) family. Recruited STAT
monomers are activated by JAK2 phosphorylation, with phosphorylated STAT5 translocating into the
nucleus, where it binds to STAT consensus binding sites in the regulatory regions of target genes and
enhances transcription. Amongst these targets is the one encoding SOCS1, a member of the SOCS
family of adaptor proteins that target phosphorylated JAK2 for proteosomal degradation. PIM1, whose
expression is also regulated by STAT5, enhances SOCS activity by phosphorylating and stabilizing
these proteins. Production of SOCS1 and PIM1 therefore form a negative feedback loop to inhibit
continued stimulation of JAK/STAT signaling. Non-canonical JAK-mediated signaling is also at play
in erythroid precursors. Although the mechanism is unclear, activated JAK2 also undergoes nuclear
translocation, where it phosphorylates tyrosine-41 of histone H3 and excludes binding of
heterochromatin protein-1α (HP-1α) to this residue. This results in the relaxation of heterochromatic
regions of the genome and the transcription of functionally important genes.
Activated JAK2 phosphorylates tyrosine residues present in the cytoplasmic domain of

the receptor to which it is tethered, providing docking sites for cytoplasmic signaling proteins,
such as the members of the signal-transducer-and-activator-of-transcription (STAT) family.
The recruited STAT monomers are activated by phosphorylation by JAK2, then dimerize and

translocate into the nucleus, where they enhance transcription from specific loci. JAK
activation also can activate the RAS-MAPK and PI3K-AKT signaling pathways.
The JH2 domain of JAK2 includes valine-617, the site of mutation in the majority of
MPN patients. The function of this domain is not fully apparent, although a well-regarded
mathematical model of its structure, proposed several years before identification of the
JAK2V617F mutation, has provided useful insights. This model predicted that valine-617
forms an unfavourable interaction with residues within the JH1 domain following JAK2
activation [39]; an amino acid substitution at this site may therefore induce a conformational
shift that mimics the repulsive interactions between the JH1 and JH2 domains that normally
follow JAK2 activation. Although this mathematical model did not provide insights into the
function of residues 536-547 (which are affected by the JAK2 exon 12 mutations), mapping
of these residues onto the hypothetical JAK2 structure revealed that they are located in a loop
adjacent to one that contains valine-617, and that both are in in close proximity to the JH1
domain. It is therefore possible that alterations in the structure or orientation of the loop
affected by a JAK2 exon 12 mutation might alleviate JAK2 auto-inhibition. In support of the
scenario proposed above, the EECs that can be cultured from peripheral blood samples from
JAK2 mutation-positive MPN patients are never JAK2-wildtype [5, 19, 22, 40].
To test this hypothesis further, investigators expressed wildtype or mutant JAK2 in Ba/F3
cells, which are derived from murine pro-B cells and require exogenous interleukin-3 (IL3)
for viability and proliferation in vitro. These cells co-expressed the receptors for TPO or EPO
to act as a scaffold to which JAK2 could bind. Cytokine withdrawal assays, which many
scientists consider to be the “gold standard” in vitro assay for the analysis of mutations
present in protein kinases, were performed to determine whether disease-associated mutations
were sufficient to enable cytokine-independent proliferation [3, 4, 6, 19]. Whereas wildtype
JAK2 resulted in a lack of cell proliferation and a reduction in numbers of viable Ba/F3 cells
within the assay system, cytokine-independent growth was observed following expression of
JAK2V617F or any one of the four JAK2 exon 12 variants tested. This was accompanied by
phosphorylation of JAK2 on tyrosines 1007/1008 and of STAT5 on tyrosine-694, indicating
that canonical JAK/STAT signaling had been activated, and that the JAK2 mutations present
in patients with an MPN were gain-of-function mutations. Mutant JAK2 expression also
resulted in the phosphorylation of AKT and ERK1/2. Interestingly, the levels of
phosphorylated JAK2 and ERK1/2 differed between the variants tested, with three of the four
JAK2 exon 12 mutants resulting in higher levels of phospho-JAK2 than that associated with
JAK2V617F expression; phosphorylated ERK1/2 levels were uniformly higher. Although it is
not clear what effect, if any, these signaling differences may have on the resulting phenotype,
it remains possible that they explain, at least in part, the hematologic differences observed
between JAK2V617F-positive and JAK2 exon 12-positive cases of PV.
In addition to the canonical JAK/STAT pathway, the presence of a non-canonical JAK
signaling pathway was revealed a decade ago by studies in Drosophila [41]. This pathway
involves the translocation of activated JAK protein into the nucleus, by a mechanism that has
not been identified, where it disrupts gene silencing by displacing heterochromatin protein-1
(Hp1), resulting in the expression of normally silent genes. In human cells, nuclear JAK2
phosphorylates histone H3 on tyrosine-41 (H3Y41); this prevents the heterochromatin
protein, HP1α, from binding to this site and leads to alterations in the chromatin structure
surrounding transcriptionally inactive genes [42]. In hematopoietic cells from MPN patients,

108 Linda M. Scott
the presence of a JAK2 mutation will ensure a fraction of total JAK2 protein is located within
the nucleus, but it is unclear what effects, if any, this has on disease phenotype.
Importantly, it was these in vitro studies that provided the justification for the testing of
drugs designed to inhibit JAK2 activity, first in a pre-clinical setting and subsequently in
Phase I/II and III clinical trials involving patients with MF.
II. CLONAL EVOLUTION AND PV

A decade of study into the genetics of PV, ET and MF has revealed that the mutant clone
in these disorders acquire additional genetic alterations over time (Figure 2). This
accumulation is probably both as a consequence of the increased mutation rates associated
with expression of mutant JAK2 [43], and the inhibition of apoptotic pathways activated as
the result of DNA damage by mutant JAK2 [44, 45]. Some of the consequent secondary
alterations significantly alter disease phenotype, whilst others facilitate clonal dominance and
may contribute to the eventual progression to MF or to development of an acute leukemia.
They might all prove to be suitable targets for pharmacologic intervention.
Figure 2. A model for mutation accumulation and phenotypic modulation in the MPNs. A simple model
for disease initiation and evolution in the MPNs can be proposed (modified from [45]). Patients that are
diagnosed with ET will have an MPL mutation (MPLS505N, MPLW515L or MPLW515K), a CALR
mutation (type I or type II variants), one or more mutations affecting unknown targets (collectively
referred to here as an X), or a single JAK2V617F (VF) mutation. Duplication of the JAK2V617F allele
or acquisition of extra mutations in the affected JAK2 allele (VF*) instead produces a polycythemic
phenotype, as does the acquisition of a mono-allelic JAK2 exon 12 (ex12) mutation. Acquisition over
time of “secondary” driver mutations (Y and/or Z), which likely include those affecting TET2, IDH1/2,
DNMT3A, ASXL1 and EZH2, would contribute to an eventual transformation to myelofibrosis.

JAK2V617F-Positive PV Arises out of JAK2V617F-Positive ET
A debate regarding the importance of the JAK2V617F mutation to MPN pathogenesis

was ignited by the identification of this mutation not only in patients with PV, but also in
those with ET or MF. How could one mutation give rise to three distinct, albeit related,
disease entities? Three different explanations were entertained [46]. Did this depend upon the
lineage commitment of the cell acquiring the mutation – not a true hematopoietic stem cell,
but rather a precursor committed to megakaryopoiesis in those patients presenting as having
ET, or to erythropoiesis in cases of PV? Or did genetic factors within the background of a
particular individual influence the final phenotypic manifestation? And did the accumulation
of other unknown acquired mutations significantly influence the final disease phenotype?
The first insights into the relationship of PV to ET was provided later in 2005, when
investigators performed a genotype-phenotype analysis using peripheral blood samples and
clinical data from patients that were enrolled in the Medical Research Council’s Primary
Thrombocythemia (MRC-PT1) trial [47]. This trial was performed to compare the impact of
drug (low-dose aspirin, and hydroxyurea or anagrelide) on more than 800 ET patients that
were at risk for thrombosis [48]. The JAK2V617F mutation was detected in 414 of the 776
trial participants (53%). There were striking differences in the counts of JAK2V617F-positive
and JAK2V617F-negative patients, with the former grouping having phenotypic similarities to
those individuals diagnosed with PV. Mutation-positive individuals presented with higher
hemoglobin levels and neutrophil and white cell counts, but with significantly lower platelet
counts. At trial entry, this subgroup also had significantly lower serum EPO levels than the
mutation-positive subgroup; the reduced EPO levels were independent of hemoglobin level,
suggesting that feedback suppression may compensate for an increased erythroid drive in
patients with a JAK2 mutation. In addition, the six patients that had experienced polycythemic
transformation during the trial’s duration were all positive for the JAK2V617F mutation.
Taken together, these findings suggested the possibility of a disease continuum between
JAK2V617F-positive ET and PV, with several factors (such as gender, EPO homeostasis and
iron levels) contributing to whether the disease manifests in a given individual as ET or as
PV. The authors of this study further suggested that individual genetic variation and JAK2
mutation homozygosity might contribute to the final disease phenotype. Involvement of the
latter was based upon observations that up to 30% of patients with PV were apparently
homozygous for the JAK2V617F mutation [3-6]: analysis of a multitude of single nucleotide
polymorphisms (SNPs) spanning chromosome 9p revealed the presence of regions of LOH, a
phenomenon that until then was associated with loss-of-function mutations in tumor
suppressor genes. LOH was the result of mitotic recombination, in which the mutated JAK2
allele is duplicated following an illegitimate crossover event during mitosis (Figure 3).
However, the true extent of this phenomenon was only recognized when JAK2 genotyping
was applied at the single cell level (as opposed to a sample of the entire granulocyte
population). Circulating hematopoietic progenitor cells from MPN patients were propagated
in vitro using a semi-solid support (methylcellulose), with the resulting colonies each arising
from a single seeded progenitor. All erythroid colonies cultured from 17 JAK2V617F-positive
ET patients (n = 684) were heterozygous for this mutation [40]. In contrast, in a similarly
sized cohort of PV patients in whom JAK2V617F homozygosity was not evident from
granulocyte DNA sequencing, each person had a subset of erythroid progenitors that were
mutation-homozygous. These were detected in all five individuals analyzed within three

110 Linda M. Scott
months of diagnosis, suggesting that the loss of the wildtype JAK2 allele occurs at an early
disease stage. Mutation-homozygous colonies were also detected in the two cases diagnosed
with ET that had transformed to PV; unfortunately, mononuclear cells that were sampled
before polycythemic transformation were not available for testing. However, taken together,
the data suggest that the acquisition of an additional mutant JAK2 allele by mitotic
recombination is a significant determinant in the establishment of a polycythemic phenotype.
Figure 3. JAK2V617F-homozygosity arises from illegitimate recombination between chromosome pairs

during mitosis. The analysis of single nucleotide polymorphisms (SNPs) in patients with PV has
revealed the presence of regions of loss-of-heterozygosity (LOH) spanning chromosome 9p. This was
the result of a recombination event, in which the mutated JAK2 allele (indicated by a *) is duplicated
following an illegitimate crossover event between duplicated chromosomes 9. One of the resulting
daughter cells would appear JAK2-wildtype, having no mutated JAK2 allele but carrying a region of
LOH on chromosome 9p, The other daughter cell would instead be JAK2V617F-homozygous. The
predominance over time of mutation-homozygous cells in patients with PV suggests that two copies of
the mutant JAK2 allele might provide an in vivo competitive advantage to affected cells over those with
only one copy.
PV Patients May Acquire Additional Mutations that Enhance JAK/STAT

Activation
Genotyping of granulocyte DNA for the JAK2V617F mutation and a SNP present in
JAK2 intron 14 revealed that this mutation had been acquired on more than one occasion in
three of the 109 informative MPN cases studied, a frequency significantly higher than
expected given the incidence of JAK2V617F acquisition within the general population [49].
This observation suggested that JAK2 point mutations might be acquired on multiple
occasions in a subset of patients with PV. Indeed, we identified a patient who acquired a
second JAK2 mutation five years after presenting with JAK2V617F-positive PV (unpublished
data); this resulted in a compound JAK2V615LV617F mutant allele reminiscent of several

other reported compound JAK2 mutations of unknown functional consequence [50]. Cells
carrying the JAK2V615LV617F mutant allele outcompeted both JAK2-wildtype and
JAK2V617F-positive cells in the patient and in competitive reconstitution assays in
transplanted mice. Colony analysis revealed a pattern of mutation accumulation in the patient
that was consistent with at least three separate genetic events: acquisition of the JAK2V617F
mutation in a normal hematopoietic stem cell, the addition of a JAK2V615L mutation to the
JAK2V617F allele, and the generation of JAK2V615LV617F-homozygous cells through
mitotic recombination.
A second patient with PV was instead found to have a compound JAK2V617FC618R
mutation; hematopoietic colony genotyping showed that a few colonies contained only the
JAK2V617F mutation, consistent with secondary acquisition of a JAK2C618R within the
mutant allele. Intriguingly, all other mutation-positive colonies were JAK2V617FC618R-
heterozygous. As microsatellite and SNP analysis revealed no evidence of chromosome 9p
LOH, it appears that addition of this second substitution, without duplication of the mutated
JAK2 allele, was sufficient to result in a polycythemic phenotype.
On occasion, inactivating SH2B3 mutations have been detected in patients with ET or MF
that lacked a JAK2V617F mutation [51] and in two patients with “erythrocytosis” [52],
although no association between SH2B3 mutations and PV has been reported. This gene
encodes SH2B3 (also referred to as LNK), which negatively regulates JAK2 activation. These
mutations map to the pleckstrin-homology (PH) and SH2 domains of SH2B3, cause in amino
acid substitutions or protein truncations, and result in a mild loss of wildtype LNK function. It
is unclear whether they are disease-initiating mutations, or if they are similar to the
JAK2V615L and JAK2C618R mutations and further enhance the signaling driven by a
disease-initiating JAK2, MPL or CALR mutation. The second scenario is certainly a factor in
the pathogenesis of some lymphoma subtypes, where an individual patient might have, for
example, amplification of a non-mutated JAK2 locus, in addition to an inactivating SOCS1
mutation and an activating STAT mutation [53]. Indeed, SH2B3 has been shown to attenuate
signaling induced by the expression of MPLW515L or JAK2V617F in vitro [54, 55], and
accelerate the development of an MPN phenotype in mice expressing mutant JAK2 [56].
Secondary Mutations that Target Epigenomic Regulators
In the last six years, inactivating mutations in genes that encode proteins with chromatin-
modifying activity have been identified in subsets of patients with an MPN. Several of these
proteins are involved in the regulation of gene expression via DNA hydroxymethylation or
methylation (such as TET2, IDH1, IDH2 and DNMT3A), whereas others are involved in
modifying (predominantly methylating) chromatin-associated proteins (ASXL1 and EZH2). In
cases of MF, ASXL1 and EZH2 mutations are associated with a poor prognosis [57, 58]. In
contrast to the disease-initiating mutations affecting JAK2, MPL or CALR, these mutations
can be detected in patients with other myeloid malignancies, including myelodysplasia
(MDS), MDS/MPN overlap syndromes and de novo AML. These mutations are also found in
a minority of patients with PV, with frequencies ranging from 2% to 20% of cases (Table 1).
In general, the mutation frequency in the MPNs increases with increasing disease severity;
that is, they are the least frequent in ET and most frequent in MF. The most frequent
epigenetic regulator mutation occuring in patients with PV targets TET2, one of a three-

112 Linda M. Scott
member family of proteins that belong to the oxyglutarate and Fe(II)-dependent di-oxygenase
superfamily.
Table 1. Incidence of key “secondary” mutations in the MPNs
Gene Protein function MPN Mutation frequency (%) references

TET2 DNA hydroxy-methylation ET 5 [77]
PV 16
MF 16
IDH1/2 DNA hydroxy-methylation ET ~1 [78]
PV ~2
MF ~4
DNMT3A DNA methylation ET 0 [79, 80]
PV 5-7
MF 10-15
EZH2 chromatin methylation ET 0 [81]
PV 3
MF 13
ASXL1 chromatin methylation ET 4 [82]
PV 7
MF 20
SRSF2 transcript splicing ET ND [64]
PV ND
MF 17
U2AF1 transcript splicing ET ND [83]
PV ND
MF 16
SF3B1 transcript splicing ET 3 [84]
PV 0
MF 4
ND, not determined.
The functional consequences of these secondary mutations is not entirely clear, although
recent studies have shed considerable insight into the role that TET2 loss might play in MPN
biology. When TET2 mutations were first identified in 2009, the function of the TET family
members was unknown. Shortly thereafter, however, iterative sequence profile searches
demonstrated that these proteins contain highly conserved domains with homology to the
trypanosome JBP1 and JBP2 proteins [59]. These enzymes modify thymine to form “base J,”
found in genes silenced following viral infection, suggesting that the TETs might alter gene
expression by epigenetically modifying DNA. Consistent with this scenario, the in vitro
expression of TET1 halved levels of 5’-methylcytosine (5mC), which is essential in higher
eukaryotes for genomic imprinting, X-inactivation, and gene regulation. Furthermore, the
presence of 5’-hydroxymethylcytosine (5hmC), a component of normal mammalian DNA,
was detected only in cells transfected with full-length TET1. These findings together
suggested that TET proteins contribute to transcriptional activation by oxidizing 5mC,
thereby preventing methyl-CpG binding proteins and DNA methyltransferases binding to
CpG islands.

Important clues to the role that TET2 haploinsufficiency may be playing in the MPNs and
other myeloid malignancies were provided by gene knockout studies in mice [60]. Loss of
Tet2 expression in hematopoietic cells did not result in any major phenotypic perturbation
when analyzed 4-6 weeks after gene deletion, although mice had splenomegaly arising from
extramedullary hematopoiesis. Subsequent analysis, however, revealed a progressive myeloid
expansion in these animals, resulting in a phenotype that was reminiscent of CMML in
humans. Tet2-deficient mice also had increased numbers of hematopoietic stem cells and
committed myeloid progenitors. In competitive reconstitution assays, the Tet2-null stem cells
out-competed their wildtype counterparts, suggesting that loss of Tet2 activity might confer
an in vivo proliferative advantage to affected cells. This hypothesis was supported by
investigations into the causes of clonal hematopoiesis seen in healthy elderly women [61], in
which 6% of those with age-associated skewing had an acquired TET2 mutation. Strikingly,
there were no significant differences in the hematologic parameters of those patients with a
TET2 mutation and of those patients without.
Recent studies in genetically engineered mice have shown that loss of Tet2 expression in
hematopoietic cells expressing Jak2V617F augments the MPN phenotype present in these
animals [62]. Specifically, there was a marked increase in spleen size in Jak2V617F-positive
Tet2-null mice (in comparison to Jak2V617F-positive mice), and in the white cell counts in
the peripheral blood. Although hematocrit and platelet numbers were not impacted, signs of
dysplasia (including abnormal shape and pseudopod formation) were noted in circulating
platelets. Furthermore, megakaryopoiesis within the bone marrow was affected: there were
increased numbers of megakaryocytes, and these displayed greater heterogeneity in size and
atypical morphological features. Finally, assessments of hematopoietic stem cell function
revealed that the combination of a Jak2V617F and a Tet2-inactivating mutation increased the
in vitro serial replating capacity of cells compared to that of Tet2-null cells, and enhanced the
competitive repopulating activity in vivo. Since myelofibrotic and leukemic transformation
was not observed in these animals, it appears that loss of Tet2 activity impacts predominantly
on clonal dominance, and has little or no effect on overt disease progression.
Secondary Mutations Affecting the Spliceosome Occur Infrequently in PV
A proportion of MPN patients may carry mutations that target constituents of the spliceo-
some, a complex that consists of small nuclear ribonucleoproteins (snRNPs) and 100-300
associated proteins and that is involved in the splicing of primary transcripts [63]. The most
commonly mutated proteins within this group include the splicing factors, U2AF1, SF3B1
and SRSF2. As with the mutations that target chromatin-modifiers, the spliceosome mutations
are not specific to the MPNs; in fact, they occur most frequently in patients with MDS or
CMML. In contrast, they are only rarely detected in ET and PV, and are present at low-to-
moderate frequencies in MF (Table 1).
The mutually exclusive occurence of the spliceosome mutations suggested that they may
similarly affect the splicing of transcripts that encode key regulators of hematopoiesis.
However, it is also clear that different mutations can have markedly different consequences –
for example, mutations in SRSF2, but not in SF3B1 or U2AF1, are associated with reduced
survival in patients with MF [58, 64]. Currently, the specific effects of these mutations are not

114 Linda M. Scott
well understood, although recent studies have shown that the P95H substitution in SRSF2
impairs hematopoietic differentiation by altering its sequence-specific binding properties [65-
67]. This alters the patterns of exon inclusion or exclusion in a subset of transcripts, including
those encoding EZH2, a chromatin-modifying enzyme that is also mutated in some patients
with an MPN (Table 1).
Drivers of Leukemic Transformation in Patients with an MPN
Patients with an MPN are at significant risk of experiencing disease transformation to an

AML that is associated with an adverse clinical outcome and standard therapies. Leukemic
transformation occurs in approximately 1% of ET patients over a 10-year period, and in 4%
and 20% of PV and MF patients, respectively [68]. Surprisingly, only about 50% of patients
with a JAK2- or MPL-mutated MPN progress to a JAK2- or MPL-mutated AML; leukemic
blasts in the remaining cases are negative for either mutation [69, 70]. Different pathogenetic
mechanisms appear to underlie transformation in these two patient subgroups [71]. Patients
with a JAK2-mutated AML invariably had a myelofibrotic disease phase, whereas this was
uncommon in patients with a JAK2-wildtype AML. In this subgroup, molecular studies have
shown that loss of the mutant JAK2 allele did not result from mitotic recombination or gene
conversion or deletion, raising the intriguing possibility that transformation in some instances
may be non-cell-autonomous.
Mutations that are commonly observed in de novo AML, such as those in NPM1, FLT3,
RAS, CEBPA, and RUNX1, are mostly absent from AMLs that arise secondary to an MPN
(sAML). In direct contrast, SRSF2 mutations are present in almost 20% of sAML cases, but
only in 6% of de novo cases, are associated with a significantly worse overall survival [72],
and most commonly occur in cases of JAK2-wildtype sAML [73]. Mutations in some
epigenetic regulators (including ASXL1, IDH1 and TET2) occur even more frequently in
sAML than they do in ET, PV or MF. An analysis of paired DNA samples taken from MPN
patients pre- and post-transformation has revealed that TET2 mutations may be acquired at
the time of leukemic transformation [74]. Skoda and colleagues have shown that the presence
of two or more somatic mutations in an MPN patient significantly increased the individual’s
risk of leukemic transformation [75]; this occurred in 25% of patients with ET, and in 36% of
patients with PV or MF.
In particular, presence of a TP53 mutation was strongly associated with transformation to
AML. TP53 mutations occur in a quarter of all patients with AML secondary to an MPN,
whereas they are detectable in only 3% of patients with an MPN [76], and frequently co-exist
with a JAK2V617F mutation, but not a CALR mutation [73]. Hematopoietic colony analysis
using serial blood samples from four patients with sAML and a TP53 mutation revealed that
these mutations were typically present in a heterozygous state for an extended period during
the chronic (MPN) disease phase [75]. However, loss of the wildtype TP53 allele via intra-
chromosomal deletion or mitotic recombination led to the rapid expansion of a homozygous-
or hemizygous-mutant sub-clone, eventuating in leukemic transformation. Subsequent studies
in genetically engineered mice have shown that the absence of p53 expression in
hematopoietic cells expressing Jak2V617F causes an overt leukemia to develop [73]. Mice
transplanted with Jak2V617F-positive p53-null cells were characterized by an increased

hematocrit and increased white cell and platelet counts compared to controls, with blasts
detected within the peripheral blood and marrow. These animals also had marked
splenomegaly and hepatomegaly as a result of infiltration by leukemic blasts. All recipients of
Jak2V617F-positive p53-null donor cells died within 100 days of transplantation. As a
consequence of these studies, as well as those in humans, the presence of a TP53 mutation in
MPN patients should be considered a risk factor for subsequent leukemic transformation.
CONCLUSION
Our understanding of the molecular pathogenesis of PV has increased dramatically in the
last decade, with the disease-initiating mutations in greater than 99% of cases now having
been identified. Studies into the mutation accumulation patterns and clonal evolution of the
MPNs have revealed the direct relationship between JAK2V617F-positive ET and PV, and
have led to the discovery of additional mutations that affect epigenetic regulators and
constituents of the splicing machinery. These secondary mutations appear to provide cells
with a selective advantage in vivo, and current research efforts are focused on more precisely
determining their functional consequences.
REFERENCES
[1] Kralovics R, Guan Y, Prchal JT. Acquired uniparental disomy of chromosome 9p is a
frequent stem cell defect in polycythemia vera. Experimental Hematology. 2002;30(3):
229-36.
[2] Ugo V, Marzac C, Teyssandier I, Larbret F, Lecluse Y, Debili N, et al. Multiple
signaling pathways are involved in erythropoietin-independent differentiation of
erythroid progenitors in polycythemia vera. Experimental Hematology. 2004;32(2):179-
87.
[3] James C, Ugo V, Le Couedic JP, Staerk J, Delhommeau F, Lacout C, et al. A unique
clonal JAK2 mutation leading to constitutive signalling causes polycythaemia vera.
Nature. 2005;434(7037):1144-8.
[4] Kralovics R, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, et al. A gain-of-
function mutation of JAK2 in myeloproliferative disorders. The New England Journal
of Medicine. 2005;352(17):1779-90.
[5] Baxter EJ, Scott LM, Campbell PJ, East C, Fourouclas N, Swanton S, et al. Acquired
mutation of the tyrosine kinase JAK2 in human myeloproliferative disorders. Lancet.
2005;365(9464):1054-61.
[6] Levine RL, Wadleigh M, Cools J, Ebert BL, Wernig G, Huntly BJ, et al. Activating
mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia,
and myeloid metaplasia with myelofibrosis. Cancer Cell. 2005;7(4):387-97.
[7] Longley BJ, Tyrrell L, Lu SZ, Ma YS, Langley K, Ding TG, et al. Somatic c-KIT
activating mutation in urticaria pigmentosa and aggressive mastocytosis: establishment
of clonality in a human mast cell neoplasm. Nature Genetics. 1996;12(3):312-4.

116 Linda M. Scott
[8] Gotlib J, Cools J, Malone JM, 3rd, Schrier SL, Gilliland DG, Coutre SE. The FIP1L1-
PDGFRalpha fusion tyrosine kinase in hypereosinophilic syndrome and chronic
eosinophilic leukemia: implications for diagnosis, classification, and management.
Blood. 2004;103(8):2879-91.
[9] Campbell PJ, Scott LM, Baxter EJ, Bench AJ, Green AR, Erber WN. Methods for the
detection of the JAK2 V617F mutation in human myeloproliferative disorders. Methods
in Molecular Medicine. 2006;125:253-64.
[10] Scott LM, Campbell PJ, Baxter EJ, Todd T, Stephens P, Edkins S, et al. The V617F
JAK2 mutation is uncommon in cancers and in myeloid malignancies other than the
classic myeloproliferative disorders. Blood. 2005;106(8):2920-1.
[11] Lipson D, Capelletti M, Yelensky R, Otto G, Parker A, Jarosz M, et al. Identification of
new ALK and RET gene fusions from colorectal and lung cancer biopsies. Nature
Medicine. 2012;18(3):382-4.
[12] Steensma DP, Dewald GW, Lasho TL, Powell HL, McClure RF, Levine RL, et al. The
JAK2 V617F activating tyrosine kinase mutation is an infrequent event in both
"atypical" myeloproliferative disorders and myelodysplastic syndromes. Blood. 2005;
106(4):1207-9.
[13] Levine RL, Loriaux M, Huntly BJ, Loh ML, Beran M, Stoffregen E, et al. The
JAK2V617F activating mutation occurs in chronic myelomonocytic leukemia and acute
myeloid leukemia, but not in acute lymphoblastic leukemia or chronic lymphocytic
leukemia. Blood. 2005;106(10):3377-9.
[14] Jelinek J, Oki Y, Gharibyan V, Bueso-Ramos C, Prchal JT, Verstovsek S, et al. JAK2
mutation 1849G>T is rare in acute leukemias but can be found in CMML, Philadelphia
chromosome-negative CML, and megakaryocytic leukemia. Blood. 2005;106(10):3370-
3.
[15] Frohling S, Lipka DB, Kayser S, Scholl C, Schlenk RF, Dohner H, et al. Rare
occurrence of the JAK2 V617F mutation in AML subtypes M5, M6, and M7. Blood.
2006;107(3): 1242-3.
[16] Busche G, Hussein K, Bock O, Kreipe H. Insights into JAK2-V617F mutation in CML.
The Lancet Oncology. 2007;8(10):863-4.
[17] Bocchia M, Vannucchi AM, Gozzetti A, Guglielmelli P, Poli G, Crupi R, et al. Insights
into JAK2-V617F mutation in CML. The Lancet Oncology. 2007;8(10):864-6.
[18] Verstovsek S, Silver RT, Cross NC, Tefferi A. JAK2V617F mutational frequency in
polycythemia vera: 100%, >90%, less? Leukemia. 2006;20(11):2067.
[19] Scott LM, Tong W, Levine RL, Scott MA, Beer PA, Stratton MR, et al. JAK2 exon 12
mutations in polycythemia vera and idiopathic erythrocytosis. The New England
Journal of Medicine. 2007;356(5):459-68.
[20] Scott LM. The JAK2 exon 12 mutations: a comprehensive review. American Journal of
Hematology. 2011;86(8):668-76.
[21] Passamonti F, Elena C, Schnittger S, Skoda RC, Green AR, Girodon F, et al. Molecular
and clinical features of the myeloproliferative neoplasm associated with JAK2 exon 12
mutations. Blood. 2011;117(10):2813-6.
[22] Percy MJ, Scott LM, Erber WN, Harrison CN, Reilly JT, Jones FG, et al. The
frequency of JAK2 exon 12 mutations in idiopathic erythrocytosis patients with low
serum erythropoietin levels. Haematologica. 2007;92(12):1607-14.

[23] Lakey MA, Pardanani A, Hoyer JD, Nguyen PL, Lasho TL, Tefferi A, et al. Bone
marrow morphologic features in polycythemia vera with JAK2 exon 12 mutations.
American Journal of Clinical Pathology. 2010;133(6):942-8.
[24] Vardiman JW, Harris NL, Brunning RD. The World Health Organization (WHO)
classification of the myeloid neoplasms. Blood. 2002;100(7):2292-302.
[25] Pearson TC. Evaluation of diagnostic criteria in polycythemia vera. Seminars in
Hematology. 2001;38(1 Suppl 2):21-4.
[26] Scott LM, Beer PA, Bench AJ, Erber WN, Green AR. Prevalance of JAK2 V617F and
exon 12 mutations in polycythaemia vera. British Journal of Haematology.
2007;139(3): 511-2.
[27] Butcher CM, Hahn U, To LB, Gecz J, Wilkins EJ, Scott HS, et al. Two novel JAK2
exon 12 mutations in JAK2V617F-negative polycythaemia vera patients. Leukemia.
2008; 22(4):870-3.
[28] Wang YL, Vandris K, Jones A, Cross NC, Christos P, Adriano F, et al. JAK2 Mutations
are present in all cases of polycythemia vera. Leukemia. 2008;22(6):1289.
[29] Yeh YM, Chen YL, Cheng HY, Su WC, Chow NH, Chen TY, et al. High percentage of
JAK2 exon 12 mutation in Asian patients with polycythemia vera. American Journal of
Clinical Pathology. 2010;134(2):266-70.
[30] Pardanani A, Lasho TL, Finke C, Hanson CA, Tefferi A. Prevalence and
clinicopathologic correlates of JAK2 exon 12 mutations in JAK2V617F-negative
polycythemia vera. Leukemia. 2007;21(9):1960-3.
[31] Passamonti F, Rumi E, Pietra D, Elena C, Boveri E, Arcaini L, et al. A prospective
study of 338 patients with polycythemia vera: the impact of JAK2 (V617F) allele
burden and leukocytosis on fibrotic or leukemic disease transformation and vascular
complications. Leukemia. 2010;24(9):1574-9.
[32] Klampfl T, Gisslinger H, Harutyunyan AS, Nivarthi H, Rumi E, Milosevic JD, et al.
Somatic mutations of calreticulin in myeloproliferative neoplasms. The New England
[33] Nangalia J, Massie CE, Baxter EJ, Nice FL, Gundem G, Wedge DC, et al. Somatic
CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. The New
England Journal of Medicine. 2013;369(25):2391-405.
[34] Saharinen P, Takaluoma K, Silvennoinen O. Regulation of the Jak2 tyrosine kinase by
its pseudokinase domain. Molecular and Cellular Biology. 2000;20(10):3387-95.
[35] Saharinen P, Silvennoinen O. The pseudokinase domain is required for suppression of
basal activity of Jak2 and Jak3 tyrosine kinases and for cytokine-inducible activation of
signal transduction. The Journal of Biological Chemistry. 2002;277(49):47954-63.
[36] Ungureanu D, Wu J, Pekkala T, Niranjan Y, Young C, Jensen ON, et al. The pseudo-
kinase domain of JAK2 is a dual-specificity protein kinase that negatively regulates
cytokine signaling. Nature Structural and Molecular Biology. 2011;18(9):971-6.
[37] Bandaranayake RM, Ungureanu D, Shan Y, Shaw DE, Silvennoinen O, Hubbard SR.
Crystal structures of the JAK2 pseudokinase domain and the pathogenic mutant V617F.
Nature Structural and Molecular Biology. 2012;19(8):754-9.
[38] Brooks AJ, Dai W, O'Mara ML, Abankwa D, Chhabra Y, Pelekanos RA, et al.
Mechanism of activation of protein kinase JAK2 by the growth hormone receptor.
Science. 2014;344(6185):1249783.

118 Linda M. Scott
[39] Lindauer K, Loerting T, Liedl KR, Kroemer RT. Prediction of the structure of human
Janus kinase 2 (JAK2) comprising the two carboxy-terminal domains reveals a
mechanism for autoregulation. Protein Engineering. 2001;14(1):27-37.
[40] Scott LM, Scott MA, Campbell PJ, Green AR. Progenitors homozygous for the V617F
mutation occur in most patients with polycythemia vera, but not essential
thrombocythemia. Blood. 2006;108(7):2435-7.
[41] Shi S, Calhoun HC, Xia F, Li J, Le L, Li WX. JAK signaling globally counteracts
heterochromatic gene silencing. Nature Genetics. 2006;38(9):1071-6.
[42] Dawson MA, Bannister AJ, Gottgens B, Foster SD, Bartke T, Green AR, et al. JAK2
phosphorylates histone H3Y41 and excludes HP1alpha from chromatin. Nature. 2009;
461(7265):819-22.
[43] Plo I, Nakadake M, Wiesmuller L, Giraudier S, Villeval J, Vainchenker W. JAK2
activation stimulates homologous recombination and genomic instability. Blood. 2007;
112:1402-12.
[44] Zhao R, Follows GA, Beer PA, Scott LM, Huntly BJ, Green AR, et al. Inhibition of the
Bcl-xL deamidation pathway in myeloproliferative disorders. The New England
[45] Scott LM, Rebel VI. JAK2 and genomic instability in the myeloproliferative
neoplasms: a case of the chicken or the egg? American Journal of Hematology.
2012;87(11):1028-36.
[46] Vainchenker W, Constantinescu SN. A unique activating mutation in JAK2 (V617F) is
at the origin of polycythemia vera and allows a new classification of myeloproliferative
diseases. Hematology / the Education Program of the American Society of Hematology
2005:195-200.
[47] Campbell PJ, Scott LM, Buck G, Wheatley K, East CL, Marsden JT, et al. Definition of
subtypes of essential thrombocythaemia and relation to polycythaemia vera based on
JAK2 V617F mutation status: a prospective study. Lancet. 2005;366(9501):1945-53.
[48] Harrison CN, Campbell PJ, Buck G, Wheatley K, East CL, Bareford D, et al.
Hydroxyurea compared with anagrelide in high-risk essential thrombocythemia. The
The New England Journal of Medicine. 2005;353(1):33-45.
[49] Olcaydu D, Harutyunyan A, Jager R, Berg T, Gisslinger B, Pabinger I, et al. A common
JAK2 haplotype confers susceptibility to myeloproliferative neoplasms. Nature
Genetics. 2009;41(4):450-4.
[50] Cleyrat C, Jelinek J, Girodon F, Boissinot M, Ponge T, Harousseau JL, et al. JAK2
mutation and disease phenotype: a double L611V/V617F in cis mutation of JAK2 is
associated with isolated erythrocytosis and increased activation of AKT and ERK1/2
rather than STAT5. Leukemia. 2010;24(5):1069-73.
[51] Oh ST, Simonds EF, Jones C, Hale MB, Goltsev Y, Gibbs KD, Jr., et al. Novel
mutations in the inhibitory adaptor protein LNK drive JAK-STAT signaling in patients
with myeloproliferative neoplasms. Blood. 2010;116(6):988-92.
[52] Lasho TL, Pardanani A, Tefferi A. LNK mutations in JAK2 mutation-negative
erythrocytosis. The New England Journal of Medicine. 2010;363(12):1189-90.
[53] Scott LM, Gandhi MK. Deregulated JAK/STAT signalling in lymphomagenesis, and its
implications for the development of new targeted therapies. Blood Reviews. 2015,
doi:10.1016/j.bire.2015.06.002.

[54] Koren-Michowitz M, Gery S, Tabayashi T, Lin D, Alvarez R, Nagler A, et al. SH2B3

(LNK) mutations from myeloproliferative neoplasms patients have mild loss of
function against wild type JAK2 and JAK2 V617F. British Journal of Haematology.
2013;161(6): 811-20.
[55] Gery S, Gueller S, Chumakova K, Kawamata N, Liu L, Koeffler HP. Adaptor protein
Lnk negatively regulates the mutant MPL, MPLW515L associated with myeloprolifera-
tive disorders. Blood. 2007;110(9):3360-4.
[56] Bersenev A, Wu C, Balcerek J, Jing J, Kundu M, Blobel GA, et al. Lnk constrains
myeloproliferative diseases in mice. The Journal of Clinical Investigation. 2010;120(6):
2058-69.
[57] Guglielmelli P, Biamonte F, Score J, Hidalgo-Curtis C, Cervantes F, Maffioli M, et al.
EZH2 mutational status predicts poor survival in myelofibrosis. Blood. 2011;
118(19):5227-34.
[58] Vannucchi AM, Lasho TL, Guglielmelli P, Biamonte F, Pardanani A, Pereira A, et al.
Mutations and prognosis in primary myelofibrosis. Leukemia. 2013;27(9):1861-9.
[59] Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, et al. Conversion
of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner
TET1. Science. 2009;324(5929):930-5.
[60] Moran-Crusio K, Reavie L, Shih A, Abdel-Wahab O, Ndiaye-Lobry D, Lobry C, et al.
Tet2 loss leads to increased hematopoietic stem cell self-renewal and myeloid
transformation. Cancer Cell. 2011;20(1):11-24.
[61] Busque L, Patel JP, Figueroa ME, Vasanthakumar A, Provost S, Hamilou Z, et al.
Recurrent somatic TET2 mutations in normal elderly individuals with clonal
hematopoiesis. Nature Genetics. 2012;44(11):1179-81.
[62] Chen E, Schneider RK, Breyfogle LJ, Rosen EA, Poveromo L, Elf S, et al. Distinct
effects of concomitant Jak2V617F expression and Tet2 loss in mice promote disease
progression in myeloproliferative neoplasms. Blood. 2015;125(2):327-35.
[63] Scott LM, Rebel VI. Acquired mutations that affect pre-mRNA splicing in hematologic
malignancies and solid tumors. Journal of the National Cancer Institute. 2013;
105(20):1540-9.
[64] Lasho TL, Finke CM, Hanson CA, Jimma T, Knudson RA, Ketterling RP, et al. SF3B1
mutations in primary myelofibrosis: clinical, histopathology and genetic correlates
among 155 patients. Leukemia. 2012;26(5):1135-7.
[65] Kim E, Ilagan JO, Liang Y, Daubner GM, Lee SC, Ramakrishnan A, et al. SRSF2
mutations contribute to myelodysplasia by mutant-specific effects on exon recognition.
Cancer Cell. 2015;27(5):617-30.
[66] Komeno Y, Huang YJ, Qiu J, Lin L, Xu Y, Zhou Y, et al. SRSF2 is essential for
hematopoiesis, and its myelodysplastic syndrome-related mutations dysregulate
alternative pre-mRNA splicing. Molecular and Cellular Biology. 2015;35(17):3071-82.
[67] Zhang J, Lieu YK, Ali AM, Penson A, Reggio KS, Rabadan R, et al. Disease-associated
mutation in SRSF2 misregulates splicing by altering RNA-binding affinities.
Proceedings of the National Academy of Sciences of the United States of America.
2015; 112(34):E4726-34.
[68] Cervantes F, Tassies D, Salgado C, Rovira M, Pereira A, Rozman C. Acute
transformation in nonleukemic chronic myeloproliferative disorders: actuarial

120 Linda M. Scott
probability and main characteristics in a series of 218 patients. Acta Haematologica.

1991; 85(3):124-7.
[69] Campbell PJ, Baxter EJ, Beer PA, Scott LM, Bench AJ, Huntly BJ, et al. Mutation of
JAK2 in the myeloproliferative disorders: timing, clonality studies, cytogenetic
associations, and role in leukemic transformation. Blood. 2006;108(10):3548-55.
[70] Theocharides A, Boissinot M, Girodon F, Garand R, Teo SS, Lippert E, et al. Leukemic
blasts in transformed JAK2-V617F-positive myeloproliferative disorders are frequently
negative for the JAK2-V617F mutation. Blood. 2007;110(1):375-9.
[71] Beer PA, Delhommeau F, LeCouedic JP, Dawson MA, Chen E, Bareford D, et al. Two
routes to leukemic transformation after a JAK2 mutation-positive myeloproliferative
neoplasm. Blood. 2010;115(14):2891-900.
[72] Zhang SJ, Rampal R, Manshouri T, Patel J, Mensah N, Kayserian A, et al. Genetic
analysis of patients with leukemic transformation of myeloproliferative neoplasms
shows recurrent SRSF2 mutations that are associated with adverse outcome. Blood.
2012; 119(19):4480-5.
[73] Rampal R, Ahn J, Abdel-Wahab O, Nahas M, Wang K, Lipson D, et al. Genomic and
functional analysis of leukemic transformation of myeloproliferative neoplasms.
Proceedings of the National Academy of Sciences of the United States of America.
2014; 111(50):E5401-10.
[74] Abdel-Wahab O, Manshouri T, Patel J, Harris K, Yao J, Hedvat C, et al. Genetic
analysis of transforming events that convert chronic myeloproliferative neoplasms to
leukemias. Cancer Research. 2010;70(2):447-52.
[75] Lundberg P, Karow A, Nienhold R, Looser R, Hao-Shen H, Nissen I, et al. Clonal
evolution and clinical correlates of somatic mutations in myeloproliferative neoplasms.
Blood. 2014;123(14):2220-8.
[76] Harutyunyan A, Klampfl T, Cazzola M, Kralovics R. p53 lesions in leukemic
transformation. The New England Journal of Medicine. 2011;364(5):488-90.
[77] Tefferi A, Pardanani A, Lim KH, Abdel-Wahab O, Lasho TL, Patel J, et al. TET2
mutations and their clinical correlates in polycythemia vera, essential thrombocythemia
and myelofibrosis. Leukemia. 2009;23(5):905-11.
[78] Tefferi A, Lasho TL, Abdel-Wahab O, Guglielmelli P, Patel J, Caramazza D, et al.
IDH1 and IDH2 mutation studies in 1473 patients with chronic-, fibrotic- or blast-phase
essential thrombocythemia, polycythemia vera or myelofibrosis. Leukemia. 2010;24(7):
1302-9.
[79] Abdel-Wahab O, Pardanani A, Rampal R, Lasho TL, Levine RL, Tefferi A. DNMT3A
mutational analysis in primary myelofibrosis, chronic myelomonocytic leukemia and
advanced phases of myeloproliferative neoplasms. Leukemia. 2011;25(7):1219-20.
[80] Stegelmann F, Bullinger L, Schlenk RF, Paschka P, Griesshammer M, Blersch C, et al.
DNMT3A mutations in myeloproliferative neoplasms. Leukemia. 2011;25(7):1217-9.
[81] Ernst T, Chase AJ, Score J, Hidalgo-Curtis CE, Bryant C, Jones AV, et al. Inactivating
mutations of the histone methyltransferase gene EZH2 in myeloid disorders. Nature
Genetics. 2010;42(8):722-6.
[82] Brecqueville M, Rey J, Bertucci F, Coppin E, Finetti P, Carbuccia N, et al. Mutation
analysis of ASXL1, CBL, DNMT3A, IDH1, IDH2, JAK2, MPL, NF1, SF3B1, SUZ12,
and TET2 in myeloproliferative neoplasms. Genes, Chromosomes & Cancer. 2012;
51(8):743-55.

[83] Tefferi A, Finke CM, Lasho TL, Wassie EA, Knudson R, Ketterling RP, et al. U2AF1
mutations in primary myelofibrosis are strongly associated with anemia and
thrombocytopenia despite clustering with JAK2V617F and normal karyotype.
Leukemia. 2014;28(2):431-3.
[84] Papaemmanuil E, Cazzola M, Boultwood J, Malcovati L, Vyas P, Bowen D, et al.
Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts. The New England

Chapter 7
MICRORNAS IN DISEASE: RECENT ADVANCES AND

MOLECULAR BACKGROUND
Braoudaki Maria1,2, Tounta Georgia2,

Lykoudi Alexandra2, Kitsiou Tzeli Sofia2,
Kanavakis Emmanuel1, Papantoniou Nikolas3
and Kolialexi Aggeliki2,3,*
1
University Research Institute for the Study and Treatment of Childhood
Genetic and Malignant Diseases, National and Kapodistrian University of
Athens, “Aghia Sophia” Children’s Hospital, Athens, Greec
2
Department of Medical Genetics and
3
Department of Obstetrics & Gynecology, Athens University, Greece
ABSTRACT
Micro-RNAs (miRNAs) are a type of highly conserved, short non-coding RNAs,
playing key roles in the postracriptional repression of their mRNA targets. The
expression of these transcriptional regulators is critical for basic cellular mechanisms
including cell development, differentiation, proliferation, migration and apoptosis.
Exctracellular circulating miRNAs can be packed in exosomes, microvesicles, or
lipoprotein complexes involved in distant postrascriptional regulation. The detection of
these molecules can be facilitated by the use of high throughput technologies such as
NGS and microarrays. Recent studies in the field provide clear evidence that changes in
the expression profile of specific miRNAs might be critical for the pathogenesis of many
disorders. MiRNAs have been linked with human diseases such as cancer, cardiac
disorders and pregnancy related complications. Profiling patterns of miRNA expression
could facilitate the characterization of a range of diseases as early as the presymptomatic
stage. Thus, these molecules can be introduced as novel tools in stage-specific genetic
diagnosis of pathological conditions.
*
Correspondence: Aggeliki Kolialexi MD, PhD, Department of Medical Genetics, Athens University. Tel: 030
2107467462, email: akolial@med.uoa.gr.

124 Braoudaki Maria, Tounta Georgia, Lykoudi Alexandra et al.
PRESENTING MIRNAS
MicroRNAs are endogenous short non-coding RNAs of 19-24 nucleotides in length that
regulate post-transcriptional gene expression. Single-stranded miRNAs are predicted to
regulate 30% of all genes and to target sequences in the 3′ untranslated region (3′ UTR) of
genes. They recognize and bind specific target miRNAs by complete or partial sequence
homology to the 3′ untranslated region of target gernes to post-transcriptionally regulate gene
expression (Sampson et al., 2015). As a result, they impact vital cellular and physiological
processes including differentiation, proliferation, growth, stress response, apoptosis and
survival (Cowland et al., 2007; Singh et al., 2008; Mirnezami et al., 2009). It has become
increasingly evident that deregulation of miRNAs is implicated in a wide range of serious
human diseases such as cardiovascular disease, neurological disorders, immune-mediated
disorders, viral infections, diabetes and several types of cancer (Singh et al., 2008; Dorn,
2011; Duroux-Richard et al., 2011; Melo and Esteller, 2011; Natarajan et al., 2012). In
addition, during pregnancy, miRNAs are implicated in a wide range of processes, such as
preparation of endometrium for implantation, regulation of genes associated with immune
response, placental development and angiogenesis (Mouillet et al., 2015). Ample examples
are provided below.
MICRORNAS IN BREAST CANCER

MicroRNAs have been implicated in regulating hallmarks of breast cancer including cell
proliferation, cell death, apoptosis, immune response, cell cycle energetics, metabolism,
replicative immortality; senescence, invasion metastasis and angiogenesis (Muluhngwi and
Klinge, 2015). Several studies in breast cancer have identified alterations in the miRNA
expression. Iorio et al. (2005) suggested that miR-10b, miR-125b, miR145, miR-21 and miR-
155 are the most consistently deregulated miRNAs in breast cancer. Among them, miR-10b,
miR-125b and miR-145 were down-regulated, whilst miR-21 and miR-155 were up-
regulated, proposing that they may potentially act as tumor suppressor genes or oncogenes,
respectively. Likewise, Volinia et al. (2006) identified a number of miRNAs that were
overexpressed in breast tumors such as miR-21, miR-17-5p miR -29b-2, miR-146, miR-181b
and miR-155. In addition, Lehmann et al. (2008) reported deregulation of miR-9-1, miR-
124a3, miR-148, miR-152 and miR-663 in 34–86% of cases in a series of 71 primary human
breast cancer specimens, which correlated strongly with methylation of known tumour
suppressor genes. Little agreement among the individual miRNAs identified in the
aforementioned studies is observed, however this might reflect pathologically different
subsets of patients. Interestingly though, miR-21 and miR-155 were significantly
overexpressed in all three studies considered herein, demonstrating the existence of a
potential breast cancer–specific miRNA signature probably associated though with particular
clinicopathologic factors or tumour properties. More recently, Tang et al. (2015) also
identified miR-21 and suggested that it was associated with unfavorable survival rates in
breast cancer patients. Regarding miR-155, Bertoli et al. (2015) proposed that because it is
circulating in the body fluids, it could potentially afford a novel, easily accessible, non-
invasive tool for the personalized management of breast cancer patients.

MicroRNAs in Disease 125
Chemotherapy is a critical therapeutic strategy for breast cancer, while chemoresistance

remains a major obstacle to treatment success. The use of miRNAs as therapeutic targets to
overcome chemoresistance is currently under investigation (Wang et al., 2015). A previous
study focused on the expression of circulating miRNAs and their association to breast cancer
and clinical outcome, concluded that miR-125b expression was associated with
chemotherapeutic resistance in human breast cancer (Cortez et al., 2012). In the same context,
another work from Bao et al. (2012) studying the chemoresistant mechanisms regulated by
miRNAs, demonstrated that miR-298 was associated with the chemoresistant mechanisms
underlying metastatic breast cancer. More recent investigations have documented that
selected miRNAs, such as miR-200c and miR-34a, may influence response to chemotherapy
in several tumor types, including breast cancer (Wang et al., 2015). Subsequently, miRNAs
might contribute to the development of novel targeted therapeutic strategies overcoming
chemotherapeutic resistance.
MICRORNAS IN COLORECTAL CANCER

Colorectal cancer (CRC) is the third most common cancer and the second leading cause
of cancer death in the Western world. It is associated with inferior prognosis and increased
possibilities of tumor invasion and migration (Boni et al., 2010; Zhang et al., 2012). Several
groups have identified a number of miRNAs of major importance to CRC. According to
Bandrés et al. (2006) the most significantly overexpressed miRNAs in CRC included miR-31,
miR-96, miR-135b and miR-183, whereas miR-133b and miR-145 were significantly
downregulated. Of note, based on their findings, miR-31 expression levels were significantly
increased in higher tumor pathological stage. In a more recent study by Motoyama et al.
(2009), miRNA expression profiling demonstrated that miR-31, miR-183, miR-17-5p, miR-
92, miR-20a and miR-18a were significantly upregulated in CRC. Overexpression of miR-
18a was particularly associated with inferior prognosis. In the same study, miR-145 and miR-
143 were significantly suppressed in cancer tissues when compared to normal references. It
seems likely that miR-143 is frequently downregulated in CRC and is involved in the
regulation of the metastasis-associated in colon cancer-1 (MACC1) oncogene, suggesting a
potential functional role in CRC (Zheng et al., 2012). Additional studies have also identified
underexpression of miR-143 in CRC making it useful for CRC clinical diagnostics and
therapeutics (Borralho et al., 2011; Li et al., 2012).
MICRORNAS IN OTHER TYPES OF CANCER

Cancer is the leading cause of deaths worldwide. Better understanding of the disease is
warranted for improvement in clinical management. Growing evidence suggests that miRNAs
might play either oncogenic or tumor suppressor roles in the pathogenesis of diverse cancer
types. Apart from their emerging role in breast and CRC cancers as already mentioned, they
have also been implicated in several other cancer types including, osteosarcoma, prostate,
gastric and lung cancers, among others.

Several studies have demonstrated the involvement of miRNAs in the pathogenesis of

osteosarcoma (OA) with the potential for development in disease diagnostics and
therapeutics. Osteosarcoma is an aggressive bone cancer that affects children and adolescents
(Sampson et al., 2015). Patients with rare and inherited syndromes including Li-Fraumani
syndrome, hereditary retinoblastoma, Rothmund-Thompson syndrome, Bloom syndrome and
Werner syndrome have higher incidence of OS (Ottaviani et al., 2009). Several miRNA
profiling studies have shown differential expression of various distinct miRNAs in OS tumors
including miR-135b, miR-150, miR-542-5p and miR-652 (Lulla et al., 2011).
Altered miRNA regulation in involved in prostate cancer pathogenesis via the nodulation
of oncogenes and tumor suppressors that subsequently affect the downstream signaling
pathway, which is regulated through a wide range of related processes including deletion,
suppression, repression and epigenetic mechanism (Khanmi et al., 2015). More specifically,
they were found to target major oncogenes and tumor suppressors such as Bcl-2 and PTEN,
while they could also cause G0/G1 cell cycle arrest in prostate cancer (Majid et al., 2012;
Verdoodt et al., 2013; Wang et al., 2013). Previous studies proposed that miR-221 and miR-
222 might act as oncogenes contributing tumor growth and disease progression, through
downregulation of the p27 tumor suppressor protein (Galardi et al., 2007; Mercatelli et al.,
2008).
MicroRNAs can also afford effective candidate for molecular diagnostics in gastric
cancer (GC), in addition to altered expression of oncogenes and tumor suppressor genes
(Janjigian et al., 2014). According to Li et al. (2013) the plasma miR-199a-3p was shown to
serve as a putative biomarker for early detection and progression of GC. In addition, miR-630
has been repeatedly found elevated in lung, head and pancreatic cancers and reports revealed
that it could modulate chemosensitivity (Farhana et al., 2013; Kanda and Kodera, 2015).
Regarding lung cancer, oncogenic activity has been attributed to miR-708 by directly
down regulating TMEM88, a negative regulator of the Wnt signaling pathway in lung cancer
(Jang et al., 2012). Moreover, underexpression of the tumor suppressive let-7 and
overexpression of the oncogenic miR-17-92 have been suggested to play roles in cancer
development (Osada and Takahashi, 2011). Administration of let-7 miRNA prevented tumor
formation in a mouse model of non-small cell lung cancer (Kumar et al., 2008).
MICRORNAS IN CARDIAC DISEASES

MiRNAs expression is strongly controlled in a tissue-specific and developmental stage-
specific manner and some of them are highly and specifically expressed in cardiovascular
tissues. Cardiac miRNAs might play significant roles in heart development and function. In
several instances, it has been reported that in vascular cells, miRNAs have been linked to
vasculoproliferative conditions such as angiogenesis. Endothelial cell-specific miR-126 plays
a key role in the developmental formation of the vascular system. It is downregulated in
myocardial infarcts but upregulated in border areas. To date, a number of miRNAs are known
to affect angiogenesis including miR-126, miR-132, miR-296, and miR-92a, due to their
effects on tumor vascularity, however their suitability as potential therapeutic interventions
remains to be elucidated (Anand and Cheresh, 2011; Donnem et al., 2012). From the
aforementioned miRNAs, miR-132 has been shown to downregulate the expression of

p120RasGAP, which acts as an important negative regulator of the vascular development and
remodeling, leading to Ras activation (Anand et al., 2010; Anand and Cheresh, 2011). A wide
range of miRNAs are also regulated in cardiac hypertrophy, ischemia and/or heart failure. For
instance, miR-210 is upregulated in ischemic myocardium and has been associated with
decreased apoptosis of ischemic cells (Kim et al., 2009; Hu et al., 2010). In addition, in vitro
overexpression of miR-133 or miR-1 inhibited cardiac hypertrophy (Carè et al., 2007). Other
studies suggested that overexpression of miR-1 is involved in arrhythmias and might serve as
a potential antiarrythmic target (Yang et al., 2007).
MICRORNAS IN RHEUMATOID ARTHRITIS

Rheumatoid arthritis (RA) is a chronic inflammatory disease determined by an
inflammation of the synovial membrane leading to destruction of cartilage and bone (Di
Sabatino et al., 2011). The interaction between genetic and environmental factors can
contribute to RA occurrence. Recently, deregulation of miRNAs has been implicated in the
pathogenesis of RA. More specifically, the initial data providing a linkage between miRNAs
and the pathogenesis of RA were collected in 2007, when autoantibodies against GW/P
bodies which are associated with miRNA complexes were identified in the serum of patients
with RA (Churov et al., 2015). Subsequent studies were related to the analysis of miRNA
aberrant expression in patients diagnosed with RA (Chatzikyriakidou et al., 2009; Salehi et
al., 2015). The most significantly deregulated miRNAs identified include miR-16, miR-132,
miR-146a, miR-155 and miR-223. All have been found overexpressed in RA circulation
suggesting their potential diagnostic impact (Duroux-Richard et al., 2011; Filková et al.,
2011). In addition, according to Churov et al. (2015) circulating peripheral blood miRNAs,
especially miR-16, miR-21, miR-26a, miR-125a-5p, miR-125b, miR-126-3p, miR-223 and
miR-451, which are elevated in the plasma and serum, are considered to be the most
promising non-invasive biomarkers for the detection of RA.
MICRORNAS IN PREGNANCY RELATED COMPLICATIONS

The human placenta affords a rapidly evolving organ that harbors a rich and diverse
transcriptome. It has been previously suggested that it expresses a wide range of miRNA
genes, with a fraction of these being specific to trophoblasts (Luo et al., 2009). Several
miRNA gene clusters have been found to encode for placental miRNAs and work
synergistically (Lycoudi et al., 2015). Among them, one of the most intriguing families of
trophoblast-specific miRNAs, the C19MC cluster, which spans approximately 100kb and
contains 59 mature miRNA species a well as the large miR-379/miR-410 cluster (also called
C14MC cluster in humans) that spans 40 kb and comprises from 52 miRNA genes (Seitz et
al., 2004; Zhang et al., 2008; Bortolin-Cavaille et al., 2009; Mouillet et al., 2015).
Individual miRNAs also regulate trophoblast cell proliferation and placental survival
such as miR-141 and miR-182, which are capable of promoting proliferation or inhibiting
apoptosis (Morales-Prieto et al., 2011; Lycoudi et al., 2015). Individual placental miRNAs
inhibit trophoblast cell invasion through signaling molecules of Nodal/TGF-b pathway

(Nadeem et al., 2013; Xu et al., 2014). For instance, miR-378a-5p represses directly Nodal
and promotes trophoblast migration (Luo et al., 2012). In addition, several miRNAs are
involved in the process of angiogenesis, vessel remodeling and tube formation, including
miR-16, miR-20b and miR-29b which negatively regulate the VEGF pathway (Wang et al.,
2012; Lycoudi et al., 2015).
Aberrant expression of numerous miRNAs in pre-eclampsia (PE) placentas has also been
identified. Zhu et al. (2009) reported 11 upregulated and 23 downregulated miRNAs in severe
PE placentas compared to uncomplicated pregnancies (Zhu et al., 2007). Among these, miR-
210, had previously been reported by Pineles et al. (2007) as highly expressed in PE placentas
and was further confirmed by Enquobahrie et al. (2011), whereas miR-195 and miR-181a
were validated by Hu et al. (2009).
Additionally, differential expression of specific miRNAs has been associated with
preterm labor and delivery (Montenegro et al., 2009; Mayor-Lynn et al. 2011). Regarding
placenta-specific miRNAs, they were found suppressed following comparisons between
intrauterine growth retardation (IUGR) and normal fetuses including miRNAs in cluster
C19MC (Lycoudi et al., 2015). It is noteworthy, that C19MC cluster miRNAs and other
individual placenta-specific miRNAs were overexpressed in plasma obtained from pregnant
women when compared to non-pregnant (Kotlabova et al., 2011).
Several studies have been conducted comparing maternal plasma or serum miRNome
from women who developed severe PE, mild PE and healthy controls, to identify circulating
differentially expressed miRNAs between complicated and normal pregnancies (Lycoudi et
al., 2015). The most imperative miRNA identified in PE plasma samples was miR-210, which
has been extensively analyzed in placenta samples from PEs. According to Anton et al.
(2013) miR-210 affords a putative diagnostic marker for early prediction of PE as it has been
repeatedly found upregulated in women who eventually develop the disorder (Zhang et al.,
2012; Xu et al., 2014).
Remaining in the pregnancy related complications, differential miRNA expression has
also been associated with gestational diabetes mellitus (GMD). Quite a few studies including
those from Zhao et al. (2011) and Collares et al. (2013) have verified the aberrant expression
of certain miRNAs in GMD cases alone, or in pregnancies that eventually developed GMD,
respectively.
Despite the intense miRNA-based research efforts in the field of placental biology and
perinatal medicine, miRNA diagnostic and therapeutic biomarkers during pregnancy remain
unknown as several technical and conceptual challenges must be addressed before. However,
it is evident that these molecules hold great potential for the improvement of diagnosis and
management of placental conditions in the future.
CONCLUSION
The discovery of miRNAs and their widespread influence on gene expression has
advanced our understanding of gene regulatory networks and has shed some light into the
molecular mechanisms underlying the pathogenesis of a plethora of diseases. However,
further research efforts are necessary to determine the functional activity of already identified
miRNAs with unknown roles, as well as to develop personalized therapeutic interventions.

REFERENCES
Anand S, Majeti BK, Acevedo LM, Murphy EA, Mukthavaram R, Scheppke L, Huang M,
Shields DJ, Lindquist JN, Lapinski PE, King PD, Weis SM, Cheresh DA. MicroRNA-
132-mediated loss of p120RasGAP activates the endothelium to facilitate pathological
angiogenesis. Nat Med. 2010;16:909-914.
Anand S, Cheresh DA. MicroRNA-mediated regulation of the angiogenic switch. Curr Opin
Hematol. 2011;18:171-176.
Anton L, Olarerin-George AO, Schwartz N, Srinivas S, Bastek J, Hogenesch JB, Elovitz MA.
miR-210 inhibits trophoblast invasion and is a serum biomarker for preeclampsia. Am J
Pathol. 2013;183:1437-45.
Bandrés E, Cubedo E, Agirre X, Malumbres R, Zárate R, Ramirez N, Abajo A, Navarro A,
Moreno I, Monzó M, García-Foncillas J. Identification by Real-time PCR of 13 mature
microRNAs differentially expressed in colorectal cancer and non-tumoral tissues. Mol
Cancer. 2006;5:29.
Bao L, Hazari S, Mehra S, Kaushal D, Moroz K, Dash S. Increased Expression of P-
Glycoprotein and Doxorubicin Chemoresistance of Metastatic Breast Cancer Is
Regulated by miR-298. Am J Pathol. 2012. [Epub ahead of print].
Bertoli G, Cava C, Castiglioni I. MicroRNAs: New Biomarkers for Diagnosis, Prognosis,
Therapy Prediction and Therapeutic Tools for Breast Cancer. Theranostics. 2015;5:1122-
43.
Boni V, Zarate R, Villa JC, Bandre´s2, Gomez MA, Maiello E, Garcia-Foncillas J, Aranda E.
Role of primary miRNA polymorphic variants in metastatic colon cancer patients treated
with 5-fluorouracil and irinotecan. Pharmacogenomics J. 2011;11:429–436.
Borralho PM, Simões AE, Gomes SE, Lima RT, Carvalho T, Ferreira DM, Vasconcelos MH,
Castro RE, Rodrigues CM. miR-143 overexpression impairs growth of human colon
carcinoma xenografts in mice with induction of apoptosis and inhibition of proliferation.
PLoS One. 2011;6:e23787.
Bortolin-Cavaille ML, Dance M, Weber M, Cavaillé J. C19MC microRNAs are processed
from introns of large Pol-II, non-protein-coding transcripts. Nucleic Acids Res.
2009;37:3464-73.
Carè A, Catalucci D, Felicetti F, Bonci D, Addario A, Gallo P, Bang ML, Segnalini P, Gu Y,
Dalton ND, Elia L, Latronico MV, Høydal M, Autore C, Russo MA, Dorn GW 2nd,
Ellingsen O, Ruiz-Lozano P, Peterson KL, Croce CM, Peschle C, Condorelli G.
MicroRNA-133 controls cardiac hypertrophy. Nat Med. 2007;13:613-618.
Chatzikyriakidou A, Voulgari PV, Georgiou I, Drosos AA. miRNAs and related
polymorphisms in rheumatoid arthritis susceptibility. Autoimmun Rev. 2012;11:636-41.
Churov AV, Oleinik EK, Knip M. MicroRNAs in rheumatoid arthritis: Altered expression
and diagnostic potential. Autoimmun Rev. 2015;14:1029-37.
Collares CV, Evangelista AF, Xavier DJ, Rassi DM, Arns T, Foss-Freitas MC, Foss MC,
Puthier D, Sakamoto-Hojo ET, Passos GA, Donadi EA. Identifying common and specific
microRNAs expressed in peripheral blood mononuclear cell of type 1, type 2, and
gestational diabetes mellitus patients. BMC Res Notes. 2013;6:491.
Cortez MA, Welsh JW, Calin GA. Circulating MicroRNAs as Noninvasive Biomarkers in
Breast Cancer. Recent Results Cancer Res. 2012;195:151-61.

Cowland JB, Hother C, Grønbaek K. MicroRNAs and cancer. APMIS. 2007;115:1090-1106.

Di Sabatino A, Calarota SA, Vidali F, MacDonald TT, Corazza GR. Role of IL-15 in
immune-mediated and infectious diseases. Cytokine Growth Factor Rev. 2011;22:19–33.
Donnem T, Fenton CG, Lonvik K, Berg T, Eklo K, Andersen S, Stenvold H, Al-Shibli K, Al-
Saad S, Bremnes RM, Busund LT. MicroRNA signatures in tumor tissue related to
angiogenesis in non-small cell lung cancer. PLoS One. 2012;7:e29671.
Dorn GW 2nd. MicroRNAs in cardiac disease. Transl Res. 2011;157:226-35.
Duroux-Richard I, Presumey J, Courties G, Gay S, Gordeladze J, Jorgensen C, Kyburz D,
Apparailly F. MicroRNAs as new player in rheumatoid arthritis. Joint Bone Spine.
2011;78:17-22.
Enquobahrie DA, Abetew DF, Sorensen TK, et al. Placental microRNA expression in
pregnancies complicated by preeclampsia. Am J Obstet Gynecol. 2011;204:178 e12-21.
Farhana L, Dawson MI, Murshed F, Das JK, Rishi AK, Fontana JA. Upregulation of miR-
150* and miR-630 induces apoptosis in pancreatic cancer cells by targeting IGF-1R.
PLoS One. 2013;8:e61015.
Filková M, Jüngel A, Gay RE, Gay S. MicroRNAs in Rheumatoid Arthritis: Potential Role in
Diagnosis and Therapy. BioDrugs. 2012;26:131-141.
Galardi S, Mercatelli N, Giorda E, Massalini S, Frajese GV, Ciafrè SA, Farace MG. miR-221
and miR-222 expression affects the proliferation potential of human prostate carcinoma
cell lines by targeting p27Kip1. J Biol Chem. 2007;282:23716-23724.
Heneghan HM, Miller N, Lowery AJ, Sweeney KJ, Kerin MJ. MicroRNAs as Novel
Biomarkers for Breast Cancer. J Oncol. 2010, Article ID 950201.
Hu Y, Li P, Hao S, Liu L, Zhao J, Hou Y. Differential expression of microRNAs in the
placentae of Chinese patients with severe pre-eclampsia. Clin Chem Lab Med.
2009;47:923-9.
Hu S, Huang M, Li Z, Jia F, Ghosh Z, Lijkwan MA, Fasanaro P, Sun N, Wang X, Martelli F,
Robbins RC, Wu JC. MicroRNA-210 as a novel therapy for treatment of ischemic heart
disease. Circulation. 2010;122(11 Suppl):S124-S131.
Iorio MV, Ferracin M, Liu CG, Veronese A, Spizzo R, Sabbioni S, Magri E, Pedriali M,
Fabbri M, Campiglio M, Ménard S, Palazzo JP, Rosenberg A, Musiani P, Volinia S,
Nenci I, Calin GA, Querzoli P, Negrini M, Croce CM. MicroRNA gene expression
deregulation in human breast cancer. Cancer Res. 2005;65:7065-7070.
Jang J, Jeon HS, Sun Z, Aubry MC, Tang H, Park CH, Rakhshan F, Schultz DA, Kolbert CP,
Lupu R, Park JY, Harris CC, Yang P, Jin J. Increased miR-708 Expression in NSCLC
and Its Association with Poor Survival in Lung Adenocarcinoma from Never Smokers.
Clin Cancer Res. 2012 [Epub ahead of print].
Janjigian YY, Kelsen DP. Genomic dysregulation in gastric tumors. J Surg Oncol.
2013;107:237-242.
Kanda M, Kodera Y. Recent advances in the molecular diagnostics of gastric cancer. World J
Gastroenterol. 2015;21:9838-9852.
Khanmi K, Ignacimuthu S, Paulraj MG. MicroRNA in prostate cancer. Clin Chim Acta. 2015
Sep 27.
Kim HW, Haider HK, Jiang S, Ashraf M. Ischemic preconditioning augments survival of
stem cells via miR-210 expression by targeting caspase-8-associated protein 2. J Biol
Chem. 2009;284:33161-33168.

Kotlabova K, Doucha J and Hromadnikova I. Placental-specific microRNA in maternal

circulation--identification of appropriate pregnancy-associated microRNAs with
diagnostic potential. J Reprod Immunol. 2011;89:185-91.
Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T. Suppression of
non-small cell lung tumor development by the let-7 microRNA family. Proc. Natl. Acad.
Sci. U.S.A. 2008;105:3903–3908.
Lehmann U, Hasemeier B, Christgen M, Müller M, Römermann D, Länger F, Kreipe H.
Epigenetic inactivation of microRNA gene hsa-mir-9-1 in human breast cancer. J Pathol.
2008;214:17-24.
Li JM, Zhao RH, Li ST, Xie CX, Jiang HH, Ding WJ, Du P, Chen W, Yang M, Cui L. Down-
regulation of fecal miR-143 and miR-145 as potential markers for colorectal cancer.
Saudi Med J. 2012;33:24-29.
Luo L, Ye G, Nadeem L, et al. MicroRNA-378a-5p promotes trophoblast cell survival,
migration and invasion by targeting Nodal. J Cell Sci. 2012;125:3124-32.
Lulla RR, Costa FF, Bischof JM, Chou PM, de F Bonaldo M, Vanin EF, Soares MB.
Identification of differentially expressed microRNAs in osteosarcoma. Sarcoma.
2011:732690.
Lycoudi A, Mavreli D, Mavrou A, Papantoniou N, Kolialexi A. miRNAs in pregnancy-
related complications. Expert Rev Mol Diagn. 2015;15:999-1010.
Majid S, Dar AA, Saini S, Arora S, Shahryari V, Zaman MS, Chang I, Yamamura S, Tanaka
Y, Deng G, Dahiya R. MicroRNA-23b represses proto-oncogene Src kinase and
functions as methylation-silenced tumor suppressor with diagnostic and prognostic
significance in prostate cancer. Cancer Res. 2012; 72: 6435–6446.
Melo SA, Esteller M. Dysregulation of microRNAs in cancer: playing with fire. FEBS Lett.
2011;585:2087-2099.
Mercatelli N, Coppola V, Bonci D, Miele F, Costantini A, Guadagnoli M, Bonanno E, Muto
G, Frajese GV, De Maria R, Spagnoli LG, Farace MG, Ciafrè SA. The inhibition of the
highly expressed miR-221 and miR-222 impairs the growth of prostate carcinoma
xenografts in mice. PLoS One. 2008;3:e4029.
Mirnezami AH, Pickard K, Zhang L, Primrose JN, Packham G. MicroRNAs: key players in
carcinogenesis and novel therapeutic targets. Eur J Surg Oncol. 2009;35:339-347.
Mishra PJ, Humeniuk R, Mishra PJ, Longo-Sorbello GS, Banerjee D, Bertino JR.A miR-24
microRNA binding-site polymorphism in dihydrofolate reductase gene leads to
methotrexate resistance. Proc Natl Acad Sci U S A. 2007;104:13513-13518.
Morales-Prieto DM, Schleussner E and Markert UR. Reduction in miR-141 is induced by
leukemia inhibitory factor and inhibits proliferation in choriocarcinoma cell line JEG-3.
Am J Reprod Immunol 2011; 66 Suppl 1:57-62.
Motoyama K, Inoue H, Takatsuno Y, Tanaka F, Mimori K, Uetake H, Sugihara K, Mori M.
Over- and under-expressed microRNAs in human colorectal cancer. Int J Oncol.
2009;34:1069-1075.
Mouillet JF, Ouyang Y, Coyne CB, Sadovsky Y. MicroRNAs in placental health and disease.
Am J Obstet Gynecol. 2015; 213(4 Suppl):S163-72.
Muluhngwi P, Klinge CM. Roles for miRNAs in endocrine resistance in breast cancer.
Endocr Relat Cancer. 2015;22:R279-300.

Nadeem L, Brkic J, Chen YF, et al. Cytoplasmic mislocalization of p27 and CDK2 mediates
the anti-migratory and anti-proliferative effects of Nodal in human trophoblast cells. J
Cell Sci. 2013;126:445-53.
Natarajan R, Putta S, Kato M. MicroRNAs and Diabetic Complications. J Cardiovasc Transl
Res. 2012. [Epub ahead of print].
Osada H, Takahashi T. let-7 and miR-17-92: small-sized major players in lung cancer
development. Cancer Sci. 2011;102:9-17.
Ottaviani G, Jaffe N. The etiology of osteosarcoma. Cancer Treat Res. 2009; 152:15–32.
Pineles BL, Romero R, Montenegro D, et al. Distinct subsets of microRNAs are expressed
differentially in the human placentas of patients with preeclampsia. Am J Obstet Gynecol.
2007;196:261 e1-6.
Salehi E, Eftekhari R, Oraei M, Gharib A, Bidad K. MicroRNAs in rheumatoid arthritis. Clin
Rheumatol. 2015;34:615-628.
Sampson VB, Yoo S, Kumar A, Vetter NS, Kolb EA. MicroRNAs and Potential Targets in
Osteosarcoma: Review. Front Pediatr. 2015;3:69.
Seitz H, Royo H, Bortolin ML, Lin SP, Ferguson-Smith AC, Cavaillé J. A large imprinted
microRNA gene cluster at the mouse Dlk1-Gtl2 domain. Genome Res. 2004;14:1741-8.
Singh SK, Pal Bhadra M, Girschick HJ, Bhadra U. MicroRNAs--micro in size but macro in
function. FEBS J. 2008;275:4929-4944.
Tang Y, Zhou X, Ji J, Chen L, Cao J, Luo J, Zhang S. High expression levels of miR-21 and
miR-210 predict unfavorable survival in breast cancer: a systemic review and meta-
analysis. Int J Biol Markers. 2015. [Epub ahead of print].
Thum T, Gross C, Fiedler J, Fischer T, Kissler S, Bussen M, Galuppo P, Just S, Rottbauer W,
Frantz S, Castoldi M, Soutschek J, Koteliansky V, Rosenwald A, Basson MA, Licht JD,
Pena JT, Rouhanifard SH, Muckenthaler MU, Tuschl T, Martin GR, Bauersachs J,
Engelhardt S. MicroRNA-21 contributes to myocardial disease by stimulating MAP
kinase signalling in fibroblasts. Nature. 2008;456:980-984.
Verdoodt B, Neid M, Vogt M, Kuhn V, Liffers ST, Palisaar RJ, Noldus J, Tannapfel A,
Mirmohammadsadegh A. MicroRNA-205, a novel regulator of the anti-apoptotic protein
Bcl2, is downregulated in prostate cancer. Int J Oncol. 2013;43:307-14.
Volinia S, Calin GA, Liu CG, Ambs S, Cimmino A, Petrocca F, Visone R, Iorio M, Roldo C,
Ferracin M, Prueitt RL, Yanaihara N, Lanza G, Scarpa A, Vecchione A, Negrini M,
Harris CC, Croce CM. A microRNA expression signature of human solid tumors defines
cancer gene targets. Proc Natl Acad Sci U S A. 2006;103:2257-2261.
Wang N, Zhou Z, Liao X, Zhang T. Role of microRNAs in cardiac hypertrophy and heart
failure. IUBMB Life. 2009;61:566-571.
Wang H, Tan G, Dong L, Cheng L, Li K, Wang Z, Luo H. Circulating MiR-125b as a Marker
Predicting Chemoresistance in Breast Cancer. PLoS One. 2012;7:e34210.
Wang Y, Fan H, Zhao G, et al. miR-16 inhibits the proliferation and angiogenesis-regulating
potential of mesenchymal stem cells in severe pre-eclampsia. FEBS J 2012;279:4510-24.
Wang L, Li B, Lei L, Wang T. MicroRNA-497 Suppresses Proliferation and Induces
Apoptosis in Prostate Cancer Cells. Asian Pac. J. Cancer Prev. 2013; 14:3499–3502.
Wang J, Yang M, Li Y, Han B. The Role of MicroRNAs in the Chemoresistance of Breast
Cancer. Drug Dev Res. 2015 [Epub ahead of print].

Xu P, Zhao Y, Liu M,. Wang Y, Wang H, Li YX, Zhu X, Yao Y, Wang H, Qiao J, Ji L,
Wang YL. Variations of microRNAs in human placentas and plasma from preeclamptic
pregnancy. Hypertension 2014;63:1276-84.
Yang B, Lin H, Xiao J, Lu Y, Luo X, Li B, Zhang Y, Xu C, Bai Y, Wang H, Chen G, Wang
Z. The muscle-specific microRNA miR-1 regulates cardiac arrhythmogenic potential by
targeting GJA1 and KCNJ2. Nat Med. 2007;13:486-491.
Zhang R, Wang YQ and Su B. Molecular evolution of a primate-specific microRNA family.
Mol Biol Evol. 2008;25:1493-502.
Zhang Y, Wang Z, Chen M, Peng L, Wang X, Ma Q, Ma F, Jiang B. MicroRNA-143 Targets
MACC1 to Inhibit Cell Invasion and Migration in Colorectal Cancer. Mol Cancer.
2012;11:23.
Zhang Y, Fei M, Xue G, Zhou Q, Jia Y, Li L, Xin H, Sun S. Elevated levels of hypoxia-
inducible microRNA-210 in pre-eclampsia: new insights into molecular mechanisms for
the disease. J Cell Mol Med. 2012;16:249-59.
Zhao C, Dong J, Jiang T, et al. Early second-trimester serum miRNA profiling predicts
gestational diabetes mellitus. PLoS One. 2011;6:e23925.
Zhu XM, Han T, Sargent IL, Yin GW, Yao YQ. Differential expression profile of
microRNAs in human placentas from preeclamptic pregnancies vs normal pregnancies.
Am J Obstet Gynecol. 2009;200:661 e1-7.

Chapter 8
PAPAYA VIRAL DISEASES:

RECENT ADVANCES AND PERSPECTIVES
Marcos Fernando Basso1,∗, José Albersio de Araújo Lima2,

Michihito Deguchi3, Diogo Manzano Galdeano4,
Lívio da Silva Amaral5 and Patricia Machado Bueno Fernandes6
1
Genetics and Biotechnology Laboratory,
Embrapa Agroenergy (CNPAE), Brasília, DF, Brazil
2
Plant Virology Laboratory, Universidade Federal do Ceará,
Fortaleza, CE, Brazil
3
Plant Molecular Biology Laboratory, National Institute of Science
and Technology in Plant-Pest Interactions (INCT-IPP),
Universidade Federal de Viçosa, Viçosa, MG, Brazil
4
Biotechnology Laboratory,
Centro de Citricultura Sylvio Moreira, IAC, Cordeiropolis, SP, Brazil
5
Phytobacteriology Laboratory, Departamento de Fitopatologia,
Universidade Federal de Viçosa, Viçosa, MG, Brazil
6
Biotechnology for Agribusiness Laboratory,
Universidade Federal do Espírito Santo, Vitória, ES, Brazil
ABSTRACT
Papaya (Carica papaya) is a perennial herbaceous plant native to tropical America
belonging to the family Caricaceae, one of the most important fruit crops and widely
distributed in tropical and subtropical countries. Viral diseases are considered of
worldwide strong economic importance by reductions in the papaya production, quality
and plant longevity. Papaya ringspot virus biotype Papaya (PRSV-P), genus Potyvirus;
Papaya lethal yellowing virus (PLYV), genus Sobemovirus; Papaya leaf distortion
mosaic virus (PLDMV), genus Potyvirus; Papaya apical necrosis virus (PANV), genus
Rhabdovirus; Papaya mosaic virus (PMV), genus Potexvirus; Tomato spotted wilt virus
∗
email: marcosbiotec@gmail.com.

136 Marcos Fernando Basso, José Albersio de Araújo Lima, Michihito Deguchi et al.
(TSWV), genus Tospovirus; Papaya mild yellow leaf virus (PMYLV), genus Tenuivirus
and Papaya meleira virus (PMeV), not classified yet by the ICTV are the main virus
species which cause disease of the worldwide economic importance reported in papaya.
They are responsible by causing losses of up to 100% in productivity, strongly decreasing
the commercial quality of the fruits and in advanced stages, some of the diseases can lead
plant death. Some of those viruses are highly stable and/or easily disseminated by insect
vectors or mechanically. Infected plants are not able to be cured, being mainly
recommended roguing for eliminating sources of inoculum. Propagation of healthy
material, tolerant or resistant plants, weekly inspections and roguing, and management of
insect vectors have been considered important measures for the success of this crop;
however, field reality hampers its adoption. Here, we summarize the main characteristics
of the papaya viral diseases, with emphasis to PRSV-P, PLYV, PMeV and PLDMV
which are considered to be of the highest incidence worldwide with economic
importance. In addition, recent advances in papaya-viruses molecular interaction are
reviewed, aiming to stimulate critical sense and on the search of novel frontiers.
1. INTRODUCTION
Papaya (Carica papaya L.) is a dicotyledonous, polygamous, diploid, perennial and
herbaceous plant native from tropical America belonging to the family Caricaceae [1-3]. It is
a highly socioeconomic fruit crops and widely distributed in tropical and subtropical countries
[4]. Papaya orchards are affected by several pathogens [5], being viruses considered
worldwide of strong economic importance by their reductions in the orchard production,
quality of fruits and plant longevity [6-8]. Several viral species have been reported infecting
papaya [9] and being of the most economic importance Papaya ringspot virus biotype Papaya
(PRSV-P), genus Potyvirus, the causal agent of ring spot disease; Papaya lethal yellowing
virus (PLYV), genus Sobemovirus, the causal agent of lethal yellowing disease; Papaya
meleira virus (PMeV), not classified yet by the ICTV, the causal agent of meleira or sticky
disease; and Papaya leaf distortion mosaic virus (PLDMV), genus Potyvirus, the causal agent
of leaf distortion mosaic disease [10].
Papaya ring spot, caused by PRSV-P, is the major papaya disease worldwide being
considered the most important economically causing serious damage in the orchard, mainly in
the Americas and Japan [11]. PYLV is restricted to Northeastern of Brazil and its incidence
and economic importance have increased significantly in recent years [12-15]. PMeV has
been reported in Brazil and Mexico, which is considered of increasing economic importance
[16, 17]. PLDMV have lower economic importance when compared with PRSV, PLYV and
PMeV [18, 19]. The Papaya apical necrosis virus (PANV), genus Rhabdovirus [20]; Papaya
mosaic virus (PMV), genus Potexvirus [21]; Tomato spotted wilt virus (TSWV), genus
Tospovirus [22]; Papaya mild yellow leaf virus (PMYLV), genus Tenuivirus [23]; and
Papaya leaf curl virus (PLCV or PaLCuV) [24] and Papaya leaf crumple virus (PaLCrV)
[25], genus Begomovirus, are considered of isolated occurrence and relative low economic
importance [9].

Papaya Viral Diseases 137
Papaya Ring Spot Disease (PRSD)
PRSD was first reported in Hawaii in the 1940’s [26] and then in India in 1958 [27] and
is the most widespread papaya virus disease nowadays. The PRSV primary systemic hosts are
restricted to the families Caricaceae and Cucurbitaceae, causing local infections in
Chenopium amaranticolor and C. quinoa from the family Chenopodiaceae, used as indicators
plants with typical local lesion [7]. Usually, the symptoms on papaya are severe, causing
mosaic and chlorosis on the leaves and water soaked oily streaks on the leaf petiole [28]. The
severe symptoms include young leaf distortion and young papaya plant can remain stunted
and will not produce fruits. Fruits from infected papaya always show ring spots and serious
deformations. A severe PRSV isolate from Taiwan is also known to induce systemic necrosis
and wilting along with mosaic and chlorosis.
PRSV isolates are divided into two major strains or biotypes based on their host range.
PRSV-W (watermelon) infects cucurbits plants but does not infect papaya, while PRSV-P
(papaya) infects papaya in addition to some cucurbits species. PRSV-P and PRSV-W are
found wherever their host crops are grown in tropical and subtropical regions. In the tropics
and subtropics where both biotypes and their hosts are present, PRSV-P effectively completes
its life cycle in papaya while PRSV-W completes its life cycle in cucurbits. So, cucurbits
generally do not serve as alternative hosts for PRSV-P. Sequence analyses surmised that
PRSV-P appearance was due to mutations in the established PRSV-W that provided property
to infect papaya, suggesting that PRSV-P originated from PRSV-W.
In cucurbits, PRSV-W infected leaves show intense mosaic with a narrowing of the
leaves. Severe cases can result in a shoestring effect similar to what is observed in papaya.
Similar to papaya, infected young cucurbits do not develop or present good production; and
old infected plants produce fruit with lower commercial quality, such as color, size, format
and nutritional quality.
Recently, a new putative virus from the genus Umbravirus was identified and partially
sequenced in Ecuador associated with PRSV-W infected plants (31%) and symptomless
PRSV-P free papayas (2%) [29]. Similarly, mixed infection of PRSV-P and phytoplasma also
has been reported in Indian papaya orchards [30].
PRSV belongs to the genus Potyvirus (family Potyviridae). Virions consists of a
nucleocapsid non-enveloped, filamentous and flexuous particles measuring 760-800 x 12 nm
long and composed of the monopartite linear single-stranded positive sense 10.326 bp RNA
genome [31]. Its genome encodes a polyprotein of the 3.344 amino acids which is
subsequently cleaved by three viral proteases (P1, HC-Pro and NIa protein) into various
functional smaller proteins with several functions, typical of virus species from the genus
Potyvirus [31]. PRSV is transmitted in non-persistent manner by several aphid species, easily
transmitted by mechanical inoculation and not transmitted by seeds.
Papaya Lethal Yellowing Disease (PLYD)
PLYD was first reported in the state of Pernambuco (Brazil), and it is actually restricted
to Northeastern Brazil orchards [14, 32]. The viral infection is initially manifested as
progressive yellowing, curled and wilting of the upper leaves. The fruits from infected plants
develop greenish circular spots that turn yellowish as fruits ripen, followed by death of entire

plant [15]. Young PLYV-inoculated papaya present mosaic, leaf yellowing and distortion
symptoms. PLYV infection may result in significant reduction in chlorophyll content and
photosynthetic potential [33]. Mechanical inoculation trials indicated that PLYV systemically
infects only species from the family Caricaceae, including C. papaya, Jacaratia heterophylla,
J. spinosa, Vasconcellea cauliflora, V. monoica and V. quercifolia. On the other hand, PLYV
did not infect any of the other 82 plant species from 16 different families mechanically
inoculated, including Chenopodium amaranticolor, C. murale, C. quinoa and Nicotiana
benthamiana [14, 34].
PLYV has been tentatively assigned to genus Sobemovirus, presenting isometric viral
particles of about 25-30 nm diameter, composed of unique positive single-stranded 4,145 bp
RNA genome and a coat protein of 34.7 kDa [35]. Its genome is organized in four open
reading frames (ORFs): (i) putative movement protein and RNA silencing suppressor, (ii)
serine protease and VPg, (iii) RNA-dependent RNA polymerase and (iv) coat protein [12, 36].
PLYV is easily mechanically transmitted to plants from the family Caricaceae through
contaminated hands or cultural practices, demonstrating its high stability [34, 37]. The virus is
also found in contaminated soils, irrigation water, dried leaves, leaf and root debris, on the
surface of infected fruit seeds, but it’s not transmitted through seed embryo [14, 37, 38]. To
date, no biological vector has been identified, being which Mizus persicae Sulz, Diabrotica
bivitulla Kirke, D. speciosa Kirke and two-spotted mite Tetranychus urticae (Arachnida:
Tetranychidae) were not capable of transmitting PLYV [6, 12, 39].
Papaya Meleira or Sticky Disease (PMeD)
PMeD was first reported in Brazil in the 1980’s [6, 40, 41] and Mexico in 2008 [42]. It is
characterized by excessive and spontaneous fluid and aqueous latex exudation from papaya
fruits, leaves and stems. This exuded latex undergoes oxidation, causing necrotic spots on
young leaves and fruit with sticky appearance [16, 43]. In advanced stages of the disease,
irregular light-green areas have been observed on Brazilian infected fruits surface. In
addition, infected plants in Mexico show small internal blotches in the fruit pulp, necrotic
spots on petiole and in severe cases, the presence of the latex inside the fruit cavity covering
the seeds [16, 42]. Several mechanical inoculation methods was tested being only injection of
PMeV-infected latex into the stem apex resulted in papaya viral infection [44, 45]. In
addition, among agricultural practices, it was observed that fruit thinning is responsible to
viral transmission and dispersion [16]. Abreu et al. [44] showed that PMeV can be transmitted
via contaminated seeds. Tapia-Tussel et al. [46] confirm that PMeV can be transmitted by
seeds and showed that the transmission occur across embryo-endosperm (50%) and seed coat
(83%). So far it is unknown the insect vector of PMeV.
PMeV is characterized by 40-nm icosahedrical viral particles, which are restricted in high
concentration to lactiferous vesicles and composed by double-stranded RNA genome of 8.7
kb [17, 40, 41]. Its genome is organized in two ORFs, a putative coat protein and RNA-
dependent RNA polymerase [17]. Genetic diversity studies show that PMeV presents
significant variability between isolates of the same sampled region [47]. Phylogenetic
relationship studies using the RdRP protein sequence from Brazilian isolate indicates that
PMeV has similarity with mycoviruses belonging to the family Totiviridae [17, 47, 48].

Papaya Leaf Distortion Mosaic Disease (PLDMD)
PRSV was reported in 1975 from southern part of Taiwan, where it destroyed most of the
papaya orchards within few years, causing unprecedented yield loss [49]. Then, it was
successfully controlled by commercialized transgenic PRSV resistant plants produced by
transferring a T-DNA fragment containing PRSV CP gene [11, 50]. However, the presence of
other virus was confirmed in PRSV-resistant transgenic papaya [51]. Susceptibility of all
PRSV-resistant transgenic papaya lines to the virus indicates that it is an emerging threat for
the application of PRSV-resistant transgenic papaya in Asian countries such as China, Japan,
Taiwan, and Hawaii. PLDMV was identified in 1954 in the northern area of Okinawa, Japan,
spread throughout the island during the 1960s [52], and has been a major constraint to papaya
cultivation in this area [19, 53]. Based on several convincing characteristics, such as
symptoms on infected papaya plants, host range, non-persistent transmission by aphids, and
viral physical properties, the papaya disease in Okinawa was initially thought to be caused by
PRSV or Papaya mosaic virus (PapMV). Mixed infection of PRSV and PLDMV has also
been related [54].
PLDMD show similar symptoms to PRSD and PapMD, for example, yellow-green leaf
discoloration and distortion on leaves, water-soaking streaks on petioles, stems and ring-spots
on fruits, making it difficult to distinguish among them without further testing [55-57].
PapMV was reported in Florida (USA) in 1962 [58] and has been considered of lower
incidence in the field and relative economic importance [18].
PLDMV can be classified into biotype C (Cucurbit) or P (Papaya). The first biotype was
collected from Ishigaki Island, Japan, infecting a cucurbit and, therefore, it was designated as
C type [19]. On the other hand, a virus identified as new pathotype of PLDMV (PLDMV, P-
TW-WF) was isolated from diseased papaya in an isolated test-field in central Taiwan, where
transgenic papaya lines resistant to PRSV were evaluated [59]. This virus, which did not react
in enzyme-linked immunosorbent (ELISA) assay with antiserum to PRSV CP, infected only
papaya, but not the other 18 plant species tested. Virions studied under electron microscope
exhibited morphology and dimensions of specie particles from the genus Potyvirus.
Furthermore, phylogenetic analysis of PLDMV isolates of Taiwan and Japan indicated that
Taiwan isolates belong to a separate genetic cluster. By their inability to infect papaya, type
PLDMV-C and type PRSV-W isolates differ from their respective P type counterparts, which
can also infect papaya, in addition to their cucurbit infectivity. In order to study genomic
character of the virus, the complete nucleotide sequence of the genome of PLDMV-P from
Taiwan was determined [19]. Next, the CP sequence of PLDMV-C from Ishigaki Island,
Okinawa, was compared with those of three strains of the P biotype (PLDMV-P), and
additional to PapMV and Papaya yellow mosaic virus, which are biologically different from
each other [51]. The CP sequences of three strains of PLDMV-P share high identities of 95 to
97%, while they share lower identities of 88 to 89% with that of PLDMV-C. Significant
changes in hydrophobicity and deletion of two amino acids at N-terminal region of the CP of
PLDMV-C were observed. The finding of two biotypes of PLDMV implies that the papaya-
infecting biotype evolved from cucurbitaceae-infecting viruses from the genus Potyvirus, as it
has been previously suggested for PRSV. In addition, a similar evolutionary event acquiring
infectivity to papaya may arise frequently in viruses in the family Cucurbitaceae.

Taken together, there is a hypothesis that PLDMV may be primarily a pathogen of the
family Cucurbitaceae and PLDMV-P could be expected to have been adapted to infect papaya
from PLDMV-C by a mutation [19].
PLDMV has a positive single-stranded RNA genome of about 10,153 bp excluding the
poly(A) tail which is translated into373.68 kDa polyprotein that is subsequently processed by
viral proteases into coat protein (CP) and several non-structural proteins (P1, HC-Pro, P3,
6K1, CI, 6K2, NIa-VPg, NIa-Pro, PIPO and NIb) [19, 51]. The polyprotein contained nine
putative proteolytic cleavage sites and some motifs conserved in other potyviral polyproteins
with 44 to 50% identities, indicating that PLDMV is a distinct species in the genus Potyvirus.
It’s transmitted via non-persistent by aphids (mainly Aphis gosypii and M. persicae) and its
host range is similar to PRSV.
2. APPROACHES FOR VIRAL DETECTION IN PAPAYA

Given the economic importance of the papaya viruses, efficient and consistent detection
and identification methods will contribute to reduce disease incidence and economic losses,
and assist on disease management [12]. Particular attention should be taken in the pathogens
early detection in propagative material, as such seeds, mother plants and plant propagation
material to avoid the introduction and spreading of these pathogens in a growing area, where
it is yet not present and also preventing the appearance of complex diseases. So, the
availability of fast, sensitive and accurate molecular and serological tools for detection and
identification of papaya viruses is necessary to assist its control. Conventional reverse
transcription-polymerase chain reaction (RT-PCR) or quantitative RT-PCR (qRT-PCR),
enzyme-linked immunosorbent assay (ELISA), DOT-ELISA (DIBA) and rolling circle
amplification (RCA) based on phi29 DNA polymerase (for viruses from the genus
Begomovirus detection/identification) methods are widely used with success in papaya
viruses detection [44, 60-63]. However, ELISA is less sensitive than RT-PCR and requires
high quality antibody to achieve specificity and sensitivity, but it is more suitable for routine
use for a large number of samples. The antibody quality is directly related to purity and
antigen structural integrity. Recombinant proteins expressed in prokaryotic systems (generally
Escherichia coli) are frequently used in research because are stable, abundant and easily
purified [12]. Due to relative lower genetic variability in CP gene (compared to others genes)
[13] and being an abundant and structural protein among papaya viral isolates, a polyclonal
antiserum against CP of a specific virus isolate may detect different others isolates. However,
for serological detection of the viral isolates present in the country, it is recommended using
polyclonal antiserum produced for local isolates [12].
Molecular tools have allowed identify and quantify a long list of viruses in several hosts,
in genetic variability studies and aiding in description of new viral species. Their main
advantages are rapidity, more specific, sensitive and accurate, and can distinguish closely
related organisms at different taxonomic levels. The choice of the detection method to be
used, primers efficiency and specificity and other reagents, selection of target DNA to
amplify, extraction protocol of the nucleic acids, the host plant sample type and sampling
time are factors of great importance for consistent detection/identification of these pathogens.
Conventional PCR and their variations, such as real-time PCR, allow fast, accurate detection

and quantification of viruses. Multiplex RT-PCR has being used with success for
simultaneous detection of papaya viruses [56, 64]. Immunocapture RT-PCR was successfully
used to PRSV detection, which allowed to detect virus in low titer in papaya [65]. A newly
developed immune virus precipitation RT-PCR (IP-RT-PCR) was shown to be a practical and
sensitive diagnostic technique for detection of plant viruses, including PLYV in papaya [15].
The IP-RT-PCR provides virus particle specific immunoprecipitation which might suppress
the problems of IgG adherence in some types of microtubes with IC-PCR. Additionally, the
IP-RT-PCR technique reduces the risk of cross-contamination with plant RNAs and does not
require expensive equipment and reagents [15]. Reverse transcription loop-mediated
isothermal amplification (RT-LAMP) also has been used in papaya viruses detection [66].
3. NEXT-GENERATION SEQUENCING (NGS)

The NGS technologies allows the knowledge of millions to billions of DNA or cDNA
sequences, with lengths between 10 and 400 bp in single run in few days. NGS has caused
great impact on molecular biology in several research fields, such as metagenomics,
comparative genomics, small RNA profile analysis, high throughput polymorphism detection,
detection and identification of plant pathogens, mutation screening, transcriptome profiling,
methylation profiling, chromatin remodeling, among other applications [67]. Compared to
Sanger sequencing, NGS does not require in vivo cloning by clonal amplification and does
not use primer to target sequence. The major advantages are deep sequencing coverage to
identify low expressed genes, simplified library construction, sequencing of entire genomes at
relatively low cost comparing to other protocols, all of which make this technology attractive
for a wide range of applications in host-pathogen interaction studies. Technologies named as
NGS include two of the major approaches, sequencing by synthesis (SBS) and sequencing by
oligo ligation and detection system (SOLiD); both with accuracy close to 99.99% [68]. In
parallel, advances in bioinformatics tools have made possible the relatively rapid analysis of
these massive amounts of data. Several software and tools have been recently implemented
and, through using the Linux command line, they have allowed more robust analyzes in less
time and labor. Sequencing of the PLYV whole genome [36], PMeV whole genome [17],
Hainan isolate of PRSV [69], cDNA library sequencing [70] and miRNA profile responsive
to PMeV infection [71] are some examples of the use of NGS.
4. PAPAYA VIRAL DISEASES MANAGEMENT

Several strategies have been recommended for papaya virus control, which include
mainly weekly inspections and eradication of symptomatic or virus-infected plants, the use of
virus-free certified plantlets, disinfectants to treat agricultural tools to help prevent mechanic
transmission, seeds from healthy papaya, as well as to eliminate voluntary plants or
abandoned papaya crops, to adopt other cultural practices to reduce virus man-assisted
transmission, planting in virus-free areas, cross protection, virus-tolerant or resistant cultivars,
transgenic plants with engineered genetic resistance and prevent planting in consortium with
cucurbits or weeds with potential to be alternative hosts [18, 72-75].

Transgenic papaya over expressing sense/antisense viral-derived genomic fragment

efficiently has resulting in virus-resistant plants by the activation of the RNA interference
system against viral infection, which acts in dsRNA-trigger of the post-transcriptional gene
silencing [74, 76-79]. Transgenic papaya by pathogen-derived resistance is currently the most
effective in the control of papaya viral diseases [80, 81]. The adoption of genetically modified
papaya reduces mainly the sources of initial inoculum and in association with others
measurements may contribute to a more durable resistance [82]. However, genetic variability,
recombination events and selection pressure exerted on the population can induce to
resistance breakdown at long run and emergence of a more virulent virus strain [83, 84]. In
addition, non-transgenic approach has been tested to control PRSV using intron-containing
hairpin RNA (ihpRNA) obtained from PRSV CP gene produced and purified into bacterial
RNaseIII-deficient E. coli strain M-JM109lacY [85]. These findings indicate that directly
spraying papaya with ihpRNA or crude bacterial extract preparations can interfere with virus
infection.
Resistant sources to papaya viruses may assist in the diseases management, but there are
still little information about virus-resistant genotypes [12, 86]. The conventional genetic
breeding in papaya is also limited by intrinsic characteristics of this plant, such as sexual
incompatibility and genetic vulnerability [87]. Promising results were obtained by Sudha et
al. [88], Siar et al. [89] and Dinesh et al. [90] using intergeneric crossing between PRSV-
resistant wild species of Vatica cauliflora and C. papaya commercial variety.
Up to now, a few mild strains of the papaya viruses are known and, therefore, cross-
protection is not a strategy that can be of usual consideration to control these viruses.
Although few successful examples for some papaya viral species [91, 92], little is accepted
and adopted by papaya growers. Similarly, the systemic resistance induction in papaya plants
to PRSV using antiviral agents, such as Boerhaavia diffusa root extracts and Clerodendron
aculeatum leaf extracts were significantly effective [93, 94].
Disease management on vector-insect control in papaya orchards is not economically
viable due to the fact that most insect vectors visit but not colonize papaya or acquires and
transmits the virus within few seconds, demonstrating that chemical control is not effective
[15]. Biological control of aphid in papaya has great potential, but the economic viability is
still not acceptable for use at the field level [95].
5. ADVANCES IN PAPAYA-VIRUSES INTERACTION

KNOWLEDGE AND PERSPECTIVES
Recent advances have been acquired in the serological and molecular diagnosis of the
papaya viruses. Moreover, new knowledge in some mechanisms of transmission, spreading
and management, and understanding the molecular mechanism of the papaya-viruses
interaction has emerged.
Gao et al. [96] showed that PRSV NIa-pro protein interacts with papaya methionine
sulfoxide reductase B1 (PaMsrB1) suggesting that this interaction could disturb PaMsrB1 into
the chloroplasts to scavenge ROS caused by PRSV infection and propose a novel mechanism
of PRSV towards the host defense. Similarity, Gao et al. [47] validate in vivo the papaya
eukaryotic translation initiation factor 3 subunit G (CpeIF3G) interaction with PRSV NIa-

Pro. The CpeIF3G acts in several stress response by enhancing the translation of defense-
related proteins and NIa-Pro interaction may impair translation pre initiation complex
assembly of host defense proteins.
WRKY and C2H2 transcriptions factors (TFs) plays important role in several abiotic and
biotic stresses responses. Pan and Jiang [97], and Jiang and Pan [98] evaluated the expression
profile of several WRKY and C2H2 TFs in C. papaya, and four and two of these TFs,
respectively, were up-regulated by PRSV infection. Similarly, Haireen and Drew [99]
identified a new PRSV-resistance gene in V. pubescens, named of VP_STK2, which presents
an additional peroxisomal targeting signal. These results suggest that some of these genes can
be used for the development of PRSV-tolerant papaya. The use of the proteomic approaches
from virus-infected papaya also has showed new regulatory mechanisms of the virus-papaya
interaction and that might help in the virus-tolerant papaya development [100-102].
Micro RNAs (miRNAs) are implicated in the defense response to abiotic and biotic
stresses and modulating several biological processes [103]. During infection, viruses and
papaya starts a real battle. Abreu et al. [71], showed that several miRNAs are differentially
expressed (up- or down-regulated) during viral infection on papaya and that these has targeted
resistance genes, transcription factors and genes involved in common biological processes
(some of which may correlate with induction of symptoms in these plants). It was also found
that this regulation is dynamic and regulated by miRNAs from many viruses, so as to render
the plant susceptible. However, many other miRNAs are regulated by host as a means of
activating the defense mechanisms. Further, the overexpression or knockout of the specific
MIR genes may confer greater tolerance to papaya viruses. In addition to these factors, there
are numerous others that are already well known and others that are emerging that illustrate
the extreme complexity of virus-papaya interaction [104].
CONFLICTS OF INTEREST
The authors declare no conflict of interest.
REFERENCES
[1] G. Fuentes, J.M. Santamaría, Papaya (Carica papaya L.): origin, domestication, and
production, in: R. Ming, P.H. Moore (Eds.) Genetics and genomics of papaya, Springer
New York Heidelberg Dordrecht London, 2014, pp. 3-16.
[2] R. Ming, S. Hou, Y. Feng, Q. Yu, A. Dionne-Laporte, J.H. Saw, P. Senin, W. Wang,
B.V. Ly, K.L.T. Lewis, S.L. Salzberg, L. Feng, M.R. Jones, R.L. Skelton, J.E. Murray,
C. Chen, W. Qian, J. Shen, P. Du, M. Eustice, E. Tong, H. Tang, E. Lyons, R.E. Paull,
T.P. Michael, K. Wall, D.W. Rice, H. Albert, M.-L. Wang, Y.J. Zhu, M. Schatz, N.
Nagarajan, R.A. Acob, P. Guan, A. Blas, C.M. Wai, C.M. Ackerman, Y. Ren, C. Liu, J.
Wang, J. Wang, J.-K. Na, E.V. Shakirov, B. Haas, J. Thimmapuram, D. Nelson, X.
Wang, J.E. Bowers, A.R. Gschwend, A.L. Delcher, R. Singh, J.Y. Suzuki, S. Tripathi,
K. Neupane, H. Wei, B. Irikura, M. Paidi, N. Jiang, W. Zhang, G. Presting, A. Windsor,
R. Navajas-Perez, M.J. Torres, F.A. Feltus, B. Porter, Y. Li, A.M. Burroughs, M.-C.

Luo, L. Liu, D.A. Christopher, S.M. Mount, P.H. Moore, T. Sugimura, J. Jiang, M.A.
Schuler, V. Friedman, T. Mitchell-Olds, D.E. Shippen, C.W. dePamphilis, J.D. Palmer,
M. Freeling, A.H. Paterson, D. Gonsalves, L. Wang, M. Alam, The draft genome of the
transgenic tropical fruit tree papaya (Carica papaya Linnaeus), Nature, 452 (2008) 991-
996.
[3] V.M. Jiménez, E. Mora-Newcomer, M.V. Gutiérrez-Soto, Biology of the papaya plant,
in: R. Ming, P.H. Moore (Eds.) Genetics and genomics of papaya, Springer New York
Heidelberg Dordrecht London, 2014, pp. 17-34.
[4] J.A.T. da Silva, Z. Rashid, D.T. Nhut, D. Sivakumar, A. Gera, M.T. Souza Jr., T. P.F.,
Papaya (Carica papaya L.) biology and biotechnology, Tree and Forestry Science and
Biotechnology, 1 (2007) 47-73.
[5] B.W. Porter, D.A. Christopher, Y.J. Zhu, Genomics of papaya disease resistance, in: R.
Ming, P.H. Moore (Eds.) Genetics and genomics of papaya, Springer New York
[6] R.C.A. Lima, J.A.A. Lima, M.T. Souza Junior, G. Pio-Ribeiro, G.P. Andrade, Etiologia
e estratégias de controle de viroses do mamoeiro no Brasil, Fitopatologia Brasileira, 26
(2001) 689-702.
[7] S. Tripathi, J.Y. Suzuki, S.A. Ferreira, D. Gonsalves, Papaya ringspot virus-P:
characteristics, pathogenicity, sequence variability and control, Molecular Plant
Pathology, 9 (2008) 269-280.
[8] J.A. Ventura, H. Costa, J.d.S. Tatagiba, Papaya diseases and integrated control, in:
S.A.M.H. Naqvi (Ed.) Diseases of Fruits and Vegetables, Kluwer Academic Publishers,
2004, pp. 201-268.
[9] K.J. Chandra, D.K. Samuel, Viral and phytoplasmal diseases of papaya in india, in:
L.R.V.a.R.C. Sharma (Ed.) Diseases of Horticultural crops-fruits, Indus. Publishing
Co., New Delhi, 1999, pp. 493-515.
[10] P.F. Tennant, G.A. Fermin, R.E. Roye, Viruses infecting papaya (Carica papaya L.):
etiology, pathogenesis, and molecular biology, Plant Viruses, 1 (2007) 178-188.
[11] D. Gonsalves, Control of Papaya ringspot virus in papaya: A case study, Annual
Review Phytopathology, 36 (1998) 415-437.
[12] M.F. Basso, A.J. Pereira, H.M.B. Pereira, H.J.O. Ramos, J.L.L. Dantas, E.P.B. Fontes,
E.C. Andrade, F.M. Zerbini, Screening of papaya accessions resistant to Papaya lethal
yellowing virus and capacity of Tetranychus urticae to transmit the virus, Pesquisa
Agropecúaria Brasileira, 50 (2015) 97-105.
[13] C.B. Daltro, A.J. Pereira, R.S. Cascardo, P. Alfenas-Zerbini, J.E.A. Beserra Jr, J.A.A.
Lima, F.M. Zerbini, E.C. Andrade, Genetic variability of Papaya lethal yellowing virus
isolates from Ceará and Rio Grande do Norte states, Brazil, Tropical Plant Pathology,
37 (2012) 37-43.
[14] A.K.Q. Nascimento, J.A.A. Lima, A.L.L. Nascimento, E.A.J. Beserra, D. Purcifull,
Biological, physical, and molecular properties of a Papaya lethal yellowing virus
isolate, Plant Disease, 94 (2010) 1206-1212.
[15] J.A.A. Lima, A.K.Q. Nascimento, R.C.A. Lima, V.C. Oliveira, G.C. Anselmo, Methods
of diagnosis, stability, transmission, and host interaction of Papaya lethal yellowing
virus in papaya, in: R.K. Gaur, T. Hohn, P. Sharma (Eds.) Plant Virus-Host Interaction,
Elsevier Inc., Academic Press Elsevier, 2015, pp. 207-228.

[16] P.M.V. Abreu, T.F.S. Antunes, A. Magaña-Álvarez, D. Pérez-Brito, R. Tapia-Tussell,

J.A. Ventura, A.A.R. Fernandes, P.M.B. Fernandes, A current overview of the Papaya
meleira virus, an unusual plant virus, Viruses, 7 (2015) 1853-1870.
[17] E.F.M. Abreu, C.B. Daltro, E.O.P.L. Nogueira, E.C. Andrade, F.J.L. Aragão, Sequence
and genome organization of Papaya meleira virus infecting papaya in Brazil, Archives
of Virology, (2015).
[18] G. Fermin, D. Gonsalves, Engineering resistance against Papaya ringspot virus by
native, chimeric and synthetic transgenes, in: G.L.a.G. Thottappilly (Ed.) Virus and
virus-like diseases of major crops in developing Countries, Kluwer Academic
Publishers, 2003, pp. 497-528.
[19] T. Maoka, T. Hataya, The complete nucleotide sequence and biotype variability of
Papaya leaf distortion mosaic virus, Phytopathology, 95 (2005) 128-135.
[20] R. Hernandez, M. Suazo, P. Toledo, The papaya apical necrosis virus, a new viral
disease in Villa Clara, Cuba, Ciencia y Tecnica en la Agricultura, Protection de
Plantas, 13 (1990) 29-36.
[21] Y. Wang, W. Shen, S. Wang, D. Tuo, P. Yan, X. Li, P. Zhou, Complete genomic
sequence of Papaya mosaic virus isolate from hainan island, China, Chinese Journal of
Tropical Crops 34 (2013) 297-300.
[22] D. Gonsalves, E.E. Trujillo, Tomato spotted wilt virus in papaya and detection of the
virus by ELISA, Plant Disease, 70 (1986) 501-506.
[23] E. Marys, O. Carballo, M.L. Izaguirre-Mayoral, Properties of a previously undescribed
supercoiled filamentous virus infecting papaya in Venezuela, Archieves of Virology,
140 (1995) 891-898.
[24] L.S. Chang, Y.S. Lee, H.J. Su, T.H. Hung, First report of Papaya leaf curl virus
infecting papaya plants in Taiwan, Plant Disease, 87 (2003) 204.
[25] P. Singh-Pant, P. Pant, S.K. Mukherjee, S. Mazumdar-Leighton, Spatial and temporal
diversity of begomoviral complexes in papayas with leaf curl disease, Archives of
Virology, 157 (2012) 1217-1232.
[26] R.C. Lindner, D.D. Jensen, W. Ikeda, Ringspot: new papaya plunderer, Hawaii Farm
and Home, 8 (1945) 10-14.
[27] S.P. Capoor, P.M. Varma, A mosaic disease of papaya in Bombay, Indian Journal
Agriculture Science, 29 (1958) 225-233.
[28] S.K. Sharma, S. Tripathi, Papaya ringspot virus-P: overcoming limitations of resistance
breeding in Carica papaya L., in: R.K. Gaur, T. Hohn, P. Sharma (Eds.) Plant virus-
host interaction, Academic Press, Academic Press, 2014, pp. 177-194.
[29] D.F. Quito-Avila, R.A. Alvarez, M.A. Ibarra, R.R. Martin, Detection and partial
genome sequence of a new umbra-like virus of papaya discovered in Ecuador,
European Journal of Plant Pathology, 143 (2015) 199-204.
[30] R. Verma, P. Gaikwad, D. Mungekar, S. Tripathi, V. Datar, J. Singh, First report of
mixed infection of Papaya ringspot virus and phytoplasma in papaya in India, Journal
of Plant Pathology, 96 (2014) 438.
[31] S.D. Yeh, D. Gonsalves, Translation of papaya ringspot virus RNA in vitro: detection
of a possible polyprotein that is processed for capsid protein, cylindrical-inclusion
protein, and amorphous-inclusion protein, Virology, 143 (1985) 260-271.
[32] T.J.G. Loreto, A.F. Vital, J.A.M. Rezende, Ocorrência de um amarelo letal do
mamoeiro solo no estado de Pernambuco, O Biológico, 49 (1983) 275-279.

[33] M.F. Basso, A.J. Pereira, A.N. Souza, E.C.d. Andrade, F.M. Zerbini, Genetic resistance
in papaya germplasm against Papaya lethal yellowing virus (PLYV) and physiology of
healthy and PLYV-infected papaya leaves, in: S.B.d. Genética (Ed.) IV Simpósio
Brasileiro de Genética Molecular de Plantas, online, Bento Gonçalves, RS, Brazil,
2013, pp. 47.
[34] P.P. Amaral, R.O. Resende, M.T. Sousa Junior, Papaya lethal yellowing virus (PLYV)
infects Vasconcellea cauliflora, Fitopatologia Brasileira, 31 (2006) 517.
[35] E. Truve, D. Fargette, Genus Sobemovirus, in: A.M.Q. King, M.J. Adams, E.B.
Carstens, E.J. Lefkowitz (Eds.) Virus taxonomy: ninth report of the International
Committee on Taxonomy of Viruses, Elsevier, London, 2012, pp. 1185‑1189.
[36] A.J. Pereira, P. Alfenas-Zerbini, R.S. Cascardo, E.C. Andrade, F.M. Zerbini, Analysis
of the full-length genome sequence of Papaya lethal yellowing virus (PLYV),
determined by deep sequencing, confirms its classification in the genus Sobemovirus,
Archives of Virology, 157 (2012) 2009-2011.
[37] A.C.M. Saraiva, W.O. Paiva, F.A.C. Rabelo Filho, J.A.A. Lima, Transmissão por mãos
contaminada e ausência de transmissão embrionária do vírus do amarelo letal do
mamoeiro, Fitopatologia Brasileira, 31 (2006) 79-83.
[38] R.F.E.A. Camarço, J.A.A. Lima, G. Pio-Ribeiro, Transmissão e presença em solo do
Papaya lethal yellowing virus, Fitopatologia Brasileira, 23 (1998) 453-458.
[39] E.W. Kitajima, F.C. Oliveira, C.R.S. Pinheiro, L.M. Soares, K. Pinheiro, M.C. Madeira,
M. Chagas, Amarelo letal do mamoeiro solo no estado do Rio Grande do Norte,
Fitopatologia Brasileira, 17 (1992) 282-285.
[40] E.W. Kitajima, C.H. Rodrigues, J.S. Silveira, F.J.L. Alves, J.A. Ventura, F.J.L. Aragão,
C.R.B. Oliveira, Association of isometric virus-like particles, restricted to laticifers,
with meleira (sticky disease) of papaya (Carica papaya), Fitopatologia Brasileira, 18
(1993) 118-122.
[41] E. Marciel-Zambolim, S. Kunieda-Alonso, K. Matsuoka, M.G. Carvalho, F.M. Zerbini,
Purification and some properties of Papaya meleira virus, a novel virus infecting
papayas in Brazil, Plant Pathology, 52 (2003) 389-394.
[42] D. Perez-Brito, R. Tapia-Tussell, A. Cortes-Velazquez, A. Quijano-Ramayo, First
report of Papaya meleira virus (PMeV) in Mexico, African Journal of Biotechnology,
11 (2012) 13564-13570.
[43] S. Rodrigues, M. Da Cunha, J. Ventura, P. Fernandes, Effects of the Papaya meleira
virus on papaya latex structure and composition, Plant Cell Reports, 28 (2009) 861-
871.
[44] P.M.V. Abreu, J.G. Piccin, S.P. Rodrigues, D.S. Buss, J.A. Ventura, P.M.B. Fernandes,
Molecular diagnosis of Papaya meleira virus (PMeV) from leaf samples of Carica
papaya L. using conventional and real-time RT-PCR, Journal of Virological Methods,
180 (2012) 11-17.
[45] S.P. Rodrigues, J.S. Andrade, J.A. Ventura, G.G. Lindsey, P.M.B. Fernandes, Papaya
meleira virus is neither transmitted by infection at wound sites nor by the whitefly
Trialeurodes variabilis, Journal of Plant Pathology, 91 (2009) 87-91.
[46] R. Tapia-Tussell, A. Magaña-Alvarez, A. Cortes-Velazquez, G. Itza-Kuk, A.
Nexticapan-Garcez, A. Quijano-Ramayo, R. Martin-Mex, D. Perez-Brito, Seed

transmission of Papaya meleira virus in papaya (Carica papaya) cv. Maradol, Plant
Pathology, 64 (2015) 272-275.
[47] C.B. Daltro, E.M. Abreu, F.L. Aragão, E.C. Andrade, Genetic diversity studies of
Papaya meleira virus, Tropical Plant Pathology, 39 (2014) 104-108.
[48] M.M.M.d. Araújo, É.T. Tavares, F.R.d. Silva, V.L.d.A. Marinho, M.T.S. Júnior,
Molecular detection of Papaya meleira virus in the latex of Carica papaya by RT-PCR,
Journal of Virological Methods, 146 (2007) 305-310.
[49] H.L. Wang, C.C. Wang, R.J. Chiu, M.H. Sun, Preliminary study on Papaya ringspot
virus in Taiwan, Plant Protection Bulletin, 20 (1978) 133-140.
[50] M.M. Fitch, R.M. Manshardt, D. Gonsalves, J.L. Slightom, J.C. Sanford, Stable
transformation of papaya via microprojectile bombardment, Plant Cell Reports, 9
(1990) 189-194.
[51] D. Tuo, W. Shen, P. Yan, C. Li, L. Gao, X. Li, H. Li, P. Zhou, Complete genome
sequence of an isolate of Papaya leaf distortion mosaic virus from commercialized
PRSV-resistant transgenic papaya in China, Acta Virologica, 57 (2013) 452-455.
[52] S. Kawano, T. Yonaha, The occurrence of Papaya leaf distortion mosaic virus in
Okinawa, Technical Bulletin of Food and Fertilizer Technology Center, 132 (1992) 13-
23.
[53] T. Maoka, S. Kawano, T. Usugi, Occurrence of the P strain of Papaya ringspot virus in
Japan, Annals of the Phytopathological Society of Japan, 61 (1995) 91-94.
[54] W. Shen, D. Tuo, Y. Yang, P. Yan, X. Li, P. Zhou, First report of mixed infection of
Papaya ringspot virus and Papaya leaf distortion mosaic virus on Carica papaya L.,
Journal of Plant Pathology, 96 (2015) S4.121.
[55] H.J. Bau, Y.J. Kung, J.A.J. Raja, S.J. Chan, K.C. Chen, Y.K. Chen, H.W. Wu, S.D.
Yeh, Potential threat of a new pathotype of Papaya leaf distortion mosaic virus
infecting transgenic papaya resistant to Papaya ringspot virus, Phytopathology, 98
(2008) 848-856.
[56] D. Tuo, W. Shen, Y. Yang, P. Yan, X. Li, P. Zhou, Development and validation of a
multiplex reverse transcription PCR assay for simultaneous detection of three papaya
viruses, Viruses, 6 (2014) 3893-3906.
[57] A.A. Cook, F.W. Zettler, Susceptibility of papaya cultivars to papaya ringspot and
papaya mosaic virus, Plant Disease Report, 54 (1970) 893-895.
[58] R.A. Conover, Virus diseases of the papaya in Florida, Phytopathology, 52 (1962) 6.
[59] Y.J. Kung, H.J. Bau, Y.L. Wu, C.H. Huang, T.M. Chen, S.D. Yeh, Generation of
transgenic papaya with double resistance to Papaya ringspot virus and Papaya leaf-
distortion mosaic virus, Virology 99 (2009) 1312-1320.
[60] Y. Maheshwari, H.N. Verma, R.K. Jain, K. Mandal, Engineered antibody fragments for
immunodiagnosis of Papaya ringspot virus, Molecular Biotechnology, 57 (2015) 644-
652.
[61] A.M.R. Almeida, J.A.A. Lima, Princípios e técnicas aplicados em fitovirologia,
Londrina: Embrapa Soja, Edições Sociedade Brasileira Fitopatologia, 2001.
[62] H. Zhang, X.-y. Ma, Y.-j. Qian, X.-p. Zhou, Molecular characterization and infectivity
of Papaya leaf curl China virus infecting tomato in China, Journal of Zhejiang
University SCINCE B, 11 (2010) 109-114.

[63] D.N. Muske, A. Peter, Swetha, S. Phadnis, P. Jingade, K.K. Satish, Molecular and
serological detection of Papaya ringspot virus infecting papaya (Carica papaya),
Journal of Plant Disease Sciences, 9 (2014) 8-15.
[64] T.R. Usharani, V. Laxmi, S. Jalali, M. Krishnareddy, Duplex PCR to detect both
Papaya ring spot virus and Papaya leaf curl virus simultaneously from naturally
infected papaya (Carica papaya L.), Indian Journal of Biotechnology, 12 (2013) 269-
272.
[65] M. Sreenivasulu, D.V.R. SaiGopal, Developmentof recombinant coat protein antibody
based IC-RT-PCR and comparison of its sensitivity with other immunoassays for the
detection of Papaya ringspot virus isolates from India, Plant Pathology Journal of
Botany, 26 (2010) 25-31.
[66] W. Shen, D. Tuo, P. Yan, X. Li, P. Zhou, Detection of Papaya leaf distortion mosaic
virus by reverse-transcription loop-mediated isothermal amplification, Journal of
Virological Methods, 195 (2014) 174-179.
[67] A.D. Radford, D. Chapman, L. Dixon, J. Chantrey, A.C. Darby, N. Hall, Application of
next-generation sequencing technologies in virology, Journal of General Virology, 93
(2012) 1853-1868.
[68] C. Knief, Analysis of plant microbe interactions in the era of next generation
sequencing technologies, Frontiers in Plant Science, 5 (2014).
[69] Y. Zhang, N. Yu, Q. Huang, G. Yin, A. Guo, X. Wang, Z. Xiong, Z. Liu, Complete
genome of Hainan Papaya ringspot virus using small RNA deep sequencing, Virus
Genes, 48 (2014) 502-508.
[70] E. Zamudio-Moreno, J.H. Ramirez-Prado, O.A. Moreno-Valenzuela, L.A. Lopez-
Ochoa, Early diagnosis of a Mexican variant of Papaya meleira virus (PMeV-Mx) by
RT-PCR, Genetics and Molecular Research, 14 (2015) 1145-1154.
[71] P.M.V. Abreu, C.G. Gaspar, D.S. Buss, J.A. Ventura, P.C.G. Ferreira, P.M.B.
Fernandes, Carica papaya microRNAs are responsive to Papaya meleira virus
infection, PloS One, 9 (2014) e103401.
[72] G.A. Fermin, L.T. Castro, P.F. Tennant, CP-transgenic and non-transgenic approaches
for the control of papaya ringspot: current situation and challenges, Transgenic Plant
Journal, 4 (2010) 1-15.
[73] M. Fuchs, D. Gonsalves, Safety of virus-resistant transgenic plants two decades after
their introduction: lessons from realistic field risk assessment studies, Annual Review of
Phytopathology, 45 (2008) 173-202.
[74] M.M.M. Fitch, R.M. Manshardt, D. Gonsalves, J.L. Slightom, J.C. Sanford, Virus
resistent papaya plants derived from tissues bombarded with the coat protein gene of
Papaya ring spot virus, Nature Biotechnology, 10 (1992) 1-7.
[75] P.J. Mansilla, A.G. Moreira, A.P.O.A. Mello, J.A.M. Rezende, J.A. Ventura, V.A.
Yuki, F.J. Levatti, Importance of cucurbits in the epidemiology of Papaya ringspot
virus type P, Plant Pathology, 62 (2013) 571-577.
[76] Y.-H. Cheng, J.-S. Yang, S.-D. Yen, Efficient transformation of papaya by coat protein
gene of Papaya ringspot virus mediated by Agrobacterium following liquid-phase
wounding of embryogenic tissues with caborundum, Plant Cell Reports, 16 (1996) 127-
132.

[77] R.E. Lines, D. Persley, J.L. Dale, R. Drew, M.F. Bateson, Genetically engineered
immunity to Papaya ringspot virus in Australian papaya cultivars, Molecular Breeding,
10 (2002) 119-129.
[78] P. Tennant, M.H. Ahmad, D. Gonsalves, Transformation of Carica papaya L. with
virus coat protein genes for studies on resistance to Papaya ringspot virus from
Jamaica, Tropical Agriculture, 79 (2002) 105-113.
[79] P. Tennant, M.T.S. Jr., D. Gonsalves, M.M. Fitch, R.M. Manshardt, J.L. Slightom,
Transgenic 63-1: A new virus-resistant transgenic papaya, HortScience, 40 (2005)
1196-1199.
[80] E.M. Tecson Mendoza, A. C. Laurena, J.R. Botella, Recent advances in the
development of transgenic papaya technology, in: M.R. El-Gewely (Ed.),
Biotechnology Annual Review, Elsevier, 2008, pp. 423-462.
[81] D. Gonsalves, Hawaii's transgenic papaya story 1978-2012: A personal account, in: R.
Ming, P.H. Moore (Eds.) Genetics and genomics of papaya, Springer New York
[82] M.A.K. Azad, L. Amin, N.M. Sidik, Gene technology for Papaya ringspot virus disease
management, The Scientific World Journal, 2014 (2014) 1-12.
[83] Y.-J. Kung, B.-J. You, J.A.J. Raja, K.-C. Chen, C.-H. Huang, H.-J. Bau, C.-F. Yang,
C.-H. Huang, C.-P. Chang, S.-D. Yeh, Nucleotide sequence-homology-independent
breakdown of transgenic resistance by more virulent virus strains and a potential
solution, Scientific Reports, 5 (2015) 1-10.
[84] G. Zhao, P. Yan, W. Shen, D. Tuo, X. Li, P. Zhou, Complete genome sequence of
Papaya ringspot virus isolated from genetically modified papaya in Hainan Island,
China, Genome Announcements, 3 (2015).
[85] W. Shen, G. Yang, Y. Chen, P. Yan, D. Tuo, X. Li, P. Zhou, Resistance of non-
transgenic papaya plants to Papaya ringspot virus (PRSV) mediated by intron-
containing hairpin dsRNAs expressed in bacteria Acta Virologica, 58 (2014) 261-266.
[86] R. Jayavalli, T.N. Balamohan, N. Manivannan, R. Rabindran, P. Paramaguru, R. Robin,
Transmission of resistance to Papaya ringspot virus (PRSV) inintergeneric populations
of Carica papaya and Vasconcellea cauliflora, Scientia Horticulturae, 187 (2015) 10-
14.
[87] S. Horovitz, H. Jimenez, Cruzamientos interspecificos y intergenericos in Caricaceas y
sus implicaciones fitotecnicas, Agronomia Tropical (Maracay), 17 (1967) 323-343.
[88] R. Sudha, T.N. Balamohan, K. Soorianathasundaram, N. Manivannan, R. Rabindran,
Evaluation of F2 intergeneric population of papaya (Carica papaya L.) for resistance to
Papaya ringspot virus (PRSV), Scientia Horticulturae, 158 (2013) 68-74.
[89] S.V. Siar, R.A. Drew, R.M. Razali, V.N. Villegas, Gene for PRSV-P resistance in
Vasconcellea species and development of PRSV-P resistant papaya via intergeneric
hybridisation, in, International Society for Horticultural Science (ISHS), Leuven,
Belgium, 2012, pp. 335-341.
[90] M.R. Dinesh, G.L. Veena, C. Vasugi, M. Krishna Reddy, K.V. Ravishankar,
Intergeneric hybridization in papaya for ‘PRSV’ tolerance, Scientia Horticulturae, 161
(2013) 357-360.
[91] B.J. You, C.H. Chiang, L.F. Chen, W.C. Su, S.D. Yeh, Engineered mild strains of
Papaya ringspot virus for broader cross protection in cucurbits, Phytopathology, 95
(2005) 533-540.

[92] P.F. Tennant, C. Gonsalves, K.S. Ling, M.M. Fitch, R. Manshardt, J.L. Slightom, D.
Gonsalves, Differential protection against Papaya ringspot virus isolates in coat protein
gene transgenic papaya and classically cross-protected papaya, Phytopathology, 84
(1994) 1359-1366.
[93] S. Singh, L.P. Awasthi, R.K. Singh, Induction of systemic resistance through antiviral
agents of plant origin against papaya ring spot disease (Carica papaya L.), Archives of
Phytopathology and Plant Protection, 44 (2011) 1676-1682.
[94] A. Srivastava, S. Trivedi, S. Krishna, H.N. Verma, V. Prasad, Suppression of Papaya
ringspot virus infection in Carica papaya with CAP-34, a systemic antiviral resistance
inducing protein from Clerodendrum aculeatum, European Journal of Plant Pathology,
123 (2009) 241-246.
[95] S.K. Ghosh, N. Chakraborty, P.P. Biswas, In vitro biological control of aphid of papaya
by Beauveria bassiana, in, International Society for Horticultural Science (ISHS),
Leuven, Belgium, 2014, pp. 113-117.
[96] L. Gao, W. Shen, P. Yan, D. Tuo, X. Li, P. Zhou, NIa-pro of Papaya ringspot virus
interacts with papaya methionine sulfoxide reductase B1, Virology, 434 (2012) 78-87.
[97] L.J. Pan, L. Jiang, Identification and expression of the WRKY transcription factors of
Carica papaya in response to abiotic and biotic stresses, Molecular Biology Reports, 41
(2014) 1215-1225.
[98] L. Jiang, L.-j. Pan, Identification and expression of C2H2 transcription factor genes in
Carica papaya under abiotic and biotic stresses, Molecular Biology Reports, 39 (2012)
7105-7115.
[99] M.R. Razean Haireen, R.A. Drew, Isolation and characterisation of PRSV-P resistance
genes in Carica and Vasconcellea, International Journal of Genomics, 2014 (2014) 8.
[100] S.P. Rodrigues, J.A. Ventura, C. Aguilar, E.S. Nakayasu, I.C. Almeida, P.M.B.
Fernandes, R.B. Zingali, Proteomic analysis of papaya (Carica papaya L.) displaying
typical sticky disease symptoms, Proteomics, 11 (2011) 2592-2602.
[101] W. Siriwan, S. Roytrakul, M. Shimizu, N. Takaya, S. Chowpongpang, Proteomics of
Papaya ringspot virus-infected papaya leaves, Kasetsart Journal: Natural Science, 47
(2013) 589-602.
[102] S.P. Rodrigues, J.A. Ventura, C. Aguilar, E.S. Nakayasu, H. Choi, T.J.P. Sobreira, L.L.
Nohara, L.S. Wermelinger, I.C. Almeida, R.B. Zingali, P.M.B. Fernandes, Label-free
quantitative proteomics reveals differentially regulated proteins in the latex of sticky
diseased Carica papaya L. plants, Journal of Proteomics, 75 (2012) 3191-3198.
[103] M.W. Jones-Rhoades, D.P. Bartel, B. Bartel, MicroRNAS and their regulatory roles in
plants, Annual Review of Plant Biology, 57 (2006) 19-53.
[104] N. Sahana, H. Kaur, Basavaraj, F. Tena, R.K. Jain, P. Palukaitis, T. Canto, S. Praveen,
Inhibition of the host proteasome facilitates Papaya ringspot virus accumulation and
proteosomal catalytic activity is modulated by viral factor HcPro, PloS One, 7 (2012)
e52546.

Chapter 9
PHYLOGENETICS AND PHYLOGEOGRAPHY OF TWO

LARGE NEOTROPICAL RODENTS
(CAPYBARA, HYDROCHOERUS HYDROCHAERIS,
HYDROCHAERIDAE AND PACA, CUNNICULIS PACA,
AGOUTIDAE; RODENTIA) BY MEANS OF
MITOCHONDRIAL GENES: OPPOSITE PATTERNS
Manuel Ruiz-García1,∗, Kelly Luengas-Villamil1, Leslie Leal2,

Luz Mery Bernal-Parra2 and Joseph Mark Shostell3
1
Laboratorio de Genética de Poblaciones-Biología Evolutiva,
Unidad de Genética, Departamento de Biología,Facultad de Ciencias,
Pontificia Universidad Javeriana, Bogotá DC., Colombia
2
Escuela de Ciencias Agrícolas Pecuarias y del Medio Ambiente,
Universidad Nacional Abierta y a Distancia, Bogotá DC., Colombia
3
Math, Science and Technology Department,
University of Minnesota Crookston, Crookston, MN, US
ABSTRACT
We analyzed mitochondrial genes (D-loop and Cyt-b) to compare the genetic
structure and phylogeography between the capybara (Hydrochoerus hydrochoeris, n =
78) and the paca (Cunniculus paca, n = 120). The two species presented very high levels
of gene diversity for both mitochondrial markers, but the paca yielded higher levels than
the capybara. The capybara showed a noteworthy and significant amount of genetic
heterogeneity among different populations, although the mt D-loop gene was more useful
in differentiating the populations than was mt Cyt-b. In contrast, the paca yielded low
levels of gene heterogeneity among different populations. In this case, both mitochondrial
genes had inconspicuous and similar genetic heterogeneities. Estimations of Bayesian
∗
Correspondence: mruizgar@yahoo.es, mruiz@javeriana.edu.co.

152 Manuel Ruiz-García, Kelly Luengas-Villamil, Leslie Leal et al.
female effective numbers, indicated the paca as having higher values than the capybara.
For both species, mt Cyt-b yielded higher effective sizes than did mt D-loop. Similarly,
the Bayesian gene flow estimates were considerably greater among paca populations than
among capybara populations. Different analyses revealed population expansions in both
species. Only the capybara population of Northern Colombia showed some evidence of a
population bottleneck. An isolation by distance analysis showed that the capybara yielded
a very positive and significant relationship between genetic and geographic distances,
whereas among paca populations there was no significant relationship. Our phylogenetic
analyses suggest the capybara to be effected by geographical barriers. This agrees quite
well with the fact that the dispersion of the capybara is restricted by the existence of
rivers. Our results did not support similar findings for the paca, and therefore could not
confirm any putative ESUs or subspecies for the paca. Furthermore, our results suggest
the mitochondrial haplotype splits of both species to have occurred during the Miocene,
but were older in capybara than in paca. In the case of the capybara, the original focus of
dispersion seems to be the Western Amazon, whereas for the paca, this origin is not clear.
Although many authors consider the trans-Andean capybara population as a different
species (H. isthmus), our molecular results suggest this population to be a geographical
subspecies.
Keywords: capybara, paca, Hydrochoerus hydrochoeris, Cunniculus paca, mt D-loop, mt

Cyt-b, gene diversity, gene flow, population changes, spatial structure, phylogenetic
analyses
INTRODUCTION
Molecular genetics studies continue to provide useful phylogeographic and population
genetic information on many animal species. However, not all species—including the
capybara (Hydrochoerus hydrochaeris) and paca (Cunniculus paca) have been equally
represented in the literature by these types of studies. In combination, the capybara [also
named poncho (Panama), chigüiro (Colombia), chigüire (Venezuela), ronsoco (Peru),
carpincho (Argentina, Uruguay, Paraguay)] and the paca [also named tepezcuintle (Mexico),
gibnot (Belize), conejo pintado (Panama), lapa (Venezuela), guagua or borugo (Colombia),
majaz or picuro (Peru) and jochi pintado (Bolivia)] make up the world’s largest Neotropical
rodents.
Capybara’s geographical distribution is divided into two populations. A trans-Andean
population extends from middle Panama (from the Panama channel to the Darien forests) to
Northern and Pacific Colombia (Atlantic area, lower valleys of the Sinu, Atrato and Cauca
rivers, lower and middle valleys of the Magdalena and Cesar rivers and some populations in
the Pacific coasts, Choco and Valle Departments; Mendoza, 1991) and Northeastern
Venezuela (around the Maracaibo lake). The cis-Andean population is distributed from
Eastern Colombia (Llanos Orientales)—including the rest of Venezuela, Guyana, Surinam,
French Guiana, the Amazonian areas of Ecuador, Peru, Bolivia and the major part of Brazil as
well as Paraguay, Uruguay and Northern Argentina (up to the Quequén River in the province
of Buenos Aires). Therefore, capybaras inhabit the basins of the main South American rivers:
Orinoco, Amazon, Sao Francisco and Parana-La Plata rivers. Initially, the first population was
considered as a differentiated sub-species (H. h. isthmus), whilst the second population was
considered another subspecies, H. h. hydrochaeris. However, Goldman (1912) claimed that

Phylogeny and Phylogeography of the Capybara and Paca 153
the Panama population could be a different species (H. isthmus). The individuals of this
population have a lower weight (26-28 kg) (Harrison-Matthews, 1977) and a shorter gestation
period (104-111 days compared to 115-125 days for the other capybara population; Trapido,
1949). Wilson and Reeder (2005) recognized both species on the basis that H. hydrochoeris
was larger in all external and cranial characters. In contrast, H. isthmus had a wider frontal
bone to total skull length proportions, with the lower diastema proportionally longer as well
as the pterygoid bones shorter and thicker than found in the other capybara population.
However, within H. hydrochoeris, no subspecies has been defined (Mones and Ojasti, 1986).
The geographical distribution of the paca ranges from Southern Mexico to Northern
Argentina (Misiones), including Central America, Cuba, Lesser Antillas, Colombia,
Venezuela, Guyana, Surinam, French Guiana, Ecuador, Peru, Bolivia, Paraguay and a large
fraction of Brazil (except Northeast Brazil). Cabrera (1961) and Hall (1981) considered the
existence of five sub-species within this taxon: 1- C. p. nelsoni (Goldman in 1913; type
locality in Veracruz, Mexico); 2- C. p. virgatus (Bangs in 1902; type locality in Chiriqui,
Panama); 3- C. p. guanta (Lönnberg in 1921; type locality in Pichincha, Ecuador); 4- C. p.
mexianae (Hagmann in 1908; type locality in Para, Brazil) and 5- C. p. paca (Linnaeus in
1758; type locality in Pernanbuco (Brazil). Another species, C. taczanowskii, has been
traditionally defined within this genus. It inhabits mountainous areas of the Andes cordillera
in Venezuela, Colombia, Ecuador, Peru and Bolivia.
There are scarcely any genetics works published on these two rodent species. For
example, there are the capybara studies by Campos-Krauer and Wisely (2011) and
Maldonado et al., (2011). The first one analyzed 110 individuals from 13 populations in the
Chaco (Paraguay) for 386 base pairs (bp) of the mitochondrial control region. This study
determined the existence of two different haplogroups, which experienced rapid population
expansions in recent times with secondary contact due to anthropogenic land transformation.
The authors claimed that this capybara population expansion, with these two haplogroups
admixed throughout the High Chaco, could exacerbate the degradation of the forest. They
also considered the population to be a possible reservoir host of several zoonotic diseases.
The second work studied five DNA microsatellite loci in a capybara troop sampled in the
Casanare Department of Colombia (Eastern Llanos). This study revealed a medium level of
gene diversity. Two markers deviated from the Hardy-Weinberg equilibrium, and there were
similar gene frequencies among the males and females within one troop and a very recent
bottleneck probably due to illegal hunting.
Two studies consider pacas. Van Vuuren et al., (2004) analyzed the mitochondrial control
region and cytochrome-b (Cyt-b) of four rodent species in French Guiana—including the
paca. For this species, they analyzed 23 individuals from 10 French Guiana localities and one
exemplar from Venezuela. They detected a population expansion for this species at that area,
a lack of genetic structure, as well as a temporal split among four different lineages around
46,000 to 66,000 years ago (YA). Vieira-Antunes et al., (2010) analyzed 10 RAPDs markers
of three commercial paca flocks (81 individuals) located in Brazil. An analysis of molecular
variance showed that 12.6% of variance was among the flocks and 87.4% among the
individuals within the flocks. Although informative, each of these earlier studies only focused
on a relatively small geographical area. No phylogeographic and population genetics analyses
have addressed a larger (continental) geographical level for these two species. Thus, this is
the first molecular study to provide a continental perspective.

We relied on two mitochondrial markers classically used in phylogenetic and population

genetics studies: control region (D-loop) and Cyt-b. Both markers have been used, for
instance, in a large variety of studies with Neotropical mammals to resolve evolutionary
significant units, taxonomic conflicts, and to determine phylogeographic patterns and
biogeographic history (Da Silva and Patton, 1998; van Vuuren et al., 2004; Lavergne et al.,
2010; Ruiz-García, 2010; Ruiz-García et al., 2013a, b).
We want to analyze the following hypothesis: capybara could have a noteworthy greater
geographical genetic structure than paca, which could not show any genetic structure. This
could be motivated because capybaras are more tightly linked to water bodies. This suggests
that capybara is more of a specialist rodent compared to paca. Capybara is restricted to
forested riverbanks, former riverbeds, brackish wetlands and mangrove swamps (Ojasti,
1973). They require water for drinking, wallowing and protecting themselves against
predators. Capybaras graze on savannas up to 0.5 km from the water (Macdonald, 1981). At
peak dry season periods, with very limited availability of water, capybaras concentrate around
major rivers and other remaining aquatic areas including small lakes. When grasses dry-out,
capybaras lose weight and they are subjected to starvation, and greater incidences of
predation and disease (Ojasti, 1973; Schaller and Crawshaw, 1981). The situation for the paca
is quite different. This species is an opportunistic feeder and is mainly frugivorous. However,
this species easily changes its diet through its geographical range and shows strong seasonal
variation according to the availability of fruits and other aliments (Collet, 1981; Gallina,
1981). It also feeds on leaves, buds and flowers. In captivity, this species has been observed
to consume raw meat, lizards and insects (Lander, 1974; Pérez, 1983). Thus, unlike capybara,
paca is not dependent on expansive grassy areas near bodies of water. Capybara can only
migrate throughout these water courses. Thus, when they diminish or disappear, they will be
transformed into authentic geographical barriers, which could create differentiated genetic
capybara populations. On the other hand, pacas are not limited by the existence of water
bodies as a geographic barrier for genetics isolation.
Taken into account this hypothesis, this work has six main aims regarding the molecular
evolutionary parameters of the two largest Neotropical rodents: 1- To determine the levels of
gene diversity at these two mitochondrial markers; 2- To determine the levels of genetic
heterogeneity among populations within these two species and the relationship of this genetic
differentiation with geographical distance; 3- To estimate female population sizes and levels
of gene flow; 4- To determine possible historical demographic changes for both rodent
species; 5- To calculate temporal haplotype splits within each species and their correlations
with geological and climate changes and 6- to determine if a correlation exists between
morphological systematics and the molecular results obtained for both species.
MATERIAL AND METHODS

Samples
A total of 78 samples of capybara and a total of 120 samples of paca were analyzed for
both mitochondrial genes. The samples were obtained from hunted animals in indigene
communities for food (the animals were hunted near of the respective communities; tissues

were muscle or pieces of skin) or live animals maintained in these communities as pets. For
more geographical details, see Table 1 and Figure 1.
Table 1. Geographic origin, countries, number of samples and haplotypes found for the
samples of capybaras and pacas sequenced at the mitochondrial D-loop region and Cyt-b
gene in the current study
Mt D-loop region in capybara

Country Geographic origins Number of Haplotypes found in
animals each locality
Colombia Cordoba Department (Northern Colombia) 22 H9, H13, H21, H22,
in different localities: Valencia, Betancia, H23, H29, H33, H34,
Tierra Alta, El Tigre, Palomas 1 H35, H36, H37, H39,
Valle Department: Cartago H43
Meta Department 7 H41
Casanare Department: Hato Corazal

19 H2, H8, H25, H27,
Guainia Department: Inirida River H28, H30, H32
4 H1, H2, H3, H4, H5,
H6, H15
H19, H31, H42
Ecuador Napo River: Misahuallí, Nueva
Rocafuerte, Pompeya 3 H16, H38
Brazil Negro River: Novo Airao

1
1 H14
Peru Napo River: Bolivar
1
Amazon River: Puerto Alegria H7
7
Mamon and Nanay rivers H40
1 H10, H11, H20, H26

Tapiche River H10
1
Contamana, Ucayali River H18
4
Yarinacocha, Ucayali River H10, H12, H17, H24,
H26

Mt Cyt-b gene in capybara

Colombia Cordoba Department (Northern
Colombia) in different localities: 10 H6, H7, H8, H9, H10,
Valencia, Betancia, Tierra Alta, El Tigre, H11, H12, H19
Palomas
Meta Department 2 H5, H6
H6, H14, H15
Guainia Department: Inirida River 3
Ecuador Napo River: Misahuallí 1 H20
Brazil Javari River
H1
Peru Amazon River: Puerto Alegria
1 H13
Mamon and Nanay rivers
1
3 H16, H17, H18
Contamana, Ucayali River
1 H13
San Martin Department: Tarapoto,
3
H2, H3, H4
Yurimaguas
Mt D-loop region in paca

Colombia Cordoba Department: Tierra Alta 1 H13
Guainia Department: Inirida and
Atabapo rivers 2 H21, H53
Amazonas Department: kilometers in
Leticia and Puerto Nariño 3 H9, H37, H59


Ecuador Cuyabeno 1 H12
Tena 4 H11, H16, H17, H19
Sucua H19, H20, H21, H54
Macas
4 H18, H55
Javari River
2 H2, H3, H4, H5, H8,
Brazil H10, H13, H14, H15,
16
Negro River: Novo Airao H34, H45, H46, H57,
H58, H60
H35
Madeira River: Sao Francisco, Novo
Aripuana, mouth of the Manicore River 1
H39, H40, H41, H42
Amazon River: Islandia island H1, H6, H7
Peru Loreto Department: Mamon, Nanay, 4
Itaya rivers 3 H23, H24, H25, H26,
H27, H30, H31, H32,
H33, H43, H44, H47,
San Martin Department: Tarapoto,
H48, H49, H50
Yurimaguas 15
H22, H28, H51, H52
Ucayali Department: Yarinacocha and 4
H36, H38, H56
Tournavista (Pachitea River)
H29
Bolivia Ribera Alta (Beni River) 3
1
Colombia Cordoba Department: Tierra Alta 6 H37, H78, H92, H93,
Meta Department H95, H96
Guainia Department: Inirida and 2 H1, H94
Atabapo rivers 2 H4, H43
Amazonas Department: kilometers in 3 H30, H55, H56
Leticia and Puerto Nariño

Mt Cyt-b gene in paca

Country Geographic origins Number of Haplotypes found in each
animals locality
Ecuador Cuyabeno 1 H25
Tena 3 H25, H31, H38
Sucua 2 H39, H42
Macas
1 H40
Javari River
19 H2, H3, H9, H17, H18,
Brazil H19, H21, H22, H23, H29,
1
Negro River: Novo Airao H32, H33, H34, H41, H46,
4 H63, H72, H80
Madeira River: Sao Francisco,
H53
Novo Aripuana, mouth of the
Manicore River H24, H58, H59, H90
Amazon River: Islandia island 3 H25, H26, H28

Loreto Department: Mamon, 27 H7, H8, H10, H12, H13,
Peru Nanay, Itaya rivers H14, H15, H16, H20, H27,
H49, H65, H66, H67, H68,
San Martin Department: Tarapoto, 9
H69, H70, H71, H73, H74,
Yurimaguas H75, H76, H91
Ucayali Department: Contamana, 20 H45, H47, H48,
Yarinacocha and Tournavista
(Pachitea River)
Ribera Alta (Beni River) 2 H60, H61, H62, H64, H77,
H78
Bolivia Mamore River: Exaltacion H4, H5, H6, H11, H35,

1
H36, H52, H54, H57, H79,
H81, H82, H83, H84, H85,
H86, H87, H88, H89, H97
H50, H51
H44
Molecular Markers Employed
The DNA of the samples of muscle and pieces of skin was extracted using the phenol-
chloroform procedure (Sambrook et al., 1989), while DNA samples from hair were extracted
with 10% Chelex resin (Walsh et al., 1991). For mt Cyt-b gene amplification via polymerase
chain reaction (PCR), we used the forward primer H15579 (5’- CCT AGT TTA TTT GGA
ATG GAT CGT AG -3’) and the reverse primer L15049 (5’- GCC TGT ACA TCC ACA

TCG GAC GAC G -3’) (Sprandling et al., 2001) under the following touchdown PCR profile:
95°C for 5 min, followed by three cycles of 95°C for 30 s, 50-57°C for 30 s, 72°C for 75 s,
followed by 27 cycles of 95°C for 30 s, 38-45°C for 30 s, 72°C for 75 s and a final extension
of 72°C for 7 min.
For the amplification of the mt D-loop, the forward and reverse primers were,
respectively, CAPIF3 (5’- CAG GAA ACA GCT ATG ACC CAA TTA TTC TAY YTG
ACA TAA GAC -3’) and CAPIR3 (5’- TGT AAA ACG ACG GCC AGT GAG CGG GTA
TAA YRT TAT GG -3´) being the PCR profile of 95°C for 3 min, followed by 30 cycles of
94°C for 30 s, 52°C for 30s, 72°C for 1 min, and a final extension of 72°C for 5 min (Steiner
et al., 2000). For both genes, the PCRs were performed in a 25 μl volume with reaction
mixtures including 2.5 μl of 10 x buffer, 2 μl of 2 mM MgCl2, 0.625 μl of 1 mM dNTPs, 1.5
μl of 0.6 mM of each primer, one unit of Taq DNA polymerase, 13.8 μl of H2O and 2 μl (20–
80 ng/μl) of DNA. PCR reactions were carried out in a BioRad thermocycler. All
amplifications, including positive and negative controls, were checked in 2% agarose gels,
using the molecular weight marker φ174 DNA digested with Hind III and Hinf I.
The amplified samples were purified using membrane-binding spin columns (Qiagen).
The double-stranded DNA was directly sequenced in a 377A (ABI) automated DNA
sequencer, by using the BigDye TM kit and all the samples were repeated to ensure sequence
accuracy.
(A)

(B)
Figure 1. Map with the geographical distribution of the capybara and the places and number of
individuals sequenced at this study (A). Map with the geographical distribution of the paca and the
places and number of individuals sequenced at this study (B).
Population Genetics Analyses
The sequences were edited and aligned with BioEdit Sequence Alignment Editor (Hall,
2004) and DNA Alignment (Fluxus Technology Ltd). The Modeltest (Posada and Crandall,
1998) and the Mega 5.1 software (Tamura et al., 2011) were applied to determine the best
evolutionary nucleotide model for the analyzed sequences of both rodent species from
different nucleotide substitution models. The Akaike Information Criteria (AIC; Akaike,
1974) and the Ln maximum likelihood criteria were used to determine the best evolutionary
nucleotide substitution model.
To determine the genetic diversity of the capybara and paca we used the number of
polymorphic sites (S), the number of haplotypes (H), the haplotype diversity (Hd), the
nucleotide diversity (π), the average number of nucleotide differences (k) and the θ statistic
by sequence. These statistics were estimated for the total sample analyzed for both species as
well as different geographical populations within capybara and paca. These gene diversity
statistics were undertaken with the Programs DNAsp 5.1 (Librado and Rozas, 2009) and
Arlequin 3.5.1.2 (Excoffier and Lischer, 2010).

We relied on the HST , KST , KST*, Z, Z* tests (Hudson et al., 1992a,b), Snn test (Hudson,
2000) and the chi-square test on the haplotype frequencies with permutation tests using
10,000 replicates to measure genetic heterogeneity and to detect indirect gene flow among the
different capybara and paca groups. An infinite island model was used. We also estimated the
GST statistic from the haplotype frequencies and the γST, NST and FST statistics (Hudson et al.,
1992a) from the nucleotide sequences.
We estimated the possible historical female effective population numbers and the
possible amounts and assymetrical gene flow estimates among the capybara and the paca
populations studied—using the Migrate 3.6 Program (Beerli, 2009). The program calculated
θ (= Νeμ) for each population considered where Ne is the female effective size and μ is the
mutation rate per generation and M (= m/μ) among the population pairs analyzed, where m is
the migration rate per generation. The average μ values were 25.75 x 10-8 for the mt D-loop
and 6.18 x 10-8 for the mt Cyt-b. The first average value was obtained from the studies of
Forster et al., (1996), Geraldes et al., (2008), Hardouin et al., (2010), Heyer et al., (2001),
Horai et al., (1995), Rajabi-Maham et al., (2008), Savolainen et al., (2002), Tamura and Nei
(1993) and Ward et al., (1991), with different mammalian species. The second average value
was estimated from the studies of Benton and Donoghue (2007), Michaux et al., (2002),
Nabholz et al., (2008, 2009) (1,938 mammalian species) and Wayne et al., (1997), also with
mammalian species. Two strategies were used to calculate female effective population size
and gene flow estimates. For the first strategy, we used a maximum likelihood procedure
(Beerli, 1998; Beerli and Felsenstein 1999, 2001) with 10 short Markov chains, with 500
recorded steps, 100 increments and a total of 50,000 sampled genealogies with a burn-in
(number of discard trees per chain) of 10,000, and one long Markov chain, with 5,000
recorded steps, 100 increments and a total of 500,000 sampled genealogies with a burn-in of
10,000. The second strategy was a Bayesian procedure (Beerli, 2006; Beerli and Felsenstein,
2001) with one long Markov chain, with 5,000 recorded steps, 100 increments, one
concurrent chain and a total of 500,000 sampled genealogies with a burn-in of 10,000.
We used several methods to determine possible demographic changes across the natural
history of these two rodent species: 1-A mismatch distribution (pairwise sequence
differences) was obtained following the method of Rogers and Harpending (1992) and Rogers
et al., (1996). The raggedness rg statistic (Harpending et al., 1993; Harpending, 1994) was
used to determine the similarity between the observed and the theoretical curves. This
procedure let us estimate the time of the beginning of a demographic change and the initial
and the final population sizes (Rogers and Harpending, 1992) 2- The Fu & Li D and F tests
(Fu and Li, 1993), the Fu FS statistic (Fu, 1997), the Tajima D test (Tajima, 1989) and the R2
statistic (Ramos-Onsins and Rozas 2002), originally created to detect natural selection
affecting DNA sequences, were also used to determine possible changes in population size
(Simonsen et al., 1995; Ramos-Onsins and Rozas, 2002). All of these statistics and tests were
obtained by means of the DNAsp 5.10 and Arlequin 3.0 Programs. 3- A Bayesian skyline plot
(BSP) was obtained by means of the BEAST v. 1.6.2 and Tracer v1.5 Software. The
Coalescent-Bayesian skyline option in the tree priors was selected with five steps and a
piecewise-constant skyline model with 40,000,000 generations (the first 4,000,000 discarded
as burn-in). In the Tracer v1.5, the marginal densities of temporal splits were analyzed and the
Bayesian Skyline reconstruction option was selected for the trees log file. A stepwise

(constant) Bayesian skyline variant was selected with the maximum time as the upper 95%
HPD and the trace of the root height as the treeModel.rootHeight.
We used IBD version 1.2 Software (Bohonak, 2002) to determine possible geographical
structure among the genetic distances (Kimura 2P distance; Kimura, 1980) obtained from the
haplotypes within the different population pairs defined within capybara and paca and the
geographical distances among these populations. A Mantel´s test (Mantel, 1967) was used to
determine the significance between the genetic distance and geographic distance matrices.
The intercept and the slope of this relationship was calculated using Reduced Major Axis
(RMA) regression (Sokal and Rohlf, 1995; Hellberg, 1994). Ten thousand randomizations
(jackknife over population pairs and bootstrapping over independent population pairs) were
executed to determine 95 and 99% confidence intervals. The calculations were completed
with non-transformed data and with log transformed data (genetic distance & geographical
distance) jointly and separately.
Phylogenetic Analyses
Two phylogenetic trees were obtained for each marker (because the number of samples
analyzed for each marker was different) for each species. The first one was by means of the
maximum likelihood procedure (ML; Felsenstein, 1981) with a General Time Reversible
model (GTR + G) at the mt Cyt-b for the paca, with a HKY + G + I at the mt D-loop for the
paca and with a TN93 + G + I (Tamura and Nei, 1993) at both genes for capybara. This tree
was obtained with the PAUP*4.0b8 Program (Swofford, 2002). The second tree was a
Bayesian one (BI; Mau, 1996; Mau et al., 1999; Rannala and Yang, 1996) with the same
models of nucleotide substitution with the gamma distributed rate varying among sites, and
variable rate categories. This Bayesian analysis was completed with the BEAST v1.6.2
Program (Drummond and Rambaut, 2007). Two separate sets of analyses were run, assuming
a Yule speciation model and a relaxed molecular clock with an uncorrelated log-normal rate
of distribution (Drummond et al., 2006). Results from the two independent runs (60,000,000
generations with the first 6,000,000 discarded as burn in and parameter values sampled every
1,000 generations) were combined with LogCombiner v1. 6.2 Software (Drummond and
Rambaut, 2007). The effective sample size (ESS) for the parameter estimates and
convergence were checked using Tracer version 1.5. The lower and upper 95% higher
posterior density (HPD) of these parameters as well as the means, geometric means, medians,
marginal densities and traces were also estimated with the Tracer v 1.5 Program. To
determine the reality of the values of these parameters, the autocorrelation tree (ACT) and
ESS for parameter estimates were obtained. Posterior probability values provide an
assessment of the degree of support of each node on the tree. The final tree was estimated
using TreeAnnotator v1.6.2 Software (Drummond and Rambaut, 2007) and visualized in the
FigTree v1.3.1 Program (Drummond and Rambaut, 2007). Additionally, this program was
run to estimate the time to most recent common ancestor (TMRCA) for the different
haplotype lineages found. For the BI, with capybaras and mt D-loop, no outgroup was
employed in order to accommodate recently debated issues about molecular dating of recent
phylogenetic splits (Ho et al., 2008). However, for the other BI, we used the value of 12.28 +
2.3 MYA as a prior for the treeModel.rootHeight in the Bayesian analysis with a normal
distribution and a standard deviation, for the capybaras at the mt Cyt-b. This value was based

on an estimated splits between the ancestor of the capybaras and Kerodon to have occurred 16
MYA (Opazo, 2005; Upham and Patterson, 2012; Rowe et al., 2010). Other priors included
the split of the ancestors of paca and Dasyprocta, 25.95 + 2.4 MYA (following Opazo, 2005,
27.9 MYA, and Upham and Patterson, 2012, 24 MYA) and the ancestors of Cunniculus paca
and C. taczanowskii (3.5 + 1.2 MYA, following Opazo, 2005). Both were in regards to the mt
D-loop and the split between the ancestors of the paca and the capybara (25.25 + 2.4 MYA,
following Opazo, 2005, 26.5 MYA, and Upham and Patterson, 2012, 24 MYA). This same
value applies to the ancestral split of Cunniculus paca and C. taczanowskii when analyzing
mt Cyt-b.
RESULTS
Gene Diversity for the Capybara and the Paca at the mt D-loop and Cyt-b
Genes
For the capybara, the sequence length was 524 bp at the mt D-loop. With the AIC criteria,
the best evolutionary substitution model was TN93 + G + I (2757.035), although it was not
significantly different from GTR + G + I (2759.326). For the Ln L criteria, GTR + G + I (-
1226.045) was the best evolutionary model. We found 43 haplotypes (Hd = 0.971 + 0.009 and
π = 0.018 + 0.009) (for additional diversity statistics, see Table 2). At the mt Cyt-b, the
sequence length was also 524 bp. With the AIC criteria, the best evolutionary substitution
model was TN93 + G + I (2705.944). For the Ln L criteria, the best evolutionary model was
GTR + G + I (-1296.276).We located 20 haplotypes (Hd = 0.977 + 0.019 and π = 0.026 +
0.009, Table 2). Thus, for both mitochondrial genes studied, capybara is characterized by
elevated values of gene diversity. When three different geographical areas were considered
(Western Amazon, Colombian Eastern Llanos-Guainia and Northern Colombia) for capybara
(Table 2), the Western Amazon population showed the highest levels of gene diversity
(Hd = 0.942 + 0.034 and π = 0.0152 + 0.0023). In comparison, the other two populations only
showed half of the nucleotide diversity.
This fact could be an indication that the Western Amazon capybara population could be
the original one and the other two derived from it. This result agrees quite well with the
phylogenetic analysis shown below.
In the case of the paca, at the mt D-loop, we analyzed 228 bp. With the AIC criteria, the
best evolutionary substitution model was HKY + G + I (3486.930). For the Ln L criteria, two
models were the best, TN93 + G + I (-11611.184) and HKY + G + I (-1611.236). We found
60 haplotypes (Hd = 0.998 + 0.003 and π = 0.077 + 0.005, Table 2). These gene diversity
statistics are even higher than those found in the capybara. For the paca, at the mt Cyt-b, the
sequence length was 524 bp. With the AIC criteria, the best evolutionary substitution model
was GTR + G (7797.033). For the Ln L criteria, the best evolutionary model was GTR + G +
I (-3678.740). A total of 97 haplotypes were found (Hd = 0.998 + 0.002, π = 0.044 + 0.008)
(Table 2). Similar to the mt D-loop, gene diversity statistics were extremely high. Therefore,
both rodent species have very high levels of gene diversity, but the paca has the highest.

Table 2. Gene diversity statistics for the overall capybara and paca samples, at the mt D-loop region and at the mt Cyt-b gene (A) and for
the Northern Colombian, Eastern Colombian-Guainia and Western Amazon capybara populations at the mt D-loop region (B).
The statistics estimated were the number of haplotypes (NH), the haplotype diversity (Hd), the nucleotide diversity (π), the average
number of nucleotide differences (K) and the θ statistic (= 2Neμ; Ne = effective female population size; μ = mutation rate per generation)
by sequence
(A)
NH Hd π K θ per sequence
Total sample studied of capybara at the mt control region 0.971 0.0180 9.452 9.463
43
(n = 73) ±0.009 ±0.0009 ±4.388 ±2.757
Total sample studied of capybara at the mt Cyt-b gene 0.977 0.0259 13.580 22.246
20
(n = 25) ±0.019 ±0.0085 ±6.314 ±7.439
Total sample studied of paca at the mt control region 0.998 0.0773 17.176 16.285
60
(n =64) ±0.003 ±0.0054 ±7.737 ±4.607
Total sample studied of paca at the mt Cyt-b gene 0.998 0.0444 23.007 48.511
97
(n = 106) ±0.002 ±0.0082 ±10.204 ±11.883
(B)
Western Amazon sample studied of capybara at the mt control region 0.942 0.0152 7.984 7.892
13
(n = 20) ±0.034 ±0.0023 ±3.873 ±2.995
Eastern Llanos-Guainia sample studied of capybara at the mt control 0.897 0.0079 4.166 3.534
16
region (n =30) ±0.040 ±0.0009 ±2.130 ±1.404
North Colombian sample studied of capybara at the mt control region 0.925 0.0078 4.103 4.606
14
(n =23) ±0.036 ±0.0008 ±2.121 ±1.829

Genetic Heterogeneity among Capybara and Paca Populations throughout

South America at the mt D-loop and Cyt-b Genes
The genetic heterogeneity among diverse populations of capybara can be seen in Table 3.
At the mt D-loop, we carried out the analysis with three and seven populations,
respectively. In the first case (Western Amazon, Colombian Eastern Llanos-Guainia and
Northern Colombia), all the statistics showed a significant genetic heterogeneity among these
three geographical regions, some of them with considerable heterogeneity (NST and FST
around 0.5-0.6). The gene flow estimates from these statistics were very small
(Nm around 0.4-0.6).
When seven geographical areas were considered (Casanare in Colombian Eastern Llanos;
Meta in Colombian Eastern Llanos; Guainia in Colombia; Cordoba in Northern Colombia;
Loreto in Northern Peruvian Amazon; Ucayali in Southern Peruvian Amazon; Ecuadorian
Amazon), there was also a very significant genetic heterogeneity for all of the statistics used.
Most of the NST and FST values were around 0.44-0.88, with gene flow estimates even lower
than for the previous analysis (Nm around 0.2-0.3).
The unique exceptions to this were the population pairs, Casanare-Meta with FST = 0.18,
and the Northern and Southern Peruvian Amazon, with practically no genetic heterogeneity
between them (FST = 0.005). All the other population pairs showed very high levels of genetic
heterogeneity.
At the mt Cyt-b, four geographical regions were considered for the capybara (frontier
between Colombia and Brazil in the Western Amazon; Peruvian Amazon; Meta and Guainia
in the Colombian Eastern Llanos; Northern Colombia). Out of seven genetic heterogeneity
tests, four were significant but the level of genetic heterogeneity was considerably lower than
for the mt D-loop (NST and FST around 0 to 0.20) and the gene flow estimates considerably
higher (Nm around 2-11). Certainly, the number of samples at the mt Cyt-b was lower than at
the mt D-loop, but this last gene showed considerably higher genetic heterogeneity and
structure than did mt Cyt-b. However, both genes showed considerable amounts of genetic
heterogeneity among capybara populations from diverse geographical areas throughout
Colombia, Brazil, Peru and Ecuador.
The genetic heterogeneity among diverse populations of paca can be observed in Table 4.
At the mt D-loop, we carried out the analysis with six different geographical areas (frontier
between Colombia and Brazil in the Western Amazon; Guainia in Colombia; Ecuadorian
Amazon; Loreto in the Northern Peruvian Amazon; San Martin in the central Peruvian
Amazon; Madeira River in the central Brazilian Amazon). Out of seven genetic heterogeneity
tests employed, only three showed marginal significance. The values of NST and FST
oscillated from 0 to 0.27 and the gene flow estimates from these statistics ranged from 3.4 to
4.3. Many population pairs practically showed null genetic heterogeneity. Thus, different to
that found for capybara at this same gene, paca showed very limited genetic heterogeneity
among diverse geographical areas throughout Colombia, Brazil, Peru and Ecuador.
At mt Cyt-b, nine geographical regions were considered for the paca (frontier between
Colombia and Brazil in the Western Amazon; Guainia in Colombia; Cordoba in Northern
Colombia; Loreto in the Northern Peruvian Amazon; San Martin in the middle Peruvian
Amazon; Ucayali in the Southern Peruvian Amazon; Ecuadorian Amazon; Madeira River in
central Brazilian Amazon; Bolivian Amazon).

Out of seven genetic heterogeneity tests, five were significant, although the NST and the
FST statistics ranged from 0 to 0.18 (low or moderate genetic heterogeneity) and the gene flow
estimates were high (Nm oscillated from 4 to 8).
Table 3. Genetic heterogeneity statistics among the capybara populations:

Three populations of capybara (Western Amazon; Eastern Llanos-Guainia; North
Colombia) at the mt control region (A); Seven populations of capybara [northern
Peruvian Amazon; southern Peruvian Amazon; Ecuadorian Amazon; Guainia
Department (Colombia); Casanare Department (Colombia); Meta Department
(Colombia); Cordoba Department (northern Colombia)] at the mt control region (B);
Four populations of capybara [Colombian-Western Brazilian Amazon; Peruvian
Amazon; Meta-Guainia Departments (Colombia); Cordoba Department (northern
Colombia)] at the mt Cyt-b gene (C)
(A)
Genetic differentiation
P Gene flow
estimated
χ2 = 146.000 df = 84 0.0000*
HST = 0.0554 0.0000* γST = 0.4669 Nm = 0.57
KST = 0.4507 0.0000* NST = 0.5498 Nm = 0.41
KST* = 0.2727 0.0000* FST = 0.5473 Nm = 0.41
ZS = 647.918 0.0000*
ZS* = 6.0180 0.0000*
Snn = 0.7217 0.0005*
(B)
P Gene flow
estimated
χ2 = 397.534 df = 246 0.0000*
HST = 0.1036 0.0000* γST = 0.6403 Nm = 0.28
KST = 0.6095 0.0000* NST = 0.6698 Nm = 0.25
KST* = 0.3919 0.0000* FST = 0.6675 Nm = 0.25
ZS = 433.409 0.0000*
ZS* = 5.6250 0.0000*
Snn = 0.8028 0.0000*
(C)
P Gene flow
estimated
χ2 = 59.486 df = 54 0.2828
HST = 0.0274 0.0680 γST = 0.1824 Nm = 2.24
KST = 0.0624 0.0610 NST = 0.0413 Nm = 11.61
KST* = 0.1055 0.027** FST = 0.0415 Nm = 11.54
ZS = 119.0186 0.028**
ZS* = 4.2444 0.0000*
Snn = 0.6118 0.0000*

Table 4. Genetic heterogeneity statistics among the paca populations: Six populations of
paca [Colombian-Western Brazilian Amazon; northern Peruvian Amazon (Loreto
Department); Middle Peruvian Amazon (San Martin Department); Ecuadorian
Amazon; Guainia Department (Colombia); Madeira River-Central Brazilian Amazon]
at the mt D-loop (A); Nine populations of paca [Colombian-Western Brazilian Amazon;
northern Peruvian Amazon (Loreto Department); Middle Peruvian Amazon (San
Martin Department); southern Peruvian Amazon (Ucayali Department); Ecuadorian
Amazon; Madeira River-Central Brazilian Amazon; Bolivian Amazon;
Guainia Department (Colombia); Cordoba Department (northern Colombia)]
at the mt Cyt-b gene (B)
(A)
Genetic differentiation estimated P Gene flow
χ2 = 277.568 df = 280 0.5298
HST = 0.0007 0.4320 γST = 0.1289 Nm = 3.38
KST = 0.0602 0.024** NST = 0.1094 Nm = 4.07
KST* = 0.0207 0.0610 FST = 0.1031 Nm = 4.35
ZS = 817.6857 0.0570
ZS* = 6.3409 0.018**
Snn = 0.3551 0.023**
(B)
Genetic differentiation estimated P Gene flow
χ2 = 789.099 df = 760 0.2254
HST = 0.0058 0.003** γST = 0.1184 Nm = 3.72
KST = 0.0550 0.017** NST = 0.0588 Nm = 8.00
KST* = 0.0339 0.002** FST = 0.0604 Nm = 7.77
ZS = 2,555.453 0.0010*
ZS* = 7.4249 0.0000*
Snn = 0.3693 0.0000*
A large fraction of the population pair genetic heterogeneity tests showed very small or
null genetic heterogeneity (FST lower than 0.10, 27/36 = 75%). The three population pairs
which showed the highest degree of differentiation were the Ecuadorian Amazon and Madeira
River in the Central Brazilian Amazon (FST = 0.188), Ecuadorian Amazon and Bolivian
Amazon (FST = 0.168) and Ecuadorian Amazon and Loreto Department in the Northern
Peruvian Amazon (FST = 0.161). In all these cases, the paca sample of the Ecuadorian
Amazon was the most differentiated population. However, whereas the genetic heterogeneity
degree was very high in the capybaras for both genes, the genetic heterogeneity found for the
paca was considerably lower and of a similar degree for both mitochondrial genes used.
Bayesian Estimates of Female Effective Numbers and Gene Flow among

Different Populations of Capybara and Paca
We estimated historical female effective numbers, as well as, asymmetrical gene flow
amounts using Migrate Software. Both procedures (maximum likelihood and Bayesian)

offered the same results. For brevity, only the Bayesian results are commented upon here. In
the case of the capybara at the mt D-loop (Table 5) we considered four populations.
Table 5. Bayesian estimations of the female effective numbers by means of the Migrate
Software for different capybara populations: at the mt D-loop region (A); at the mt Cyt-b
gene (B). θ = Neμ, Ne = Female effective number and μ = Mutation rate. CI = 97.5%
confidence interval
(A)
θ value CI Ne value CI
Western Amazon 0.0183 0.0042; 0.0359 71,029 16,311-139,301
Colombian Eastern
Llanos-Guainia 0.0159 0.0062; 0.0271 61,786 24,078-105,126
Northern 0.0180 0.0064; 0.0317 70,019 24,854-122,990
Colombia
Napo River 0.0099 0; 0.0381 38,369 0-118,078
(B)
Western Amazon 0.0527 0.0145; 0.0983 852,104 235,113-1,590,129
Colombian Eastern
Llanos-Guainia 0.0059 0; 0.0176 95,307 0-284,790
Northern Colombia 0.0495 0.0081; 0.0930 800,971 130,583-1,504,854
The Western Amazon (excluding Napo River) showed the highest effective numbers
(Ne = 71,029; 97.5% confidence interval: 16,311-139,301), whereas the Napo River
population from Peru and Ecuador showed the lowest value (Ne = 38,369; 97.5% confidence
interval: 0-118,078). The mean value for the four populations was 60,301 + 15,196. At mt
Cyt-b, three populations were considered, with the Western Amazon population again
showing the highest value (Ne = 852,104; 97.5% confidence interval: 235,113-1,590,129).
The mean value for these three populations was 582,794 + 422.950. Thus, for the capybara,
mt Cyt-b yielded a historical female effective number 9.66 times greater than that of mt
D-loop.
In the case of the paca (Table 6), at the mt D-loop, seven populations were considered,
with the Madeira River population showing the highest effective numbers (Ne = 338,757;
97.5% confidence interval: 254,757-388,350). The San Martin population yielded the lowest
values (Ne = 213,825; 97.5% confidence interval: 60,311-387,301). The mean value for the
seven populations we considered was 286,164 + 50,661. Mt Cyt-b, for nine populations
considered, the population of the Madeira River also showed the highest value
(Ne = 1,304,531; 97.5% confidence interval: 877,023-1,618,123), whereas the population of
the San Martin Department again yielded the lowest amount (Ne = 477,508; 97.5%
confidence interval: 188,835-972,977). The mean value for the nine considered populations
was 993,456 + 257,168. Also, in this species, the historical female effective size was 3.47
times higher at mt Cyt-b than at the mt D-loop as it occurred in the capybara, but in this last
species the differentiation between both genes was higher than in the paca. Additionally, for
both genes, the paca showed higher historical female effective numbers than the capybara
(4.75 times higher at the mt D-loop gene and 1.7 times higher at the mt Cyt-b gene).

Table 6. Bayesian estimations of the female effective numbers by means of the Migrate
Software for different paca populations: at the mt D-loop region (A); at the mt Cyt-b
gene (B). θ = Neμ, Ne = Female effective number and μ = Mutation rate. CI = 97.5%
confidence interval
(A)
Guainia 0.0588 0.0188; 0.0993 228,427 73,010-385,515
Colombian-Western
Brazilian Amazon 0.0867 0.0693; 0.1034 336,583 269,010-388,350
Ecuadorian Amazon 0.0696 0.0358; 0.1123 270,097 139,029-388,350
Northern Peruvian 0.0759 0.0481; 0.1976 294,757 186,680-388,350
Amazon
Middle Peruvian 0.0551 0.0155; 0.0997 213,825 60,311-387,301
Amazon
Southern Peruvian 0.0826 0.0587; 0.1234 320,699 227,845-396,876
Amazon
Madeira River 0.0872 0.0656; 0.1126 338,757 254,757-392,765
(B)
Northern Colombia 0.0585 0.0284; 0.1000 946,602 459,547-1,618,123
Guainia 0.0633 0.0303; 0.1125 1,024,437 489,806-1,618,123
Colombian-Western
Brazilian Amazon 0.0697 0.0393; 0.0991 1,127,508 635,437-1,604.045
Ecuadorian Amazon 0.0549 0.0168; 0.0788 889,320 271,847-1,275,081
Northern Peruvian 0.0485 0.0177; 0.0729 785,275 286,893-1,179,126
Amazon
Middle Peruvian 0.0295 0.0117; 0.0601 477,508 188,835-972,977
Amazon
Southern Peruvian 0.0784 0.0502; 0.1183 1,267,799 812,298-1,618,123
Amazon
Madeira River 0.0806 0.0542; 0.1064 1,304,531 877,023-1,618,123
Bolivian Amazon 0.0691 0.0358; 0.0994 1,118,123 579,288-1,608,414
The Bayesian gene flow estimates for the capybara at both genes are located in Table 7.
At the mt D-loop, we obtained some population pair estimations below 1, which is a value
related to the absence of gene flow in an infinite island model. These were the cases from the
Colombian Eastern Llanos-Guainía to the Western Amazon, from the Western Amazon to
Northern Colombia, from the Napo River to the Colombian Eastern Llanos-Guainia, from the
Western Amazon to the Colombian Eastern Llanos-Guainia and from Northern Colombia to
the Colombian Eastern Llanos-Guainia. There are several cases, where the gene flow between
the same population pair was asymmetrical. For instance, the gene flow from the Western
Amazon to the Napo River was high (Nm = 4.95), but from the Napo River to the Western
Amazon it was considerably lower (Nm = 1.16). Also, gene flow from Northern Colombia to
the Napo River was high (Nm = 5.66), but in the opposite direction it was very limited (Nm =
1.06). No gene flow estimation pairs were lower than 1 at mt Cyt-b. However, two cases were
lower than 4, which is considered the limit of gene flow for a stepping-stone model. These
were the cases from the Colombian Eastern Llanos-Guainia to Northern Colombia and from
the Colombian Eastern Llanos-Guainia to Western Amazon.

Table 7. Bayesian gene flow estimates (Nm) among capybara population pairs at the mt
D-loop region (A) and at the mt Cyt-b gene (B)
(A)
Eastern Napo Western Amazon basin, Northern
Colombian River excluding the Napo River Colombia
Llanos-
Guainia
Eastern Colombian - 0.63 0.54 0.67
Llanos-Guainia
Napo River 4.57 - 4.95 5.66
Western Amazon basin, 0.86 1.16 - 1.08
excluding the Napo River
Northern Colombia 1.27 1.06 0.69 -
(B)
Western Amazon Eastern Colombian Northern
basin Llanos-Guainia Colombia
Western Amazon basin - 1.34 14.77
Eastern Colombian 15.68 - 16.58
Llanos-Guainia
Northern Colombia 18.24 2.99 -
The Bayesian gene flow estimations for the paca at both genes can be observed in Table
8. In general, the gene flow estimation pairs for paca were substantially higher than for
capybara. In the case of the mt D-loop, only four cases showed Nm values lower than 1. They
were the cases from the populations of San Martin, Loreto, Madeira River and Guainia
towards the frontier between Colombia and Brazil in the Western Amazon. In contrast, the
gene flow estimates from this Colombian-Brazilian Amazon region to these other regions
were elevated. There were three other cases with Nm estimates lower than 4. Thus, 83.33% of
the gene flow estimation pairs were higher than 4. In the case of the mt Cyt-b, only one case
presented an Nm value lower than 1. This was the case from the Ecuadorian Amazon to the
relatively close San Martin in Peru. There were 9 cases with Nm estimations lower than 4.
Nevertheless, as in the previous gene, the major part of the gene flow estimation pairs were
higher than 4 (86.11%).
Therefore, more limited and structured gene flow was observed in the capybara, whereas
greater and more highly indiscriminate gene flow estimates were obtained for the paca.
Historical Demographic Changes in Capybara and Paca of South America
For the capybara, the mismatch distribution showed a significant population expansion at
the mt D-loop for the overall sample (Figure 2) as well as for the samples of the Western
Amazon, Colombian Eastern Llanos-Guainia and Northern Colombia (Table 9 and Figure 2).
In this analysis, the values of τ and θ0 were 5.802 and 3.651, respectively.

Table 8. Bayesian gene flow estimates (Nm) among paca population pairs at the mt D-loop region (A) and at the mt Cyt-b gene (B)
(A)
Colombian Middle Southern Northern Ecuadorian Madeira River Guainia
and Western Peruvian Peruvian Peruvian Amazon
Brazilian Amazon Amazon Amazon
Amazon
Colombian and Western - 0.34 1.89 0.58 9.56 0.51 0.56
Brazilian Amazon
Middle Peruvian Amazon 36.86 - 46.70 47.18 27.27 17.03 5.84
Southern Peruvian Amazon 8.92 7.59 - 18.94 7.42 33.17 15.26
Northern Peruvian Amazon 28.45 11.76 8.13 - 10.11 55.80 8.41
Ecuadorian Amazon 34.09 5.86 67.96 11.46 - 17.31 18.17
Madeira River 12.17 3.50 10.35 9.56 5.73 - 9.54
Guainia 14.43 24.44 49.24 2.04 20.52 13.31 -
(B)
Guainia Colombian and Southern Northern Middle Madeira Ecuadorian Northern Bolivia
Western Peruvian Peruvian Peruvian River Amazon Colombia Amazon
Brazilian Amazon Amazon Amazon
Amazon
Guainia - 19.43 42.32 33.55 23.57 45.14 23.59 44.20 52.54
Colombian and 51.55 - 47.20 31.22 12.92 43.62 38.95 28.80 54.41
Western Brazilian
Amazon
Southern Peruvian 20.47 61.14 - 7.82 12.34 35.48 24.53 5.81 26.49
Amazon
Northern Peruvian 39.42 34.78 21.11 - 9.00 29.94 15.32 1.02 1.91
Amazon
Middle Peruvian 10.31 1.71 26.23 3.58 - 20.70 0.66 35.90 2.65
Amazon
Madeira River 13.88 1.57 2.95 6.03 3.30 - 19.71 10.20 7.87
Ecuadorian Amazon 15.07 6.29 2.62 11.62 10.64 8.93 - 17.06 35.23
Northern Colombia 34.24 56.98 64.98 33.94 25.64 60.05 41.71 - 49.45
Bolivia Amazon
9.09 20.54 7.22 6.60 11.73 10.61 26.96 10.23 -
Table 9. Demographic statistics applied to the overall, Northern Colombian, Eastern Colombian Llanos-Guainia, and Western Amazon
basin capybara samples studied at the mt D-loop region and at the mt Cyt-b gene, and to the overall paca sample at the mt D-loop region
and at the mt Cyt-b gene . + P < 0.05; * P < 0.01, significant
population expansions
Tajima D Fu & Li D* Fu & Li F* Fu’s Fs raggedness rg R2

Overall capybara sample at the mt D-loop P[D < -0.267] = P[D* < -0.715] P[F* < -0.648] P[Fs < -19.49 ] P[rg < 0.0029] = P[R2 < 0.0994]
region 0.4640 = 0.2200 = 0.2690 = 0.0000* 0.0000* = 0.5622
Northern Colombian capybara sample at the P[D < -0.396] = P[D* < -0.707] P[F* < -0.716] P[Fs < -5.111] P[rg < 0.0281] = P[R2 < 0.109]
mt D-loop region 0.3661 = 0.2142 = 0.2147 = 0.0081* 0.0412+ = 0.3173
Eastern Colombian Llanos-Guainia capybara P[D < -0.100] = P[D* < -0.199] P[F* < -0.198] P[Fs < -5.581] P[rg < 0.0175] = P[R2 < 0.142]
sample at the mt D-loop region 0.5143 = 0.3965 = 0.3902 = 0.0140+ 0.0161+ = 0.7422
Western Amazon capybara sample at the P[D < -0.091] = P[D* < -0.459] P[F* < -0.408] P[Fs < -1.736] P[rg < 0.0177] = P[R2 < 0.133]
mt D-loop region 0.5156 = 0.3022 = 0.3401 = 0.2067 0.0451+ = 0.5710
Overall capybara sample at the mt Cyt-b gene P[D < -1.715] = P[D* < -1.605] P[F* < -1.929] P[Fs < -4.333 ] P[rg < 0.0131] = P[R2 < 0.0908]
0.0271+ = 0.0611 = 0.0471+ = 0.0521 0.1071 = 0.1691
Overall paca sample at the mt D-loop region P[D < -1.305] = P[D* < -2.601] P[F* < -2.505] P[Fs < -45.54 ] P[rg < 0.0031] = P[R2 < 0.1041]
0.0750 = 0.0181+ = 0.0200+ = 0.0000* 0.0030* = 0.6040
Overall paca sample at the mt Cyt-b gene P[D < -2.268] = P[D* < -3.422] P[F* < -3.405] P[Fs < -33.74 ] P[rg < 0.0021] = P[R2 < 0.0433]
0.0000* = 0.0071* = 0.0070* = 0.0000* 0.0000* = 0.0050*

Assuming that a generation in capybara could be 1 or 2 years, this could mean than this
population expansion began for this sample around 43,111 to 86,222 YA with an initial
female population size of around 27,000 individuals and with a final female population of
around 7.4 million of individuals. However, mt Cyt-b did not show any trend of a significant
population expansion for the global capybara sample. For the paca, both mitochondrial genes
showed population expansions for the overall sample (Figure 3), although more clearly at the
mt D-loop. In this case, the values of τ and θ0 were 7.521 and 9.655, respectively. These
values corresponded to a population expansion which began around 121,732 to 243,464 YA,
with an initial female population size of around 156,200 individuals and with a final female
population of around 16.2 million of individuals.
At mt Cyt-b, the values of τ and θ0 were 7.824 and 32.615, respectively, which
corresponded with a temporal population expansion similar to the previous case, with an
initial female population size of around 1 million of individuals and with a final female
population of around 30.9 million of individuals. Although, these female effective population
sizes were not identical to that determined with Bayesian procedures (see before), in all the
cases, mt Cyt-b showed higher effective population numbers than the mt D-loop and these
effective numbers were always higher for the paca than for the capybara. The statistics used
to determine population changes are in Table 9. For the capybara, at the mt D-loop at the
global level, in the Colombian Eastern Llanos-Guainia and in the Northern Colombian, only
one out of five statistics showed evidence of population expansion (Fu Fs).
For the Western Amazon sample, no statistics showed evidence of population changes. In
contrast, pacas showed more evidence of population expansions for these statistics. At the mt
D-loop, three out of five statistics showed evidence of population expansions, whilst, at the
mt Cyt-b, five out of five statistics indicated population expansion.
(A) Significant population expansion

(B) No population expansion
(C) Significant population expansion
(D) Significant population expansion

(E) Significant population expansion
Figure 2. Historical demographic analyses by means of the mismatch distribution procedure (pairwise
sequence differences) for the two mitochondrial DNA genes studied in the capybara. These analyses
were applied to: Overall capybara sample at the mt D-loop region (A); Overall capybara sample at the
mt Cyt-b gene (B); Northern Colombian capybara sample at the mt D-loop region (C); Eastern
Colombian Llanos-Guainia capybara sample at the mt D-loop region (D); Western Amazon capybara
sample at the mt D-loop region (E).
(A) Significant population expansion

(B) Significant population expansion
Figure 3. Historical demographic analyses by means of the mismatch distribution procedure (pairwise
sequence differences) for the two mitochondrial DNA genes studied in the paca. These analyses were
applied to: Overall paca sample at the mt D-loop region (A); Overall paca sample at the mt Cyt-b gene
(B).
A) Overall capybara sample at the mt D-loop region
B) Overall capybara sample at the mt Cyt-b gene

C) Northern Colombian capybara sample at the mt D-loop region
D) Eastern Colombian Llanos-Guainia capybara sample at the mt D-loop region
E) Western Amazon capybara sample at the mt D-loop region

F) Overall paca sample at the mt D-loop region
G) Overall paca sample at the mt Cyt-b gene
Figure 4. Bayesian skyline plot (BSP) analyses applied to the capybara and paca populations studied at
two mitochondrial DNA genes. Overall capybara sample at the mt D-loop region (A); Overall capybara
sample at the mt Cyt-b gene (B); Northern Colombian capybara sample at the mt D-loop region (C);
Eastern Colombian Llanos-Guainia capybara sample at the mt D-loop region (D); Western Amazon
capybara sample at the mt D-loop region (E); Overall paca sample at the mt D-loop region (F); Overall
paca sample at the mt Cyt-b gene (G). Time in millions of years.
We conducted five BSP analyses for the capybara (Figure 4). Based upon the mt D-loop
(for the overall sample) the female population underwent growth in the last 1 MY, although
this population expansion hardly increased in the last 0.5 MY. Mt Cyt-b also indicated
continuous growth of the female population during the last 3 MY. In the case of mt D-loop,
we analyzed possible demographic changes in three different geographical areas. In the
Western Amazon, the capybara population also showed a continuous and gradual growth in
the last 3 MY. In the Colombian Eastern Llanos-Guainia, a population expansion was
detected in the last 0.5 MY, but with strong growth for the last 0.25 MYA. We also detected a
population expansion in Northern Colombia starting 0.5 MY, but it has declined in the last
10,000 Y. This declination trend in capybara is unique.

Two BSP analyses were carried out for the paca. At the mt D-loop, we observed a
population expansion beginning 1.5 MYA, with a strong increase in the last 1 MY. A similar
result was observed for Cyt-b—beginning 3 MYA, but particularly strong in the last 1 MY.
Thus, both rodent species were characterized by population expansions especially in the
last 0.5-1.0 MY, based upon BSP analyses. These temporal values for the beginning of these
population expansions were higher for this Bayesian procedure than for the mismatch
distribution analysis.
Genetic and Geographical Relationships among Populations of Capybara

and Paca
The capybara populations showed significant relationships between the estimated genetic
distances and the geographic distances at both mitochondrial genes. At the mt D-loop, the
four regression models we used showed significant results, with the geographic distance
explaining from 54.1 to 70.6% of the genetic distances found. The best regression equation
was: (genetic distance) = 1.04e-5 (+ 2.82e-6) (geographic distance) + 9.09e-3 (+ 3.67e-3) with
r = 0.84 and p = 0.024. At mt Cyt-b, the four regression models were also significant, with the
geographic distance explaining from 78.1 to 88.3% of the genetic distances found. The best
regression equation was: log (genetic distance) = 2.07 (+ 0.71) log (geographic distance) –
7.94 (+ 2.23) with r = 0.94 and p = 0.03. In contrast, the paca populations did not show any
significant relationship between genetic distances and geographic distances. At the mt D-loop,
the four regression models showed that the geographic distance only explained from 0 to
1.2% of the genetic distances found. Similarly, at the mt Cyt-b, the geographic distance only
explained from 1.9 to 7% of the genetic distances found. In either case, there were no
significant values.
Thus, in contrast to the paca, the capybara’s population genetic structure was related to
geographic distances.
Phylogenetic Relationships in Capybara and Paca and Temporal Haplotype

Splits within these Species
ML and BI trees for capybara at the mt D-loop are shown in Figure 5. In the ML tree, the
first clade to diverge (bootstrap 100%) consisted of exemplars sampled throughout different
Peruvian rivers within the Western Amazon. There was also a sample from the Negro River
in the middle of the Brazilian Amazon. This last one showed the most divergent sequence in
this clade. It was also the most geographically remote individual within this group. The
second clade to diverge contained four individuals from the Inirida River in the Guainia
Department (Colombia). The third clade was integrated by animals from the Colombian
Eastern Llanos (Meta and Casanare Departments) (100%). The last clade was composed of
two sub-groups (80%), one by three animals from the Ecuadorian Amazon (Napo River) and
one from the Peruvian Amazon (also Napo River).


Figure 5. Maximum Likelihood (ML) (A) and Bayesian (BI) (B) trees for the capybara studied at the mt
D-loop region.
The other one contained all the trans-Andean animals sampled at Cordoba in Northern
Colombia and one individual from Valle del Cauca on the Pacific coast of Colombia. The BI
tree showed very similar results. Western Amazon was the first clade to diverge (p = 1). The
other large cluster was comprised of the Colombian Eastern Llanos group (in this case,
including the specimens of the Guainia Department) and the Northern Colombian group plus
the Amazonian animals from the Napo River in Ecuador and Peru. The temporal splits of
these clades were estimated in the BI tree. The split between the main Western Amazon clade
and the others occurred around 6.6 MYA (95% HPD: 6.2-10.18 MYA). The temporal
separation of the central Brazilian Amazon haplotype from the other western Amazon
haplotypes of this clade was around 4.6 MYA. The temporal split process within the Peruvian
individuals (excepting those from the Napo River) began around 1.75 MYA (95% HPD: 0.25-

3.67 MYA). The split between the Colombian Eastern Llanos population and the trans-
Andean North Colombian population was around 2.8 MYA (95% HPD: 1.05-6.61 MYA),
whereas the haplotype diversification within the first quoted group was around 2.6 MYA
(95% HPD: 0.42-4.44 MYA). The temporal split between the trans-Andean and Napo River
capybara populations occurred approximately 1.9 MYA. The haplotype diversification
process within the trans-Andean population began around 1 MYA.
Figure 6 shows the BI tree for the mt Cyt-b gene. The information offered by this
molecular marker was of less quality than that obtained from the previous marker due mainly
to the lower number of individuals analyzed. However, some results, such as the separation
between capybara and the sister group (Kerodon; p = 1), are interesting and agree quite well
with that observed at the mt D-loop. As in the previous case, the western Amazon clade and
the Colombian Eastern Llanos-Northern Colombian clade were clearly separated (p = 1). In
this case, different individuals from the Meta and Guainia Departments were intermixed with
the trans-Andean individuals sampled in Northern Colombia. The temporal split between the
capybara and Kerodon occurred about 11.88 MYA (95% HPD: 9.99-13.88 MYA). The
separation of the capybara ancestors from Western Amazon and the Colombian Eastern
Llanos-Northern Colombia clade happened about 9.09 MYA (95% HPD: 5.68-11.8 MYA).
Also, the diversification within the Western Amazonian clade began around 7.75 MYA (95%
HPD: 3.28-10.38 MYA). Within the Colombian Eastern Llanos-Northern Colombia clade it
began around 7.41 MYA (95% HPD: 5.01-8.63 MYA). These temporal estimates are higher
than those obtained for the mt D-loop, although the 95% HPD of both mitochondrial genes
are overlapped.
Figure 6. Bayesian (BI) tree for the capybara studied at the mt Cyt-b gene.


Figure 7. Maximum Likelihood (ML) (A) and Bayesian (BI) (B) trees for the paca studied at the mt D-
loop region.
Figure 7 shows the ML and the BI trees at the mt D-loop for the paca. No geographical
clusters were observed in the ML tree with all the individuals intermixed independently of
their geographical origins. For instance, individuals from the Guainia Department, from trans-
Andean localities as well as from the Madeira River, were intermixed with individuals from
the Western Amazon. The bootstrap percentages were low or very low in the ML tree. The BI
tree showed that the outgroups, Dasyprocta fuliginosa and Cunniculus taczanowski, diverged
from the paca with p = 1. Within the pacas, there were only a few clades with elevated p, but
even these clades were composed by intermixed animals from different geographical origins.
The temporal split between the ancestor of Dasyprocta fuliginosa and the remaining taxa was
around 21.18 MYA (95% HPD: 16.9-26.76 MYA). The temporal separation between both
species of Cunniculus was around 7.45 MYA (95% HPD: 4.66-12.51 MYA). The beginning
of the D-loop diversification process within the paca was around 4.22 MYA (95% HPD:
4.53-7.97 MYA), when the ancestor of a haplotype found in the Aripuana River (an affluent
of the Madeira River) diverged. However, the main clades began their diversification
processes 3.95 (95% HPD: 3.97-7.29 MYA), 3.71 and 3.31 MYA (95% HPD: 2.25-5.63
MYA). Many other haplotype fragmentation processes were around 2.8-1.6 MYA. The ML
and BY tree results for paca at the mt Cyt-b agree quite well with those for mt D-loop (Figure
8). Very few clades showed elevated bootstraps or posterior probabilities and the animals and
haplotypes were intermixed independently of their geographic origins. In this case, the
outgroups (capybara and Cunniculus taczawnoski) in the BI tree showed p = 1.


Figure 8. Maximum Likelihood (ML) (A) and Bayesian (BI) (B) trees for the paca studied at the mt
Cyt-b gene.
The temporal split of the capybara with reference to the two Cunniculus species occurred
around 21.45 MYA (95% HPD: 19.5-25.61 MYA). Also, the divergence between the
Cunniculus species was around 7.34 MYA (95% HPD: 4.66-12.27 MYA). The first mt Cyt-b
haplotype divergence events within the paca were estimated to begin around 6.46 and 6.3
MYA (95% HPD: 4.56-7.93 MYA) respectively. Many other haplotype fragmentation events
occurred around 4.73-5.92 MYA.
Therefore, the ML and BI trees for both mitochondrial genes, had an opposite trend in the
paca to that observed in the capybara. Capybara showed a very pronounced geographical
structure, but paca did not at all. Moreover, mt Cyt-b showed temporal split events slightly
older than those offered by the mt D-loop for both rodent species. Additionally, the haplotype

fragmentation events for both molecular markers seemed to be slightly older in the capybara
than in the paca.
DISCUSSION
Gene Diversity and Genetic Heterogeneity: Similitudes and Opposite
Patterns in Capybara and Paca
Both species of rodents presented high levels of gene diversity at both mitochondrial
markers. This agrees quite well with the few studies carried out on this topic. Campos-Krauer
and Wisely (2011) determined a global nucleotide diversity of 0.016 for the capybaras studied
in three regions of the Paraguayan Chaco. This value is practically identical to the global
value we obtained for the mitochondrial control region, although our average value for the
two mitochondrial genes studied was slightly superior (π = 0.022). Van Vuuren et al.,(2004)
showed a value π = 0.015 for pacas from the French Guiana at the mt Cyt-b. Our average
value for two mitochondrial genes was higher (π = 0.060), which is absolutely logical because
our geographical area of study was considerably larger than the area of the French Guiana.
The fact that both rodent species have high gene diversity levels positively supports the need
for their genetic conservation. However, it is interesting to note, that gene diversity is higher
in paca than in capybara.
These two cited works also agree quite well with the levels of genetic heterogeneity we
found for both rodent species. Campos-Krauer and Wisely (2011) determined two distinct
phylogroups of capybara in the Gran Chaco region of Paraguay, which recently admixed.
They also showed that major river drainages from east to west of the Chaco explained up to
27% of the genetic variation in the capybara population of that region. They reported a global
genetic heterogeneity (FST) of 0.44. Similarly, we determined the existence of different and
significant haplogroups within capybara. Our average genetic heterogeneity (FST) for both
markers was 0.41, very similar to the previous value. In contrast, Van Vuuren et al., (2004)
determined that only a minimal fraction of the genetic heterogeneity for the paca in French
Guiana was among populations (FST= 0.17). They only detected sympatric clades with little or
no geographical separation between haplotypes. The same was found by us in the current
study. Our average genetic heterogeneity value (FST = 0.08) for paca was also small and we
did not detect significant haplogroups within the large geographical area studied.
Our geographical spatial analyses revealed that a very large fraction of the genetic
distances among capybara populations was explained by geographical distances (best results
for both genes, geographical distances explained 70.6-88.3%). Whereas in the case of the
pacas, geographical distances did not sufficiently explain genetic distances (in the best of the
cases, around 1.2-7%). Also Van Vuuren et al., (2004) did not find a significant correlation of
geographic distance and genetic differentiation for French Guiana populations of paca.
Therefore, capybara showed a very pronounced geographical genetic structure whereas
paca did not. We offer one explanation to understand this considerable difference. This
explanation is more related to ecological and natural history constrictions than to the current
social and reproductive systems of both species. We hypothesized in the introduction that the
fact that capybara is a specialist and linked to the river systems could have created a greater

genetic structure in this species than in the paca. Effectively, this hypothesis was confirmed.
Ojasti (1973) showed that 40% of capybara recaptured were released less than 100 m from
where they were originally captured and 80% less than 1 km. Azcárate (1980) determined that
a capybara herd daily walked, on average, around 1,120 m in the dry seasons and around 790
m in the wet seasons. When some animals were moved to other areas during drought periods
to avoid death, many of them returned to their original home ranges during the night.
Additionally, the highest population densities for capybaras range from 2 to 3.5
individuals/ha (Cordero and Ojasti, 1981; Macdonald, 1981). Therefore, geological or
climatological change—affecting Neotropical rivers, could have enormous consequences for
the genetic structure of this species. Contrarily, no differentiated mitochondrial lineages were
observed in the paca. The estimated population densities for the paca are clearly higher than
for the capybara. In Masaguaral, Venezuelan Llanos, Eisenberg et al., (1979) estimated 25
individuals/km2. Smythe et al., (1983) and Glanz (1983) estimated values of 70 and 40
individuals/km2 at Barro Colorado Island in Panama. Collet (1981) determined densities
around 40-70 individuals/km2 in different places of the Eastern Colombian Llanos, although
these places were intensively hunted areas. Similarly, Van Vuuren et al., (2004) reported 27
individuals/km2 in French Guiana. Although not necessarily linearly, the censuses population
sizes (Ruiz-García, 2000) are related to the effective population sizes (Ne) and this, in turn, is
related to the degree of gene flow (Wright, 1938, 1951). For this reason, it is not strange that
our Bayesian population genetics estimations of gene flow among populations were
significantly higher for the paca than for the capybara.
Demographic Evolution
Practically all the historical demographic analyses revealed population expansions for
both rodent species as well as for the different geographical regions where these analyses
were performed. This also agrees quite well with the population expansions detected by
Campos-Krauer and Wisely (2011). They determined a mismatch distribution with a bell-
shaped curve indicative of a population expansion or colonization event for the capybara
population from the Paraguayan Gran Chaco. Identically, Van Vuuren et al., (2004)
determined a population expansion in the paca population of French Guiana. Our mismatch
distribution at mt-D loop for the capybara (mt Cyt-b did not reveal demographic changes for
this species) showed a population increase from 27,000 to 7.4 million females in the last
43,100-86,200 years. This period coincides with the middle Pleniglacial which occurred
60,000-26,000 YA, which was characterized by moderate temperatures although in general
this epoch was not especially dry (Van der Hammen, 1992). This was a temperate and
relatively wet epoch between two very cold periods. The first originated about 90,000-70,000
YA, with the cold extreme during the early Pleni-glacial, which was the first extreme cold
period in the fourth glaciation. Absy et al., (1991), Van der Hammen (1992) and Van der
Hammen and Absy (1994) analyzed the vegetation in Carajas (Eastern Amazon) covering a
time period starting approximately 65,000-51,000 YA and determined that this current area of
the Amazon forest was a savannah and probably formed by an extension of the early Pleni-
glacial period. These wet savannas could be optimal habitat for the capybara. Later, in the
middle Pleniglacial, another dry period began 26,000 YA in the Bogota savannah causing the
disappearance of the Bogota lagoon and the apparition of dry vegetation species (Van der

Hammen, 1980). This period was followed by the most intense cold and dry period of the
fourth glaciation, which ranged from 26,000 to 14,000 YA. In the case of the paca, both
markers for the mismatch distribution showed population expansions (from 156,000 to 16.2
million of females and from 1 to 30.9 million of females, respectively; this population
expansion was higher than for the capybara) in the last 121,700-243,500 years. This coincides
with the period immediately prior to the beginning of the fourth (and last) glacial period
during the Pleistocene. Van der Hammen (1992) determined the beginning of this period to be
around 130,000 YA by the pollen deposits at the Fuquene and Bogotá lagoons in the
Cundiboyacense highlands in Colombia and it coincided with the beginning of the Lujanense
period in Argentina.
The BSP analyses detected a greater number of older population expansions in both
species compared to the mismatch distribution. For the capybara, in the Western Amazon, the
beginning of the population expansion began around 3 MYA. This period occurred during the
Pliocene epoch (5.3-1.8 MYA) in which, for instance, some mammalian groups experienced
an explosion of species (see Koepfli et al., 2007, 2008). The cold and dry climate during the
Pliocene, coincided with the onset of high latitude glacial cycles, causing an explosive
expansion of low-biomass vegetation, including grasslands and steppe at mid-latitudes and
development of taiga at high latitudes in Eurasia and North America. These changes were
correlated with an extreme diversification of species such as muroid rodents and passerine
birds that exploited these new habitats. The same could happen in the western Amazon—
helping to create an expansion of the initial capybara population. This period also coincides
with the formation of the Panamanian land bridge. For instance, during this period (3–2.5
MYA), the Bogotá savannah reached its current altitude of 2,500 masl. The capybara
population from Eastern Colombia Llanos (and Guainia) experienced a population expansion
around 0.5-0.25 MYA. This agrees quite well with the coldest epoch of the Mindel-Kansas
glacial period (Bonaerense period for Argentina; 0.3–0.5 MYA). Many Neotropical forests in
that area of South America transformed into savannas and this could have facilitated the
expansion of the capybara population. The same was noted for the capybara population of
Northern Colombia. However, this population had a population declination in the last 10,000
years. This could be explained by human activity and/or the beginning of Dryas II (14,500 to
12,000 YA), when the cold was extreme. In that moment, the glacial area reached its
maximum in the Cordillera Blanca and in the Vilcanota Cordillera in Peru as well as in the
Nevado of ChoqueYapu in Bolivia and in the Chimborazo in Ecuador (Rodbell and Seltzer,
2000). Later (10,000-9,000 YA) a brief, but very cold period existed named “El Abra”—
corresponding to the Younger Dryas in Europe or Dryas III (Van der Hammen, 1992). This
coincides with the last glacial advance in the Andes (Clapperton, 1993). Thompson (1993)
and Thompson et al., (1995) determined that the temperature in the Andes was around 8-12
°C lower than today. Cardich (1964) determined an extreme glacial advance (the Antarragá
advance) that occurred 9,000 YA, in the area of Lauricocha (Huanaco Department, Peru) as
well as, in the Peruvian coast. These climatic events could have had a negative impact on the
North Colombian capybara population.
The paca population showed an expansion for both mitochondrial markers during the last
1.5 MYA, but especially in the last million years. This fact coincides with the beginning of
the Pleistocene (2.4-1.6 MYA; Forasiepi et al., 2007). This also coincides with the last phase
of formation of the Central Andes. The entire Andean chain between Cajamarca and
Huancavelica (Peru) appeared by volcanism during this period as well as the marine

introgression during the Interensenadian. Thus, both rodent species are characterized by
population expansions during the Pliocene-Pleistocene.
Today, in some areas of South America, both species could experience population
expansions, similar to that of the capybara in Paraguayan Gran Chaco (Campos-Krauer and
Wisely, 2010). These authors and others (Paschoaletto et al., 2003 in Brazil) claimed that
capybara are considered a local pest because of damage they cause to crops in addition to
their intense pressure on already degraded natural habitat when they compete with cattle for
forage. However, the real situation in most South American areas is the opposite. For
example, for microsatellite markers, Maldonado et al., (2011) determined that at the Casanare
Department in Colombia there was a very recent bottleneck, which aligns with recent events
in the last decades. Hernández-Camacho et al., (1983) detected a drastic population
diminution of the capybara populations at the Arauca and Casanare Departments. The main
motif of this population diminution is the traffic of dry meat towards Venezuela (Aldana-
Dominguez et al., 2002). Around 120,000 to 150,000 individuals are illegally hunted in
Colombia and trafficked to Venezuela every year, because the Venezuelan populations of this
species has been note worthily decimated. It doesn’t take much data to comprehend the
magnitude of this phenomenon. Higuera (2001) recorded that, in March 2003, 6,800
capybaras were killed in four points of the Casanare Department. In other area of the
Casanare, Paz de Ariporo, 25,000 capybaras were hunted and killed during a 10-year period
(Aldana-Dominguez et al., 2007). In other countries, such as Peru and Argentina, the leather
exportation of this species is important. In the 70´s, one capybara skin cost 2 US$, and then
by the 1980s it increased to 11 US$. Currently, a skin cost around 20 US$/m2 (Lemke, 1981;
Ojeda and Mares, 1982, Ojasti, 1996). Tens of thousands of skins are annually exported from
diverse South American countries for gloves, belts, shoes and handbags (Ojeda and Mares,
1982).
In many regions of the Amazon, the paca has been classified as vulnerable at the local
level because of intense hunting pressure especially, near urban areas. In French Guiana, 12%
of the mammalian biomass consumed by Ameridians is from pacas (Ouhoud-Renoux, 1988).
Furthermore, the paca makes up an estimated 8% of all meat consumed in the Amazon
(Ojasti, 1983).
Given the serious threats to these rodent species, their conservation is well deserved.
Furthermore, molecular population genetics could directly benefit any conservation efforts for
these species.
Origins, Systematics and Main Temporal Haplotype Splits in Capybara and

Paca
Family Hydrochoeridae appeared in the fossil record of South America about 10 MYA.
Mones and Ojasti (1986) claimed that the capybara probably evolved from an unknown
species of Cardiatherium about 2 MYA in Southern South America. It is closely related with
the genera Neocherus and Hydrochoeropsis (Upper Pliocene to Recent; Pascual et al., 1967).
All the known fossil records of capybara are from the Pleistocene in Curacao, Urugay,
Argentina, Colombia and Bolivia. Also, the few fossil records from the paca are from
Pleistocene deposits in Mina Gerais, Brazil (Winge, 1888). Nevertheless, our molecular
results could indicate that both species could be older than that indicated by the fossil record.

In the case of the capybara, the phylogenetic results with mt D-loop showed that the
Western Amazon population (excluding animals from the Napo River) was the oldest and
original one and the ancestors of this population split around 6.6 MYA. This coincides with
the Middle to Late Miocene. After the Mid-Miocene Climatic Optimum (17-15 MYA), there
was a noteworthy cooling of the global climate near the end of the Middle Miocene. This
period of cooling coincides with formation of a permanent Antarctic ice sheet in the Middle
and in the Late Miocene and an Arctic ice sheet in the Pliocene. In fact, many mammals were
affected by these climatic changes. Johnson and O´Brien (1997) and Johnson et al., (2006)
showed that seven of the eight primary lineages of felids radiated in the early part of the Late
Miocene (10.8-6.2 MYA). Koepfli et al., (2007, 2008) and Ruiz-García et al., (2013b) also
showed that many Mustelidae and Procyonidae genera appeared in this period. The temporal
haplotype diversification within this clade (excluding a Negro River specimen) began around
1.75 MYA, which coincided with the beginning of the Pleistocene as we previously
commented. The following ancestors to appear were those which gave origins to the
populations of the Colombian Eastern Llanos and Guainia on one hand and the Northern
Colombian (and the Napo River) populations on the other hand, 2.8 MYA. The first
population rapidly began its diversification, 2.6 MYA, which agrees quite well with the last
phase of the Andes formation (Dollfus, 1974; Clapperton, 1993). This also agrees with the
Andes’s active volcanic activity and the transition from rain forests to steppe and grassland
environments and the formation of the Panamanian land bridge. However, this split between
the ancestors of the Amazon and the Orinoco capybara populations is long after from the
isolation of both the Orinoco and the Amazon basins. Following Hoorn (1993, 1994) and
Hoorn et al., (1995, 2010), both basins were isolated around 8-10 MYA (Miocene) after the
uplift of the Vaupes Arch in the foreland of the Andes. The last important temporal split
separated the Northern Colombian (trans-Andean) and the Amazonian Napo River
populations, 1.7 MYA, and also coincided with the arrival of the Pleistocene. This means that
in that period, the Magdalena River basin and some Amazonian river basins (at least, the
Napo River basin) were probably connected and that northern Colombian animals migrated
towards the Northern Amazon, although the orogeny of the Eastern Cordillera of Colombia
and the isolation of the Magdalena drainage basin was around 12–11 MYA (Hoorn et al.,
1995; Guerrero, 1997). The haplotype temporal splits within the northern Colombia clade
began 1 MYA, which agrees quite well with the Pre-Pastonian glacial period (1.30-0.8
MYA), which was very cold and dry. These last results have extremely important
consequences to the systematics of the capybara. The Northern capybara population could be
not considered as a different species because part of their descendants are now in the Amazon
and they likely intermix with the other original Western Amazon haplogroup. This means that
there is probably reproductive cohesiveness between these different haplogroups and that they
follow the Biological Species definition of Mayr (1942, 1963). Peceño (1983) showed that the
capybara from the Venezuelan Llanos has 66 chromosomes (FN = 102), whilst one specimen
from the Lake Maracaibo basin (Northern population) had 64 chromosomes (FN = 104). This
karyotype could be derived from the first by one pericentric inversion and one Robertsonian
change. But polymorphic and variable chromosome numbers can be found within the same
species. Additionally, Peceño (1983) analyzed 44 enzymatic blood loci and estimated very
small genetic distances (D = 0.0056) between two samples representing the two “a priori”
capybara species (37 individuals from the Apure state and 16 from the Maracaibo Lake
basin). Thus, these enzymatic loci, as well as our mitochondrial sequence results, do not agree

with two well differentiated capybara species. Thus, the Northern Colombian population (and
probably the Panama one) should be considered as a subspecies (H. h. isthmus). It may also
be preferable for these different capybara populations to be considered ESUs (Evolutionary
Significant Units; Moritz, 1994). If we follow this line of reasoning, we should consider, at
least, four ESUs: (1) Western Amazon, (2) Colombian Eastern Llanos and Guainia, (3)
Northern and Pacific Colombia and (4) the Napo River. However, the level of hybridization
must be investigated as well as the geographical frontiers of the Napo River population. Many
additional ESUs could potentially exist in other geographical areas of South America that we
have yet to investigate.
The mt Cyt-b phylogenetic tree was less informative than the previous one because the
number of samples was considerably lower. This analysis showed the temporal split between
Hydrochoerus and Kerodon to have occurred around 11.88 MYA, which agrees with a
Miocene origin of these Rodent taxa.
In the case of the paca, both markers showed a temporal split around 21-22 MYA among
the ancestors of the paca and the ancestors of capybara and Dasyprocta. Also, both markers
showed a similar temporal split between Cunniculus paca and Cunniculus taczanowski,
around 7-7.5 MYA, during the Upper Miocene. Many mammalian genera appeared at this
time due to strong climatic changes. The mt D-loop showed the initial haplotype
diversification for paca around 4.2 MYA (Pliocene), whilst mt Cyt-b showed this
diversification to occur 6.3 MYA (final phase of Miocene). Other periods of intense
haplotype diversification in paca were 3.5, 2.8 and 1.6 MYA, which agrees with the
explanations previously offered for the last rising of the Andes, the formation of the
Panamanian land bridge and the beginning of the Pleistocene. In the case of the paca, the
identity of the ancestral population remains unclear. The mt D-loop showed one exemplar
from the Aripuana River (an affluent of the Madeira River in central Brazilian Amazon) as
the most divergent, but mt Cyt-b showed some exemplars from the Northern Peruvian
Amazon as being the most divergent. Additionally, our samples of paca could contain three
supposed morphological subspecies (C. p. guanta, C. p.mexianae and C. p. paca). However,
our molecular results did not detect a relationship between clades and geographical areas.
Thus, all the samples we analyzed could be interpreted as belonging to one unique ESU.
Additional samples of more diversified geographical areas must be studied in order to
more accurately determine the phylogenetic relationships among different taxa within
capybara and paca. Future studies may find it beneficial to analyze other kinds of molecular
markers such as MHC as well as autosomal and sexual chromosome intron sequences to
ratify, or not, the results herein presented.
ACKNOWLEDGMENTS
Thanks to Dr. Diana Alvarez, Pablo Escobar-Armel, Luisa Fernanda Castellanos-Mora
and Nicolás Lichilín for their respective help in obtaining capybara and paca samples during
the last 18 years. Many thanks go to the Peruvian Ministry of Environment, to the PRODUCE
(Dirección Nacional de Extracción y Procesamiento Pesquero from Peru), Consejo Nacional
del Ambiente and the Instituto Nacional de Recursos Naturales (INRENA), to the Colección
Boliviana de Fauna and CITES Bolivia (Dr. Julieta Vargas) and to the Environment Ministry

at Coca (Ecuador) for their role in facilitating the obtainment of the collection permits in
Peru, Bolivia and Ecuador. The first author also acknowledges and thanks the Ticuna,
Yucuna, Yaguas, Witoto and Cocama Indian communities in the Colombian Amazon, Bora,
Ocaina, Shipibo-Comibo, Capanahua, Angoteros, Orejón, Yaguas, Cocama, Kishuarana and
Alama in the Peruvian Amazon, to the Sirionó, Canichana, Cayubaba and Chacobo in the
Bolivian Amazon and Marubos, Matis, Mayoruna, Kanaimari, Kulina, Maku and Waimiri-
Atroari communities in the Brazilian Amazon for helping to obtain capybara and paca
samples.
REFERENCES
Akaike H. (1974). A new look at the statistical model identification. IEEE Transactions on
Automatic Control AC 19: 716–723.
Aldana-Domínguez J, Forero J, Betancur J, Cavelier J. (2002). Dinámica y estructura de la
población de chigüiros (Hydrochaeris hydrochaeris: Rodentia, Hydrochaeridae) de Caño
Limón, Arauca, Colombia. Caldasia 24: 445-458.
Aldana-Domínguez J, Vieira-Muñoz MI, Ángel-Escobar DC. (2007). Estudios sobre la
ecología del chigüiro (Hydrochoerus hydrochaeris), enfocados a su manejo y uso
sostenible en Colombia. Instituto Alexander von Humboldt, Bogotá D.C., Colombia.
Azcarate T. (1980). Sociología y manejo del capibara (Hydrochoerus hydrochaeris). Doñana
Acta Vertebrata 7-8:1-228.
Beerli P. (1998). Estimation of migration rates and population sizes in geographically
structured populations. In: Advances in molecular ecology (Ed. Carvalho G). NATO-ASI
workshop series. IOS Press, Amsterdam. pp. 39-53.
Beerli P. (2006). Comparison of Bayesian and maximum likelihood inference of population
genetic parameters. Bioinformatics 22:341-345.
Beerli P. (2009). How to use migrate or why are markov chain Monte Carlo programs dicult
to use? In G. Bertorelle, M. W. Bruford, H. C. Haue, A. Rizzoli, and C. Vernesi, editors,
Population Genetics for Animal Conservation, volume 17 of Conservation Biology,
pages 42-79. Cambridge University Press, Cambridge UK.
Beerli P, Felsenstein J. (1999). Maximum-likelihood estimation of migration rates and
effective population numbers in two populations using a coalescent approach. Genetics
152: 763-773.
Beerli P, Felsenstein J. (2001). Maximum likelihood estimation of a migration matrix and
effective population sizes in n subpopulations by using a coalescent approach. Proc Natl
Acad Sci USA 98:4563-4568.
Benton MJ, Donoghue PC. (2007). Paleontological evidence to date the tree of life.
Molecular Biology and Evolution 24: 26-53.
Bohonak AJ. (2002). IBD (Isolation by Distance): A program for Analyses of isolation by
distance. Journal of Heredity 93: 153-154.
Cabrera A. (1961). Catálogo de los mamíferos de América del Sur. Revista del Museo
Argentino de Ciencias Naturales. Ciencias Zoológicas 4: 309 – 732.
Campos-Krauer JM, Wisely, SM. (2010). Deforestation and cattle ranching drive rapid range
expansión of capibara in the Gran Chaco ecosystem. Glob. Change Biol. 10: 1-13.

Cardich A. (1964). Lauricocha: Fundamentos para una prehistoria de los Andes centrales.
Studia Praehistorica I. Centro Argentino de Estudios Prehistóricos, Buenos Aires,
Argentina.
Clapperton C. (1993). Quaternary geology and geomorphology of South America. Elsevier,
Amsterdam, The Nederlands. pp. 1-489.
Collet S. (1981). Population characteristics of Agouti paca (Rodentia) in Colombia.
Publications of the Museum, Michigan State University, Biological Series 5: 485-602.
Cordero GA, Ojasti J. (1981). Comparison of capybara populations of open and forested
habitats. Journal of Wildlife Management 45: 267-271.
Da Silva MNF, Patton JL. (1998). Molecular phylogeography and the evolution and
conversvation of Amazonian mammals. Molecular Ecology 7:475-486.
Dollfus O. (1974). La cordillera de los Andes: Presentación de los problemas
geomorfológicos. Bulletin de l’Institut Francois d’Etudes Andines 3: 1-36.
Drummond AJ, Rambaut A. (2007). BEAST: Bayesian evolutionary analysis by sampling
trees. BMC Evolutionary Biology 7: 214.
Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. (2006). Relaxed phylogenetics and dating
with confidence. PLOS Biology, 4: e88.
Eisenberg JF, O’Connell, MA, August, PV. (1979). Density, productivity and distribution of
mammals in two Venezuelan habitats. In Eisenberg, JF (Ed.). Vertebrate ecology in the
northern Neotropics. Smithsonian Institution Press, Washington, DC. Pp. 187-207.
Excoffier L, Lischer HEL. (2010). Arlequin suite ver 3.5: a new series of programs to
perform population genetics analyses under Linux and Windows. Molecular Ecology
Resources 10: 564–567.
Felsenstein J. (1981). Evolutionary trees from DNA sequences: A maximum likelihood
approach. Journal of Molecular Evolution 17: 368-376.
Forasiepi A, Martinelli A, Blanco J. (2007). Bestiario Fósil. Mamíferos del Pleistoceno de la
Argentina. Editorial Albatros SACI, Buenos Aires, Argentina. Pp. 1-190.
Forster P, Harding R, Torroni A, Bandelt H-J. (1996). Origin and evolution of Native
American mtDNA variation: a reappraisal. American Journal of Human Genetics
59:935–945.
Fu Y-X. (1997). Statistical tests of neutrality against population growth, hitchhiking and
background selection. Genetics 147: 915-925.
Fu Y, Li W. (1993). Statistical Tests of Neutrality of Mutations. Genetics 133: 693-709.
Gallina S. (1981). Contribución al conocimiento de los hábitos alimenticios del tepezcuintle
(Agouti paca Lin.) en Lancajá-Chansayab, Chiapas. Publicación del Instituto de Ecología
de México 6: 55-67.
Geraldes A, Basset P, Gibson B, Smith KL, Harr B, Yu HT, Bulatova N, Ziv Y, Nachman
MW. (2008). Inferring the history of speciation in house mice from autosomal, X-linked,
Y-linked and mitochondrial genes. Molecular Ecology 17:5349–5363.
Guerrero J. (1997). Stratigraphy, sedimentary environments and the Miocene uplift of the
Colombian Andes. In: Kay, R.F., Madden, R.H., Cifelli, R.L., Flynn, J.J. (eds) Vertebrate
Paleontology in the Neotropics: The Miocene Fauna of La Venta, Colombia. Washington
DC: Smithsonian Institution Press, pp. 15–43.
Glanz W. (1983). The terrestrial mammal fauna of Barro Colorado Island: censuses and long-
term changes. In Leigh E, Rand A, Windsor D (Eds.). The tropical ecology of a tropical

forest: seasonal rhythms and long term changes. Smithsonian Institution Press,
Washington, DC. pp. 1-468.
Goldman, E.A. (1912). New mammals from eastern Panama. Smithson. Misc. Collect. 60: 1-
18.
Hall, E.R. (1981). The mammals of North America. Second Ed. New York, New York: John
Wiley and Sons.
Hardouin EA, Chapuis J-L, Stevens MI, van Vuuren BJ, Quillfeldt P, Scavetta RJ, Teschke
M, Tautz D. (2010). House mouse colonization patterns on the sub-Antarctic Kerguelen
Archipelago suggest singular primary invasions and resilience against re-invasion. BMC
Evolutionary Biology 10: 325–339.
Harpending HC. (1994). Signature and ancient population growth in a low-resolution
mitochondrial DNA mismatch distribution. Human Biology 66: 591-600.
Harpending HC, Sherry ST, Rogers AR, Stoneking M. (1993). Genetic structure of ancient
human populations. Current Anthropology 34: 483-496.
Harrison-Matthews L. (1977). La vida de los Mamíferos II. Destino, Barcelona, España.
Hellberg ME. (1994). Relationships between inferred levels of gene flow and geographic
distance in a philopatric coral, Balanophyllia elegans. Evolution 48: 1829-1854.
Hernández-Camacho J, Pachón JE, Rodríguez JV. (1983). Evaluación de las poblaciones de
chigüiro (Hydrochaeris hydrochaeris) en los hatos Brasilia, Guamito, La Aurora, La
Borra, El Danubio, La Veremos y Mapurisa, municipio de Hato Corozal, Casanare.
Informe presentado a Instituto Nacional de los Recursos Naturales Renovables y del
Ambiente- INDERENA. Bogotá, Colombia.
Heyer E, Zietkiewicz E, Rochowski A, Yotova V, Puymirat J, Labuda D. (2001).
Phylogenetic and familial estimates of mitochondrial substitution rates: study of control
region mutations in deep-rooting pedigrees. American Journal of Human Genetics 69:
1113–1126.
Higuera C. (2001). Condena por la masacre de 10 000 chigüiros. El Espectador, 12 de julio,
Bogotá.
Ho SYW, Saarma U, Barnett R, Haile J, Shapiro B. (2008). The effect of inappropriate
calibration: three case studies in molecular ecology. PLoS ONE 32: e1615.
Hoorn C. (1993). Marine incursions and the influence of Andean tectonics on the Miocene
depositional history of northwestern Amazonia: Results of a palynostratigraphic study.
Palaeogeogr. Palaeocl. 105: 267–309.
Hoorn C. (1994). An environmental reconstruction of the palaeo-Amazon River system
(Middle to Late Miocene, NW Amazonia). Palaeogeogr. Palaeocl. 112: 187–238.
Hoorn C, Guerrero J, Sarmiento GA, Lorente MA. (1995). Andean tectonics as a cause for
changing drainage patterns in Miocene northern South America. Geology 23: 234–240.
Hoorn C, Wesselingh, FP. (2010). Amazonia: landscape and species evolution. A look into
the past. Wiley-Blackwell Publishing Ltd. Pp. 1-446.
Horai S, Hayasaka K, Kondo R, Tsugane K, and Takahata N. (1995). Recent African origin
of modern humans revealed by complete sequences of hominoid mitochondrial DNAs.
Proc Natl Acad Sci USA 92:532–536.
Hudson RR. (2000). A new statistic for detecting genetic differentiation. Genetics 155: 2011-
2014.
Hudson RR, Boss DD, Kaplan NL. (1992a). A statistical test for detecting population
subdivision. Molecular Biology and Evolution 9: 138-151.

Hudson RR, Slatkin M, Maddison WP. (1992b). Estimations of levels of gene flow from
DNA sequence data. Genetics 132: 583-589.
Johnson WE, O’Brien SJ. (1997). Phylogenetic reconstruction of the Felidae using 16S rRNA
and NADH-5 mitochondrial genes. Journal of Molecular Evolution 44 (Suppl 1): S98-
S116.
Johnson WE, Eizirik E, Pecon-Slattery J, Murphy WJ, Antunes A, Teeling E, O’Brien SJ.
(2006). The Late Miocene radiation of the modern Felidae: a genetic assessment. Science
311: 73–77.
Kimura M. (1980). A simple method for estimating evolutionary rates of base substitutions
through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:
111-120.
Koepfli K-P, Gompper ME, Eizirik E, Ho CC, Linden L, Maldonado JE, Wayne RK. (2007).
Phylogeny of the Procyonidae (Mammalia: Carnivora): molecules, morphology and the
Great American Interchange. Molecular Phylogenetics and Evolution 43: 1076-1095.
Koepfli KP, Deere KA, Slater GJ, Begg C, Begg K, Grassman L, Lucherini M, Veron G,
Wayne RK. (2008). Multigene phylogeny of the Mustelidae: Resolving relationships,
tempo and biogeographic history of a mammalian adaptive radiation. BMC Evolutionary
Biology 6: 10.
Lavergne A, Ruiz-García M, Catzeflis F, Lacote S, Contamin H, Mercereau-Puijalon O,
Lacaste A, Thoisy B. (2010). Taxonomy and phylogeny of squirrel monkey (genus
Saimiri) using cytochrome b genetic analysis. American Journal of Primatology 72: 242-
253.
Lemke TO. (1981). Wildlife management in Colombia: The first ten years. Wildl. Soc. Bull.
9:28-36.
Librado P, Rozas J. (2009). DnaSP v5: A software for comprehensive analysis of DNA
polymorphism data. Bioinformatics 25: 1451-1452 | doi: 10.1093/bioinformatics/btp187.
Macdonald DW. (1981). Dwindling resource and the social behaviour of capybaras
(Hydrochoeris hydrochaeris) (Mammalia). Journal of Zoology 194:371-391.
Maldonado-Chaparro A, Bernal-Parra LM, Forero-Acosta G, Ruiz-García M. (2011).
Estructura genética de un grupo de capibaras (Hydrochoerus hydrochoeris,
Hydrocheridae, Rodentia) mediante marcadores microsatélites en los Llanos Orientales
colombianos. Revista de Biología Tropical (International Journal of Tropical Biology)
59: 1777-1793.
Mantel NA. (1967). The detection of disease clustering and a generalized regression
approach. Cancer Research 27: 209-220.
Mayr, E. (1942). Systematics and the Origin of Species. Columbia University Press. New
York, USA. pp. 1-334.
Mayr, E. (1963). Animal Species and Evolution. Harvard University Press: Cambridge,
Massachusetts.
Mau B. (1996). Bayesian phylogenetic inference via markov chain Monte Carlo methods.
University of Wisconsin, Madison.
Mau B, Newton M, Larget B. (1999). Bayesian phylogenetic inference via markov chain
montecarlo methods. Biometrics 55: 1-12.
Mendoza A. (1991). El Chigüiro: Una especie antigua en el Nuevo Mundo. Revista Humbolt
91:80–87.

Michaux JR, Chevret P, Filippucci MG, Macholan M. (2002). Phylogeny of the genus
Apodemus with a special emphasis on the subgenus Sylvaemus using thenuclear IRBP
gene and two mitochondrial markers: cytochrome b and 12S rRNA. Molecular
Phylogenetics and Evolution 23: 123–136.
Mones A, Ojasti AM. (1986). Hydrochaeris hydrochaeris. Mammalian Species 264: 1–7.
Moritz C. (1994). Defining evolutionary significant units for conservation. Trends in Ecology
and Evolution 9: 373-375.
Nabholz B, Glemin S, Galtier N. (2008). Strong variations of mitochondrial mutation rate
across mammals - the longevity hypothesis. Molecular Biology and Evolution 25: 120-
130.
Nabholz B, Glémin S, Galtier N. (2009). (The erratic mitochondrial clock: variations of
mutations rate, not population size, affect mtDNA diversity across birds and mammals.
BMC Evolutionary Biology 9: 54 doi:10.1186/1471-2148-9-54.
Ojasti J. (1973). Estudio biológico del chigüire o capibara. Fondo Nacional de
Investigaciones Agropecuarias, Caracas, Venezuela.
Ojasti J. (1983). Consumo de fauna por una comunidad indígena en el Estado Bolivar,
Venezuela. Symposio Conservación y Manejo de Fauna Silvstre Neotropical 9: 45-50.
Ojasti, J. (1996). Wildlife Utilization in Latin America: Current Situation and Prospects for
Sustainable Management. (FAO Conservation Guide - 25). Food and Agriculture
Organization of the United Nations – FAO, Rome. pp. 224.
Ojeda RA, Mares MA. (1982). Conservation of South American mammals: Argentina as a
paradigm. En Mares, M.A. y Genoways, H.H. (Eds.). Mammalian Biology in South
America. Univ. Pittsburgh, pp. 505-521.
Opazo, J.C., (2005). A molecular timescale for caviomorph rodents (Mammalia,
Hystricognathi). Molecular Phylogenetics and Evolution 37: 932–937.
Ouhoud-Renoux F. (1998). Se nourrir à Trois Sauts: Analyse diachronique de la prédation
chez les Wayapi du Haut-Oyapock (Guyane française). Journal of d’Agriculture
Traditionnelle Traditionnelle et de Botanique Appliquée, Revue d’Ethnobiologie 40: 181–
206.
Paschoaletto K, Ferraz M, Lechevalier M, Zarate di Couto H, Martins L. (2003). Damage
caused by capybaras in a corn field. Scientia Agricolas 60: 191–194.
Pascual R, Ortega-Hinojosa EJ, Gondar D, Tonni E. (1967). Las edades del cenozoico
mamalífero de la Provincia de Buenos Aires. In Pascual R (Ed.). Paleontografía
bonaerense. La Plata. Pp. 1-202.
Peceño MC. (1983). Estudio citogenético y genético evolutivo del chigüire género
Hydrochaeris. Trabajo Especial de Grado, Universidad Simón Bolívar, Caracas.
Posada D, Crandall KA. (1998). MODELTEST: testing the model of DNA substitution.
Bioinformatics 14: 817-818.
Rajabi-Maham H, Orth A, Bonhomme F. (2008). Phylogeography and postglacial expansion
of Mus musculus domesticus inferred from mitochondrial DNA coalescent, from Iran to
Europe. Molecular Ecology 17:627–641.
Ramos-Onsins SE, Rozas J. (2002). Statistical properties of new neutrality tests against
population growth. Molecular Biology and Evolution 19: 2092–2100.
Rannala B, Yang Z. (1996). Probability distribution of molecular evolutionary trees: A new
method of phylogenetic inference. Journal of Molecular Evolution 43: 304-311.

Rodbell DT, Seltzer GO. (2000). Rapid ice margin fluctuations during the Younger Dryas in
the tropical Andes. Quaternary Research 54: 328-338.
Rogers AR, Harpending HC. (1992). Population growth makes waves in the distribution of
pairwise genetic differences. Molecular Biology and Evolution 9: 552-569.
Rogers AR, Fraley AE, Bamshad MJ, Watkins WS, Jorde LB. (1996). Mitochondrial
mismatch analysis is insensitive to the mutational process. Molecular Biology and
Evolution 13: 895-902.
Rowe DL, Dunn KA, Adkins RM, Honeycutt RL. (2010). Molecular clocks keep dispersal
hypotheses afloat: evidence for trans-Atlantic rafting by rodents. Journal of
Biogeography 37: 305–324.
Ruiz-García M. (2010). Changes in the demographic trends of pink river dolphins (Inia) at the
micro-geographical level in Peruvian and Bolivian rivers and within the Upper Amazon:
Microsatellites and mtDNA analyses and insights into Inia´s origin. In Ruiz-García M.,
Shostell, J. (Eds.). Biology, Evolution, and Conservation of River Dolphins Within South
America and Asia. Nova Science Publishers., Inc. New York, USA. Pp. 161-192.
Ruiz-García M, Mejia D, Escobar-Armel P, Tejada-Martinez D, Shostell JM. (2013a).
Molecular identification and historical demography of the marine tucuxi (Sotalia
guianensis) at the Amazon River’s mouth by means of the mitochondrial control region
gene sequences and implications for conservation. Diversity 5: 703-723.
Ruiz-García M, Lichilín N, Jaramillo MF. (2013b). Molecular phylogenetics of two
Neotropical Carnivores, Potos flavus (Procyonidae) and Eira Barbara (Mustelidae): No
clear existence of putative morphological subspecies. In Ruiz-García M, Shostell JM
(Eds). Molecular Population Genetics, Evolutionary Biology and Biological
Conservation of Neotropical Carnivores. Nova Science Publishers., Inc. New York
(USA). Pp. 37-84.
Sambrock J, Fritsch E, Maniatis T. (1989). Molecular Cloning: A Laboratory manual. 2nd
edition. V1. Cold Spring Harbor Laboratory Press. New York.
Savolainen P, Zhang YP, Luo J, Lundeberg J, Leitner T. (2002). Genetic evidence for an East
Asian origin of domestic dogs. Science 298: 1610-1613.
Simonsen K, Churchill G, Aquadro C. (1995). Properties of Statistical Tests of Neutrality for
DNA Polymorphism Data. Genetics 141: 413-429.
Smythe N, Glanz W, Leigh Jr E. (1983). Ecology and population regulation in pacas and
agoutis. In Leigh Jr E (Ed.) Seasonal rhythms and long term changes in tropical forest.
Smithsonian Institution Press, Washington DC. pp. 227-238.
Sokal RR, Rohlf FJ. (1995). Biometry. 3rd edition. W.H. Freeman and Co., New York.
Spradling T, Hafner M, Demastes J. (2001). Differences in rate of cytochrome-b evolution
among species of rodents. Journal of Mammalogy 82:65-80.
Steiner C, Catzeflis F. (2000). Molecular characterization and mitochondrial sequence
variation in two sympatric species of Proechimys (Rodentia: Echimyidae) in French
Guiana. Biochemical Systematics and Ecology 28:963–973.
Swofford DL. (2002). PAUP*. Phylogenetic analysis using parsimony and other methods.
http://paup.csit.fsu.edu. pp. 1-142.
Tajima F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA
polymorphism. Genetics 123: 585-595.

Tamura K, Nei M. (1993). Estimation of the number of nucleotide substitutions in the control
region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and
Evolution 10: 512-526.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar, S. (2011). MEGA5: Molecular
evolutionary genetics analysis using maximum likelihood, evolutionary distance, and
maximum parsimony methods. Molecular Biology and Evolution 28: 2731-2739.
Thompson LG. (1993). Reconstructing the Paleo Enso records from tropical and sub-tropical
ice cores. Bulletin de l’ Institut Francais d’Etudes Andines 22: 65-83.
Thompson LG, Mosley E, Davies ME, Lin PN, Henderson KA, Coledal J, Bolzan JF, Liu
KB. (1995). Huascarán, Perú. Science 269: 46-50.
Trapido H. (1949). Gestation period, young, and maximum weight of the isthmian capibara,
Hydrocherus isthmius Goldman. Journal of Mammalogy 30: 433-437.
Upham NS, Patterson BD. (2012). Diversification and biogeography of the Neotropical
caviomorph lineage Octodontoidea (Rodentia: Hystricognathi). Molecular Phylogenetics
and Evolution 35: 417-429.
Van der Hammen T. (1980). Glaciales y glaciaciones en el Cuaternario de Colombia:
palaeoecología y estratigrafía. Primer Seminario sobre El Cuaternario en Colombia (pp.
44-45). Bogota, Colombia.
Van der Hammen T. (1992). Historia, ecología y vegetación. Editorial Corporación
Colombiana para la Amazonía, Araracuara, Bogotá DC., Colombia. pp. 1-411.
Van der Hammen T, Absy ML. (1994). Amazonia during the last Glacial. Palaeogegraphy,
Palaeoclimatology and Palaeoecology 109: 247-261.
Van Vuuren BJ, Kinet S, Chopelet J, Catzeflis F. (2004). Geographic patterns of genetic
variation in four Neotropical rodents: conservation implications for small game mammals
in French Guyana. Biological Journal of the Linnean Society 81: 203-218.
Viera Antunes K, Medeiros Machado TM, Lopes Serao NV, Facione Guimaraes SE, Rezende
Paiva S. (2010). Genetic diversity of captive spotted paca (Agouti paca) from south east
Brazil assessed by the RAPD-PCR technique. Revista Brasileira de Zootecnia 39: 268-
272.
Walsh PS, Metzger DA, Higuchi R. (1991). Chelex 100 as a medium for simple extraction of
DNA for PCR-based typing from forensic material. BioTechniques 10: 506-513.
Ward RH, Frazier BL, Dew-Jager K, Paabo S. (1991). Extensive mitochondrial diversity
within a single Amerindian tribe. Proc Natl Acad Sci USA 88:8720–8724.
Wayne RK, Geffen E, Girman D, Koepfli K, Lau L, Marshal C. (1997). Molecular
Systematics of the Canidae. Systematic Biology 46: 622-653.
Wilson DE, Reeder DM. (2005). Mammal species of the World. A taxonomic and geographic
reference. Johns Hopkins University, Baltimore, EEUU.
Winge H. (1888). Jordfundne og nuvelende gnavere (Rodentia) fra Lagoa Santa, Minas
Geraes, Brasilien. E Museo Lundii, Copenhague. pp 1-200.
Wright S. (1938). Size of population and breeding structure in relation to evolution. Science
87: 430–431.
Wright S. (1951). The genetical structure of populations. Annals of Eugenics 15: 323–354.

Chapter 10
OMICS TECHNOLOGIES APPLIED

TO PROKARYOTES
Marcus de Barros Braga*, Adonney Allan de Oliveira Veras,

Pablo Henrique Caracciolo Gomes de Sá, Diego Assis das Graças,
Rafael Azevedo Baraúna, Jorianne Thyeska Castro Alves,
Kenny da Costa Pinheiro, Vasco Ariston de Carvalho Azevedo,
Maria Paula Cruz Schneider, Rommel Thiago Jucá Ramos
and Artur Luiz da Costa da Silva
Federal University of Pará, Institute of Biological Sciences,
Genomics and Systems Biology Center – Belém, Pará, Brazil
ABSTRACT
Next-generation platforms provide high-throughput sequencing, but these data must
be evaluated for quality to increase the accuracy of analyses. This considerable amount of
data have benefited omic studies in prokaryotes, such as genomics, metagenomics and
transcriptomics. On genomics, became possible to perform the assembly of whole
genomes, which allowed understand the biological processes and the study of the
evolution in a set of genes using bioinformatics, through software and pipelines. Another
field affected by next-generation sequencing (NGS) data are the study of the microbial
ecology which can be reached by the Whole Metagenome Sequencing (WMS) approach
or through the sequencing of molecular markers, such as the 16S rRNA gene, to
determine the microbial diversity. Advances have also been obtained in transcriptomics,
allowing to understand the functional elements, such as mRNAs, non-coding RNAs and
small RNAs. For transcriptome, two methods are commonly used: reference-based, to
determine the expression level for each gene and the de novo approach, which enables
the identification of unknown elements in the genome. Throughout this chapter we
present the fundamental concepts behind these omics analyzes based on NGS data,
emphasizing its main applications and the most used computational tools.
*
Email: marcusbraga@ufpa.br.

202 M. de Barros Braga, A. A. de Oliveira Veras, P. H. Caracciolo Gomes de Sá et al.
INTRODUCTION
Next-generation sequencing (NGS) technologies can generate a considerable amount of
data [1], and certain research fields, such as genomics, metagenomics and transcriptomics,
have directly benefited. The quality of these data is essential in certain methodological steps,
such as genome assembly and annotation, and in analyses to identify single nucleotide
polymorphisms (SNPs) used in studies of gene expression and diversity. During sequencing,
errors such as insertions, deletions and substitutions may be incorporated into the data, which
requires the use of pre-processing algorithms to treat and ensure data accuracy [1].
Genomics encompasses whole genome mapping, sequencing and functional analysis.
Moreover, genomics enables the study of structure, function and genome comparison [2].
Metagenomics discusses general concepts involving the analysis of microbial
communities, molecular markers used to determine microbial diversity and the concept of
operational taxonomic units (OTUs) and their use in microbial ecology. The analysis of
microbial diversity and whole metagenomic sequencing (WMS) are critical points.
NGS platforms also enabled advances in the field of trancriptomics, which involves the
set of transcripts in a cell under specific physiological conditions or at a specific
developmental stage. The studies in this field identify mRNAs, non-coding RNAs and small
RNAs. With these functional data, it is possible to perform differential expression analysis,
gene annotation correction and new transcripts identification.
NGS DATA
Next-generation sequencing is widely used to designate platforms that use post-Sanger
sequencing methods [3]. With the rapid evolution of these equipments, significant advances
have occurred in the field of genomic sequencing, contributing to a better understanding of
biological systems [4].
With the commercialization of these platforms in 2005, genome sequencing and its
associated projects had leverage compared to previous years when Sanger technology was
predominant [5]. The 454 platform, designed by Roche, was the first NGS technology
marketed. Subsequently, other platforms were developed, such as GA Illumina by Solexa and
SOLiD by Life Technologies, and this new generation of sequencers became widely known
as the second generation [6].
New DNA decoding methods were launched after the second-generation platforms
entered the market, approaches such as Single Molecule Real Time (SMRT), which does not
require amplification, enable high data accuracy and potentially long reads compared to the
previous generation. The PacBio RS System utilizes SMRT and is classified as third
generation. The Ion Torrent PGM platform, also considered third generation [7], developed a
new approach to sequencing by detecting hydrogen ions, which are released when a
nucleotide is incorporated into the elongating strand [1].
Sequencing using these platforms results in reads that represent the decoded bases of
each DNA fragment, and any of the reads may include adenine, thymine, guanine and
cytosine. The length of these reads may vary according to the sequencing platform used [8].

Omics Technologies Applied to Prokaryotes 203
The NGS platforms initially produced large amount of short reads compared to the
previous generation, which used the Sanger methodology, and these reads created challenges
in the genome assembly process [9]. Therefore, it became necessary to acquire more robust
computational structures and develop efficient algorithms for data processing [10].
The main features that differentiate next generation sequencers from equipment using the
Sanger methodology include its low cost, reduced sequencing time and high throughput of the
generated data [3]. Table 1 compares the length and throughput of reads generated by the
main NGS platforms.
Table 1. Most popular NGS platforms
NGS Platform/Company Read Length Throughput

3730xl DNA Analyzer/Life Technologies Up to 900 bp 690 – 2100 Kbases
(http://www.lifetechnologies.com/)
454 GS FLX System/Roche Up to 1000 bp 700 Mb
(http://www.454.com/)
Illumina HiSeq 2500/Illumina 2 x 150 bp 150 – 180 Gb (Dual Flow
(http://www.illumina.com/) Cell)
ABI SOLiD 5500xl/Life Technologies 75 bp ( Fragments) Up to 250 Gb
75x35 bp (Paired-end)
60x60 bp (Mated-paired)
Ion Torrent Personal Genome Machine/Life Up to 400 bp
Technologies Up to 100 Mb (314 chip v2)
Up to 1 Gb (316 chip v2)

Up to 2 Gb (318 chip v2)
PacBio RS System/Pacific Biosciences Over 40,000 bp Up to 1 Gb
(http://www.pacificbiosciences.com/)
The reads produced by these platforms present low quality at the 3' end, which directly
affects the assembly process [9, 11]. Thus, a subsequent data processing step is necessary, for
example, to conduct quality assessment and process low quality bases. This processing step,
called trimming, increases data accuracy [12].
The Phred score is a quality metric used widely in evaluations of NGS data. The value
attributed to each nucleotide detected is calculated by the Phred metric, which measures the
likelihood of error, as shown in Formula 1 [13].
10 , where P is the error probability estimated for each base.
Formula 1. Phred metric for data assessment.
Each read file may have a distinct extension according to the technology that generated
the data. Each extension has its own format for storing sequencing information. The most
simple and universal format is FASTA, which was originally created by Bill Pearson as an

input format for his FASTA program [14]. Figure 1 shows an example of a FASTA file
obtained from the NCBI sequence read archive (SRA, http://www.ncbi.nlm.nih.gov/sra/).
Figure 1. FASTA typical file. 1) Header starting with '>' followed by an identifier text. 2) DNA
decoded bases.
The FASTAQ format [8] was created at the Wellcome Trust Sanger Institute by Jim
Mullikin. Figure 2 shows an example of a FASTAQ file obtained from the NCBI sequence
read archive (SRA: http://www.ncbi.nlm.nih.gov/sra/).
Figure 2. FASTAQ typical file. 1) Header starting with '@' followed by an identifier text. 2)
Nucleotides decoded sequence. 3) '+' symbol indicatind the end of sequence-based. 4) Phred quality
values (in ASCII format) for each base.
Another format introduced by ABI SOLiD was the Color Space FASTA (CSFASTA),
which uses a coding system with 16 possible combinations of 4 numbers and 4 distinct colors
(4 colors represented as 0, 1, 2 or 3 and A, C, G or T, respectively). Figure 3 shows a color
and number matrix used to decode CSFASTA files [15].
In all of these techniques, small DNA fragments are sequenced and then aligned to
reconstruct the original genome [16]. Because of this fragmentation, it is necessary to
generate a genomic DNA library to make genome sequencing possible. The sizes of the DNA
fragments in the library vary according to the platform used. Each sequencer has its own
library generation protocol. The fragment libraries (or single-end) allow the sequencer to
generate a partial read of the sequenced template DNA fragment at one end only. By contrast,
the paired libraries (mate-pair and paired-end) enable sequencing at both ends of a fragment.
Figure 3. CSFASTA typical file.
The main difference between paired-end and mate-pair libraries is the distance between
the two ends of the fragment being sequenced. A paired-end protocol generates a library

sequenced with DNA fragments between 200 and 800 base pairs (bp). However, a mate-pair
protocol generates libraries with DNA fragments between 2 and 5 Kb, in which the ends are
repaired with labeled dNTPs and later circularized and fragmented again. Only fragments
with labeled dNTPs are maintained (corresponding to the ends of the original DNA fragment)
and purified for sequencing. These techniques facilitated the detection of genomic
rearrangements and repetitive elements in the sequence and the identification of new
transcripts [17].
Figure 4. Libraries used in NGS platforms. Single-end sequences only one end of the DNA fragment.
Paired-end enables sequencing at both ends of a DNA fragment between 200 and 800 bp. Mate-pair is
used for DNA fragments between 2 and 5 Kb. The fragment is circularized and the ends are ligated with
labeled dNTPs. Subsequently, fragmentation occurs, and only the fragments containing labeled dNTPs
(400 – 600 bp) are sequenced. The arrows indicate the sequencing direction.
NGS technologies cannot sequence 100% accurately. Therefore, some sequencing errors
such as insertions, deletions and substitutions can occur. Such errors are evident on the 454
and Ion Torrent PGM platforms, which generate base insertion and deletion (INDEL) errors
in the sequencing of homopolymeric regions [18]. These sequencing errors difficult both
reference and de novo analysis because they reduce the accuracy of the reading alignments
[19].
Unrepresented regions, known as gaps, are also observed. These gaps may be related to
computational limitations but may also be associated with low-coverage sequencing regions
because of certain features such as the G+C content in each organism [20]. Low coverage
because of G+C content is a type of coverage bias known as GC bias, which can affect NGS
data assessment [21]. The average G+C content of an organism may be determined by
calculating the total sum of guanine and cytosine bases and dividing this value by the total
number of bases in the genome. Next, the resultant value is multiplied by 100 to convert the
result to a percentage. This bias has been observed in regions with both high or low G+C
content [20].

GENOMICS
The complete set of genes that constitutes the genetic information contained in the
chromosomes (monoploid or haploid) of an organism is called a genome, which is a term
coined in 1920 by Hans Winkler, professor of botany at the University of Hamburg in
Germany. The study of the genome, designated genomics, is a field within genetics
addressing genome sequencing, mapping and analysis. Therefore, genomics aims to
understand the organization and function of genes and genetic information [2, 22]. Genomics
is divided into structural, functional and comparative genomics, all considered to be part of
the "post-genomics era" [22, 23].
Structural genomics is defined as the study of genomic organization and structure, which
comprises coding sequences (CDS), open reading frames (ORFs), ribosomal RNA (rRNA)
and transfer RNA (tRNA). The identification of the location of a gene and other molecular
markers assists in the functional understanding of the genomic structural elements [23, 24, 25,
26, 27]. This identification is represented by a genetic map defined by molecular markers.
These maps are generated according to gene recombination frequencies (chromosomal
genetic map), using the physical molecular distance (physical map) or the location of
cytological features of the chromosomes (cytogenetic map). These maps may be correlated
but should show the identical genomic structure, maintaining marker order [28].
Knowing the DNA sequence alone does not indicate how the genome works. For this
purpose, functional genomics enabled the analysis of gene function, which encompasses
changes in genome functioning at different stages of development or environmental
conditions [2, 29]. This is a broad term, which includes transcriptomics and proteomics. The
transcriptome comprises a set of transcripts, i.e., mRNA, and is thus related to gene
expression in the organism [30]. The proteome represents the set of proteins expressed in a
genome or tissue and is modified according to extracellular influences [31, 32].
Another subdivision of genomics that has been increasingly used is comparative
genomics. This field began in 1995 with the publication of two bacterial genome sequences
[33, 34]. This branch of genomics studies distinct genomes, for example, by comparing
nucleotide/amino acid sequences. Research in this field may assist in the discovery of
therapeutic and biological targets [35]. Other fields originated from these divisions:
metabolomics, which investigates biochemical pathways [36]; nutrigenomics, which studies
the effect of nutrients on gene expression [37]; physiomics, which investigates the interaction
of the physiological processes of an organism [38]; pharmacogenomics, which studies the
correlation between the genetic composition of an organism and its response to drugs [39];
metagenomics, which analyzes the genome of a bacterial community independently [40, 41];
and interatomics, which is the set of interaction between all macromolecules in a cell [22].
Advances in these fields have allowed investigators to identify new molecular targets for
disease prevention, diagnosis and treatment [42, 43].
Genome Assembly
The assembly of genomic sequences consists of pairing the reads and considering their
identity to one another to recreate the target genome [44].

The assembly process occurs in two manners: assembly by reference, which consists of
using a reference genome that features phylogenetic proximity to the target genome studied;
and the de novo approach, which performs the assembly solely based on overlapping
readings, without using a reference [45].
Depending on the sequencing errors, it is necessary to perform data pre-processing before
starting the assembly. Among the relevant procedures for this step, we emphasize read quality
evaluation, which can be performed with the graphical tool FastQC (http://www.bio
informatics.babraham.ac.uk/) enabling read quality assessment through histograms and
identifying low quality regions. Read quality evaluation supports decision-making in regard
to the removal of low quality bases (trimming) [46, 47]. The fastx-toolkit software
(http://hannonlab.cshl.edu/fastx_toolkit/) may be used in the trimming step to remove any
bases showing a quality lower than desired.
After data pre-processing, assembly can proceed using different approaches such as
greedy algorithms, overlap-layout-consensus (OLC) and a de Bruijn graphs [48].
The greedy algorithm initiates the assembly process by comparing each read with all
others in search for the best overlaps. This process is repeated until no read is left. In the next
step, the overlapping values are used to group reads to assemble contiguous sequences
(contigs) [48]. TIGR Assembler [49], SHARCGS [50] and VCAKE [51] are examples of
algorithms that use this paradigm.
The OLC searches for the best overlap values similarly to the greedy algorithms;
however, this information is organized into a graph in which each vertex represents a read and
the connecting edges are their overlaps [47]. The Celera Assembler [52], Mira [53], Newbler
[54] and Edena [55] are assemblers that use OLC.
In the de Bruijn graph, the reads are divided into sub-reads of fixed length called k-mers.
These k-mers are used to generate a graph in which each vertex represents a k-mer and the
edges are overlaps with k-1 bases (for example, when comparing two k-mers of 5, AGCAG
and GCAGT, they have exactly 4 shared bases, GCAG). However, the search step for the best
overlap values is not performed, resulting directly in the reduction of the computational effort
necessary to perform the task [48]. The ALLPATHS [56], Velvet [57], ABySS [58],
SOAPdenovo2 [59] and SPAdes software [60] are de Bruijn implementations.
The main assembly softwares for the NGS data and some of their features are shown in
Table 2 below.
The final assembly step is scaffold generation. This process uses paired reads to orient
and order the contigs produced in the assembly thus generating the scaffolds because it is
possible to use the estimated distance between the paired reads to determine the distance
between contigs and resolve/close the gaps present in the scaffold [61].
Some assemblers feature modules for automatic scaffold generation during the assembly
process. Therefore, at the end of the process, two output files are generally created: one
containing the contigs and the other containing the scaffolds. Among these assemblers, we
cite ABySS [58], SGA [62], SPAdes [60] and SOAPdenovo2 [59]. Nonetheless, there are also
specific programs used only in the contig scaffolding process, among them Bambus2 [63],
GRASS [64], MIP [65], Opera [66], SCARPA [67], SOPRA [68] and SSPACE [69].
It is possible to have gaps after generating the scaffold, and these gaps may be resolved
or decreased by programs such as GapFiller [70] and IMAGE [71], which generate a more
accurate scaffold for subsequent analyses [61].

Table 2. Main genome assemblers for NGS data (adapted from [47])
Assembler Approach Output

ABYSS de Bruijn Contigs and Scaffolds
ALLPATHS-LG de Bruijn Contigs and Scaffolds
Celera OLC Contigs and Scaffolds
Edena OLC Contigs
Euler de Bruijn Contigs and Scaffolds
Fermi OLC Contigs
Forge OLC Contigs and Scaffolds
Newbler OLC Contigs
PASHA de Bruijn Contigs
SGA OLC Contigs and Scaffolds
SHARCGS Greedy Contigs
Shorty OLC Contigs and Scaffolds
SOAPdenovo2 de Bruijn Contigs and Scaffolds
SSAKE Greedy Contigs
Vcake Greedy Contigs
Velvet de Bruijn Contigs and Scaffolds
Genome Annotation
Genome annotation is the computational process of attaching biologically relevant

information to sequenced genomic data [46]. Similar to the sequencing step, the bacterial
genome annotation process became highly automated recently. The interpretation of these
DNA sequencing data involves gene, protein, regulatory and/or metabolic pathway
identification and annotation. This process is typically conducted using sequence annotation
pipelines, a variety of software and, in some cases, human expertise to handle the
automatically generated annotations [72].
The annotation process may be conceptually divided into two phases. The computational
or automatic phase uses several lines of evidence from other genomes or transcriptome data
from specific species in parallel to generate the initial gene and transcript prediction. In the
next phase, manual curation, all annotated information in the automatic process is reviewed
and then summarized in a final annotation [46]. The majority of annotation mechanisms use
homology-based methods to transfer information from a close reference genome to the new
sequence [73].
With increasing genome sequencing, less time remains to manually annotate these
genomes, which also increases dependency on automatic annotation pipelines. However, the
use of fully automated pipelines may lead to error introduction and propagation, such as
orthography errors, the use of identical gene names but different product names, errors in
distinguishing between orthologous and paralogous, among others, which may create
inconsistent and incorrect annotations. Therefore, the importance of manual curating is

obvious; it may detect and remove such errors [74]. One software generally used for curating
activities is Artemis [75].
An obvious consequence to increasing the scale of sequencing was an growing in the
number of annotated genomes submitted to public databases. In turn, these sequence
databases introduced more stringent requirements to submit annotated genomes (The
Bacterial Genome Submission Guide - http://www.ncbi.nlm.nih.gov/genbank/genomesubmit/;
Genome Project Submission Account guidelines – http://www.ebi.ac.uk/ena/submit).
Prokaryotic Genome Annotation
The general bacterial genome annotation process consists of several steps, as shown in
Figure 5. The nucleotide sequence is initially subjected to a gene prediction mechanism in
order to identify potential coding regions. The result is then compared to a reference genome
deposited in a public database. If there is homology, the reference annotation is used,
otherwise the genome receives a hypothetical protein characterization because their functions
are unknown. The next step is to search for specific protein functions to predict protein
domains and motifs, and then, an annotation is attributed. The final step is to predict other
features such as tRNA and rRNA [73].
Figure 5. Generic genomic annotation process.
In many cases, there is a lineage/biovar of a close species that has been previously
sequenced and annotated. Most annotation pipelines employ some gene prediction software.
One example is Glimmer [76], which uses a set of sequences as a reference to train a model
and then uses this model to predict coding regions in the genome of interest [77]. Among the

softwares that uses ab initio algorithms for gene prediction, GeneMarkS [78], Prodigal [79]
and Glimmer are the most popular [80]. Another alternative approach for gene discovery is
through extrinsic methods, by which ORFs are directly identified though comparisons with
protein databases [81, 82]. Once the coding regions are identified, they are aligned and
compared with a reference genome annotation. Another option is to submit the sequence to
UniProt [83], which is a free web service with high accuracy, used as a source of protein
sequences and functional information and that provides fast sequence alignment tools such as
FASTA [84] and BLAST [85]. With UniProt, the higher similarity cases are accepted as
homologous, and annotation is attributed to genes receiving a high similarity score.
Additional information, including tRNA and rRNA, may be added using prediction softwares
such as tRNAscan-SE [86] and RNAmmer [87]. Other public databases include Pfam [88],
which offers a broad collection of protein families, and InterProScan 5 [89], which combines
different protein recognition methods in a single environment.
A series of automatic prokaryotic annotation pipelines has been published. Some
pipelines are web-based, such as RAST [90], BASys [91], WeGAS [92] and
MaGe/Microscope [93], and others are systems for local use (desktop) such as AGeS [94],
DIYA [95] and PIPA [96]. There is also MICheck [97], which checks for syntax errors in
annotated sequences. All these systems perform the steps mentioned in Figure 5, adding
particular routines for error verification or additional information acquiring.
When genomes are submitted to a public genome database, such as GenBank or EMBL,
some mandatory information must be provided, such as CDS and RNA structural features.
However, other features, such as the start and stop codons, ribosomal-binding sites (RBSs),
conserved motifs/domains, horizontal gene transfer regions, repeats, among others, must be
added. There is a variety of specific software for this type of identification [98, 99].
Occasionally, the gene prediction software may attribute an inexact start and stop
location for a gene, and this must be checked and corrected. Glimmer, during prediction,
locates the start region based on the first start codon for each gene. The RBSFinder [100]
software, in turn, locates the exact start of the gene by searching for an RBS, which is
generally characterized by the sequence known as Shine–Dalgarno [101]. This technique is
performed after using the prediction software. The TransTerm application [102] searches for
rho-independent transcription terminators to attribute the correct stop site. In addition to
correcting the start and stop sites, these features must be added to the annotation using the
labels “RBS” and “terminator”,’ respectively.
Conserved regions within proteins, such as motifs and domains, must also be added to the
annotation after the gene discovery step. Several databases, such as PROSITE [103], PRINTS
[98], Pfam [88], and InterProScan [104] store protein families for this purpose.
Horizontal gene transfer (HGT) areas, such as pathogenicity islands and prophages, may
be predicted by searching for asymmetry in codon composition and G+C content. Typically,
there is a difference between HGT regions and the remainder of the genome. These
differences are often associated with the presence of integrases, transposases and insertion
sequence elements (ISs) [105]. There is softwares to conduct this type of prediction, such as
IslandPath [106] and SIGI-HMM [107], as reviewed in [108].
There are also specific standards for phage annotation. These should be labeled “source”
with the bacteriophage name in the “organism” qualifier and the sequence type in “type_mol”
(typically genomic DNA). There is no specific annotation marker for genomic islands (GIs).

Hence, these islands must be annotated as “miscellaneous features.” Mobile genetic elements
are annotated with the “mobile_element” label.
Sequence repeats, such as clustered regularly interspaced short palindromic repeats
(CRISPR), and other tandem repeats are of biological interest and may be used to understand
the bacterial defense mechanism [109] and to distinguish closely related strains [110].
Software, such as CRT-CRISPR Recognition Tool [111], CRISPRFinder [112],
CRISPRcompar [113] and databases, such as MICdb [114], which store microsatellites and
provide sequence prediction tools for user-submitted sequences.
Identifying the location of a protein in a cell may indicate its function and may be used to
identify targets in drug development. There are many prediction methods, including
homology and keywords [115], amino acid composition [116, 117] and methods combining
these two [118]. A review of the software available for this purpose is reviewed in [119].
Other techniques such as RNA-seq provide a better indication of the role of proteins and
determine whether they are functional, thus helping with annotation validation because they
are based on real experimental data and not only on homology.
RefSeq (NCBI References Sequence Database), a large multi-species database of curated
sequences representing separate but explicitly connected registries of genomes to transcripts
and translation products, is an attempt to standardize and improve genome annotation quality.
Although more stringent regulations for genome submission have been adopted, a high
degree of variation in the quality of genomes deposited remains. Methods to measure the
quality of the annotation provided are necessary. Evidence qualifiers, such as how the feature
was predicted and which entries the sequence can be aligned with in a given database, and
database versions and dates provide an obvious path to those assessing annotation quality.
Information about the adopted annotation model, indicating whether it was generated
automatically or has been through subsequent manual curation, must also be reported, to
provide to the user conditions to measure and qualify the annotation. Because the current
pipelines do not consider many of the limitations described above, much manual effort
becomes necessary to correct errors and inconsistencies. One of the great current challenges is
to improve these automated procedures to significantly reduce the time for subsequent manual
checking and correcting [73].
In the literature, there are several studies that review and re-annotate genomes, such as
strains of E. coli [120], Mycobacterium tuberculosis [121], and others. Among the current
studies, these genomes tend to be closer to “the gold standard of genomic annotation.”
Some proteins are multi-functional and play various roles and, therefore, the attribution
of a single name is inadequate, depending on the context in which they are expressed.
Attributing a protein name based on its first associated function may be inaccurate. Gene
Ontology (GO) [122] offers a more flexible method to describe a series of functions in an
explicit and concise manner, and the GO annotations include native evidence qualifiers.
However, GO terms are not frequently included in the initial bacterial genome annotation.
The European Bioinformatics Institute - EBI (http://www.ebi.ac.uk/) offers the UniProtKB-
GOA [123] proteome sets, which are GO annotations for all fully sequenced genomes
available on public databases, but they are not included or clearly associated with the
originally submitted genome. GO annotation development and use is encouraged and should
be included in the genome annotation efforts.

Additional data from post-genomic experiments, such as microarray and RNA-Seq data,
may help improve genome annotation [124]. However, it is necessary to define what data
must be included in the annotation and what must be maintained in separate databases.
A consistent bacterial annotation must address both stages, automated and manual, to
offer users a quality measure for the entire genome and individual genes, thus enabling the
user to make a conscious choice independent of reference genomes and annotation transfer
between genomes. The use of GO terms substantially improves protein description and
decreases the number of syntax errors.
METAGENOMICS
Microorganisms are the most abundant and diverse creatures on Earth. They can inhabit
nearly all environments on the planet, even those considered to be most extreme.
Understanding how microbial populations group into communities and how they interact with
one another and the environment is the objective of microbial ecology. The two main
components of the study of microbial ecology are biodiversity and microorganism activity.
To study environmental microbial communities, classic culture-based microbiology
techniques are insufficient to characterize microorganisms with unknown physiology.
Moreover, morphological analyses and biochemical tests are often redundant and do not
allow for the identification of the ecological role of these organisms in the environment.
Therefore, culture-independent molecular techniques are used to better understand the
microbial world. Hence, metagenomics was developed.
Metagenomics is the analysis of the genetic information of microorganisms present in an
environmental sample without culturing, i.e., the DNA of the microbial community is directly
extracted from the environmental sample. Nucleic acid extraction is the initial step in the
analysis of bacterial communities in an independent culture. When the DNA has been
extracted directly from the environmental sample [125], the nucleic acid molecules from the
microorganisms become available, thus enabling analysis through sequencing. Following
DNA extraction, several methods can be utilized, including PCR-based applications to
environmental DNA shotgun sequencing. In general, to analyze microbial diversity, the
environmental DNA acts as a template to amplify the 16S rRNA gene. These amplicons are
then sequenced on the various commercially available NGS platforms. Sequence comparison
allows for biodiversity assessment.
Box 1. Concepts
Ecosystem Complex and dynamic microbial, animal and vegetable communities, together
with their non-living environments, which interact as a functional unit.
Habitat Part of an ecosystem that is more adequate to one or few organism populations.
Population A group of microorganisms of one species that inhabits the identical location
simultaneously. They descend from a single cell.
Amplicon DNA segment obtained from PCR amplification using both DNA or RNA as
template.
Community A group of populations that interact with one another.
Species richness Total number of species present.
Species abundance Proportion of each species in the community.

Another type of approach frequently utilized is whole metagenomic sequencing (WMS),

which consists of the direct sequencing of environmental DNA without running a PCR. In
this type of approach, genes, and often genomes, are accessed, which enables the analysis of
microbial community function. The following section will discuss the diversity assessment
approach based on the 16S rRNA gene in regard to WMS. Microbial ecology terms will be
used, which are defined in Box 1.
Metagenomics for Microbial Diversity
The majority of studies aiming to determine microbial diversity involve the 16S rRNA
gene, but other markers may also be used. The 16S rRNA gene includes variable regions that
provide a taxonomical signature of the microorganism and make it possible to evaluate a
sample’s microbial diversity. This method uses extracted environmental DNA that is then
amplified by PCR using universal oligonucleotides flanking the gene. The 16S rRNA
amplicons theoretically contain copies of the genes of all microbial species present in the
sample. Each 16S rRNA sequence present in the sample represents an operational taxonomic
unit (OTU), defined according to its similarity with other sequences [126]. The number of
OTUs present in the library reflects the sample’s microbial diversity. Diversity indices such
as ACE, CHAO and Shannon derive from the OTU number, and it is possible to estimate the
actual environmental diversity [126]. These analyses are called alpha-diversity and form the
basis of microbial ecology.
Total Metagenomic Sequencing
To evaluate the endogenous microbial communities in a given environment, two

approaches must be employed. In the first approach, single genes are sequenced to describe
the species that are part of the microbial community, for example, 16S rRNA gene
sequencing. Another approach is complete sequencing of the environmental genome: whole
metagenomic sequencing (WMS), without restricting the analyses to a single gene [127].
In this final methodology, environmental DNA is extracted and then sequenced with
NGS sequencers. The reads generated will correspond to the full genetic material of the
microbial community and not only a single gene. According to prior studies [128], the
majority of the reads derive from the genomes of bacteria, Archae and fungi. Because
environmental DNA is heterogeneous, mate-pair or paired-end libraries are more commonly
used for sequencing because when a read assembly is performed, these types of libraries yield
higher metagenomic coverage and higher N50 values for the assembly derived contigs [129,
130].
An amplification step using the phi29-DNA polymerase enzyme can also be used before
sequencing [131]. In the end, the reads generated are analyzed with different tools and
databases that provide important information about the metabolic pathways and role of
microbial communities from the GO signatures. The data generated also provide a sufficient
number of 16S rRNA gene sequences to perform α and β-diversity analyses.
The European Bioinformatics Institute (EBI-EMBL) provides a pipeline with a series of
scripts that enable functional and diversity analyses from environmental DNA samples. This

pipeline includes scripts to detect and select reads containing partial 16S rRNA genes
(rRNASelector v.1.0.1), α- and β-diversity analysis programs such as QIIME v.1.9.0 [132],
scripts for coding sequence prediction in reads (FragGeneScan v.1.15), and software for
predicting protein signatures (InterProScan).
Depending on sequencing quality, full or partial genomes may be obtained for non-
culturable bacterial strains that are extremely important in environmental biogeochemical
cycles. Therefore, from the WMS data, it is possible to conduct taxonomic analyses,
functional analyses and comparative analyses, answering the three main questions of
metagenomics: “Who is there?,” “What are they doing?” and “What are their differences?.”
TRANSCRIPTOMICS / RNA-SEQ
Transcriptomics is another field of study that greatly advanced as a result of the use of
NGS technologies. We can define a transcriptome as the complete set of transcripts in a cell
under a given physiological condition or developmental stage. Studies in this field provide an
understanding of functional genomic elements and catalog several types of transcripts such as
mRNAs, non-coding RNAs and small RNAs [133].
Several analyses can be performed using transcriptome data, for example, transcript
quantification, differential expression analysis, gene annotation based on a reference and de
novo transcript assembly [134, 135].
Full transcriptome sequencing with RNA-Seq consists of the conversion of the entire or a
portion of an RNA population, such as poly(A), into a cDNA fragment library with adaptors
linked to one or both sequence ends. Hence, this library is sequenced and generates short
reads that may range from 30 to 400 bp according to the sequencing platform selected for the
experiment [136]. Following sequencing, the generated reads are then mapped against a
reference genome to determine the level of expression of each gene as shown in Figure 6.
Another approach is to use these reads to perform de novo transcript assembly, thus
forming a consensus sequence and quantifying the expression profile through read mapping
against the consensus sequence generated. RNA-Seq has several advantages compared to
prior techniques, but the most important advantage is the ability to detect new transcripts that
correspond to the present genomic sequence, thus making it an important technique to analyze
non-model organisms (that do not have a defined genomic sequence). Additionally, RNA-Seq
can detect sequence variations such as single nucleotide polymorphisms (SNPs) [133].
One of the challenges in transcript quantification is associated with the software accuracy
used in the mapping process largely because of the nature of reads that may contain
mismatches, insertions and deletions caused by genomic variations and sequencing errors. For
eukaryotic organisms, there is also difficulty in mapping reads spanning two exons. Splicing
event detection can be performed using computation tools such as MapSplice [133] and
TopHat [137], which act on two stages. The first stage performs read alignments and splicing
region search analyses. These reads, which were not initially aligned, are then realigned using
these splicings, which guide the final alignment [135].
Given the importance of transcriptomic studies several softwares has been developed to
perform alignments for experimentally derived data, namely GSNAP [138], OSA [139], and
STAR [140].

Figure 6. Two main approaches to perform transcriptomes analysis. Adapted from (Martin and Wang
2011).
The de novo approach is an option for the quantification process when there is no
reference for mapping. Computational tools such as transAbys [141] and Velvet/OASES [57,
142] are thus necessary to generate consensus from the sequencing reads. A commonly used
strategy is the breakdown of reads into fixed-size sub-reads, called k-mers. Overlaps of these
k-mers, with k-1 size, are used to generate the graph that represents all possible sequences
that can be generated as shown in Figure 6. The main challenge of this approach is to define
criteria that allow the distinction between sequencing errors and variations such as SNPs
[143, 144].
CONCLUSION
Advances in NGS technology have made possible to sequence prokaryotic genomes at a
higher rate and yield considerable data, which has led to several achievements and
breakthroughs and the emergence of new biological applications, including omics
technologies such as genomics, metagenomics and transcriptomics.
Some of the data generated may be low quality, which can negatively interferes on
important steps as genome assembly; hence, data quality must be assessed. The Phred score is
one of the ways used to measure and evaluate NGS data quality. Those with quality well

below the desired limit will be subjected to a process of trimming, ensuring an increase in
output accuracy.
Genomics utilizes data from NGS platforms to analyze in a systematic manner the genes,
their organization, function and significance. The identification and localization of CDS,
ORFs, rRNAs and tRNAs, in addition to other molecular markers, assists in the functional
understanding of genomic structural elements. New genomic studies have also made it
possible to analyze gene function, i.e., to understand which changes may occur in genome
functioning depending on the developmental stage or environmental conditions. Proteomics
and transcriptomics are directly associated with these types of studies. It also became possible
to compare nucleotide and/or amino acid sequences from different genomes, which showed
great usefulness in the discovery of therapeutic and biological targets.
The way in which the libraries in NGS sequencing are prepared create the need for
overlapping, ordering and directing the data generated so that the genome can be
reconstructed. This process is known as genomic assembly. This assembly can be performed
with a reference genome or by de novo assembly, without a model organism. Three
approaches are followed for the assembly, greedy algorithms, overlap-layout-consensus
(OLC) and a de Bruijn graph, and a scaffold is generated at the end of the assembly process,
where paired reads will orient and order the contigs generated.
The process of genomic annotation, which is responsible for adding relevant biological
information to the sequenced genomic data, including the interpretation, identification and
annotation of genes, proteins and regulatory and/or metabolic pathways, began requiring
automatic pipelines because of the large amount of sequenced data. However, this automatic
annotation is not always consistent and adequate for publication. Therefore, manual curation
is necessary to review, correct, and confirm the knowledge acquired.
To better understand the microbial world, metagenomics introduced molecular techniques
independent of culture because classic microbiology techniques were insufficient to
characterize microorganisms with unknown physiology. Similarly, morphological analyses
and biochemical tests did not allow the identification of the biological role of these organisms
in the environment. The process consists initially of extracting DNA from a microbial
community directly from the environmental sample. Therefore, the molecules become
accessible and can be analyzed by sequencing. Several approaches can be then followed,
including the use of PCR or environmental DNA shotgun sequencing. Environmental DNA
functions serve as a model to amplify the 16S rRNA gene. Sequence comparison enables
biodiversity assessment. Another approach is the whole metagenomic sequencing with direct
sequencing of environmental DNA without using PCR. In this case, the genes, or genomes,
are accessed and the role of the microbial community is analyzed. The 16S rRNA sequences
in the sample form an operational taxonomic unit, which is defined based on its similarity
with other sequences. The number of OTUs present in the library indicate the microbial
diversity of the sample.
Another field directly influenced by NGS technologies is transcriptomics, which studies
the set of transcripts in the cell under a given physiological condition or developmental stage.
These studies have contributed to a better understanding of the role of the genome and the
characterization of transcripts such as mRNAs, non-coding RNAs and small RNAs. Among
the important analyses performed using transcriptome data include transcript quantification,
differential expression, and annotation correction and validation by reference and de novo
transcript assembly.

REFERENCES
[1] J. Zhang, R. Chiodini AB and GZ. The impact of next-generation sequencing on
genomics. J. Genet. 2011;38(3):95–109.
[2] Snustad P. SM. Principles of Genetics. 6th ed. Witt K, editor. John Wiley & Sons, Inc.;
2012.
[3] Schlebusch S, Illing N. Next generation shotgun sequencing and the challenges of de
novo genome assembly. S. Afr. J. Sci. 2012;108 (11-12):1–8.
[4] Liu L, Li Y, Li S, Hu N, He Y, Pong R, et al. Comparison of next-generation
sequencing systems. J. Biomed. Biotechnol. 2012;
[5] Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol.
2008;26(10):1135–45.
[6] Henson J, Tischler G, Ning Z. Next-generation sequencing and large genome
assemblies. Pharmacogenomics. 2012;13(8):901–15.
[7] Kaur R, Malik C. Next Generation Sequencing: a Revolution in Gene Sequencing.
CIBTech J. Biotechnol. [Internet]. 2013;2(4):1–20. Available from:
http://www.cibtech.org/J Biotechnology/ PUBLICATIONS/2013/ Vol-2-No-4/CJB-01-
01- MALIK- NEXT- SEQUENCING.pdf.
[8] Cock PJ a, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for
sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids
Res. 2009;38(6):1767–71.
[9] Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet [Internet].
Nature Publishing Group; 2010;11(1):31–46. Available from: http://dx.doi.org/
10.1038/nrg2626.
[10] Schuster SC. Next-generation sequencing transforms today’s biology. Nat. Methods.
2008;5(1):16–8.
[11] Miller JR, Koren S, Sutton G. Assembly algorithm for Next-Ganeration Sequencing
data. Genomics. 2010;95(6):315–27.
[12] Suzuki S, Ono N, Furusawa C, Ying BW, Yomo T. Comparison of sequence reads
obtained from three next-generation sequencing platforms. PLoS One. 2011;6(5):1–6.
[13] Ewing B, Ewing B, Hillier L, Hillier L, Wendl MC, Wendl MC, et al. Base-Calling of
Automated Sequencer Traces Using. Genome Res. 2005;(206):175–85.
[14] Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc.
Natl. Acad. Sci. USA. 1988;85(8):2444–8.
[15] Applied-Biosystems. Principles of Di-Base Sequencing and the Advantages of Color
Space Analysis in the SOLiD System. Changes. 2011;2–5.
[16] Sanger F, Nicklen S, Coulson a R. DNA sequencing with chain-terminating inhibitors.
Proc. Natl. Acad. Sci. USA. 1977;74(12):5463–7.
[17] Berglund EC, Kiialainen A, Syvänen A-C. Next-generation sequencing technologies
and applications for human genetic history and forensics. Investig Genet [Internet].
BioMed Central Ltd; 2011;2(1):23. Available from: http://www.investigativegenetics.
com/content/2/1/23.
[18] Zeng F, Jiang R, Chen T. PyroHMMsnp: An SNP caller for Ion Torrent and 454
sequencing data. Nucleic Acids Res. 2013;41(13):1–13.

[19] Wirawan A, Harris RS, Liu Y, Schmidt B, Schröder J. HECTOR: a parallel multistage
homopolymer spectrum based error corrector for 454 sequencing data. BMC
Bioinformatics [Internet]. 2014;15(1):131. Available from: http://www.biomedcentral.
com/1471-2105/15/131
[20] Chen YC, Liu T, Yu CH, Chiang TY, Hwang CC. Effects of GC Bias in Next-
Generation-Sequencing Data on De Novo Genome Assembly. PLoS One. 2013;8(4).
[21] Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, et al.
Characterizing and measuring bias in sequence data. Genome Biol [Internet]. BioMed
Central Ltd; 2013;14(5):R51. Available from: http://www.pubmedcentral.nih.gov/
articlerender.fcgi?artid=4053816&tool=pmcentrez&rendertype=abstract
[22] MIR. Genômica. São Paulo: Ed. Atheneu; 2004.
[23] Montelione GT, Anderson S. Structural genomics: keystone for a Human Proteome
Project. Nat. Struct. Mol. Biol. [Internet]. 1999 Jan;6(1):11–2. Available from:
http://dx.doi.org/10.1038/4878.
[24] Burley SK, Almo SC, Bonanno JB, Capel M, Chance MR, Gaasterland T, et al.
Structural genomics: beyond the human genome project. Nat. Genet. 1999;23(2):151–7.
[25] Gaasterland T. Structural genomics: Bioinformatics in the driver’s seat. Nat Biotechnol
[Internet]. 1998;16(10):291–4. Available from: http://llama.mshri.on.ca/publications/
Roth_NatBiotech_1998.pdf.
[26] Vitkup D, Melamud E, Moult J, Sander C. Completeness in structural genomics. Nat.
Struct. Biol. 2001;8(6):559–66.
[27] Terwilliger TC. Class[ndash]directed structure determination: Foundation for a Protein
Structure Initiative. Prot. Sci. 1998. p. 1851–6.
[28] Griffiths A.J.F., Carroll, S B., Lewontin, RC. and Wessler SR. Introdução à Genética.
Rio de Janeiro: Guanabara, Koogan; 2008.
[29] Mahdi LK, Deihimi T, Zamansani F, Fruzangohar M, Adelson DL, Paton JC, et al. A
functional genomics catalogue of activated transcription factors during pathogenesis of
pneumococcal disease. BMC Genomics [Internet]. 2014;15(1):769. Available from:
http://www.biomedcentral.com/1471-2164/15/769.
[30] Mutz K-O, Heilkenbrinker A, Lönne M, Walter J-G, Stahl F. Transcriptome analysis
using next-generation sequencing. Curr. Opin. Biotechnol. 2012;22–30.
[31] De Keersmaecker SCJ, Thijs IM V, Vanderleyden J, Marchal K. Integration of omics
data: how well does it work for bacteria? Mol Microbiol [Internet]. Blackwell
Publishing Ltd; 2006;62(5):1239–50. Available from: http://dx.doi.org/10.1111/j.1365-
2958.2006.05453.x.
[32] James P. Protein identification in the post-genome era: the rapid rise of proteomics. Q
Rev Biophys. 1997;30(4):279–331.
[33] Fleischmann RD, Adams MD, White O, Clayton R a, Kirkness EF, Kerlavage a R, et al.
Whole genome random sequencing and assembly of Haemophilus influenzae rd.
Science (80-) [Internet]. 1995;269 (5223):496–512. Available from: <Go to
ISI>://A1995RL49500017.
[34] Fraser C.M., Gocayne J.D., White O., Adams M.D., Clayton R.A., Fleischmann R.D.,
Bult C.J., Kerlavage A.R., Sutton G.G., Kelley J.M., Fritchman J.L., Weidman J.F.,
Small K.V., Sandusky M., Fuhrmann J.L., Nguyen D.T., Utterback T.R., Saudek D.M.,
Phillips VJC. The minimal gene complement of Mycoplasma genitalium. Science
(80-). 1995; 270:397–403.

[35] Land M, Hauser L, Jun S-R, Nookaew I, Leuze MR, Ahn T-H, et al. Insights from
20 years of bacterial genome sequencing. Funct Integr. Genomics [Internet]. 2015;141–
61. Available from: http://link.springer.com/10.1007/s10142-015-0433-4.
[36] Weckwerth W, Loureiro ME, Wenzel K, Fiehn O. Differential metabolic networks
unravel the effects of silent plant phenotypes. Proc. Natl. Acad. Sci. USA.
2004;101(20):7809–14.
[37] Peregrin T. The New Frontier of Nutrition Science: Nutrigenomics. J. Am. Diet Assoc.
2001;Volume 101(Issue 11):Pages 1306.
[38] Sanford Karl, Phillipe Soucaille GW and GC. Genomics to fluxomics and physiomics-
pathway engineering. 2002;318–22.
[39] Marshall A. Genset-Abbott deal heralds pharmacogenomics era. Nat. Biotechnol.
1997;15(9):829–30.
[40] Sleator RD, Shortall C, Hill C. Metagenomics. Lett Appl. Microbiol. 2008;47(5):361–6.
[41] Culligan EP, Sleator RD, Marchesi JR, Hill C. Metagenomics and novel gene
discovery: promise and potential for novel therapeutics. Virulence [Internet].
2014;5(3):399–412. Available from: http://www.scopus.com/ inward/record.url?eid=2-
s2.0-84892380811&partnerID=tZOtx3y1.
[42] Shajahan-Haq A, Cheema M, Clarke R. Application of Metabolomics in Drug Resistant
Breast Cancer Research. Metabolites [Internet]. 2015;5(1):100–18. Available from:
http://www.mdpi.com/2218-1989/5/1/100/.
[43] McBride CM, Bowen D, Brody LC, Condit CM, Croyle RT, Gwinn M, et al. Future
Health Applications of Genomics: Priorities for Communication, Behavioral, and Social
Sciences Research. Am. J. Prev. Med. 2010;38(5):556–65.
[44] Berlin K, Koren S, Chin C-S, Drake JP, Landolin JM, Phillippy AM. Assembling large
genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotech
[Internet]. Nature Publishing Group, a division of Macmillan Publishers Limited. All
Rights Reserved.; 2015 Jun; 33(6):623–30. Available from: http://dx.doi.org/
10.1038/nbt.3238.
[45] El-Metwally S, Hamza T, Zakaria M, Helmy M. Next-Generation Sequence Assembly:
Four Stages of Data Processing and Computational Challenges. PLoS Comput. Biol.
2013;9(12).
[46] Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and
annotation. Evol. Appl. [Internet]. 2014;n/a – n/a. Available from: http://doi.wiley.com/
10.1111/eva.12178.
[47] Wojcieszek M, Pawełkowicz M, Nowak R, Przybecki Z. Genomes correction and
assembling: present methods and tools [Internet]. Proc. SPIE. 2014. p. 92901X –
92901X – 8. Available from: http://dx.doi.org/10.1117/12.2075624.
[48] Nagarajan N, Pop M. Sequence assembly demystified. Nat Rev Genet [Internet]. Nature
Publishing Group; 2013;14(3):157–67. Available from: http://www.ncbi.nlm.nih.gov/
pubmed/23358380.
[49] Sutton GG, White O, Adams MD, Kerlavage AR. TIGR Assembler: A New Tool for
Assembling Large Shotgun Sequencing Projects. Genome Sci. Technol. 1995;1(1):9–
19.
[50] Dohm JC, Lottaz C, Borodina T, Himmelbauer H. SHARCGS, a fast and highly
accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res.
2007;17(11):1697–706.

[51] Jeck WR, Reinhardt J a., Baltrus D a., Hickenbotham MT, Magrini V, Mardis ER, et al.
Extending assembly of short DNA sequences to handle error. Bioinformatics.
2007;23(21):2942–4.
[52] Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, et al. T HE D
ROSOPHILA G ENOME A Whole-Genome Assembly of Drosophila.
2000;287(March):2196–204.
[53] Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, et al. Using
the miraEST assembler for reliable and automated mRNA transcript assembly and SNP
detection in sequenced ESTs. Genome Res. 2004;14(6):1147–59.
[54] Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, et al. Genome
Sequencing in Open Microfabricated High Density Picoliter Reactors. Nature.
2005;437(7057):376–80.
[55] Hernandez D, François P, Farinelli L, Østerås M, Schrenzel J. De novo bacterial
genome sequencing: Millions of very short reads assembled on a desktop computer.
Genome Res. 2008;18(5):802–9.
[56] Butler J, MacCallum I, Kleber M, Shlyakhter I a., Belmonte MK, Lander ES, et al.
ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res.
2008;18(5):810–20.
[57] Zerbino DR, Birney E. Velvet: Algorithms for de novo short read assembly using de
Bruijn graphs. Genome Res. 2008 May;18(5):821–9.
[58] Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM. ABySS : A parallel
assembler for short read sequence data ABySS : A parallel assembler for short read
sequence data. 2009;1117–23.
[59] Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an empirically
improved memory-efficient short-read de novo assembler. Gigascience [Internet].
2012;1(1):18. Available from: http://www.gigasciencejournal.com/content/1/1/18.
[60] Bankevich A, Nurk S, Antipov D, Gurevich A a., Dvorkin M, Kulikov AS, et al.
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell
Sequencing. J. Comput. Biol. 2012;19(5):455–77.
[61] Hunt M, Newbold C, Berriman M, Otto TD. A comprehensive evaluation of assembly
scaffolding tools. Genome Biol [Internet]. 2014;15(3):R42. Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4053845&tool= pmcentrez
&rendertype=abstract
[62] Simpson JT, Durbin R, Zerbino DR, Birney E, Wong K, Jackman SD. Efficient de novo
assembly of large genomes using compressed data structures sequence data. 2012;549–
56.
[63] Koren S, Treangen TJ, Pop M. Bambus 2: Scaffolding metagenomes. Bioinformatics.
2011;27(21):2964–71.
[64] Gritsenko A a., Nijkamp JF, Reinders MJT, de Ridder D. GRASS: A generic algorithm
for scaffolding next-generation sequencing assemblies. Bioinformatics. 2012;28
(11):1429–37.
[65] Salmela L, Mäkinen V, Välimäki N, Ylinen J, Ukkonen E. Fast scaffolding with small
independent mixed integer programs. Bioinformatics. 2011;27(23):3259–65.
[66] Gao S, Sung W-K, Nagarajan N. Opera: Reconstructing Optimal Genomic Scaffolds
with High-Throughput Paired-End Sequences. J Comput Biol [Internet].

2011;18(11):1681–91. Available from: http://www.liebertonline.com/doi/abs/10.1089/

cmb.2011.0170.
[67] Donmez N, Brudno M. SCARPA: Scaffolding reads with practical algorithms.
Bioinformatics. 2013;29(4):428–34.
[68] Dayarian A, Michael TP, Sengupta AM. SOPRA: Scaffolding algorithm for paired
reads via statistical optimization. BMC Bioinformatics. 2010;11:345.
[69] Boetzer M, Henkel C V., Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled
contigs using SSPACE. Bioinformatics. 2011;27(4):578–9.
[70] Boetzer M, Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol
[Internet]. BioMed Central Ltd; 2012;13(6):R56. Available from: http://genomebiology.
com/2012/13/6/R56.
[71] Tsai IJ, Otto TD, Berriman M. Improving draft assemblies by iterative mapping and
assembly of short reads to eliminate gaps. Genome Biol. 2010;11(4):R41.
[72] Médigue C, Moszer I. Annotation, comparison and databases for hundreds of bacterial
genomes. Res. Microbiol. 2007;158(10):724–36.
[73] Richardson EJ, Watson M. The automatic annotation of bacterial genomes. Brief
Bioinform. 2013;14(1):1–12.
[74] Stothard P, Wishart DS. Automated bacterial genome analysis and annotation. Curr.
Opin. Microbiol. 2006;9(5):505–10.
[75] Carver T, Harris SR, Berriman M, Parkhill J, McQuillan J a. Artemis: An integrated
platform for visualization and analysis of high-throughput sequence-based experimental
data. Bioinformatics. 2012;28(4):464–9.
[76] Delcher AL, Bratke K a., Powers EC, Salzberg SL. Identifying bacterial genes and
endosymbiont DNA with Glimmer. Bioinformatics. 2007;23(6):673–9.
[77] Do JH, Choi D-K. Computational approaches to gene prediction. J. Microbiol.
2006;44(2):137–44.
[78] Borodovsky M, Lomsadze A. Gene identification in prokaryotic genomes, phages,
metagenomes, and EST sequences with GeneMarkS suite. Curr. Protoc. Microbiol.
2014;(SUPPL.32):1–17.
[79] Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal:
prokaryotic gene recognition and translation initiation site identification. BMC
Bioinformatics. 2010;11:119.
[80] Tripp HJ, Sutton G, White O, Wortman J, Pati A, Mikhailova N, et al. Toward a
standard in structural genome annotation for prokaryotes. Stand Genomic Sci [Internet].
Standards in Genomic Sciences; 2015;10(1):45. Available from: http://www.standards
ingenomics.com/ content/10/1/45
[81] Frishman D, Mironov A, Mewes HW, Gelfand M. Combining diverse evidence for
gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res.
1998;26(12):2941–7.
[82] Badger JH, Olsen GJ. CRITICA: coding region identification tool invoking
comparative analysis. Mol. Biol. Evol. 1999;16(4):512–24.
[83] Apweiler R, Martin MJ, O’Donovan C, Magrane M, Alam-Faruque Y, Antunes R, et al.
Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res.
2011;39(SUPPL. 1):214–9.
[84] Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA.
Methods Enzymol. 1990;183(1988):63–98.

[85] Altschup SF, Science C, Pennsylvania T, University S, Park U. Basic Local Alignment
Search Tool. J. Mol. Biol. 1990;215:403–10.
[86] Lowe TM, Eddy SR. tRNAscan-SE: A program for improved detection of transfer
RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.
[87] Lagesen K, Hallin P, Rødland EA, Stærfeldt HH, Rognes T, Ussery DW. RNAmmer:
Consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res.
2007;35(9):3100–8.
[88] Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: The
protein families database. Nucleic Acids Res. 2014;42(D1):1–9.
[89] Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5:
Genome-scale protein function classification. Bioinformatics. 2014;30(9):1236–40.
[90] Aziz RK, Bartels D, Best A a, DeJongh M, Disz T, Edwards R a, et al. The RAST
Server: rapid annotations using subsystems technology. BMC Genomics. 2008;9:75.
[91] Van Domselaar GH, Stothard P, Shrivastava S, Cruz J a., Guo AC, Dong X, et al.
BASys: A web server for automated bacterial genome annotation. Nucleic Acids Res.
2005;33(SUPPL. 2):455–9.
[92] Lee D, Seo H, Park C, Park K. WeGAS: a web-based microbial genome annotation
system. Biosci. Biotechnol. Biochem. 2009;73(1):213–6.
[93] Vallenet D, Labarre L, Rouy Z, Barbe V, Bocs S, Cruveiller S, et al. MaGe: A
microbial genome annotation system supported by synteny results. Nucleic Acids Res.
2006;34(1):53–65.
[94] Kumar K, Desai V, Cheng L, Khitrov M, Grover D, Satya RV, et al. AGeS: A software
system for microbial genome sequence annotation. PLoS One. 2011;6(3).
[95] Stewart AC, Osborne B, Read TD. DIYA: A bacterial annotation pipeline for any
genomics lab. Bioinformatics. 2009;25(7):962–3.
[96] Yu C, Zavaljevski N, Desai V, Johnson S, Stevens FJ, Reifman J. The development of
PIPA: an integrated and automated pipeline for genome-wide protein function
annotation. BMC Bioinformatics. 2008;9:52.
[97] Cruveiller S, Le Saux J, Vallenet D, Lajus A, Bocs S, Médigue C. MICheck: A web
tool for fast checking of syntactic annotations of bacterial genomes. Nucleic Acids Res.
2005;33(SUPPL. 2):471–9.
[98] Attwood TK, Bradley P, Flower DR, Gaulton a., Maudling N, Mitchell a. L, et al.
PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res.
2003;31(1):400–2.
[99] Bateman A, Coggill P, Finn RD. DUFs: Families in search of function. Acta Crystallogr
Sect F Struct Biol Cryst Commun. International Union of Crystallography;
2010;66(10):1148–52.
[100] Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL. A probabilistic method for
identifying start codons in bacterial genomes. Bioinformatics. 2001;17(12):1123–30.
[101] Shine J, Dalgarno L. The 3’-terminal sequence of Escherichia coli 16S ribosomal RNA:
complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci.
USA. 1974;71(4):1342–6.
[102] Ermolaeva MD, Khalak HG, White O, Smith HO, Salzberg SL. Prediction of
transcription terminators in bacterial genomes. J. Mol. Biol. 2000;301(1):27–33.

[103] Sigrist CJ a, Cerutti L, De Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A,

et al. PROSITE, a protein domain database for functional characterization and
annotation. Nucleic Acids Res. 2009;38(SUPPL.1):161–6.
[104] Mulder N, Apweiler R. InterPro and InterProScan. Comparative genomics. Springer;
2007. p. 59–70.
[105] Hacker J, Blum-Oehler G, Mühldorfer I, Tschäpe H. Pathogenicity islands of virulent
bacteria: Structure, function and impact on microbial evolution. Mol. Microbiol.
1997;23(6):1089–97.
[106] Hsiao W, Wan I, Jones SJ, Brinkman FSL. IslandPath: Aiding detection of genomic
islands in prokaryotes. Bioinformatics. 2003;19(3):418–20.
[107] Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, et al. Score-based
prediction of genomic islands in prokaryotic genomes using hidden Markov models.
BMC Bioinformatics. 2006;7:142.
[108] Langille MGI, Hsiao WWL, Brinkman FSL. Evaluation of genomic island predictors
using a comparative genomics approach. BMC Bioinformatics. 2008;9:329.
[109] Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, et al.
CRISPR provides acquired resistance against viruses in prokaryotes. Science.
2007;315(5819):1709–12.
[110] Kassai-Jáger E, Ortutay C, Tóth G, Vellai T, Gáspári Z. Distribution and evolution of
short tandem repeats in closely related bacterial genomes. Gene. 2008;410(1):18–25.
[111] Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, et al. CRISPR
recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced
palindromic repeats. BMC Bioinformatics. 2007;8:209.
[112] Grissa I, Vergnaud G, Pourcel C. CRISPRFinder: a web tool to identify clustered
regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007;Vol. 35(Web
Server issue).
[113] Grissa I, Vergnaud G, Pourcel C. CRISPRcompar: a website to compare clustered
regularly interspaced short palindromic repeats. Nucleic Acids Res. 2008;36(Web
Server issue):52–7.
[114] Sreenu VP, Alevoor V, Nagaraju J, Nagarajaram H a. MICdb: Database of prokaryotic
microstatellites. Nucleic Acids Res. 2003;31(1):106–8.
[115] Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, et al. Predicting subcellular
localization of proteins using machine-learned classifiers. Bioinformatics.
2004;20(4):547–56.
[116] Hua S, Sun Z. Support vector machine approach for protein subcellular localization
prediction. Bioinformatics. 2001;17(8):721–8.
[117] Wang J, Sung W-K, Krishnan A, Li K-B. Protein subcellular localization prediction for
Gram-negative bacteria using amino acid subalphabets and a combination of multiple
support vector machines. BMC Bioinformatics. 2005;6:174.
[118] Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, et al. PSORTb v.2.0:
Expanded prediction of bacterial protein subcellular localization and insights gained
from comparative proteome analysis. Bioinformatics. 2005;21(5):617–23.
[119] Gardy JL, Brinkman FSL. Methods for predicting bacterial protein subcellular
localization. Nat Rev Microbiol. Nature Publishing Group; 2006;4(1):741–51.
[120] Luo C, Hu G-Q, Zhu H. Genome reannotation of Escherichia coli CFT073 with new
insights into virulence. BMC Genomics. 2009;10:552.

[121] Camus J-C, Pryor MJ, Médigue C, Cole ST. Re-annotation of the genome sequence of
Mycobacterium tuberculosis H37Rv. Microbiology. 2002;148(Pt 10):2967–73.
[122] Ashburner M, Ball C a, Blake J a, Botstein D, Butler H, Cherry JM, et al. Gene
ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat.
Genet. 2000;25(1):25–9.
[123] Barrell D, Dimmer E, Huntley RP, Binns D, O’Donovan C, Apweiler R. The GOA
database in 2009 - An integrated Gene Ontology Annotation resource. Nucleic Acids
Res. 2009;37(SUPPL. 1):396–403.
[124] Watson M. ProGenExpress: visualization of quantitative data on prokaryotic genomes.
BMC Bioinformatics. 2005;6:98.
[125] Handelsman J, Handelsman J. Metagenomics : Application of Genomics to Uncultured
Microorganisms Metagenomics : Application of Genomics to Uncultured
Microorganisms. 2004;68(4):669–85.
[126] Schloss PD, Handelsman J. Introducing SONS, a tool for operational taxonomic unit-
based comparisons of microbial community memberships and structures. Appl.
Environ. Microbiol. 2006;72(10):6773–9.
[127] Scholz MB, Lo CC, Chain PSG. Next generation sequencing and bioinformatic
bottlenecks: The current state of metagenomic data analysis. Curr. Opin. Biotechnol.
2012;23(1):9–15.
[128] Fierer N, Lauber CL, Ramirez KS, Zaneveld J, Bradford M a, Knight R. Comparative
metagenomic, phylogenetic and physiological analyses of soil microbial communities
across nitrogen gradients. ISME J. [Internet]. Nature Publishing Group;
2012;6(5):1007–17. Available from: http://dx.doi.org/10.1038/ismej.2011.159.
[129] Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH.
Genome sequences of rare, uncultured bacteria obtained by differential coverage
binning of multiple metagenomes. Nat Biotechnol [Internet]. 2013; 31(6):533–8.
Available from: http://www.ncbi.nlm.nih.gov/pubmed/23707974.
[130] Chen K, Pachter L. Bioinformatics for whole-genome shotgun sequencing of microbial
communities. PLoS Comput. Biol. 2005;1(2):0106–12.
[131] Khodakova AS, Smith RJ, Burgoyne L, Abarno D, Linacre A. Random whole
metagenomic sequencing for forensic discrimination of soils. PLoS One. 2014;9(8).
[132] Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al.
QIIME allows analyses of high-throuput community sequencing data. Nat. Methods.
2010;7(5):335–6.
[133] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics.
Nat. Rev. Genet. 2009;10(1):57–63.
[134] Tjaden B. De novo assembly of bacterial transcriptomes from RNA-seq data. Genome
Biol. 2015;16(1):1–10.
[135] Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and
microarray in transcriptome profiling of activated T cells. PLoS One. 2014;9(1).
[136] Wolf JBW. Principles of transcriptome analysis and gene expression quantification: An
RNA-seq tutorial. Mol. Ecol. Resour. 2013;13(4):559–72.
[137] Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate
alignment of transcriptomes in the presence of insertions, deletions and gene fusions.
Genome Biol. BioMed. Central Ltd; 2013 Apr 25;14(4):R36.

[138] Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in
short reads. Bioinformatics. 2010 Apr 1;26(7):873–81.
[139] Hu J, Ge H, Newman M, Liu K. OSA: a fast and accurate alignment tool for RNA-Seq.
Bioinformatics. 2012 Jul 15;28(14):1933–4.
[140] Dobin A, Davis C a, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast
universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15–21.
[141] Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, et al. De novo
assembly and analysis of RNA-seq data. Nat. Methods. 2010;7(11):909–12.
[142] Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: Robust de novo RNA-seq
assembly across the dynamic range of expression levels. Bioinformatics.
2012;28(8):1086–92.
[143] Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for
transcriptome annotation and quantification using RNA-seq. Nat Methods. Nature
Publishing Group; 2011;8(6):469–77.
[144] Zerbino DR. Using the Velvet de novo assembler for short-read sequencing
technologies. Curr. Protoc. Bioinformatics. 2010 Sep; Chapter 11:Unit 11.5.

INDEX
amino acid(s), 96, 102, 103, 107, 111, 137, 139, 206,
# 211, 216, 223
anatomy, 60
20th century, 66
ancestors, 163, 182, 191, 192
anemia, 84, 90, 97, 103, 121
A aneuploidy, 8, 9, 16, 22
angiogenesis, 124, 126, 128, 129, 130, 132
accessibility, 83 annotation, 202, 208, 209, 210, 211, 212, 214, 216,
accessions, 144 219, 221, 222, 223, 224, 225
acetylation, 25 anthropology, 16, 18
acid, 69, 212 antibody, 9, 140, 147, 148
acute leukemia, 108, 116 antigen, 8, 19, 20, 88, 97, 140
acute lymphoblastic leukemia, 103, 116 antigen-presenting cell(s), 19
acute myeloid leukemia, 103, 116 antisense, 30, 31, 142
adaptability, 39, 59, 68 antiviral agents, 142, 150
adaptation, 40, 56 aorta, 5
adaptive radiation, 196 apex, 138
adenine, 202 apoptosis, ix, 80, 83, 84, 92, 123, 124, 127, 129, 130
adenocarcinoma, viii, 79 apoptotic pathways, 108
ADH, 69 Argentina, 152, 153, 189, 190, 194, 197
adhesion, 3, 92 arrest, 7, 23, 92, 126
adolescents, 126 arrhythmias, 127
ADP, 98 arthritis, 127
Africa, 18, 41, 66 ascorbic acid, 69
age, 8, 10, 16, 19, 98, 104, 113 ASI, 193
aggregation, 53, 57, 58 Asia, 10, 12, 105, 198
Agrobacterium, 148 Asian countries, 139
Alaska, 12, 18 aspartate, 69
Albania, 66 assault, 12
alcoholism, 22 assessment, vii, 12, 63, 66, 75, 162, 196, 203, 205,
Algeria, 66 207, 212, 213, 216
algorithm, 58, 207, 217, 219, 220, 221 asymmetry, 210
alkylation, 98 ataxia, 97
allele, 38, 41, 44, 50, 51, 58, 61, 63, 70, 73, 102, ATP, 4, 7, 90
103, 104, 105, 108, 109, 110, 111, 114, 117 attachment, 30, 31
Allozyme, viii, 61, 64, 65 attribution, 211
alters, 114 Austria, 63
Amazon River, 155, 156, 157, 158, 195, 198 autism, vii, 1, 9, 18, 19
autoantibodies, 127

228 Index
autosomal recessive, 87
AZF, viii, 3, 16, 19, 20, 21, 22, 23, 24, 34, 35
C
azoospermia, viii, 3, 4, 7, 14, 17, 21, 22, 23, 24, 26,
calibration, 195
33, 35
cancer, viii, ix, 5, 6, 7, 8, 13, 15, 16, 18, 29, 34, 79,
80, 85, 86, 88, 90, 92, 98, 99, 101, 123, 124, 125,
B 126, 130, 131, 132
cancer cells, viii, 79, 80
bacteria, 149, 213, 218, 223, 224 cancer death, 125
bacterial strains, 214 candidates, 8, 9, 13
bacteriophage, 210 CAP, 150
banks, 75 carboxyl, 6
barriers, x, 152, 154 carcinogenesis, 17, 131
base, ix, 2, 79, 81, 88, 89, 90, 98, 102, 103, 112, 153, carcinoma, 6, 19, 96, 100, 129
196, 203, 204, 205 cardiovascular disease, 124
base excision repair, ix, 79, 81, 88, 89, 98 cartilage, 127
base pair, 2, 89, 103, 153, 205 case study(s), 18, 144, 195
Bayesian estimation, 168, 169 catalytic activity, 150
Belgium, 149, 150 cattle, 190, 193
bias, 205, 218 Cdc42, 92, 99
Bible, 60 CDC42, 92, 98, 99
binding globulin, 22 cDNA, 80, 97, 141, 214
biodiversity, 66, 69, 212, 216 cell culture, 80
biogeography, 199 cell cycle, 29, 84, 88, 92, 96, 124, 126
bioinformatics, xi, 100, 141, 196, 201, 207 cell death, 81, 124
biological control, 150 cell differentiation, 3
biological processes, xi, 3, 29, 92, 143, 201 cell invasion, 127
biological roles, 13 cell line(s), viii, ix, 79, 80, 83, 90, 96, 97, 130, 131
biological systems, 202 central nervous system, 9
biomarkers, 127, 128 cervix, viii, 79, 96
biomass, 63, 189, 190 challenges, 17, 128, 148, 203, 211, 214, 217
Biomedicine, vii, 1 chemical(s), ix, 22, 79, 80, 83, 86, 89, 142
biopsy, 7 chemotherapy, 90, 125
biotechnology, 144 chicken, 118
biotic, 143, 150 children, 3, 126
birds, 189, 197 Chile, 10
blood, ix, 8, 101, 102, 114, 191 chimpanzee, 27
body fluid, 124 China, 139, 145, 147, 149
Bolivia, 152, 153, 157, 158, 171, 189, 190, 192 chloroform, 158
bone(s), 8, 19, 104, 113, 126, 127, 153 chlorophyll, 138
bone cancer, 126 chloroplast, 55, 61, 62
bone marrow, 8, 19, 104, 113 choriocarcinoma, 131
bone marrow transplant, 8 chromosomal abnormalities, 22, 35
brain, 4, 5, 9, 19, 34 chromosomal instability, 23
Brazil, 79, 135, 136, 137, 138, 144, 145, 146, 152, chromosome, vii, viii, 1, 2, 3, 7, 8, 9, 11, 12, 13, 14,
153, 155, 156, 157, 158, 165, 170, 190, 199, 201 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 31, 35,
breakdown, 142, 149, 215 87, 100, 102, 103, 109, 110, 111, 115, 116, 191,
breast cancer, 16, 92, 124, 125, 130, 131, 132 192
breeding, viii, 37, 64, 66, 142, 145, 199 chronic lymphocytic leukemia, 103, 116
brothers, 13 circulation, 127, 131
browsing, 43, 62 classes, 2, 3, 19, 25, 27, 35
budding, 85, 98 classification, 104, 116, 117, 118, 146, 222
Bulgaria, 66, 68 cleavage, 27, 140
burn, 161, 162 climate(s), 41, 58, 63, 154, 189, 191

Index 229
climate change, 154 cooling, 191

clinical trials, 108 corolla, 68
clonality, 115, 120 correlation(s), 41, 51, 52, 53, 73, 98, 154, 187, 206
clone, 62, 97, 108, 114 correlation coefficient, 51, 52, 53
cloning, 63, 83, 97, 99, 141 cost, 141, 190, 203
cluster analysis, 57 counseling, 15
clustering, 73, 121, 196 covering, 12, 29, 138, 188
clusters, 25, 26, 29, 38, 104, 127, 184 Croatia, 66, 78
coarctation, 5 crop(s), x, 64, 135, 136, 137, 141, 144, 145, 190
coding, viii, ix, xi, 2, 3, 21, 24, 25, 26, 27, 28, 29, 30, crossing over, 2
32, 34, 102, 123, 124, 129, 201, 202, 204, 206, cryopreservation, 80
209, 214, 216, 221 CSF, 105
codon, 210 CTAB, 44
coherence, 17 Cuba, 145, 153
Colombia, x, 151, 152, 153, 155, 156, 157, 163, 165, cultivars, 141, 147, 149
166, 167, 168, 169, 170, 171, 178, 179, 181, 182, cultivation, 75, 139
189, 190, 191, 193, 194, 195, 196, 199 cultural practices, 138, 141
colon, 125, 129 culture, 80, 96, 97, 212, 216
colon cancer, 125, 129 cycles, 44, 159, 189, 214
colonization, 10, 41, 188, 195 Cyprus, 40, 58
color, 137, 204 cystic fibrosis, 22
colorectal cancer, 129, 131 cytochrome, 153, 196, 197, 198
commercial, x, 136, 137, 142, 153 cytokines, 102
communication, 9, 96 cytoplasm, 28, 29
community(s), 10, 60, 61, 78, 154, 193, 206, 212, cytosine, 202, 205
216, 224 cytotoxicity, 88
comparative analysis, 221
competitive advantage, 110
competitors, 31 D
compilation, vii
danger, 41
complement, 218
complementarity, 222 dark matter, 33, 34
data analysis, 63, 224
complexity, 103, 143
data processing, 203
complications, ix, 117, 123, 128, 131
data structure, 220
composition, 41, 146, 206, 210, 211
database, 19, 82, 83, 84, 86, 87, 89, 91, 93, 94, 95,
computation, 214
209, 210, 211, 222, 223, 224
computer, 78, 220
deaths, 125
concordance, 15
decoding, 202
conditioning, 97
decoupling, 10, 18
conductance, 22
conflict, 143 defects, 88
defense mechanisms, 143
conflict of interest, 143
deficiency, 84, 97, 99
consensus, 11, 49, 53, 106, 207, 214, 215, 216
deficit, 72
conservation, viii, 34, 37, 38, 39, 40, 41, 57, 58, 59,
degenerate, 2, 4, 5, 6
60, 61, 63, 65, 67, 69, 76, 77, 187, 190, 197, 198,
degradation, 27, 28, 29, 31, 32, 77, 100, 106, 153
199
Delta, 66, 76
constituents, 113, 115
demographic change, 154, 161, 178, 188
construction, 141
demographic data, 41
contaminated soil(s), 138
contamination, 141 demography, 198
denaturation, 44
control group, 7
deoxyribose, ix, 79
controversial, 7
deposits, 189, 190
convergence, 162
depression, 56, 75

230 Index
deregulation, 124, 127, 130 dominance, 39, 62, 108, 113

destruction, 41, 55, 61, 73, 127 donors, 9
detectable, 114 dosage, 7, 29
detection, ix, 88, 103, 105, 116, 123, 126, 127, 140, dosage compensation, 29
141, 145, 147, 148, 196, 205, 214, 220, 222, 223, DOT, 140
225 draft, 144, 221
detection system, 141 drainage, 191, 195
developed countries, 22 Drosophila, 107, 220
developmental disorder, 9, 17 drought, 39, 56, 188
deviation, 72 drug resistance, 90
diabetes, 124 drugs, 90, 108, 206
diagnostic criteria, 105, 117 DUSP3, 94, 95, 100
dideoxy sequencing, 102 dwarfism, 87
diet, 154 dysplasia, 113
diploid, 44, 45, 61, 68, 70, 136
discontinuity, 62
discrimination, 12, 224 E
disease progression, 113, 119, 126
E. coli, 98, 142, 211
diseases, vii, ix, x, 1, 3, 4, 9, 86, 90, 118, 119, 123,
East Asia, 12, 198
124, 128, 130, 135, 140, 142, 144, 145, 147, 153
Eastern Europe, 60
disorder, ix, 9, 14, 87, 90, 101, 128
ecological requirements, 40
dispersion, x, 138, 152
ecology, xi, 55, 59, 60, 193, 194, 195, 201, 202, 212,
distribution, viii, 10, 11, 16, 38, 39, 40, 41, 48, 56,
213
58, 59, 60, 61, 63, 65, 66, 68, 104, 152, 153, 160,
economic losses, 140
161, 162, 170, 175, 176, 179, 188, 189, 194, 195,
ecosystem, 61, 64, 193, 212
197, 198
Ecuador, 11, 12, 137, 145, 152, 153, 155, 156, 157,
divergence, 2, 59, 62, 66, 67, 73, 186
158, 165, 168, 181, 189, 193
diversification, 10, 182, 184, 189, 191, 192
editors, 193
diversity, vii, ix, x, xi, 13, 15, 18, 39, 41, 44, 45, 46,
egg, 29, 118
47, 48, 49, 50, 51, 52, 53, 55, 56, 58, 59, 61, 62,
Elam, 60, 66, 76
63, 69, 73, 77, 79, 80, 138, 145, 147, 151, 152,
election, 41
153, 154, 160, 163, 164, 187, 197, 199, 201, 202,
electromagnetic, 22, 32
212, 213, 216
electromagnetic waves, 22, 32
DNA, vii, viii, ix, 1, 8, 15, 16, 17, 18, 19, 22, 25, 26,
electron, 139
27, 30, 31, 32, 37, 40, 41, 43, 44, 45, 55, 60, 61,
electrophoresis, viii, 44, 65, 69
62, 63, 79, 80, 81, 82, 83, 84, 85, 86, 88, 89, 90,
ELISA, 139, 140, 145
92, 93, 94, 95, 96, 97, 98, 99, 100, 102, 103, 104,
e-mail, 79
108, 109, 110, 111, 112, 114, 119, 139, 140, 141,
embryonic stem cells, 29
153, 158, 159, 160, 161, 194, 196, 197, 198, 199,
encoding, 3, 31, 102, 106, 114
202, 203, 204, 205, 206, 208, 210, 212, 213, 216,
endangered, 41, 66, 75
217, 220, 221
endocrine, 131
DNA damage, vii, ix, 31, 79, 80, 81, 82, 83, 86, 88,
endocrinology, 80
90, 92, 93, 96, 98, 99, 100, 108
endonuclease, 88, 91, 98
DNA damage response, 81, 83, 92, 98, 99, 100
endosperm, 138
DNA lesions, ix, 79, 90
endothelium, 129
DNA ligase, 86, 97
engineering, 219
DNA polymerase, 83, 88, 90, 97, 140, 159, 213
environment(s), 39, 43, 59, 64, 96, 105, 191, 194,
DNA repair, ix, 79, 80, 81, 83, 85, 86, 88, 92, 94, 95,
210, 212, 213, 216
96, 98, 99
environmental change, 41
DNA sequencing, 15, 102, 104, 109, 208, 217
environmental conditions, 40, 62, 206, 216
DNA strand breaks, 92
environmental factors, 127
dogs, 198
enzyme(s), ix, 27, 28, 40, 62, 69, 72, 80, 83, 86, 90,
domestication, 143
92, 94, 112, 114, 139, 140, 213

Index 231
enzyme-linked immunosorbent assay, 140 force, 11

epidemiology, 148 Ford, 60
epithelial cells, 96 Forensic Genetics, vii, 1, 12
equilibrium, 74, 153 forest management, 39, 40, 41, 58, 59
equipment, 141, 203 formation, 24, 40, 62, 81, 83, 89, 97, 113, 126, 128,
erosion, 73 189, 191, 192
erythrocytosis, ix, 101, 104, 111, 116, 118 formula, 75
erythroid cells, 106 founder effect, 57, 73
erythropoietin, ix, 101, 102, 106, 115 fragments, 44, 69, 147, 204, 205
EST, 221 France, 66, 68
ethnic groups, 3 free radicals, 98
ethnicity, 103 fruits, x, 136, 137, 138, 139, 144, 154
etiology, 104, 132, 144 functional analysis, 120, 202
eukaryotic, 7, 86, 96, 142, 214 fungi, 42, 213
eukaryotic cell, 96 fusion, 85, 116
Eurasia, 189
Europe, 7, 10, 41, 189, 197
evidence, ix, x, 3, 12, 14, 16, 22, 26, 27, 76, 77, 103, G
111, 123, 125, 152, 173, 193, 198, 208, 211, 221
gamete, 83
evolution, xi, 10, 13, 14, 16, 17, 18, 20, 26, 34, 62,
gamma radiation, 92
83, 108, 115, 120, 133, 194, 195, 198, 199, 201,
gel, 44, 69
202, 223
gender differences, vii, 1
excision, ix, 79, 81, 86, 87, 88, 89, 97, 98
gene amplification, 158
exclusion, 114
gene expression, 7, 17, 19, 26, 27, 28, 29, 31, 34,
exons, 26, 27, 103, 214
111, 112, 124, 128, 130, 202, 206, 224
expertise, 208
gene flow, viii, x, 38, 41, 45, 46, 48, 49, 50, 51, 52,
exposure, 22, 102
55, 57, 65, 66, 73, 152, 154, 161, 165, 166, 167,
extinction, 66, 73
169, 170, 171, 188, 195, 196
extraction, 43, 44, 69, 140, 199, 212
gene pool, 17
extracts, 86, 88, 90, 98, 142
gene promoter, 31
extreme cold, 188
gene regulation, 112
EZH2, 108, 111, 112, 114, 119, 120
gene silencing, 25, 27, 107, 118, 142
gene transfer, 210
F genes, vii, viii, x, xi, 1, 2, 3, 4, 7, 8, 9, 16, 17, 18, 21,
22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 35, 80, 86,
families, 24, 25, 87, 127, 137, 138, 210, 222 88, 98, 104, 106, 107, 109, 111, 112, 124, 126,
family deletions, vii 127, 140, 141, 143, 149, 150, 151, 154, 159, 162,
family members, 25, 105, 112 163, 165, 167, 168, 169, 170, 173, 175, 176,
fauna, 194, 197 178,뒬179, 182, 186, 187, 194, 196, 201, 206,
fertility, 13, 16, 20 210, 212, 213, 214, 216, 221, 222
fertilization, 3, 29 genetic alteration, 108
fetus, 2 genetic background, 33
fibroblasts, 86, 88, 96, 132 genetic diversity, vii, viii, 18, 37, 39, 40, 41, 44, 45,
fidelity, 90 46, 48, 50, 51, 52, 53, 55, 56, 58, 59, 62, 65, 66,
fitness, 56 67, 72, 73, 160
fixation, 57, 74 genetic drift, 12, 41, 57, 66
flora, 41, 66, 73, 76, 77 genetic factors, 33, 41, 109
flora and fauna, 41 genetic information, 13, 90, 152, 206, 212
flow value, 48 genetic marker, vii, 1, 12, 44
flowers, 68, 154 genetic mutations, 102, 105
fluctuations, viii, 39, 65, 73, 198 genetics, vii, 1, 10, 14, 15, 18, 19, 38, 55, 62, 66, 77,
fluid, 138 78, 108, 152, 153, 154, 188, 190, 194, 199, 206
food, 15, 40, 58, 60, 154

232 Index
genome, vii, xi, 1, 10, 12, 13, 17, 27, 29, 30, 55, 80, Hawaii, 137, 139, 145, 149
81, 102, 106, 137, 138, 139, 140, 141, 144, 145, health, vii, 1, 3, 13, 131
146, 147, 148, 149, 201, 202, 203, 204, 205, 206, heart disease, 130
207, 208, 209, 210, 211, 212, 213, 214, 215, 216, heart failure, vii, 1, 4, 5, 6, 127, 132
217, 218, 219, 220, 221, 222, 224 height, 162
Genome integrity, 81 HeLa cells, vii, ix, 79, 80, 81, 82, 83, 84, 85, 86, 87,
genomic DNA library, 204 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100
genomic instability, 92, 99, 118 hematocrit, 104, 113, 115
genomic regions, 28, 30, 31 hematopoietic stem cells, 113
genomic stability, ix, 80, 81, 92, 94, 96, 97 hemoglobin, 104, 109
genomics, xi, 141, 143, 144, 149, 201, 202, 206, 215, hemorrhage, 104
217, 218, 222, 223 hepatomegaly, 115
genotype, 40, 44, 104, 105, 109 heterochromatin, 3, 106, 107
genotyping, 12, 17, 109, 111 heterogeneity, x, 113, 151, 154, 161, 165, 166, 167,
genus, x, 55, 135, 136, 137, 138, 139, 140, 146, 153, 187
196, 197 highlands, 189
geographical origin, 184 histidine, 69
geography, 10, 18, 48 histocompatibility antigens, 8, 17
geology, 194 histogram, 207
germ cells, 19, 31 histone, 25, 26, 32, 83, 97, 106, 107, 118, 120
Germany, 206 histone deacetylase, 83
germination, 61 history, vii, 1, 10, 12, 18, 41, 59, 60, 61, 63, 66, 73,
gestation, 153 154, 161, 187, 194, 195, 196, 217
gestational diabetes, 128, 129, 133 HLA, 8, 19
global climate change, 62 Holocene, 38, 60, 73
graft-versus-host disease, vii, 1, 4, 5, 6, 17 homeostasis, 22, 109
graph, 207, 215, 216 homologous recombination, viii, ix, 21, 26, 79, 81,
grasses, 154 83, 84, 85, 97, 118
grasslands, 189 hormone, 22
Greece, 21, 63, 123 host, vii, 1, 4, 5, 6, 8, 9, 17, 19, 137, 139, 140, 141,
grouping, 58, 104, 109 142, 143, 144, 150, 153
growth, 28, 39, 58, 63, 64, 80, 81, 107, 117, 124, hotspots, 83, 97
129, 131, 178, 198 House, 195
growth arrest, 80, 81 HSCT, 8
growth hormone, 117 human, vii, viii, ix, 1, 3, 9, 10, 12, 13, 14, 16, 17, 18,
guanine, 202, 205 19, 20, 21, 24, 26, 29, 32, 33, 35, 38, 39, 41, 56,
guardian, 80, 97 57, 59, 66, 79, 80, 83, 84, 86, 90, 92, 96, 97, 98,
guidelines, 67, 104, 209 99, 100, 107, 115, 116, 118, 123, 124, 125, 127,
Guyana, 152, 153, 199 129, 130, 131, 132, 133, 189, 195, 208, 217, 218
GVHD, vii, 1, 8, 9, 19 human activity, 38, 56, 57, 59, 189
human brain, 9
Human Evolution, vii, 1, 9
H human genome, 29, 218
hunting, 20, 153, 190
habitat, vii, viii, 38, 65, 66, 67, 73, 75, 76, 77, 78,
H-Y antigen, 8, 19
188, 190
hybrid, 31, 88
habitat fragmentation, vii, viii, 65, 66, 75, 76, 78
hybridization, 40, 149, 192
habitats, 51, 68, 189, 194
hydrogen, 202
hair, 158
hydrophobicity, 139
handbags, 190
hyperplasia, 104
haploid, 9, 206
hypertrophy, 127, 129, 132
haplotypes, 7, 9, 13, 17, 18, 20, 54, 55, 155, 160,
hypothesis, 7, 10, 38, 57, 58, 73, 92, 107, 113, 140,
162, 163, 164, 181, 184, 187
154, 188, 197, 198
harbors, 127

Index 233
hypoxia, 88, 133 intron(s), 26, 27, 30, 110, 129, 142, 149, 192
hypoxia-inducible factor, 88 invasions, 195
hypoxic cells, 88 ionizing radiation, 83, 92
ions, 202
Iran, 40, 197
I iron, 109
irradiation, 99
IBD, 162, 193 irrigation, 138
Iberian Peninsula, viii, 65, 66, 75 ischemia, 127
identification, ix, xi, 13, 14, 57, 80, 86, 90, 91, 100, islands, 112, 210, 223
102, 107, 109, 131, 140, 141, 193, 198, 201, 202, isocitrate dehydrogenase, 70
205, 206, 208, 210, 212, 216, 218, 221 isolation, x, 10, 66, 152, 154, 191, 193
identity, 8, 25, 27, 38, 54, 58, 192, 206 isozyme(s), 70, 76, 77, 78
idiopathic, 3, 19, 26, 35, 104, 116 Israel, viii, 37, 38, 39, 40, 41, 42, 46, 48, 50, 51, 54,
images, 67 58, 59, 60, 61, 62, 63, 64
immortality, 124 issues, 162
immune response, 124 Italy, 66, 67
immunity, 149
immunoprecipitation, 141
imprinting, 29, 112 J
improvements, ix, 101
in vitro, ix, 32, 80, 96, 101, 102, 107, 108, 109, 111, Jamaica, 149
112, 113, 127, 145 Japan, 12, 136, 139, 147
in vivo, ix, 101, 110, 113, 115, 141, 142 Jordan, 39, 42, 46, 47, 48, 55, 56, 64
inbreeding, 57, 61, 66, 73 justification, 108
incidence, x, 9, 87, 104, 105, 110, 126, 136, 139, 140
incompatibility, 142
India, 137, 145, 148 K
Indians, 18
karyotype, 9, 121, 191
individuals, vii, viii, 1, 7, 8, 44, 47, 62, 65, 66, 69,
kidney, 27
70, 72, 90, 102, 103, 104, 105, 109, 119, 153,
kill, 55
160, 173, 179, 181, 182, 184, 188, 190, 191
kinase activity, ix, 101, 105
induction, 129, 142, 143
infection, 80, 137, 138, 139, 141, 142, 143, 145, 146,
147 L
infertility, vii, 3, 4, 5, 6, 7, 15, 16, 17, 18, 19, 22, 23,
24, 26, 28, 32, 33, 34, 35 lakes, 154
inflammation, 127 landscape(s), 38, 40, 41, 55, 66, 195
inflammatory disease, 127 language development, 9
inhibition, 32, 92, 107, 108, 129, 131 Latin America, 197
initiation, 4, 7, 108, 142, 221 lead, x, 3, 8, 30, 41, 73, 81, 84, 85, 88, 90, 136, 208
inoculation, 137, 138 Lebanon, 40
inoculum, x, 136, 142 lesions, 81, 86, 88, 90, 120
insects, 42, 154 leucine, 22, 103
insertion, 90, 205, 210 leukemia, 103, 104, 114, 116, 120, 131
inspections, x, 136, 141 leukocytes, 19
integrases, 210 leukocytosis, 117
integrity, 80, 81, 140 LIFE, 77
interface, 40 life cycle, 39, 59, 137
interference, 31, 86, 142 ligand, 106
interferon, 103 light, 34, 43, 59, 128, 138
interstrand crosslink, 90, 93, 98 limestone, 40
intervention, 108 lithium, 70
intrauterine growth retardation, 87, 128 liver, 5

234 Index
liver cancer, 5 mental retardation, 87

living environment, 212 mesenchymal stem cells, 132
localization, 70, 83, 95, 99, 100, 216, 223 meta-analysis, 132
loci, viii, 9, 13, 14, 18, 19, 24, 29, 30, 44, 45, 46, 47, metabolic pathways, 213, 216
49, 50, 51, 52, 53, 56, 57, 65, 69, 71, 72, 73, 74, metabolism, 3, 89, 99, 124
102, 107, 153, 191 metabolites, 89
locus, 44, 51, 57, 70, 87, 102, 111 metastasis, 124, 125
logging, 55 methodology, 203, 213
longevity, x, 66, 135, 136, 197 methylation, 25, 26, 31, 32, 111, 112, 124, 131, 141
lung cancer, 83, 97, 103, 116, 125, 126, 130, 132 methylcellulose, 109
Luo, 76, 99, 127, 128, 131, 132, 133, 144, 198, 220, Mexico, 10, 136, 138, 146, 152, 153
223 Mg2+, 90
lymphoma, 111 MHC, 192
lysine, 31, 103 mice, 84, 99, 111, 113, 114, 119, 129, 131, 194
microbial community(s), 202, 212, 213, 216, 224
microcephaly, 87
M microorganism(s), 212, 213, 216
microRNA(s), vii, 27, 33, 34, 36, 88, 129, 130, 131,
machinery, 29, 115
132, 133, 148
macromolecules, 206
microsatellites, 14, 211
magnitude, 190
microscope, 139
majority, ix, 25, 29, 33, 80, 86, 101, 104, 105, 107,
Middle East, 40, 41, 59, 63, 64
208, 213
migrants, 45, 48, 49, 51, 55
malate dehydrogenase, 69
migration, ix, 10, 18, 48, 56, 70, 72, 92, 123, 125,
male infertility, vii, 3, 5, 7, 15, 16, 18, 19, 22, 23, 24,
128, 131, 161, 193
28, 33, 34, 35
Miocene, x, 152, 191, 192, 194, 195, 196
male lineages, vii, 1, 10, 12, 13
MIP, 207
male-specific region, vii, 1, 2, 16, 19
miscarriage, 4, 5, 6
mammal(s), 27, 31, 154, 191, 194, 195, 197, 199
mismatch repair, ix, 79, 81, 90, 91, 98
mammalian cells, 80, 85, 88
mitochondria, 83
management, ix, x, 38, 41, 57, 63, 67, 75, 77, 101,
mitochondrial DNA, 15, 175, 176, 178, 195, 197,
116, 124, 125, 128, 136, 140, 142, 149, 196
199
manufacturing, 22
mitochondrial genes, vii, x, 151, 154, 163, 167, 173,
mapping, 3, 87, 107, 202, 206, 214, 215, 221
179, 182, 186, 187, 194, 196
Markov chain, 161
mitosis, 109, 110
marrow, 9, 15, 104, 115, 117
model system, 80
masking, 30
models, ix, 60, 101, 160, 162, 163, 179, 223
mass, 104
modifications, 25, 28, 44, 70, 72, 92
materials, 41, 67
modules, 207
matrix, 193, 204
moisture, 40
matter, 9, 13
molecular biology, 141, 144
measurements, 142
molecular dating, 162
meat, 154, 190
molecular mass, 98
medical, 3, 14, 15, 19, 102, 123
molecular weight, 44, 159
medicine, 14, 15, 16, 17, 19, 128
molecules, ix, 8, 27, 33, 80, 90, 123, 127, 128, 196,
Mediterranean, viii, 38, 39, 40, 41, 55, 57, 59, 61,
212, 216
62, 63, 64, 65, 66, 73
monomers, 106
Mediterranean Basin, viii, 65, 73
Monte Carlo method, 196
Mediterranean climate, 39
Montenegro, 128, 132
Mediterranean countries, 38
morphology, 22, 40, 104, 139, 196
megakaryocyte, 104
mortality, 7, 8, 56
meiosis, 2, 83
mortality rate, 56
mellitus, 128, 129, 133
memory, 220

Index 235
mosaic, x, 2, 15, 19, 35, 135, 136, 137, 138, 139, normal distribution, 162
145, 147, 148 North America, 10, 12, 61, 76, 189, 195
motif, 5, 7, 19, 99, 190 nuclei, 66
mRNA(s), ix, 17, 27, 28, 29, 30, 31, 32, 36, 88, 119, nucleic acid, ix, 79, 140, 212
123, 201, 202, 206, 214, 216, 220 nucleosome, 3
MSY, vii, 1, 2, 3, 4, 6, 9, 12, 23, 24, 25, 26 nucleotide excision repair, ix, 79, 81, 86, 87, 97
mtDNA, 194, 197, 198 nucleotide sequence, 2, 25, 83, 139, 145, 161, 196,
multiple myeloma, 15 209
multiplication, 96 nucleotides, 27, 28, 88, 90, 124
mumps, 22 nucleus, 28, 29, 88, 92, 94, 99, 106, 107
mutagenesis, 90 null, 3, 92, 113, 114, 165, 167
mutant, ix, 92, 94, 101, 103, 105, 107, 108, 110, 111, nutrients, 206
114, 117, 119
mutation(s), ix, 2, 9, 11, 13, 19, 22, 23, 41, 81, 88,
90, 97, 101, 102, 103, 104, 105, 107, 108, 109, O
110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
obesity, 22, 33
120, 121, 137, 140, 141, 161, 164, 195, 197, 198
oligospermia, 23
mutation rate, 13, 19, 41, 88, 108, 161, 164, 197
oligozoospermia, viii, 3, 15, 21, 22, 23, 24, 26, 33,
mutational analysis, 120
35
myelodysplasia, 103, 111, 119, 121
oncogenes, 124, 126
myelodysplastic syndromes, 116
oncogenesis, 8
myelofibrosis, 104, 108, 115, 119, 120, 121
optimization, 221
myeloid metaplasia, 115
orchid, 76
myeloproliferative disorders, 115, 116, 118, 119, 120
organ(s), 7, 127
myocardium, 127
organism, 80, 85, 205, 206, 210, 212, 216
orthography, 208
N overlap, 103, 111, 207, 216
oxidation, 138
NaCl, 44 oxidative stress, 22
NADH, 196
National Academy of Sciences, 98, 119, 120
National Institutes of Health, viii, 79 P
Native Americans, 10, 12, 14
p53, 80, 81, 82, 83, 95, 97, 114, 120
native population, 13, 15
P53, 97
NATO, 193
pachytene, 31
natural selection, 56, 161
Pacific, 10, 152, 181, 192, 203
necrosis, x, 135, 136, 137, 145
pairing, 27, 90, 206
negative effects, 75
Panama, 152, 153, 188, 192, 195
neoplasm, ix, 101, 103, 115, 116, 120
pancreatic cancer, 126, 130
Netherlands, 61
papaya viral diseases, vii, x, 136, 142
neural system, 28
Paraguay, 152, 153, 187
neurodegenerative diseases, 98
parallel, 10, 141, 208, 218, 220
neutral, 41, 55, 198
parameter estimates, 162
New England, 16, 115, 116, 117, 118, 120, 121
participants, 9, 109
next generation, 2, 12, 13, 81, 148, 203, 217
paternal lineages, vii, 1, 9, 10, 14
nitrogen, 44, 224
pathogenesis, ix, 7, 19, 102, 109, 111, 115, 123, 125,
nodes, 12
126, 127, 128, 144, 218
non-classical, 92
pathogens, 136, 140, 141
Non-homologous end joining, 85, 86
pathology, 18, 23
nonsmokers, 8
pathophysiology, vii, 1
non-structural protein, 140
pathway(s), 2, ix, 36, 79, 84, 85, 86, 88, 90, 91, 92,
normal aging, 8
94, 97, 98, 107, 118, 126, 127, 206, 208, 219
normal development, 27

236 Index
PCR, 17, 44, 102, 104, 129, 140, 146, 147, 148, 158, polymerase chain reaction, 140, 158
159, 199, 212, 213, 216 polymerization, 99
peptides, 8 polymorphism(s), 9, 16, 62, 109, 110, 129, 131, 141,
pericentric inversion, 191 196, 198, 202, 214
perinatal, 128 population, viii, x, 7, 10, 11, 12, 13, 14, 15, 19, 38,
peripheral blood, 8, 15, 102, 107, 109, 113, 115, 127, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
129 55, 56, 57, 58, 59, 60, 62, 63, 64, 65, 66, 69, 73,
peripheral blood mononuclear cell, 129 75, 76, 78, 109, 110, 142, 149, 152, 153, 154,
permission, 69 161, 162, 163, 164, 165, 167, 168, 169, 170, 171,
Peru, 10, 152, 153, 155, 156, 157, 158, 165, 168, 172, 173,뒬174, 175, 176, 178, 179, 182, 187,
170, 181, 189, 190, 192 188, 189, 190, 191, 192, 193, 194, 195, 197, 198,
pesticide, 35 199, 214
pH, 44, 69 population group, 15
phage, 210 population growth, 194, 195, 197
phenol, 158 population size, viii, 41, 55, 60, 65, 66, 73, 76, 154,
phenotype(s), ix, 3, 23, 24, 38, 39, 44, 59, 70, 72, 88, 161, 164, 173, 188, 193, 197
90, 92, 99, 101, 104, 107, 108, 109, 111, 113, population structure, 38, 48, 56, 57, 62, 76
118, 219 positive feedback, 16
phenylalanine, 102, 103 precipitation, 141
Philadelphia, 103, 116 precursor cells, 106
phosphorylation, 83, 84, 97, 106, 107 predation, 154
phylogenetic tree, 11, 45, 162, 192 predators, 154
phylogenetics, vii, 194, 198 preeclampsia, 129, 130, 132
phylogeography, vii, x, 151, 194 pregnancy, ix, 123, 124, 128, 131, 133
physical interaction, 92 preparation, 124
physical properties, 139 preservation, 41, 64, 75
physiology, 146, 212, 216 prevention, 16, 206
PI3K, 102, 107 primate, 133
pilot study, 32 probability, 12, 61, 120, 162, 203
Pinus halepensis, 56, 61, 63 progenitor cell(s), 109
pipeline, 213, 222 prognosis, 9, 111, 119, 125
placenta, 127, 128 project, 3, 218
Plant Pathology, 144, 145, 146, 147, 148, 150 prokaryotes, vii, xi, 201, 221, 223
plants, x, 28, 55, 58, 60, 62, 63, 70, 73, 76, 78, 136, proliferation, ix, 3, 83, 99, 107, 123, 124, 127, 129,
137, 138, 139, 140, 141, 142, 143, 145, 148, 149, 130, 131, 132
150 propagation, 38, 41, 57, 58, 59, 66, 80, 96, 140, 208
plasmid, 88 prostate cancer, vii, 1, 4, 5, 6, 7, 8, 16, 17, 18, 126,
plasticity, 39, 55, 59 130, 131, 132
platelet count, 104, 109, 115 prostate carcinoma, 130, 131
platelets, 98, 113 prostate gland, 7
platform, 202, 204, 214, 221 proteasome, 150
playing, ix, 86, 113, 123, 131 protection, 75, 141, 142, 149, 150
Pliocene, 189, 190, 191, 192 protein kinases, 107
PM, 19, 129, 131, 217 protein sequence, 138, 210
point mutation, 110 protein signatures, 214
pollen, 66, 189 proteins, 3, 25, 27, 28, 29, 30, 31, 32, 35, 83, 84, 88,
pollution, 22 90, 91, 92, 95, 96, 98, 100, 105, 106, 111, 112,
polycythemia, vii, ix, 101, 102, 115, 116, 117, 118, 113, 137, 140, 143, 150, 206, 210, 211, 216, 223
120 proteome, 13, 16, 206, 211, 223
polycythemia vera, vii, ix, 101, 102, 115, 116, 117, proteomics, 150, 206, 218
118, 120 proto-oncogene, 131
polygenic disorder, 7 pseudogene, 2
polymerase, 28, 29, 44, 98, 138, 140, 158 psychiatry, 17

Index 237
psychosocial functioning, 15 reproduction, 9, 16, 17, 19, 62, 68

PTEN, 126 requirements, 44, 96, 209
publishing, 80 researchers, 3, 9, 26
pulp, 138 reserves, 57
purity, 140 residue(s), 102, 103, 105, 106, 107
PVP, 44, 69 resilience, 195
pyrimidine, 88 resistance, 39, 59, 88, 125, 131, 141, 142, 143, 144,
145, 146, 147, 149, 150, 223
resolution, 11, 13, 14, 16, 20, 195
Q resources, 39, 59, 63
response, 9, 31, 58, 59, 81, 83, 88, 91, 92, 96, 98, 99,
quantification, 141, 214, 215, 216, 224, 225
125, 143, 150, 206
Queensland, 101
restoration, 38, 39, 57, 59
retinoblastoma, 126
R rheumatoid arthritis, 129, 130, 132
Rho gtpase, 92, 93, 94, 98, 99
Rab, 119 Rhoa, 92, 93, 99
RAC1, 92, 98, 99 ribose, 98
radiation, 22, 32, 92, 99, 100, 196 ribosomal RNA, 33, 206, 222
radiation treatment, 92 ribosome, 222
rain forest, 191 risk, 7, 8, 9, 15, 22, 57, 104, 109, 114, 115, 118, 141,
rainfall, 43, 51, 56, 58 148
RAS, 107, 114 risk assessment, 148
reactions, 89, 159 risk factors, 22
reactive oxygen, 92, 98, 99 river basins, 191
reading, 19, 25, 26, 138, 205, 206 river systems, 187
reagents, 140 RNA(s), viii, ix, xi, 3, 4, 5, 7, 21, 26, 27, 28, 29, 30,
reality, x, 136, 162 31, 32, 33, 34, 35, 36, 83, 86, 97, 119, 123, 124,
reasoning, 192 137, 138, 140, 141, 142, 143, 145, 148, 201, 202,
receptor(s), ix, 22, 101, 102, 105, 106, 107, 117 206, 210, 211, 212, 214, 216, 224, 225
recognition, 97, 119, 210, 221, 223 RNAi, 94
recombination, vii, viii, ix, 1, 2, 9, 21, 26, 79, 81, 83, Robertsonian translocation, 22
84, 85, 89, 97, 109, 110, 111, 114, 118, 142, 206 rodents, vii, 152, 154, 187, 189, 197, 198, 199
reconstruction, 161, 195, 196 Romania, 66
recovery, 75 root, 39, 138, 142, 162
recovery plan, 75 root system, 39
red blood cells, ix, 101 routes, 120
regeneration, 55, 63 routines, 210
registries, 211 rowing, 56
regression, viii, 51, 53, 54, 65, 162, 179, 196
regression analysis, 51, 53, 54
S
regression equation, 51, 179
regression model, 179
samplings, 55
regulations, 211
savannah, 188, 189
rehabilitation, 39, 59
science, 14, 15, 17, 18, 19, 80
rejection, 9, 19
scripts, 213
relatives, vii, 1, 12, 13, 14
sea level, 39, 73
relaxation, 106
second generation, 202
remission, 17
seed, 66, 75, 76, 77, 138
repair, vii, ix, 28, 79, 80, 81, 82, 83, 84, 85, 86, 87,
seedlings, 56, 63
88, 89, 90, 91, 92, 95, 97, 98, 100
semen, 32
replication, 86, 88, 89, 90, 91, 98
senescence, 92, 95, 124
repression, ix, 28, 123, 126
sensing, 83

238 Index
sensitivity, 90, 92, 97, 102, 105, 140, 148 spermatogenesis, viii, 3, 7, 15, 17, 18, 20, 21, 22, 23,
sequencing, xi, 12, 13, 17, 33, 141, 146, 148, 201, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34
202, 203, 204, 205, 206, 207, 208, 209, 212, 213, spin, 159
214, 215, 216, 217, 218, 219, 220, 224, 225 spinal cord, 9
serine, 5, 83, 97, 105, 138 spleen, 113
serum, 104, 109, 116, 127, 128, 129, 133 splenomegaly, 113, 115
serum EPO, 104, 109 Spring, 63, 198
serum erythropoietin, 116 sprouting, 55, 60
settlements, 41 stability, 27, 39, 59, 81, 92, 94, 138, 144
sex, 2, 3, 5, 8, 9, 13, 16, 19, 22, 23 Stachys maritima, viii, 65, 66, 67, 68, 71, 72, 73, 74,
sex chromosome, 8, 9, 16 75
sex differences, 8, 16, 19 standard deviation, 162
sex reversal, 5 starch, 69, 70, 72
sexual dimorphism, 9 starvation, 154
shape, 39, 40, 61, 99, 113 state(s), 31, 44, 66, 114, 137, 144, 191, 224
shock, 4, 7, 15, 18 statistics, 18, 160, 161, 163, 164, 165, 166, 167, 172,
shoreline, 73 173
short tandem repeat, vii, 1, 12, 223 stem cells, 8, 29, 113, 130
showing, 73, 80, 83, 84, 86, 88, 90, 91, 168, 207 stimulation, 106
shrubs, 40, 63 storage, 75, 80
signal transduction, 117 storms, 67, 73
signaling pathway, 107, 115, 126 STR(s), vii, 1, 12, 13, 14, 16, 18, 19
signalling, 115, 118, 132 stratification, 104
signals, 75, 99 stress, 22, 62, 80, 92, 93, 94, 95, 99, 124, 143
significance level, 54 stress response, 124, 143
signs, 113 stressors, ix, 79
Sinai, 58, 63 structural changes, 106
siRNA, 84 structural protein, 140
SKD, 69 structure, ix, x, 7, 18, 27, 33, 56, 57, 61, 62, 66, 70,
skin, 155, 158, 190 76, 78, 79, 104, 107, 118, 146, 151, 152, 153,
smoking, 8, 16, 83 154, 162, 165, 179, 186, 187, 195, 199, 202, 206,
smooth muscle, 99 218
SNP, 12, 15, 17, 18, 110, 111, 217, 220, 225 subgroups, 26, 114
social behaviour, 196 substitution(s), 102, 103, 107, 111, 114, 160, 162,
software, xi, 44, 63, 64, 141, 160, 196, 201, 207, 163, 195, 196, 197, 199, 202, 205
208, 209, 210, 211, 214, 222 substrate(s), 90, 98
soil seed bank, 66 succession, 61
soil type, 39 Sun, 19, 29, 31, 35, 130, 133, 147, 223
solid tumors, 103, 119, 132 suppression, 25, 26, 31, 98, 109, 117, 126
solution, 149 survival, 8, 15, 31, 64, 92, 99, 113, 114, 119, 124,
somatic mutations, 114, 120 127, 130, 131, 132
South America, vii, 1, 9, 10, 12, 13, 14, 15, 17, 18, survival rate, 124
152, 165, 170, 189, 190, 192, 194, 195, 197, 198 susceptibility, 7, 14, 118, 129
Spain, 1, 66 Switzerland, 63
speciation, 162, 194 symptoms, 137, 138, 139, 143, 150
species, viii, x, 37, 38, 39, 40, 41, 42, 43, 45, 49, 55, syndrome, 22, 23, 73, 86, 87, 97, 98, 102, 116, 119,
56, 57, 58, 59, 61, 63, 64, 65, 66, 68, 73, 75, 76, 126
92, 98, 99, 127, 136, 137, 138, 139, 140, 142, synovial membrane, 127
149, 151, 152, 153, 154, 160, 161, 162, 163, 168, synthesis, 27, 80, 90, 97, 141
179, 184, 186, 187, 188, 189, 190, 191, 195, 198, Syria, 40
199, 208, 209, 211, 212, 213 systemic mastocytosis, 102
speech, vii, 1
sperm, viii, 3, 12, 15, 21, 22

Index 239
transfer RNA, 206, 222

T transformation, 104, 108, 109, 110, 113, 114, 117,
119, 120, 147, 148, 153
T cell(s), 8, 224
translation, 3, 4, 7, 28, 29, 31, 32, 142, 211, 221
T lymphocytes, 8
translocation, 26, 75, 88, 106, 107
Taiwan, 137, 139, 145, 147
transmission, 138, 139, 141, 142, 144, 147
tandem repeats, 3, 12, 211, 223
transpiration, 40
target, 28, 29, 30, 31, 102, 106, 113, 124, 126, 127,
transplant, 8
140, 141, 206, 207
transplantation, 8, 15, 19, 115
taxa, 27, 28, 29, 76, 78, 184, 192
transportation, 69
taxonomy, 76, 146
transposases, 210
TCC, 158
treatment, 22, 83, 90, 102, 103, 105, 125, 130, 206
teams, 102
trial, 109
techniques, 204, 205, 211, 212, 214, 216
triggers, 2
technology(s), vii, ix, 12, 13, 64, 123, 141, 148, 149,
TTTY gene, vii, viii, 21, 24, 25
202, 203, 205, 214, 215, 216, 217, 222, 225
TTTY1, viii, 21, 25, 32
telangiectasia, 97
TTTY18, viii, 21, 24, 25
telomere, 83, 85, 86
TTTY19, viii, 21, 25
temperature, 22, 41, 58, 189
TTTY2, viii, 21, 24, 25, 26, 27, 31, 32, 35
tempo, 196
territory, 39
testicular cancer, 14
TTTY2L12A, viii, 21, 24, 25, 26, 31
testing, 16, 108, 110, 139, 197, 198
TTTY2L2A, viii, 21, 24, 25, 26, 31
testis, viii, 3, 4, 6, 7, 8, 17, 21, 25, 26, 27, 31
TTTY3, viii, 21, 24, 25
TGF, 127
TTTY4, viii, 21, 24, 25
therapeutic interventions, 126, 128
TTTY6, viii, 21, 25
therapeutic targets, 125, 131
TTTY7, viii, 21, 24, 25
therapeutics, 125, 126, 219
TTTY8, viii, 21, 25
therapy, 130
tuberculosis, 211, 224
thinning, 62, 138
tumor(s), 6, 7, 8, 80, 81, 82, 83, 88, 90, 92, 97, 98,
threats, 66, 190
99, 109, 124, 125, 126, 130, 131
threonine, 5
tumor cells, 8, 80, 83, 90
thrombocytopenia, 121
tumor development, 131
thrombocytosis, 103
tumor growth, 126
thrombopoietin, 102
tumor invasion, 99, 125
thrombosis, 104, 109
tumor resistance, 90
thymine, 97, 112, 202
tumorigenesis, 92, 97
tissue, 3, 28, 29, 55, 60, 69, 80, 96, 97, 126, 130, 206
tumour suppressor genes, 124
tobacco, 16
Turkey, 39, 40, 58, 60, 66
tourism, viii, 65
Turks, 61
TP53, 114
turnover, 33
TPI, 70
tyrosine, ix, 94, 95, 100, 101, 102, 105, 106, 107,
trafficking, 29, 98
115, 116, 117
traits, 3, 61, 75
tyrosine phosphatase, 94, 95, 100
transactions, 90
transcription, 3, 4, 7, 18, 24, 25, 26, 28, 29, 31, 80,
86, 90, 106, 140, 141, 143, 147, 148, 150, 210, U
218, 222
transcription factors, 143, 150, 218 ubiquitin, 6
transcriptomics, xi, 201, 202, 206, 215, 216, 224 unification, 224
transcripts, viii, 21, 25, 26, 27, 28, 30, 31, 32, 34, uniform, 33
113, 129, 202, 205, 206, 211, 214, 216 unique features, vii, 1
transducer, 106 United Nations, 197
transfection, 27

240 Index
United States (USA), 34, 97, 98, 99, 119, 120, 139,
193, 195, 196, 198, 199, 217, 219, 222
W
urban, viii, 65, 190
Washington, 20, 60, 194, 195, 198
urban areas, 190
water, 39, 40, 43, 62, 63, 137, 138, 139, 154
urbanization, 66
web, 210, 222, 223
urticaria, 115
web service, 210
Uruguay, 152
wetlands, 154
USA, 34, 97, 99, 139, 193, 195, 196, 198, 199, 217,
wild type, 119
219, 222
Wisconsin, 61, 196
USDA, 61
withdrawal, 107
UV irradiation, 86
Wnt signaling, 126
UV light, 86, 88
wood, 19, 58, 67
UV radiation, 92
woodland, 62
workers, 35
V World Health Organization, 104, 117
World Health Organization (WHO), 104, 105, 117
Valencia, 155, 156 worldwide, viii, x, 12, 13, 21, 125, 135, 136
validation, 19, 147, 211, 216
valine, 102, 107
variables, 41
X
variations, 7, 40, 57, 58, 77, 83, 140, 197, 214, 215
X chromosome, 2, 9, 23, 31
varieties, 39, 42, 59
xenografts, 129, 131
vascular system, 126
X-inactivation, 112
vector, 138, 142, 223
XYY syndrome, 15
vegetation, 59, 66, 188, 189
VEGF, 128
Venezuela, 145, 152, 153, 190, 197 Y
vertebrates, 105
vesicle, 98 Y chromosome, vii, viii, 1, 2, 3, 5, 7, 8, 9, 10, 11, 12,
viral diseases, vii, x, 136, 142 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
viral infection, 27, 80, 112, 124, 137, 138, 142, 143 26, 27, 33, 34, 35
virology, 80, 148 yeast, 83, 98
virus infection, 142, 148, 150 yield, 139, 213, 215
viruses, x, 28, 33, 80, 83, 96, 136, 139, 140, 142, Y-linked gene family, viii, 21
143, 147, 223 Yugoslavia, 78
virus-host, 145
visualization, 221, 224
vulnerability, 142

Advances in Genetics Research 16

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advances in Genetics Research 16

Uploaded by

Copyright:

Available Formats

Complimentary Contributor Copy

Complimentary Contributor Copy