Professional Documents
Culture Documents
Abstract
Gene duplication is widely regarded as a major mechanism modeling genome evolution and function. However, the
mechanisms that drive the evolution of the two, initially redundant, gene copies are still ill defined. Many gene duplicates
experience evolutionary rate acceleration, but the relative contribution of positive selection and random drift to the
retention and subsequent evolution of gene duplicates, and for how long the molecular clock may be distorted by these
gene functions (Ohno 1970; Zhang et al. 2003). For instance, tage), the likelihood of the duplicate becoming fixed is low
in the human genome, between 15% and 38% of genes may and can be estimated as 1/2N in diploid organisms, where N is
have arisen through gene duplication (Lynch and Conery the effective population size. Two genes with identical func-
2000; Zhang et al. 2003). Unequal crossing-over during mei- tions are unlikely to be maintained in the genome unless the
osis is the main contributor for tandem or closely located generation of extra gene product is beneficial (Zhang et al.
gene duplicates, giving rise to a situation where the structure 2003). Thus, most duplicates are expected to be lost to pseu-
of the progenitor (parent) and novel duplicated (daughter) dogenization rather than becoming fixed (Lynch and Conery
gene is expected to be initially equal, unless the gene was 2000). Retrogenes are strong candidates to be pseudogenized,
disrupted during the process of duplication itself. On the because they are expected to be distantly located from the
contrary, genes formed by retrotransposition (retrogenes) promoter region of the parent copy. However, it has been
will have a different structure from their parent copy, because observed that duplicates formed through this mechanism can
retrotransposition involves the random insertion of a be expressed (Marques et al. 2005; Zemojtel et al. 2010).
retrotranscribed cDNA. The structure of the daughter retro- Several mechanisms have been proposed to explain the
gene is characterized by a single exon, the presence of a polyA maintenance of gene duplicates in the genome (reviewed in
tail, and a lack of regulatory regions (Hahn 2009). Segmental Hahn [2009] and Innan and Kondrashov [2010]), which can
duplications and whole-genome duplications (WGDs) be broadly summarized into three main models: neofunctio-
are other sources of gene duplicates, in which multiple nalization, subfunctionalization, and increased gene-dosage
genes and their flanking regulatory regions are duplicated advantage. In the neofunctionalization model, one copy is
simultaneously (Dehal and Boore 2005; Sharp et al. 2005). maintained by purifying selection and retains the original
The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: journals.permissions@oup.com
1830 Mol. Biol. Evol. 30(8):18301842 doi:10.1093/molbev/mst083 Advance Access publication April 26, 2013
Neofunctionalization of Gene Duplicates . doi:10.1093/molbev/mst083 MBE
function, whereas the redundant copy is free to evolve and technology appears to be more appropriate for studying
potentially acquire new gene functions (Ohno 1970; Walsh expression in duplicates because it negates the cross-
1995; Force et al. 1999). Following gene duplication, the evo- hybridization problem. However, there is currently a paucity
lutionary rate is expected to be accelerated in the copy freed of data for most species, and where data are available, it is
from purifying selection, resulting in asymmetry between the only for a limited number of tissues and time points, which
two copies. If mutations lead to adaptive functional changes may lead to underdetection of differences in gene expression
then the new copy is expected to evolve by positive selection. between duplicates.
In the subfunctionalization model (complementation The aim of this study is to investigate the evolutionary
degeneration model), both duplicates accumulate mutations pace of recently duplicated genes, tracking their nonsynony-
through genetic drift, and consequently, their evolutionary mous to synonymous substitution rate ratio (dN/dS) during
rates increase symmetrically relative to the original copy, different time intervals, and to evaluate functional conse-
and the ancestral function(s) may be split between duplicates quences through the analysis of patterns of positive selection
(Force et al. 1999; Lynch and Force 2000). Subsequently, and gene expression. Estimation of evolutionary rate in young
purifying selection is expected to maintain the two function- duplicates can be a difficult task, due to the short period of
ally distinct copies. A related model is that of escape from time available for changes to accumulate. In addition, when
adaptive conflict, in which a protein with multiple functions few changes have accumulated, it may not be possible to
can optimize each function independently after gene dupli- detect significant differences in the speed of evolution be-
1831
Pegueroles et al. . doi:10.1093/molbev/mst083 MBE
sequences. Post-SD families had a single duplication event of which were specifically involved in the respiratory
(fig. 1B and C) except for a few cases in which the gene had chain (GO:0070469).
independently duplicated in mouse and rat (fig. 1D). Comparison of the distribution of nonsynonymous to
We next investigated whether the set of duplicated synonymous substitution rates (dN/dS) in the branches
genes showed any functional biases with respect to the rest before and after the duplication event (PreDupl vs.
of the genes in the genome. Gene Ontology (GO) term en- PostDupl, fig. 1) showed a significant tendency toward an
richment analysis of the 152 families from the Post-SD and increase in evolutionary rate following duplication
Pre-SD sets, using the human ortholog to identify function, (P < 0.001 and P < 108, for Pre-SD and Post-SD, respectively,
revealed an over-representation of olfactory receptor and Wilcoxons test; fig. 2A and B, respectively). This acceleration
mitochondrial genes. For the molecular function ontology, diminished over time, as we detected a significant decrease
we found an enrichment in olfactory receptor activity in dN/dS in the Pre-SD set by the time the mouse and rat
(GO:0004984) in 10.53% of the genes list (P < 0.05 using a lineages have segregated (P < 0.01, PostDupl vs. PostSp
two-tailed Fishers exact test and adjusting for multiple Wilcoxons test, fig. 2A).
comparisons). Fifty percent of these genes were also defined
as melanocortin receptor activity (GO:0004977). For the bio- Asymmetric Evolution of Gene Duplicates
logical process ontology, we found an enrichment in several To investigate whether acceleration in evolutionary rate
terms related to sensory perception of smell (GO:0007600, affects each of the duplicates differently, we estimated the
GO:0007606, and GO:0007608), in 11.18% of the gene list percentage of asymmetry in evolutionary rate between the
(P < 0.05 using a two-tailed Fishers exact test and adjusting two copies. As our aim was to compare different branches
for multiple comparisons). The cellular component ontology from the same family, we only used families in which all
was enriched in mitochondrial genes (GO:0044429), involving branches had passed the quality filters for dN and dS estima-
8.55% of the gene list (P < 0.05 using a two-tailed Fishers tion (see Materials and Methods). We found that 65.38% and
exact test and adjusting for multiple comparisons), 38.5% 20.41% of the pairs were evolving at a significantly different
1832
Neofunctionalization of Gene Duplicates . doi:10.1093/molbev/mst083 MBE
rate after the duplication (i.e., asymmetrically, P < 0.05 using We investigated whether the duplicate gene acceleration
a Fishers exact test and adjusting for multiple comparisons) observed in our sets was due primarily to the effect of
in the Pre-SD and Post-SD sets, respectively. It should be retrogenes. Analysis of the genomic location of duplicated
noted that the low number of absolute changes in individual genes showed that retrogenes tended to be located on a
genes could lead to an underdetection of significant differ- different chromosome from the parent gene, contrary to
ences within families, especially in the Post-SD set, which is what is observed for nonretrogene copies. Considering the
composed of younger duplicates. In fact, when classifying the Pre-SD and Post-SD sets together, 69% of duplicates located
two rodent duplicates from the Post-SD set into slower and on different chromosomes were retrogenes, compared with
faster evolving, we observed that even in those cases classified just 11% of duplicates located on the same chromosome.
as symmetrical, the dN/dS of the faster evolving copy was Most of the duplicates located on the same chromosome
on average 3-fold higher than that of the slower evolving were found within 300 kb of each other (83%), in agreement
copy, and the difference in the distribution of dN/dS values with the predominance of DNA-based tandem duplication.
between the faster and slower copies pooled together was Analysis of dN/dS values for the two types of duplicated gene
significant (P < 106, Wilcoxons test). A similar trend was indicated that acceleration is not exclusive to retrogenes. In
observed in the Pre-SD set where, even when the pair had the Post-SD set, asymmetry between fast and slow evolving
been classified as symmetrical, the faster copy was evolving on copies remained significant when excluding retrogenes from
average 2.5 times more rapidly, though in this case, differences the comparison (P < 106, Wilcoxons test) and when con-
were not significant because the sample size was reduced to sidering only duplicates located on the same chromosome
just six families. (P < 105, Wilcoxons test). Furthermore, fast evolving copies
The aforementioned observations prompted us to con- of the retrogene containing families were not significantly
sider the faster and slower evolving branches independently. accelerated when compared with the nonretrogene families
This revealed that dN/dS for the slower evolving branch (P = 0.308, Wilcoxons test) nor was there a difference be-
(PostDuplS) was not significantly different from the dN/dS tween fast copies for families located on the same chromo-
of the preduplication branch (PreDupl), in both the Pre-SD some versus those found on different chromosomes
and Post-SD sets (fig. 3A and B, respectively). In contrast, the (P = 0.770, Wilcoxons test).
faster evolving branch (PostDuplF) was on average 4.7 and 2.5
times accelerated with respect to the preduplication branch,
for the Post-SD and Pre-SD sets, respectively (PostDuplF Deceleration of Evolutionary Rates after the Initial
against PostDuplS in figure 3A and B; P < 1012, Wilcoxons Increase
test). In the Pre-SD set, the speed of evolution of the faster In an attempt to better define the observed pattern of evo-
evolving copy decelerated after speciation, as shown by a lutionary rate deceleration, duplicates of the Pre-SD and
significant decrease in dN/dS values in this more recent Post-SD sets were classified into younger and older by sub-
period when compared with the period immediately after tracting the value of dS for the preduplication branch from
the duplication (fig. 3A, PostDuplF against PostSpF; the dS of the postduplication branch (or the mean of both
P < 0.002, Wilcoxons test). postduplication branches when available). For this analysis,
1833
Pegueroles et al. . doi:10.1093/molbev/mst083 MBE
we considered all families having dS > 0.01 in both the vs. PostDupl) and subsequently experiences a similar decrease
preduplication branch and at least one of the two postdu- (P = 0.006 PostDupl vs. PostSp). In the older Pre-SD duplicates,
plication branches, comprising 33 families for the Pre-SD set there are no significant differences between the PreDupl,
and 74 families for the Post-SD set. Positive values, indicating PostDupl, and PostSp branches. The same result is obtained
that the postduplication branch is longer than the predupli- when we consider only the faster evolving copy of each pair
cation branch (dS PostDupl > dS PreDupl, see branches in (supplementary data, Supplementary Material online). Thus,
fig. 1), were classified as older duplicates and negative values older duplicates in the Pre-SD set (~56.5 My mean age)
as younger duplicates (table 1). Assuming that the primate appear to have already returned to their initial preduplication
and rodent lineages split approximately 70 Ma and the dN/dS values before speciation (~16 Ma), which means that
mouse and rat lineages approximately 16 Ma (Douzery in rodents 40.5 My may be sufficient for purifying selection to
et al. 2003), the mid point of the rodent ancestral branch erase signs of duplication driven acceleration. Using the same
is 43 Ma. From this, we can infer that the mean age of the rationale, because we detect significant differences between
older Pre-SD duplicates is approximately 56.5 Ma and of the the pre- and postduplication branches of the Post-SD set, we
younger Pre-SD duplicates approximately 29.5 Ma. Doing may conclude that 12 My of rodent evolution is not sufficient
similar calculations for the Post-SD duplicated gene set, to return to similar values of dN/dS similar to those of before
the mean age of the older Post-SD duplicates is approxi- the duplication event.
mately 12 Ma and of the younger Post-SD duplicates The increase in purifying selection after the initial acceler-
approximately 4 Ma (table 1). ation period was confirmed by comparisons at the intrafamily
When considering the dN/dS values (table 1), we observe level. We calculated normalized dS and dN/dS for older and
that the postduplication branches are evolving significantly younger Pre-SD and Post-SD duplicates as the average
faster than the preduplication branches in both older and postduplication value divided by the sum of the average
younger Post-SD duplicates (P = 0.001 and P < 106, respec- postduplication and preduplication values. Using this for-
tively, Wilcoxons test). For the younger Pre-SD duplicates, the mula, values have a range from 0 to 1; dN/dS values close
rate initially increases by about 23-fold (P < 103 PreDupl to 0.5 indicate that post- and preduplication branches have
1834
Neofunctionalization of Gene Duplicates . doi:10.1093/molbev/mst083 MBE
FIG. 4. Normalized dS and dN/dS values older and younger for older and younger Pre-SD and Post-SD duplicates. Normalized values were calculated as
1835
Pegueroles et al. . doi:10.1093/molbev/mst083 MBE
Table 3. Selected Gene Duplicates Undergoing Positive Selection after Duplication Event.
Set Gene ID Description P*
ENSMUSP00000023655 Myxovirus (influenza virus) resistance 2.26 109
ENSMUSP00000009058 ATP-binding cassette, subfamily B 3.64 106
ENSMUSP00000028986 BPI fold containing family A 6.88 104
ENSMUSP00000130225 Pyroglutamylated RFamide peptide receptor 2.06 104
Pre-SD ENSMUSP00000045229 Fucosyltransferase 2 9.11 103
ENSMUSP00000031693 Sperm adhesion molecule 1 4.12 103
ENSMUSP00000082947 Zinc finger protein 1.86 103
ENSMUSP00000050330 GTPase, IMAP family 1.67 103
ENSMUSP00000038931 Glutathione S-transferase, pi 2 8.20 108
ENSMUSP00000077984 Kinesin family member C5B 4.86 107
ENSMUSP00000040485 Kininogen 1 2.20 107
ENSRNOP00000062817 Type I keratin KA11 3.28 106
Post-SD ENSRNOP00000020952 Olfactory receptor 68 1.46 105
ENSMUSP00000058779 Olfactory receptor 1033 1.00 105
ENSMUSP00000075814 Solute carrier family 22, member 21 1.22 104
ENSMUSP00000036682 WD repeat domain 20b 5.26 103
NOTE.Selection in the Pre-SD occurred in the ancestral rodent gene of the given mouse ID, whereas selection in the Post-SD set occurred in
was the daughter. For the Pre-SD set, nine families (35%) Table 4. Tissue-Expression Divergence of Duplicated Genes.
could be similarly classified, and again, in all but one case,
dT
the postduplication branch corresponding to the daughter
was significantly faster evolving. Of these, three of the families Pre-SD Post-SD
had a postduplication branch that tested significant for Hs-F Hs-S F-S Hs-F Hs-S F-S
positive selection, which was the daughter branch in two Mean 0.79 0.36 0.85 0.60 0.24 0.61
cases. The presence of positively selected sites in a protein Median 0.82 0.23 0.92 0.80 0.00 0.83
suggests that it has acquired a novel or modified function. Standard deviation 0.20 0.31 0.23 0.39 0.32 0.41
Thus, the data support an important role for neofunctiona- NOTE.dT, tissue-expression divergence; Hs, Homo sapiens; F, fast mouse duplicate; S,
lization in the evolution of recent rodent gene duplicates. slow mouse duplicate.
Tissue-Expression Divergence Patterns nor for families located on the same or different chromo-
We investigated tissue-expression divergence (dT, see somes (P = 0.349 and 0.658, respectively, Wilcoxons test).
Materials and Methods) of fast and slow evolving mouse In the five families for which we had expression data and
duplicates in comparison with the nonduplicated ortholo- parentdaughter categorization in the Pre-SD set, we ob-
gous human gene. We employed 23 families from the served that the daughter paralog always showed a restricted
Post-SD set and 12 families from the Pre-SD set for which expression profile, that was a subset of that of the parent
RNAseq-based expression data were available from a recently paralog, and typically was only expressed in one of the six
published mouse and human tissue panel (Brawand et al. tissues examined. Five of 23 genes in the Post-SD set had a
2011). We found that tissue-expression divergence (dT, clear signal of neofunctionalization in the expression data,
see Materials and Methods) between the faster evolving that is, they were expressed in at least one tissue in which
copy and the human ortholog was significantly higher than neither their duplicate copy nor the human ortholog was
tissue-expression divergence between the slower evolving expressed.
copy and the human ortholog (P = 0.002 and P = 0.003
for the Pre-SD and Post-SD sets, respectively, Wilcoxons
test, table 4). Indel Analysis
Tissue divergence remained significant when considering We investigated the distribution of coding sequence indels in
just the 14 families from the Post-SD set for which we had postduplication branches using a previously described proto-
both expression data and classification into parent and col for identifying indels from alignments (Laurie et al. 2012).
daughter, with dT between Homo sapiens (Hs) and the We observed a total of 73 bona fide indels affecting 42 (27%)
parent copy equal to 0.138, and dT between Hs and the postduplication branches in the Pre-SD set and 33 indels
daughter copy equal to 0.610 (P = 0.003, Wilcoxons test). affecting 22 (22%) postduplication branches in the Post-SD
In general, daughter copies were observed to be expressed set. Overall, we observed an excess of deletions relative to
in fewer tissues (2.2 on average) than parent copies (4 on insertions at a ratio of approximately 1.5:1, and although
average). Tissue-divergence values between human and approximately half the occurrences of each were of one
mouse were not significantly different when comparing retro- amino acid in length, there was a slight tendency toward
gene containing families to nonretrogene containing families insertions being longer than deletions (see supplementary
1836
Neofunctionalization of Gene Duplicates . doi:10.1093/molbev/mst083 MBE
data, Supplementary Material online). We found a strong been underestimated in this study because they discarded
tendency in both data sets for indels to be found on the those triplets in which the outgroup was closer to a certain
faster evolving branch following duplication; 35 of 47 cases paralog than the other paralog.
in the Pre-SD set (P < 0.01, binomial test), and 30 of 33 In this study, we have observed that acceleration of evo-
cases in the Post-SD set (P < 105). lutionary rate is only apparent in one of the two duplicates.
We examined whether there was any association between Here, we have estimated the evolutionary rate of the predu-
signs of positive selection and the presence of indels following plication branch by using two outgroup species instead of
duplication. Of the 20 branches in the Pre-SD set that tested one, in contrast to most previous studies. Analysis of the
positive in the branch-site test, 15 had indels, whereas of the genomic location of duplicated copies shows that the
remaining 136 branches, only 27 had indels, indicating a sig- parent copy evolves in a highly constrained manner, most
nificant relationship between positive selection and indels in likely preserving the ancestral function, whereas the daughter
this data set (P < 105, Fishers exact test). A similar result was copy diverges away. This implies that the two copies are not
found in the Post-SD set where 10 of the 15 branches that had equal at birth, not only for copies relocated to a different
evidence of selection contained an indel event, compared chromosome or retrogenes (Cusack and Wolfe 2007) but
with 8 of the remaining 83 branches (P < 105, Fishers also for copies located proximally to each other. These obser-
exact test). vations strongly support the neofunctionalization model in
Discussion this data set. To our knowledge, acceleration being limited to
1837
Pegueroles et al. . doi:10.1093/molbev/mst083 MBE
Previous studies focusing on C. elegans and yeast (Davis which states that function is more conserved between ortho-
and Petrov 2004; Scannell and Wolfe 2008) found that evo- logs than paralogs but that it can change rapidly after gene
lutionarily constrained genes tend to duplicate more duplication, as recently validated using experimental data
frequently than less-constrained genes. However, our set of (Altenhoff et al. 2012). Tissue-divergence patterns seem to
152 recently duplicated gene families was composed of sig- be related to the genomic location of duplicates, because
nificantly accelerated genes when comparing dN/dS of the patterns when comparing fast- and slow evolving copies
nonduplicated rodent to an exhaustive set of 1:1 ortholog were distinct for duplicates located on different chromo-
genes from Toll-Riera et al. (2011) (P = 0.004, Wilcoxons somes but not when considering only retrogenes. It is
test after removing families with dS > 2 and < 0.01). worth noting that significant differences in tissue-divergence
Subsequent examination of gene function showed that the patterns were detected even though it is likely that the true
increased rates were mainly due to 25 families defined as values are being underestimated, firstly due to the limited
receptors, as there were no significant differences when we number of tissues for which data was available, and secondly
excluded them from the comparison (P = 0.071, Wilcoxons because we only tested tissue-related differences.
test). Previous studies have failed to find a correlation between
In this data set, we found that a substantial fraction of tissue-divergence expression and the evolutionary rate of
duplicates exhibited signals of positive selection, suggesting coding sequence (Wagner 2000; Conant and Wagner 2003)
that some of the changes contributing to asymmetry are or found only a weak correlation, for example, only significant
1838
Neofunctionalization of Gene Duplicates . doi:10.1093/molbev/mst083 MBE
duplication event following divergence from the primate Estimation of Evolutionary Rates and Positive
lineage, together with their corresponding human ortholog, Selection Tests
using Biomart at Ensembl (r65) (Flicek et al. 2012). To The number of nonsynonymous substitutions per nonsy-
ensure a high-quality data set, we applied the following nonymous site (dN), synonymous substitutions per synon-
initial filtering criteria to all pairs of duplicates: We dis- ymous site (dS), and the corresponding dN/dS ratio were
carded pairs where the percentage of identity between estimated using the ML method implemented in the
copies was lower than 50%, pairs with overlapping location CodeML program of PAML version 4.4 (Yang 2007).
(which are likely to represent distinct splice forms rather We used the free-ratio branch model, which allows the
than duplicated genes), and genes lacking experimental calculation of a distinct ratio for each branch, and equilib-
evidence of expression. We also removed one family rium codon frequencies as free parameters (Codonfreq = 3).
containing selenocysteines as these are interpreted as To determine the degree of acceleration and the percent-
stop codons by CodeML. Genomic location and the per- age of asymmetry between duplicates, we compared
centage of identity between copies were obtained from the evolutionary rates between the preduplication and
Biomart, and experimental evidence of expression from postduplication branches, and between the two paralogs
the University of CaliforniaSan Cruz (UCSC) Table using Fishers exact test.
Browser (considering both mRNA and EST tracks), using We tested for positive selection with the branch-site test
the mm9 and rn4 databases (Karolchik et al. 2004). (Test 2 implemented in CodeML) (Zhang et al. 2005). This
1839
Pegueroles et al. . doi:10.1093/molbev/mst083 MBE
Duplicate Classification one paralog is expressed in a specific tissue, in which neither
The genomic location of duplicated genes and number of the second paralog nor the single-copy ortholog are
exons in each were obtained from Biomart (Haider et al. expressed.
2009). We classified as possible retrogenes those genes
having a single exon when the corresponding paralog Indel Analysis
and ortholog copies are composed of multiple exons The position and size of all indel events within the coding
(Farre and Alba` 2010). Where possible, genes were classified sequence of the proteins analyzed, as determined from the
into parent and daughter copies by looking for conserved multiple alignments, were identified using an in-house Perl
synteny using the Galaxy server to access the UCSC script as described elsewhere (Laurie et al. 2012). Indels were
Genome Browser (Goecks et al. 2010) under the assump- discarded if they were found to be proximal to an Ensembl-
tion that the parent gene will have a longer extent of defined exon boundary, as such cases are more likely to
conserved synteny than the daughter copy. The Extract represent errors in exon definition within the gene-building
MAF blocks tool was used to obtain multiple alignments algorithm rather than true indel events.
for a given set of genomic intervals, and the resulting pre-
dicted parentdaughter relationships were manually com- GO Analysis
pared with Ensembl r65 to check for retention of gene A GO (Ashburner et al. 2000) enrichment analysis was per-
order. Duplicates closely located on the same chromosome formed for all 152 gene families using the FatiGO program of
1840
Neofunctionalization of Gene Duplicates . doi:10.1093/molbev/mst083 MBE
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: Innan H, Kondrashov F. 2010. The evolution of gene duplications:
a practical and powerful approach to multiple testing. J R Stat Soc classifying and distinguishing between models. Nat Rev Genet. 11:
Ser B. 57:289300. 97108.
Bergthorsson U, Andersson DI, Roth JR. 2007. Ohnos dilemma: Jordan IK, Wolf YI, Koonin EV. 2004. Duplicated genes evolve slower
evolution of new genes under continuous selection. Proc Natl than singletons despite the initial rate increase. BMC Evol Biol. 4:22.
Acad Sci U S A. 104:1700417009. Jun J, Ryvkin P, Hemphill E, Nelson C. 2009. Duplication mechanism and
Blanc G, Wolfe KH. 2004. Functional divergence of duplicated genes disruptions in flanking regions determine the fate of Mammalian
formed by polyploidy during Arabidopsis evolution. Plant Cell 16: gene duplicates. J Comput Biol. 16:12531266.
16791691. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D,
Brawand D, Soumillon M, Necsulea A, et al. (18 co-authors). 2011. The Kent WJ. 2004. The UCSC Table Browser data retrieval tool. Nucleic
evolution of gene expression levels in mammalian organs. Nature Acids Res. 32:D493D496.
478:343348. Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C,
Casola C, Conant GC, Hahn MW. 2012. Very low rate of gene conversion Hammond M, Rocca-Serra P, Cox T, Birney E. 2004. EnsMart: a
in the yeast genome. Mol Biol Evol. 29:38173826. generic system for fast and flexible access to biological data.
Codoner FM, Alfonso-Loeches S, Fares MA. 2010. Mutational dynamics Genome Res. 14:160169.
of murine angiogenin duplicates. BMC Evol Biol. 10:310. Kellis M, Birren BW, Lander ES. 2004. Proof and evolutionary analysis of
Conant GC, Wagner A. 2003. Asymmetric sequence divergence of ancient genome duplication in the yeast Saccharomyces cerevisiae.
duplicate genes. Genome Res. 13:20522058. Nature 428:617624.
Cusack BP, Wolfe KH. 2007. Not born equal: increased rate asymmetry Kimura M. 1983. The neutral theory of molecular evolution. Cambridge
in relocated and retrotransposed rodent gene duplicates. Mol Biol (United Kingdom): Cambridge University Press.
1841
Pegueroles et al. . doi:10.1093/molbev/mst083 MBE
Shiu S-H, Byrnes JK, Pan R, Zhang P, Li W-H. 2006. Role of positive neutralist-selectionist debate. Proc Natl Acad Sci U S A. 97:
selection in the retention of duplicate genes in mammalian 65796584.
genomes. Proc Natl Acad Sci U S A. 103:22322236. Walsh JB. 1995. How often do duplicated genes evolve new functions?
Suyama M, Torrents D, Bork P. 2006. PAL2NAL: robust conversion Genetics 139:421428.
of protein sequence alignments into the corresponding codon Waterston RH, Lindblad-Toh K, Birney E, et al. (223 co-authors). 2002.
alignments. Nucleic Acids Res. 34:W609W612. Initial sequencing and comparative analysis of the mouse genome.
Teshima KM, Innan H. 2004. The effect of gene conversion on the Nature 420:520562.
divergence between duplicated genes. Genetics 166:15531560. Wickham H. 2009. ggplot2: elegant graphics for data analysis. New York:
Tian D, Wang Q, Zhang P, Araki H, Yang S, Kreitman M, Nagylaki T, Springer.
Hudson R, Bergelson J, Chen JQ. 2008. Single-nucleotide mutation Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood.
rate increases close to insertions/deletions in eukaryotes. Nature 455: Mol Biol Evol. 24:15861591.
105108. Yang Z, dos Reis M. 2011. Statistical properties of the branch-site test of
Toll-Riera M, Laurie S, Alba` MM. 2011. Lineage-specific variation positive selection. Mol Biol Evol. 28:12171228.
in intensity of natural selection in mammals. Mol Biol Evol. 28: Zemojtel T, Duchniewicz M, Zhang Z, Paluch T, Luz H, Penzkofer T,
383398. Scheele JS, Zwartkruis FJT. 2010. Retrotransposition and mutation
Van de Peer Y, Taylor J, Braasch I, Meyer A. 2001. The ghost of selection events yield Rap1 GTPases with differential signalling capacity.
past: rates of evolution and functional divergence of anciently du- BMC Evol Biol. 10:55.
plicated genes. J Mol Evol. 53:436446. Zhang J, Nielsen R, Yang Z. 2005. Evaluation of an improved branch-site
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. 2009. likelihood method for detecting positive selection at the molecular
EnsemblCompara GeneTrees: complete, duplication-aware phyloge- level. Mol Biol Evol. 22:24722479.
netic trees in vertebrates. Genome Res. 19:327335. Zhang P, Gu Z, Li W-H. 2003. Different evolutionary patterns
1842