Professional Documents
Culture Documents
Languages
Here is a remarkable fact about humans: we speak approx- In this Review I shall first describe how a new and
Linguists identify two imately 7,000 mutually unintelligible languages around the expanding field of phylogenetic and comparative studies
languages as distinct when, world1. This means that a person plucked from one cor- of language evolution has made use of concepts, data and
according to various criteria, ner of the Earth is not able to communicate with another statistical modelling approaches that draw inspiration
they become mutually
human in a different corner of the Earth, or often from from genetics to exploit the genetic-like properties of
unintelligible.
next door. Apart from a few songbird species, and pos- language. I shall then move on to describe recent work
sibly some whales that learn their songs locally and show in four areas of language evolution in which statistical
dialectical differences, this is unique among animals. For modelling approaches have begun to return results.
example, a chimpanzee or an elephant removed from its These include the reconstruction of language phyloge-
range and placed among any other chimpanzees or ele- nies and their relationship to genetic trees; investigations
phants will know what to do and how to communicate. of the rate, tempo and time-depth of language evolu-
Large as the number of extant human languages is, tion; social influences on language; and studies of the
it has probably reduced from a maximum of perhaps structure of language.
12,000 to 20,000 different languages before the spread of My coverage of these topics will be selective, but is
agriculture2, and it pales in comparison with the possibly designed to give a flavour of what language evolution
hundreds of thousands of different languages humans is like and of what is possible. I will not discuss the tricky
have ever spoken2. Elsewhere I have pondered the ques- and very large literatures on language origins or how we
tion of why humans would evolve a system of commu- acquire it, whether our language skills are innate, or pos-
nication that prevents them from communicating with sible genetic influences on language abilities. Instead, I
other members of its species, and have suggested that will treat language as evolving against what I will regard
human societies come to behave in ways that are not so for sake of discussion as a more or less homogeneous
different from that of biological species3. Whether or not genetic background in its human hosts.
School of Biological Sciences,
that explanation is correct, the human tendency to sepa-
University of Reading,
Reading, Berkshire RG6 6AH, rate into distinct societies has given human language a Descent with modification
UK; and Santa Fe Institute, geographical mosaic on which to play out its evolution. One of the best-known theories for the diversity of
1399 Hyde Park Road, My interest here is to use the phenomenon of language human languages is a creation myth. According to the
Santa Fe, New Mexico, USA. diversity to understand the evolution of what turns out to bible story of the Tower of Babel, humans developed
e-mail:
m.pagel@reading.ac.uk
be a remarkable culturally transmitted replicator, one with the conceit that they could construct a tower that would
doi:10.1038/nrg2560 many of the properties we have come to expect of genes, take them all the way to heaven. Angered at the attempt
Published online 7 May 2009 but also with many of its own. to usurp his control, God destroyed the tower. To ensure
that it could not be rebuilt, God confused the workers by learners (language), and historical signals would quickly
giving them different languages, leading to the irony that be lost. Other features of language evolution that might
language exists to stop us from communicating. be thought to vitiate its historical signature have analo-
Delightful as the Babel story is, ideas taken from gies in genetic systems. For example, languages can
the theory of evolution give us the conceit that we can acquire new unrelated words by borrowing, and genes
improve on it. Darwin4 asserted that languages, like can arrive from bouts of lateral transfer. These influ-
biological species, evolve by a process of descent with ences often occur at lower rates in genetic systems, but
modification. If correct, we can expect human lan- do not represent a qualitative difference between genes
guages to form into family trees, known as phylogenies, and language.
which chart the history of their evolution in a manner
analogous to that for biological species. It also means Data and statistical modelling
that the diversity of extant languages reflects the actions The starting point for most comparative statistical investi-
of various shared historical evolutionary processes, gations of language evolution is a set of discrete characters
Phylogeny including features of the rate and tempo of linguistic that can be scored in each of the languages. These might
A branching diagram evolution, timings and correlations, as well as the start- include features of the syntax or structure of a language,
describing the set of ancestral–
ing points or ancestral languages. This raises the pos- other aspects of grammar, phonemes and, most obviously,
descendant relationships
among a group of species or sibility that, far from settling for each language being lexical items or words. I shall confine my remarks here to
languages. a distinct object of creation, we can use the combina- the lexicon, although most of what I have to say applies
tion of phylogenetic trees of language along with sta- to these other classes of discrete traits. Owing to pioneer-
Borrowing tistical models of how languages evolve to detect and ing work by Morris swadesh5 in the 1950s a common
The acquisition of a new
non-cognate word from
characterize the signature of these historical processes. list of 200 words known as the fundamental vocabulary
another language. In effect, we wish to discover what the past must have is available for a large number of the world’s languages
been like and how it evolved given what we now see. (see the further information box for a link to an example
Phoneme TABLE 1 records analogies between the ways that of a swadesh list). This type of list contains words for
Characteristically thought
genes and languages evolve, giving hope that the use of things that are expected to be found in all languages, such
of as the smallest units of
speech-sounds that are phylogenetic methods will succeed. Key among these as names for body parts, pronouns, common verbs and
distinguished by the speakers analogous features is that both systems of replicators numerals, but excludes technological words and words
of a particular language. are digital, comprising discrete heritable units: the four related to specific ecologies or habitats. It can be thought
Phonemes are not universal, nucleotides in the case of genes, and words in the case of as like a list of universal genes. Other lists are possible,
but act as the fundamental
building blocks to produce all
of language. Without this property neither system would but swadesh’s has simply proven to be well chosen and
of the words of a given retain fidelity through repeated bouts of transmission widely available. His words tend to evolve slowly and are
language. from parents to offspring (genes) or from teachers to largely resistant to outside influences and borrowing 6.
Box 1 | A linguistic alignment and a statistical model of evolution Given a list of m meanings (such as hand, tree, I,
walk, run) in n languages, a data set (M) can be written
Matrix of cognates as a matrix, analogous to an alignment of gene sequences
Whereas a gene sequence alignment identifies homologous sites in genes, a (BOX 1), but recording sets of cognate words. Linguists,
lexical ‘alignment’ identifies sets of cognate words, or words that descend from using careful rules of sound correspondences within
a common ancestral word. Let a matrix of these lexical alignments be denoted M
language families, can assign words from different lan-
(see also REFs. 7,9,19) to signify that it is a matrix of meanings (for example, hand,
who and ear), and write M as: guages into classes denoting words that derive from
a common ancestral word, analogous to identifying
OGCPKPIU
O
homologous genes. The word ‘two’ in English and dos in
spanish are cognate. The French fleur and Dutch blumen
NCPIWCIG are not. Cognacy data, by recording evolved similarities
NCPIWCIG and differences among languages, can be used to infer
/ linguistic phylogenetic trees or to study features of lexi-
cal evolution itself 7. Phylogenies are of interest in their
NCPIWCIGP M own right as descriptions of historical relationships, and
they form the backbone of comparative studies that seek
The columns of numbers designate cognate classes, or words for a given
meaning that have been identified as deriving from a common ancestral word to understand the evolution of linguistic traits, and how
(see text). The first column of M denotes a meaning for which four distinct other cultural traits have evolved and co-evolved7–9. For
cognate classes of words exist (0, 1, 2 and 3), the second column shows a meaning these wider cross-cultural studies, M can be broadened
represented by two cognate classes, the third has k + 1 cognate classes, and the to include the cultural data.
last column shows a meaning with a single cognate class — that is, all of the words statistical approaches apply models of evolution to
for that particular meaning among the n languages derive from a common characterize the probability of observing the data in M
ancestral word. This matrix is the analogue to an aligned set of gene sequences, under various evolutionary scenarios10,11. A model that
although all gene sequences have the same four states (twenty states if considering will be familiar to geneticists is the finite-state continu-
amino acids). ous time Markov transition model (BOX 1). Often desig-
A statistical model nated Q, it was introduced into studies of phylogenetic
The data in M can be used to infer phylogenetic trees of languages, or perhaps to inference from gene sequences by Felsenstein12 along
investigate some feature of lexical replacement or the change from one cognate with the sum over histories logic. Applied to lexical data,
class to another. A statistical model that is widely used in phylogenetic inference
this model estimates the instantaneous rates at which the
from gene sequences is written as:
words of one cognate class evolve into another unrelated
M set of words7. Other statistical approaches to analysing M
Ũ S01 S0k include a ‘stochastic Dollo’ model13 that allows each new
S10 Ũ S1k cognate class to arise only once on a tree of languages.
The name is a conscious nod to the Belgian palaeon-
3
tologist Louis Dollo (1857–1931), who suggested that
identical complex forms do not arise more than once in
M Sk0 Sk1 Ũ
nature. A different stochastic treatment of M (described
in REF. 14) allows for borrowing while estimating the
The matrix Q is the central element of the finite-state continuous time Markov
underlying tree.
transition model (see text). Each qij term in this matrix describes the instantaneous
Parsimony or distance -based methods can be used
rate of change from state i to state j over the short interval dt. In gene sequence
data the states are the bases A, C, G or T, and Q is always a 4 × 4 matrix. If protein instead of statistical approaches. However, statistical
sequences are used, the states are amino acids and Q becomes a 20 × 20 matrix. approaches, unlike the other methods, allow one to
Using lexical data, the states are the cognate classes, and a different Q needs to estimate directly parameters of the models of evolution
be estimated for meanings with different numbers of cognate classes. For (such as the qij transition rates in Q, see BOX 1) and to test
phylogenetic inference with lexical data it is convenient to rewrite M such that among different models for the same data15. Whether
all of the columns have the same number of cognate classes and thus a common inferring trees or studying the evolution of traits on trees,
Q can be estimated for the entire matrix, as is common for gene sequences. the common currency for testing models is a quantity
This is achieved by converting M to a binary form such that the k + 1 cognate known as the likelihood or L, defined as an amount pro-
classes for each meaning are written as k + 1 binary vectors, each one of which
portional to the probability of the data given the model16.
identifies a different cognate class as ‘1’ with the remainder designated ‘0’.
Then, Q becomes a 2 × 2 matrix estimating a common rate at which new cognate
It is conventionally written as L ∝ P (M | Q,T), where
classes occur. M and Q are as defined here, and T refers to the phy-
The elements of the Q matrix are presumed to apply equally well to each site in a logenetic tree on which the data in M are presumed to
gene or protein sequence or, in the case of lexical data, to different words. To have evolved. Likelihood methods regard the observed
accommodate variation in the rates of evolution among sites or among words, the data as a fixed observation. This makes them particularly
well-known gamma rate variability correction can be applied40. This correction suited to historical inference problems such as those in
amounts to multiplying each of the qij in Q by carefully chosen constants that are linguistics, in which the observed data arise only once.
either less than or greater than one to achieve an overall slower or faster rate of Thus, the likelihood does not describe the probability
evolution. that the events under study happened (they did) or that
Outside the linguistic context M could contain any set of cross-cultural or other
the model is true; it merely describes the ‘fit’ between the
comparative data, and Q could then be used to estimate their evolutionary
transitions (for example, REFs 8,9).
observed data and an inferred tree, or model (such as Q)
of how a trait evolved on a tree.
The likelihood can be found by maximum likelihood for the Bantu and Austronesian trees20,22, if not for the
methods — in this case many different trees or models Papuan languages24. Techniques designed to reveal con-
are tried and the one that gives the largest value of L flicting phylogenetic signals (for example, splitsTree
is preferred. Alternatively, Markov chain Monte Carlo and neighbour-net analyses)28, such as would arise
(MCMC) methods17 are increasingly being employed to from borrowing, typically reveal a healthy pattern of
infer models and trees9,18. Rather than seeking a single tree-like data29.
‘best’ solution, MCMC methods attempt to derive a dis- The Indo-European and Bantu trees reflect popula-
Cognate tribution of outcomes consistent with the data, called tion expansions or radiations into new areas, riding on
Two words are deemed cognate
the posterior distribution. Posterior distributions can the back of agriculture21,23,30, whereas the Austronesian
if they derive by a process of
descent with modification from be formed for trees, for their likelihoods and for the tree records an expansion that may have been propelled
a common ancestral word. parameters of the model of evolution. Their attraction in fits and starts linked to developments in sea-going
is in providing a measure of the uncertainty in the esti- boat technologies 20,25. These population processes
Sum over histories mates of these various components. The posterior dis- might contribute to the elegance of the three phylog-
A mathematical technique
that accounts for all possible
tribution of trees also provides the logical background enies by reducing the opportunities for borrowing of
ancestral states (that is, all against which to estimate models of evolution for other lexical items among the speakers of differing languages.
possible histories) when finding traits, this being a tidy way to account for the effects of Trees must always be carefully checked for borrowing,
the likelihood of observing the uncertainty about the past. but unless it regularly occurs among distantly related
gene sequence or other data
languages, the broad structure of language phylogenies
among extant species.
Language trees and gene trees should be relatively unaffected31. Owing to a battle lost
Parsimony Features of language trees. An early attempt to apply at Hastings, England, in 1066, English was bombarded
When applied to phylogenetic a likelihood sum over histories approach to languages by words of Romance origin and now approximately
inference in a linguistic context, made use of 7 Indo-European languages, 18 meanings and 50% of its vocabulary derives from such stock. still,
parsimony is a method that
seeks the phylogenetic tree
the finite-state Markov transition model Q described despite its history, English correctly appears among the
that implies the fewest number above 19. The analysis yielded a phylogeny with the Germanic languages in the I-E tree (FIG. 1), although
of changes among cognate expected monophyletic groupings of Romance languages linguists often place it closer to Frisian than the basal
classes. (spanish, French and Romanian) and Germanic lan- position it occupies in its portion of the Germanic
guages (German, Dutch and English), and Welsh as an clade.
Distance
As applied to phylogenetic out-group. In the same year a tree of 77 Austronesian
inference in a linguistic context, languages appeared, which was derived from parsi- Comparison of gene trees and language trees. If lan-
distance is a set of methods mony methods20. Holden21, also using parsimony and guages are not the ‘closed shop’ to outside influences
that infer an underlying a 100-word swadesh list, inferred a tree of 93 Bantu that we have come to expect of eukaryotic organisms
phylogenetic tree from a matrix
languages. with sequestered germ lines, the strength of descent with
of the pair-wise differences
among all languages. Later analyses of the Bantu data with likelihood modification in language trees shows that the cultural
models returned more or less the same tree22. Gray and processes of language teaching and learning that trans-
Likelihood Atkinson23 applied the Markov transition model in a mit language from one generation to the next can have a
A statistical quantity defined as
MCMC context to analyse 87 Indo-European languages surprisingly high fidelity and can show resistance to out-
an amount that is proportional
to the probability of observing using the entire swadesh 200-word list, estimating an side effects. Although genes may only be replicated once
some set of data given a ancestral age for Indo-European languages of between or a few times between generations, vocabulary items are
particular model of how those 7,800 and 9,800 years. Trees of Papuan languages have replicated by producing a sound that is copied by a lis-
data arose. In linguistic been inferred from both typological and lexical fea- tener and then produced anew in a cyclical process that
phylogenetic applications one
finds the likelihood of the
tures of language24. Recently, Gray and colleagues have may occur many tens of thousands of times (or more)
lexical data on the proposed expanded their Austronesian sample to include over 400 per word per speaker. The opportunities for mutation
tree given some model of how languages, inferring the tree using MCMC approaches and corruption of this signal, not to mention for innova-
words evolve. and the Markov transition model25. Their tree sup- tion and borrowing, are great and yet these simple lists of
ports a scenario for the origin of this group in Formosa, words can reconstruct the cultural history of groups
Maximum likelihood
method beginning approximately 6,000 years ago. of speakers spanning thousands of years.
A statistical technique for My interest here is less in the trees per se than in their Trees derived from language bear a range of relation-
finding the parameters of a characteristics. FIGURE 1 shows a consensus tree derived ships to gene trees for the same population, and this is
model that make the observed from the Bayesian posterior distribution of trees for as we should expect. Cavalli-sforza32 demonstrated in
data most likely or probable
under that model.
87 Indo-European languages23,26. The tree recovers the the late 1980s that the major genetic groupings of peo-
expected clades of Romance, Germanic, slavic, Indo- ple around the world conform, with few exceptions, to
Markov chain Monte Carlo Iranian and Celtic languages, and suggests their deeper their language groupings. This reveals, unsurprisingly,
(MCMC). A statistical method relationships. But what is remarkable about this tree is that people divided by large geographical distances drift
for searching a complex
how tree-like it is, given all of the ways that a linguistic apart genetically and linguistically. More fine-grained
high-dimensional space. As
applied to phylogenetic signal can be corrupted — most obviously by borrow- analyses reveal a different picture. sometimes language
inference in a linguistic context, ing. The numbers near to the nodes of this tree record groups conform closely to genetic groups even on a small
MCMC methods return a the posterior support for that node, defined as the pro- geographical scale33 and other times they do not 34. This
sample of trees that are portion of trees in the posterior distribution in which does not invalidate one kind of tree or elevate another. It
statistically representative of
the trees that might arise from
that node was found. These posterior support values tells us that some trees are good for tracking the move-
a given model of how words rival those found for many gene trees of a similar size27. ments of genes, and others for tracking the movement
evolve. Comparable degrees of posterior support are reported of cultures.
Lithuanian ST
O
Albania
Albanian G
Albanian
Lithuanian
nU
Alban
ni a n
Latvian
ian L
Alba
Lusatia
Gre
n
n Top
ed o
atia
Lusat
ian K
n
ni a n
ek M
Gre
aria
T
cro
Mac
Gr
ek
Bulg
ian
C
bo
od
ee
MD
ven
Ser
Gr
kM
Slo
ee
k
Ar
va
L
Ar Gr
kD
m
Slo
h
m ee
en
ec
en kK
E
ia
Cz
ian
h
n
ec
To n
Lis
M sia
Cz
ch od
t
ari s
To a nA Ru sh
ch li
ari
an Po nia
n
B rai n
Hit Uk ssia
tite eloru
By
etic
Oss
100
100
chi
Balu
100
Frenc
h Cre an
100
ole D Afgh
56
100
French i
Creole
C Wazir
48 100
Wakhi
90
Walloon
4 9
French Tadzik
100
Persian List
Provencal
0.01 100 Singhalese
Portuguese ST
78 Kashmiri
Brazilian
h 100 Gypsy G
k
Spanis
Nepa
48
la n li List
Cata Khas
100
in kura
Lad Ben
gali
100
ian
Ital
100
Gu
100
N jara
ian Ma ti
rdin ra
S a nC Hin thi
rd inia L di
Sa an Pa
d ini h nja
Sa
r ac La
hn bi ST
Vl
da
t
Lis
Iri
ian
ist
sh
hL
an
Iris
B
ns
m
tc
hA
aa
W
Ru
sh
Du
rik
els
We
mi
ian
Af
h
Fle
Bre
h
lsh
N
Fris
utc
Bret
ST
ton
Bret
C
aki
Danis
n. D
ST
man
Riksma
on S
VL
Faroese
Swedish Up
Icelandic ST
Takit
Swedish List
List
on S
English
Pen
Swedish
Ger
T
E
l
Figure 1 | tree of indo-european languages. Consensus tree of 87 Indo-European languages derived from the Swadesh
Nature
list of 200 words5,23,26. Inner coloured ring identifies major clades as shown in the legend. The tree is Reviews
rooted | Genetics
using ancient
Hittite and Tocharian languages23,26. Branch lengths measure the expected number of lexical replacements (word changes)
between two points on the tree. Numbers along branches are the Bayesian posterior probabilities of selected deep nodes
of the tree, showing that words can resolve old relationships. Many of those nodes not labelled have high posterior
support, although some are low and suffer from conflicting signals26. The outer colours identify a language’s sentence
word order in terms of subject (S) verb (V) and object (O) (red bars), and whether it employs pre or postpositional
modification of sentence objects (blue bars) (see text, data from REF. 69). Red, SVO or VSO; light red, SOV; blue,
prepositional; light blue, postpositional. Celtic languages are VSO; Greek, German, Dutch, Byelorussian and others are
sometimes classified as no dominant word order (NDO) (here coded red). Blue–red pairs and light blue–light red pairs
conform to Greenberg’s58 prediction (see text and FIG. 5)
a 20 b 140
18
120
16
14 100
12
80
Count
Count
10
60
8
6 40
4
20
2
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0 0 0 0 0 0 0 0
,00 ,00 ,00 ,00 ,00 ,00 ,00 ,00
10 20 30 40 50 60 70 80
Mean rate of lexical replacement Word half-life estimate
Figure 2 | Rates of lexical replacement. a | Histogram of mean rates of lexical replacement in Indo-European
Nature Reviews | Genetics
languages for the 200 words in the Swadesh list, measured in units of numbers of new cognates per 1,000 years of
evolution. Fastest to slowest rate represents over 100-fold difference. Values were found as the mean of the
posterior distribution of the elements of Q matrices (see text) integrated over a Bayesian posterior distribution of
trees26. Mean = 0.3 ± 0.18 new cognates per 1,000 years, median = 0.27, range = 0.009 to 0.93. b | Histogram of the
Indo-European languages word half-life estimates as derived from the rates of lexical replacement, measuring the expected amount of time
A family of related languages before a word has a 50% chance of being replaced by a new non-cognate word. A half-life of >70,000 years is
that derive from a common indicative of a transition rate that is compatible with observing a single cognate class (that is, no changes) over
ancestral language that the entire ~130,000 ‘language years’ of the Indo-European tree26 (calculated as the total obtained by adding the
probably arose in Anatolia number of years of evolution represented by the sum of the branches of the tree in FIG. 1). Existence of at least five
around 8,000 years ago and such classes in Indo-European lexicon lends support to this estimate. Mean = 5,300 years, median = 2,500,
then spread throughout
range = 750 to 76,000.
Europe, India, and what is now
Afghanistan, Pakistan and Iran.
Monophyletic
In a phylogenetic context, a
To understand why this is true, consider a thought Rates of evolution and time depth
group of species (or languages) experiment in which human genes flow among popu- Differing rates of word evolution. What English speak-
is monophyletic if they derive lations or even around the world largely invisible to ers call a bird, the Italians call ucello, the French oiseau,
from a common ancestor not the human phenotypes they inhabit (although there the spanish pajaro, the Germans vogel, the Greeks pouli,
shared with any other species
are some hints of genes and languages co-evolving 35). and Caesar would have said avis. There are approxi-
(or languages). The Germanic
languages are monophyletic Culture can, in principle, rest easily above this flow, as mately 15 different cognate classes for ‘bird’ among the
and are distinct from the migrants adopt the local traditions, such that cultural 90 or so Indo-European languages. By comparison, all
monophyletic group of variants and changes are independent of genetic changes. Indo-European language speakers use a related form of
Romance languages. A situation similar to this has recently been reported for the word ‘two’ (dos, deux, due, zwei; the Latin is duo) to
Monophyly implies that the
group has just one origin.
some Melanesian islanders34. Accordingly, phylogenetic describe two objects. just as some sites in a gene sequence
trees derived from language may be preferable to gene alignment evolve slowly and others rapidly, words in the
Bantu languages trees in cross-cultural studies whenever the variables Indo-European languages show ~100-fold variation in
A group of approximately 500 of interest are culturally transmitted8. These studies their rates of lexical replacement or in the acquisition of a
languages that is part of the
must separate the influence of common ancestry on a new non-cognate form7,26 (FIG. 2a). These rates were found
larger Niger-Congo language
family. Bantu languages trait’s representation among cultures from independent from estimating the qij in Q separately for each word in
probably arose 3,000 years instances of the acquisition or evolution of that trait8. For the swadesh list, integrating over a Bayesian posterior
ago in West Africa, possibly cultural data, such as bride wealth and dowry, matriliny, sample of Indo-European trees. slowly evolving words
close to present day patriliny, modes of subsistence and even sex ratio36–38, include ‘two’, ‘three’, ‘I’, ‘five’ and ‘who’, each of which has
Cameroon, and then spread
east and then south eventually
the cultural phylogenetic tree provides the description just a single cognate class among the Indo-European lan-
reaching to present day south of common ancestry that makes this separation pos- guages. By comparison, words such as ‘bird’, ‘tail’, ‘sand’,
Africa. sible. Linguists and anthropologists need not suffer and ‘belly’ evolve more rapidly, with the word ‘dirty’
from gene envy when it comes to building and using having, at 46, the largest number of cognate classes.
Clade
phylogenies. The rates can be expressed as word half-lives 7,19,26
In the context of languages,
a clade is a group of related In the next three sections I move away from inferring (FIG. 2b) corresponding to the expected amount of time
languages. and interpreting language trees to discuss examples of before a word has a 50% chance of being replaced by a
how they have been used as the backbones of investiga- new non-cognate word. The median half-life is 2,000–
Lexical replacement tions into how features of language evolve, including 2,500 years. This may seem fast, especially when com-
The rate of lexical replacement
is the rate at which a word is
rates of word evolution and the structure of languages, pared to genes, but it is slower than the average rate at
replaced by a new non-cognate and to investigate social influences on the rates of which new languages appear (which is approximately
word. lexical evolution. every 500–1,000 years for Indo-European languages).
Linguistic universals Social effects and bursts of linguistic change area3,51 (FIG. 4). Human cultural–linguistic groups seem
A set of features of language Are there external forces that affect linguistic change to partition the landscape in a manner similar to species,
and relationships among those independently of the ways we use language in every- and perhaps for similar reasons: where in the southerly
features that the great day speech? Here I briefly discuss one way in which regions the landscape is richer and more ecologically
comparative linguist Joseph
Greenberg proposed would be
languages, by acting as markers of social identity, may diverse, a greater variety of species seems able to coexist.
found in all or nearly all influence the rate of linguistic evolution. The puzzle is that humans are all the same species, and
languages, or which would at Languages are not evenly distributed geographically. so their higher densities in tropical regions may suggest
least show statistical evidence Cultural groups are more densely packed in coastal than a tendency for cultural groups to fission whenever the
for being linked.
inland regions49,50. similarly, the density or number of environment will support it 3,51.
Word order different indigenous languages spoken in a given area Gene flow is often reduced across linguistic bound-
The typical order of subjects, of north America before European contact sharply aries52, and anthropologists speculate that language
verbs and objects in a increases in the more southerly regions of that conti- may be used to advertise affinity to particular social
sentence. nent, and is startling in its similarity to a plot of the groups53,54. The eighteenth century American educa-
density of different biological species from the same tor noah Webster put this view trenchantly at the time
of American independence from Britain saying that
“as an independent nation, our honor [sic] requires
a 35 200
us to have a system of our own, in language as well as
government”55,56. Phylogenetic trees of languages for
30 Austronesian, Bantu and Indo-European languages all
150 suggest that Webster was stating a general phenome-
Number of languages
25
Number of species
non56. Extant languages with a rich history of language
20 splitting events, such as that between the speakers of
100 American and British English, have diverged more from
15 their ancestral languages than extant languages with
10
fewer splitting events in their pasts. A similar pattern
50 is observed for genetic evolution among biological spe-
5 cies57. Humans seem to adjust languages at crucial times
of cultural evolution, such as during the emergence of
0 0
0 10 20 30 40 50 60 70 80 90
new and rival groups. Maybe there is some truth to the
Babel myth after all.
Latitude (°N)
3
or nearly all known languages58,59. Languages can be
0.3
classified according to structural properties of syntax,
2 grammar and other features. Associations among these
0.2
features reveal the internal structure of language and
0.1
1 what combinations are possible. I give only the briefest
treatment of this very large area, which is ripe for
0 0 quantitative approaches60,61, confining my remarks to
0 10 20 30 40 50 60 70 80 90 showing how language phylogenies are fundamental
Latitude (°N) to understanding how language structures evolve.
One structural feature of language is the word order
Figure 4 | Relationships between language
Nature Reviewsand
| Genetics
species distribution. North American human of its sentences. Of the six possible orderings of sub-
language–cultural groups (before European contact) jects (s), verbs (v) and objects (O) in a sentence, two
and mammal species are distributed similarly across — svO and sOv — dominate the world’s languages,
degrees of latitude. a | Numbers of languages and two others — vsO and vOs — account for ~10% of
numbers of mammal species at each degree of north languages, and the remaining two — Osv and Ovs
latitude in North America. The trends reflect the shape — are rare58,62. From analyses of the relative frequen-
of the continent, being narrow in the south regions cies of these differing orders among the world’s major
and growing wider at higher latitudes. Both trends language families, it has been suggested that the ances-
peak at approximately 40°N, where North America is tral human language was sOv62 (M. Gell-Mann and
~3,000 miles wide. b | Densities of languages and
M. Ruhlen, personal communication). One of
mammal species, calculated as the number of each
found at the specific latitude divided by the area of the Greenberg’s best-known universals was his pro-
continent for a 1° latitudinal slice at each latitude. posal that vsO and svO languages use prepositional
Figure and data is reproduced, with permission, from phrases to modify sentence objects, whereas sOv lan-
Nature REF. 3 (2004) Macmillan Publishers Ltd. All guages tend to use postpositional phrasing. Counts of
rights reserved. languages support Greenberg’s proposal63.
manner analogous to automated gene sequence align- hundreds of thousands of observations on language and
ment. Greenberg’s58 proposals for linguistic universals these databases need to be developed in a similar way to
describe dozens of associations among pairs of structural GenBank and other genetic databases.
features, and these are suitable candidates for phyloge- For geneticists, or for anyone interested in molec-
netic testing. At the level of lexicon, rather than structure, ular evolution, the parallels between linguistic and
little is known about whether some words tend to change genetic evolution should be striking, and all the more
together, and whether these potential co-evolutionary so because language is a cultural rather than a physical
linkages affect how these words evolve. replicator, without built-in error correction mechanisms
swadesh’s original vocabulary list comprises words and potentially subject to far greater effects of borrow-
that are used at a higher than average frequency, and ing and other influences that could corrupt its signal.
so could be expanded to include less frequently used Like genomes, the languages we observe today are the
and consequently more rapidly evolving words. In a survivors of a long process of being tried out and tested
similar vein, although historically much emphasis has by their speakers. Like genomes, we can speculate that
been placed on how words come to be replaced by new we have retained those languages that adapted best to
non-cognate words, there is room to study how words our minds73, and this may be the most obvious reason
come to acquire new meanings, or how words gradu- why we find them easy to learn and use.
ally change their sounds while retaining their meanings For a language system to survive it must adapt as a
and while remaining cognate. An important aspect of coherent whole, and this governs the likely combina-
this process, in turn, relates the ways that languages are tions of language elements, be they words, grammar,
learned and transmitted within communities to the rates syntax or morphology. These functional restraints on
at which existing words (or other features of language) languages coupled with the observation of high fidel-
change or new words emerge and replace old ones69. ity in the transmission of linguistic elements means that
This is analogous to attempts in evolutionary biology to there are far fewer languages and less linguistic diversity
describe how within-population processes give rise than might otherwise be possible. Are some of these lan-
to differences between populations or species70. guages somehow better than others or somehow better
Language trees provide the logical backbone on which suited to their own speakers, or do existing languages
to test these and many other anthropological questions. represent alternative and equally functional outcomes
There is no doing comparative linguistics or compara- of the linguistic evolutionary process? It is the many dif-
tive anthropology without them, and new linguistic or ferences between what we see and what is possible that
anthropological research programmes should routinely reveal the ways that languages adapt. How they do it and
make their construction a priority. Already projects why is an area that holds great promise for furthering
such as the World Atlas of Linguistic structures71 or the our understanding of this uniquely human trait as the
Austronesian Basic vocabulary Database72 document complex and adaptively evolving system that it is74.
1. Gordon, R. G. Ethnologue: Languages of the World 13. Nicholls, G. K. & Gray, R. D. in Phylogenetic 23. Gray, R. D. & Atkinson, Q. D. Language-tree
15th edn (SIL International, Dallas, 2005). Methods and the Prehistory of Languages divergence times support the Anatolian theory of
2. Pagel, M. in The Evolutionary Emergence of Language (eds Forster, P. & Renfrew, C.) 161–171 (McDonald Indo-European origin. Nature 426, 435–439 (2003).
(eds Knight, C., Studdert-Kennedy, M. & Hurford, J.) Institute for Archaeological Research, Cambridge, This study used language phylogeny to test a
391–416 (Cambridge Univ. Press, Cambridge 2000). 2006). historical hypothesis for the timing of the origin of
An overview of linguistic diversity and how it can be 14. Warnow, T., Evans, S. N., Ringe, D. & Nakhleh, L. Indo-European languages.
studied phylogenetically and statistically. in Phylogenetic Methods and the Prehistory of 24. Dunn, M., Terrill, A., Reesink, G., Foley, R. A. &
3. Pagel, M. & Mace, R. The cultural wealth of nations. Languages (eds Forster, P. & Renfrew, C.) 75–87 Levinson, S. C. Structural phylogenetics and the
Nature 428, 275–278 (2004). (McDonald Institute for Archaeological Research, reconstruction of ancient language history. Science
4. Darwin, C. The Descent of Man (Murray, London, 1871). Cambridge, 2006). 309, 2072–2075 (2005).
5. Swadesh, M. Lexico-statistic dating of prehistoric 15. Pagel, M. Inferring the historical patterns of biological 25. Gray, R. D., Drummond, A. J. & Greenhill, S. J.
ethnic contacts. Proc. Am. Phil. Soc. 96, 453–463 evolution. Nature 401, 877–884 (1999). Language phylogenies reveal expansion pulses and
(1952). 16. Edwards, A. W. E. Likelihood (Cambridge Univ. Press, pauses in Pacific settlement. Science 323, 479–483
6. Embleton, Sheila M. Statistics in Historical Linguistics. Cambridge, 1972). (2009).
Quantitative Linguistics Vol. 30 (Bochum, Brockmeyer, 17. Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. This paper describes the use of a language
1986). in Markov Chain Monte Carlo in Practice phylogeny to test a historical hypothesis for the
7. Pagel, M. & A. Meade. in Phylogenetic Methods and (eds Gilks, W. R., Richardson, S. & Spiegelhalter, D. J.) timing of the origin of Austronesian languages.
the Prehistory of Languages (eds Forster, P. & 1–19 (Chapman and Hall, 1996). 26. Pagel, M., Atkinson, Q. D. & Meade, A. Frequency of
Renfrew, C.) 173–182 (McDonald Institute for 18. Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & word use predicts rates of lexical evolution throughout
Archaeological Research, Cambridge, 2006). Bollback, J. P. Bayesian inference of phylogeny and Indo-European history. Nature 449, 717–719 (2007).
8. Mace, R. & Pagel, M. The comparative method in its impact on evolutionary biology. Science, 294, A statistical phylogenetic study that proposed a
anthropology. Curr. Anthropol. 35, 549–564 (1994). 2310–2314 (2001). general explanation for variation in rates of lexical
This paper formally introduced use of phylogenetic 19. Pagel, M. in Time-Depth in Historical Linguistics replacement.
trees into comparative anthropology. (eds Renfrew, C., MacMahon, A. & Trask L.) 189–207 27. Sanderson, M. J. & Donoghue, M. J. Patterns of
9. Pagel M, Meade A. in The Evolution of Cultural (The McDonald Institute of Archaeology, Cambridge, variation in levels of homoplasy. Evolution 43,
Diversity: a Phylogenetic Approach (eds Mace R., 2000). 1781–1795 (1989).
Holden C. J. & Shennan S.) 235–256 (UCL Press, 20. Gray, R. & Jordan, F. Language trees support the 28. Huson, D. H. SplitsTree: analyzing and visualizing
London, 2005). express-train sequence of Austronesian expansion. evolutionary data. Bioinformatics 14, 68–73 (1998).
10. Kruskal, J., Dyen, I. & Black, P. in Mathematics in the Nature 405, 1052–1055 (2000). 29. Bryant, D., Filimon, F. & Gray, R. D. in The Evolution of
Archeological and Historical Sciences (eds Hodson, 21. Holden, C. J. Bantu language trees reflect the spread Cultural Diversity: Phylogenetic Approaches
F. R., Kendall, D. G. & Tautu, P.) 361–380 (Edinburgh of farming across Sub-Saharan Africa: a maximum- (eds Mace, R., Holden, C. & Shennan, S.) 69–85
Univ. Press, Edinburgh, 1971). parsimony analysis. Proc. R. Soc. Lond., B 269, (UCL Press, London, 2005).
11. Sankoff, D. in Current Trends in Linguistics 11: 793–799 (2002). 30. Renfrew, C. Archaeology and Language: the Puzzle of
Diachronic, Areal and Typological Linguistics This paper describes an early application of Indo-European Origins (Cape, London, 1987).
(ed. Sebeok, T. A.) 93–112 (Mouton, The Hague, phylogenetic methods in linguistics. Classic text on the origin of the Indo-European
1973). 22. Holden, C. J., Meade, A. & Pagel, M. in The Evolution language family.
12. Felsenstein, J. Evolutionary trees from DNA of Cultural Diversity: a Phylogenetic Approach 31. Greenhill, S., Currie, T. & Gray. R. Does horizontal
sequences: a maximum likelihood approach. (eds Mace R., Holden C. J. & Shennan S.) 53–65 transmission invalidate cultural phylogenies? Proc.
J. Mol. Evol. 17, 368–376 (1981). (UCL Press, London, 2005). R. Soc. Lond., B 18 Mar 2009 (doi:rspb.2008.1944).
32. Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & 46. Starostin, S. A. in Works on Linguistics (ed. Starostin, 64. Mace, R. The Evolution of Cultural Diversity:
Mountain, J. Reconstruction of human evolution: S. A.) 827–839 (Languages of the Slavic Culture, a Phylogenetic Approach (eds Mace R., Holden C. J.
bringing together genetic, archaeological, and Moscow, 2007). & Shennan S.) 1–10 (UCL Press, London, 2005).
linguistic data. Proc. Natl Acad. Sci. USA 85, 47. Ellis, N. C. Frequency effects in language processing: 65. Harvey, P. H. & Pagel, M. D. The Comparative
6002–6006 (1988). a review with implications for theories of implicit and Method in Evolutionary Biology (Oxford Univ. Press,
This paper is a widely cited early attempt to link explicit language acquisition. Stud. Second Lang. Oxford, 1991).
genetic and linguistic diversity. Acquisit. 24, 143–188 (2002). 66. Green, P. J. Reversible jump Markov chain Monte
33. Lansing, J. S. et al. Coevolution of languages and 48. Huettig, F., Quinlan, P. T., McDonald, S. A. & Carlo computation and Bayesian model determination.
genes on the island of Sumba, eastern Indonesia. Altmann, G. T. M. Models of high-dimensional Biometrika 82, 711–732 (1995).
Proc. Natl Acad. Sci. USA 104,16022–16026 (2007). semantic space predict language-mediated eye 67. Pagel, M. & Meade, A. Bayesian analysis of
34. Hunley, K. et al. Genetic and linguistic coevolution in movements in the visual world. Acta Psychol. 121, correlated evolution of discrete characters by
Northern Island Melanesia. PLoS Genet. 4, 1–14 65–80 (2006). reversible-jump Markov chain Monte Carlo.
(2008). 49. Birdsell, J. B. Some environmental and cultural factors Am. Nat.167, 808–825 (2006).
35. Dediu, D. & Ladd, D. R. Linguistic tone is related to influencing the structuring of Australian Aboriginal 68. Pagel, M., Meade, A. & Scott, D. Assembly rules
the population frequency of the adaptive haplogroups populations. Am. Nat. 87, 171–207 (1953). for protein interaction networks. BMC Evol. Biol. 7
of two brain size genes, ASPM and microcephalin. 50. Nichols, J. Linguistic Diversity in Space and Time (Suppl. 1), S16 (2007).
Proc. Natl Acad. Sci. USA 104, 10944–10949 (Univ. of Chicago Press, Chicago, 1992). 69. Niyogi, P. The Computational Nature of Language
(2007). 51. Mace, R. & Pagel, M. A latitudinal gradient in the Learning and Evolution (MIT Press, Cambridge,
36. Holden, C. J. & Mace, R. Spread of cattle led to the density of human languages in North America. Massachusetts, 2006).
loss of matriliny in Africa: a co-evolutionary analysis. Proc. Roy. Soc. Lond., B 261, 117–121 (1995). 70. Mangel, M. & Clark, C. W. Dynamic Modeling in
Proc. R. Soc. Lond., B 270, 2425–2433 (2003). 52. Barbujani, G. & Sokal, R. R. Zones of sharp genetic Behavioral Ecology (Princeton Univ. Press, Princeton,
A good example of the use of language trees to change in Europe are also linguistic boundaries. New Jersey, 1988).
study cultural evolution. Proc. Natl Acad. Sci. USA 87, 1816–1819 (1990). 71. Haspelmath, M., Dryer M. S., Gil, D. & Comrie, B.
37. Fortunato, L., Holden, C. J. & Mace, R. From 53. Labov, W. Principles of Linguistic Change: Social (eds) The World Atlas of Linguistic Structures
bridewealth to dowry? A Bayesian estimation of Factors (Blackwell, Oxford, 2001). Max Planck Digital Library [online], www.wals.info
ancestral states of marriage transfers in Indo- 54. Milroy, J. & Milroy, L. Linguistic change, social network (2008).
European groups. Human Nature 17, 355–376 and speaker innovation. J. Linguist 21, 229–284 72. Greenhill, S. J., Blust, R. & Gray, R. D. The
(2006). (1985). Austronesian Basic Vocabulary Database: from
38. Mace, R. & Jordan, F. in The Evolution of Cultural 55. Webster, N. Dissertations on the English Language bioinformatics to lexomics. Evol. Bioinform. Online 4,
Diversity: a Phylogenetic Approach (eds Mace, R., (Isaiah Thomas, Boston, 1789). 271–283 (2008).
Holden, C. & Shennan, S.) 207–216 (UCL Press, 56. Atkinson, Q., Meade, A., Venditti, C., Greenhill, S. & 73. Christiansen, M. H. & Chater, N. Language as
London, 2005). Pagel, M. Languages evolve in punctuational bursts. shaped by the brain. Behav. Brain Sci. 31, 489–558
39. Burger, J., Kirchner, M., Bramanti, B., Haak, W. & Science 319, 588 (2008). (2008).
Thomas, M. G. Absence of the 57. Pagel, M., Venditti, C. & Meade, A. Large 74. Gell-Mann, M. The Quark and the Jaguar:
lactase-persistence-associated allele in early punctuational contribution of speciation to Adventures in the Simple and Complex
Neolithic Europeans. Proc. Natl Acad. Sci. USA 104, evolutionary divergence at the molecular level. (W.H. Freeman New York, 1994).
3736–3741 (2007). Science 314, 119–121 (2006).
40. Yang, Z. Maximum likelihood phylogenetic estimation 58. Greenberg, J. H. (ed.) Universals of Languages Acknowledgements
from DNA sequences with variable rates over sites: (MIT Press, Cambridge, Massachusetts, 1963). I thank C. Venditti, A. Calude, I. Peiros, A. Meade, Q. Atkinson,
approximate methods. J. Mol. Evol. 39, 306–314 59. Kirby, S. Function, Selection, and Innateness: M. Ruhlen, M. Cysouw and M. Haspelmath for help, com-
(1994). the Emergence of Language Universals (Oxford Univ. ments and suggestions. Grants to M.P. from the Leverhulme
41. Zipf, G. K. Human Behaviour and the Principle of Least Press, Oxford, 1999). Trust and the Natural Environment Research Council
Effort (Addison-Wesley, Reading, Massachusetts, 60. Cysouw, M. in Quantitative Linguistics: An International supported this work.
1949). Handbook (eds Altmann, G., Köhler, R. & Piotrowski, R.)
42. Leech, G., Rayson, P. & Wilson, A. Word Frequencies 554–578 (Mouton de Gruyter, Berlin, 2005).
in Written and Spoken English: Based on the British 61. Croft, W. Explaining Language Change:
National Corpus (Longman, London, 2001). an Evolutionary Approach (Longman, Harlow, 2000). FURTHER INFORMATION
43. Zipf, G. K. Prehistoric ’cultural strata’ in the evolution This text provides a good overview of evolutionary Mark Pagel’s homepage: www.evolution.reading.ac.uk
of Germanic: the case of Gothic. Mod. Lang. Notes 62, thinking about language. Austronesian Basic Vocabulary Database: http://language.
522–530 (1947). 62. Newmeyer, F. J. in The Evolutionary Emergence of psy.auckland.ac.nz/austronesian
44. Francis, W. N., Kuçera, H. & Mackie, A. W. Frequency Language (eds Knight, C., Studdert-Kennedy, M. & SplitsTree: http://www.splitstree.org
Analysis of English Usage: Lexicon and Grammar Hurford, J.) 372–388 (Cambridge Univ. Press, Swadesh list: http://en.wiktionary.org/wiki/
(Houghton Mifflin, Boston, 1982). Cambridge, 2000). Appendix:Swadesh_list
45. Lieberman, E., Michel, J.-B., Jackson, J., Tang, T. & 63. Haspelmath, M. & Siegmund, S. Simulating the World Atlas of Linguistic Structures: http://www.wals.info
Nowak, M. A. Quantifying the evolutionary dynamics replication of some of Greenberg’s word order ALL Links ARe Active in the onLine pdf
of language. Nature 449, 713–716 (2007). predictions. Linguistic Typology 10, 74–82 (2006).