Professional Documents
Culture Documents
Nature405,10521055(2000)MacmillanPublishersLtd.
Language trees support the express-train sequence of
Austronesian expansion
Languages,likemolecules,documentevolutionaryhistory.Darwin1observedthat
evolutionarychangeinlanguagesgreatlyresembledtheprocessesofbiologicalevolution:
inheritancefromacommonancestorandconvergentevolutionoperateinboth.Despite
manysuggestions24,fewattemptshavebeenmadetoapplythephylogeneticmethodsused
inbiologytolinguisticdata.Herewereportaparsimonyanalysisofalargelanguagedata
set.Weusethisanalysistotestcompetinghypothesesthe"expresstrain"5andthe
"entangledbank"6,7modelsforthecolonizationofthePacificbyAustronesianspeaking
peoples.Theparsimonyanalysisofamatrixof77Austronesianlanguageswith5,185
lexicalitemsproducedasinglemostparsimonioustree.Theexpresstrainmodelwas
convertedintoanorderedgeographicalcharacterandmappedontothelanguagetree.We
foundthatthetopologyofthelanguagetreewashighlycompatiblewiththeexpresstrain
model.
Therearemanyparallelsbetweentheprocessesofbiologicalandlinguisticevolutionand
themethodsusedtoanalysethem4.Despitetheseparallels,however,historicallinguists
havenotusedthequantitativephylogeneticmethodsthathaverevolutionizedevolutionary
biologyinthepast20years8.So,althoughlinguistsroutinelyusethe"comparative
method"9toconstructlanguagefamilytreesfromdiscretelexical,morphologicaland
phonologicaldata,theydonotuseanexplicitoptimalitycriteriontoselectthebesttree,nor
dotheytypicallyuseanefficientcomputeralgorithmtosearchforthebesttreefromthe
discretedata.Thisissurprisinggiventhatthetaskoffindingthebesttreeisinherentlya
combinatorialoptimizationproblemofconsiderablecomputationaldifficulty10.One
potentialproblemwithaquantitativephylogeneticapproachtolinguisticevolutionarises
fromthemorereticulatenatureofculturalevolution.Someauthors11,12haveclaimedthat
reticulateprocessesinlinguisticevolutionovershadowthoseofdescent,leadingthemto
rejecttheappropriatenessofthefamilytreemodel.Webelievethatthisisanempirical
claim,whichcanbeevaluatedusingphylogeneticmethods.Ifthedatafitwellonthetree
andthereislittlesystematicconflictingsignal,thenthefamilytreemodelissupported.If
thedatafitpoorly,thenalternativephylogeneticmethodsthatdonotassumeatreemodel,
suchasspectralanalysisorsplitdecomposition,shouldbeinvestigated.Acriticalpartof
phylogeneticinferenceinvolvestestingforcongruencebetweenindependentlinesof
evidence.HerewetestamodelofthecolonizationofthePacificthatisderivedfrom
predominantlyarchaeologicaldatabyquantitativelyexaminingitsfitwithaparsimonytree
ofAustronesianlanguages.
PrehistorichumancolonizationinthePacifichappenedintwophases.Initially,Pleistocene
huntergathererexpansionsfromIslandSoutheastAsiathroughNewGuineareachedthe
Bismarckarchipelagoby33,000BPandthePapuanspeakingdescendantsofthesepeople
aredispersedthroughoutNewGuineaandpartsofIslandMelanesia13.Thesecond
colonizationwaveofAustronesianlanguagespeakersinvolvedadiasporaofNeolithic
farmingpeoplesoutofsouthChinaandTaiwanaround6,000BP1315.Accordingtothe
'expresstraintoPolynesia'model,theAustronesianexpansionfromTaiwanwasextremely
rapid,takingroughly2,100yearstoreachtheedgesofwesternPolynesiaadistanceof
10,000kilometres.Convergingevidencefromarchaeologyandmolecularanthropology
supportsarapidandrelativelyencapsulateddispersaloftheAustronesianspeakers
throughoutthePacific13,1618(Fig.1);however,thereissomedisputeabouttheexactdegree
ofinteractionwithearlierMelanesiansettlers,therateatwhichthemigrationoccurredand
theextentandlocationofanycolonizationpauses19.Inbroadterms,mostPacificscholars
seemtofavourtheexpresstrainmodel,butothershavearguedthattheancestral
PolynesiansderivefromanolderMelanesian"matrix"7,20.Thelatterauthorsstressthata
phylogenetic,colonizationfocusedperspectiveobscuresthehighdegreeofprehistoric
contactandinterrelationshipsamongstPacificpeople;weuseTerrell'sphrase6the
entangled-bank modelto represent this. These two models are not
mutually exclusive, but are best characterized as two ends of a
continuum of modes of human prehistory, with a pure tree at one end
and a maximally connected network at the other. The issues surrounding
the settlement of the Pacific are thus a microcosm of the general debate
about whether human cultural evolution can be appropriately
represented as a tree.
Wetestedoneaspectoftheexpresstrainmodel,thecolonizationsequence,inthewaythat
biologiststesthypothesesaboutthesequenceofeventsinbiologicalevolution.We
constructedatreeandthenmappedthetraitontothetreetoseewhethertheinferred
sequenceofchangesfitsaparticularscheme21.Figure2showshowasimplecolonization
schemecanbetestedbymappinggeographyontoanindependenttree.Wegrouped
languagesaccordingtoDiamond'sarchaeological/geographicalstations5,22.Usingcharacter
statefunctionsintheprogramMacClade23,weassignedeachstationacharacterstatefrom0
to9.Thestateswereorderedinacharacterstatetreetofitthesequenceproposedbythe
expresstrainmodel.Forexample,inFig.1theTaiwaneselanguagesweregroupedasstate
1,theRemoteOceaniclanguagesasstate8;thismeansachangefromstate1to8would
requirefivesteps(accordingtothemodelpresentedinFig.1).Bymappingthesecharacter
statesontothemostparsimoniouslanguagetree(Fig.3),wewereabletoevaluatethe
expresstrainmodelinaquantitativemanner.Ifthelanguagetreefitstheexpresstrain
modelwell,thenthecharacterstatetreeshouldfitwellontoourobtainedtree.Theshortest
possibletreelengthrequiredtooptimizethecharacterstatetreeontothelanguagetreewas
nine(thatis,thenumberofcharacterstatesminusone).Whenthecharacterstatetreewas
mappedontotheoptimaltree,weobtainedatreelengthof13.Toassessthestatistical
significanceofthefit,werandomlyshuffledthecharacterstatesbetweenthe77languages
200times23. This gave us a null distribution of tree lengths with a mean
tree length of 48.9 steps (s.d. 1.98, range 4353). This indicates that the
express-train character-state tree fits the language tree with significantly
fewer steps than would occur by chance. In fact, the obtained fit was
very close to the shortest possible length (nine), indicating that the
express-train model fits the language tree exceptionally well.
Bydefinition,anentangledbankmodelcannotberepresentedbyacharacterstatetree;
however,wecanassesswhetherthelanguagedatasupporttheentangledbankmodelby
examiningthetopologyofthetree.Whileadvocatesofthismodelmakenopredictions
aboutthelikelyshapeofalanguagetreeunderanentangledbankconception,theyargue
thatlargescalemigrationpatternsinlanguagesareobscuredbyculturecontact7.
Consequently,theymightpredictalayered,'candelabralike'treethatemphasizesregional
contact.Incontrast,an(archaeologically)quickcolonizingwavefromIslandSoutheast
AsiathroughthePacifictoPolynesiashouldproduceatreetopologythatis'chainlike'(see
Fig.3).Proponentsoftheentangledbankmodelarguethatculture,languageandbiology
'combineandrecombine'insuchcomplexinteractionsthatpatternsoflanguage
relationshipsmaytellusverylittleaboutthehistoryoflanguagespeakers7.Inthiscase,the
treeshouldmerelyreflectgeographicalproximity.Ourtree,however,showsseveralcases
wheretherelationshipsfitthehistoricalsequencesimpliedbytheexpresstrainmodelbut
conflictwithgeographicalproximity(seeFig.3).
Althoughwerejectthespecificfeaturesoftheentangledbankmodel,wedonotclaimthat
Austronesianculturalhistoryistotallytreelike.Theconsistencyindex(ameasureofthefit
ofthelexicaldataonthetree)isonly0.25.Thisvalueisnotsubstantiallylowerthanwould
beexpectedforequivalentlysizedmorphologicalandmoleculardatasets24inwhich
hybridizationisuncommon.Althoughitisprobablethatmuchofthepoorfitinthelexical
dataisduetothelossofculturalorlinguisticfeatures15,25,archaeological26andgenetic27
evidencedoindicatethatpopulationinteractionand'borrowing'arelikelytohaveoccurred
evenbetweenfarflungarchipelagoes.Awayofapproachingtheissueofborrowingisto
examinelanguageswhoseplacementconflictswiththecolonizationscheme.Forexample,
BuliandNumforaregroupedinsidetheOceaniclanguagegrouponourtree,whereasthe
expresstrainmodelplacesthesesouthHalmahera/westNewGuinealanguagesoutsidethe
Oceanicgroup.Similarly,ChamorroandPalaulanguageswhoseclosestrelationshipsare
mostlikelywiththePhilippines28aregroupedwiththeOceaniclanguages.Inboththese
cases,borrowingisalikelycauseoftheincongruencebetweentheexpresstrainmodeland
ourtree.Moredetailedevidenceforspecificpatternsofreticulationisevaluatedelsewhere
(F.M.J.andR.D.G,manuscriptinpreparation).
Thepatternsapparentinlinguisticrelationshipsareintegrallytiedtothemovements,
contactsandactivitiesoflanguagespeakers.Ourpreliminaryinvestigationshaveshown
thataphylogeneticapproachtolanguagesofferstheabilitytotesthypothesesabouthuman
prehistory.Inbiology,phylogeneticmethodshavebecomeinvaluabletoolsforinvestigating
patternsandprocessesinevolution.Inthefuture,phylogeneticmethodsmayprovidea
commonmethodologyandanalyticframeworktointegratedatafromethnography,
archaeology,linguisticsandgenetics.Thisisanimportantsteptowardsaunifiedapproach
tobiologicalandculturalevolution.
Methods
DataweretakenfromBlust'sAustronesianComparativeDictionary(R.Blust,personal
communication).Thisisacontinuingprojecttocompilecomparativelexicaldatafromthe
largestlanguagefamilyintheworld.Currently,thedictionaryis25%completeand
comprises5,185lexicalitemsacross191languages.Eachlexicalitemhasasetofcognate
termslistedwiththelanguagesinwhichtheyappear.Toensurethattherewassufficient
informationinthedatasetforphylogeneticanalysis,wecutthenumberoflanguagesfrom
191to68byusingacriterionof150ormoreappearancesinacognateset.Anadditional
ninelanguageswerethenaddedtoprovideabalancedrepresentationoftheprincipal
Austronesianlanguagesubgroups,givingus77languagesintotal.Thepresenceofa
languageinacognatesetwascodedas'1'inamatrixof77languages 5,185lexicalitems.
Ifalanguagewasnotpresentinaparticularcognateset,thatlanguagewascodedas'0'for
thatiteminthematrix.Linguistic15,28,archaeological13andgenetic16,18evidenceagreesthat
TaiwanisthemostlikelyAustronesianhomeland,andsothetwoTaiwaneselanguages
(AmisandPaiwan)wereusedtorootthetree.WeusedPAUP*4.0d65(ref.29)tofindthe
setofmostparsimonioustrees.Tomaximizethechanceoffindingoptimaltrees,1,200
randomadditionsequencesandtreebisectionreconnectionbranchswappingwereused.
Charactersweretypedaseasyloss(5:1ratio)ontheassumptionthatindependentlossesof
lexicalitemsweremorelikelythanindependentgains.Similarassumptionsaboutcharacter
codinghavebeenusedforcomplexbehaviouralcharacters30,andlinguisticfeatures(suchas
phonemes)havebeenshowntobelostinawesttoeastdirectionacrossthePacific25.Other
easylosscodingsandequallyweightedparsimonyproducedsimilarresults(R.D.G.and
F.M.J.,manuscriptinpreparation).Thesearchfoundoneshortesttreeof52,129stepswith
aconsistencyindexof0.25.Thelinguisticdatasetcontainedsignificantphylogeneticsignal
(treelengthskewnessindexg1=0.505calculatedfrom100,000randomtrees).
References