You are on page 1of 70

A Short Introduction on Cladistics

Christophe HENDRICKX, Phd Student

I. Theory and terminology

Systematics
Systematics: study of the diversification of living organisms, both

past and present, and the evolutionary relationships among groups of organisms through time.

Systematics: Provides scientific names for organisms. Provides classifications for the organisms. Describes organisms. Taxonomy

Preserves collections of them.


Investigates evolutionary histories of organisms. Considers their environmental adaptations.

Taxonomy
Taxonomy: and Classification, naming of

identification, organisms.

Taxon
Taxon (pl. taxa): Group of two or more organisms.

Usually a taxon is given a name (ex. plants, dinosaurs, birds, dogs, etc.)

and a rank (Kingdom, class, family, genus, species, etc.), but neither is
required.
Plants (Plantae Kingdom) Birds (Aves Class) Lion (Panthera leo Species)

Cladistics
Cladistics: Method of classification that groups taxa hierarchically

into nested sets based on shared characters.

Phylogenetics
Phylogenetics: study of

evolutionary relationships among groups of organisms.

Cladogram
Cladogram (= phylogenetic tree) : A branching diagram specifying

hierarchical relationships among taxa based upon homologies.

Homology
Homology: Structural similarities, correspondence of features in

different organisms that is due to inheritance from a common ancestor.

Character present in an ancestor and its descendants. Richard Ower:


the same organ in different animals under every variety of form and function. Ex. the forelimb of tetrapods.

Analogy
Analogy: similarity of function and superficial resemblance of

structures that have different origins. Ex. the wing of insects and flying

vertebrates.

Convergence
Convergence: acquisition of

the same biological trait in unrelated

lineages. Ex. The hydrodynamic and


pisciform shape of the body of sharks, ichtyosaurs and dolphins.

Terminology in a cladogram
Branche: Line on a cladogram connecting two nodes (internal

branches or internodes), a node and the root (basal branch) or a node

and a terminal taxon (terminal branch).


Node: point on a cladogram where three or more branches meet.

Terminology in a cladogram
Terminal taxon/node: A taxon placed at one end of a terminal

branch. Taxon under comparison. Operational taxonomic units (OTUs).

Internal node: ancestral unit.


Root: common ancestor of all OTUs under study.

The path from root to node and nodes to nodes defines an evolutionary

path.

Rooting tree
Inferring evolutionary relationships between the taxa requires rooting the tree.

Terminology in a cladogram
Outgroup: taxon used for comparative purposed. Serves as a

reference taxon for determination of the evolutionary relationship among

three or more clades.


Ingroup: clade that includes all taxa of interest to the current study.

Group of interest under investigation in order to resolve the relationships

of its members.

Groups

Monophyletic group (= clade): group including a most recent

common ancestor and all its descendants. (Ex. dinosaurs, birds,


mammals, etc.). Monophyletic groups are characterized by shared derived characters.

Groups

Paraphyletic group: group including a most recent common

ancestor and only some of its descendants. Only recognized by the


absence of synapormorphies. Ex. gymnosperms (- angyosperms), fishes (- tetrapods), prosauropods (- sauropods), etc.

Groups

Polyphyletic group: group that does not include the most common

ancestor of all its members. Ex. warm blooded animals (mammals and
birds).

Types of characters

Plesiomorphy: ancestral/primitive character or character state (usually coded 0 in a datamatrix).

Types of characters

Apomorphy: derived character or character state (usually coded 1, 2, etc. in a datamatrix).

Types of characters

Synapomorphy (= homology): Apomorphy (derived character) that


unites two or more taxa into a monophyletic group (clade). Derived character(s) defining a clade.

Types of characters

Symplesiomorphy: Plesiomorphy (ancestral character) shared by


two or more taxa into a monophyletic group (clade).

Types of characters

Autapomorphy: Apomorphy (derived character) that is restricted to a single terminal taxon in a data set. Derived character(s) defining a

taxon (Ex. a genus or species).

Types of characters

Homoplasy: Similarity in species of different ancestry that is the result of convergent evolution. Correspondence between parts or organs acquired as the result of parallel evolution or evolutionary convergence..

Any character that is not a synapomorphy.

II. Cladistic analysis

How to construct a cladogram?


1) have Select your OTUs (Operational Taxonomic Units) = ingroup. They shared primitive (plesiomorphies) and derived characters

(synapomorphies).

2)

Select an outgroup. The outgroup has one or several shared

primitive character that is common to all OTUs.

3)

Construct a character table and code each OTUs.

4)

Construct a cladogram based on the number of shared characters.

The more shared characters, the more closely related are the OTUs.

How to construct a cladogram?

Lancelot Lamprey Tuna Salamander Turtle Leopard

00000 00011 00011 00111 01111 11111

Characters
Character: Observable feature of an orgasism used to distinguish it

from another.

Character state: Scored observation of a feature perceived in an

organism choosen as an OTUs.

Example: Dentary ramus: (0) elongate; (1) shortened, not much longer than tall. Character = elongation of the dentary ramus. Character state = elongated dentary ramus.

Characters
Discrete characters: denumerable character, character that can be

represented by a subset of all possible real number.

Binary characters: characters that have just two states. Usually

coded as 0 and 1 (e.g. absence/presence). Multistates characters: character that has more than two

observed states. Usually coded as 0, 1, 2, 3...n.


Can be ordered or unordered.

Characters
Polarized characters: character or transformation series where the

direction of character change or direction of evolution has been specified,

thereby determining the relative plesiomorphy (primitive character) or


apomorphy (derived character) of the characters or character states.

Characters
Ordered characters: A multistate characters of which the order has

been determined.

Transformation between two adjacent states costs the same number of steps, but transformation between two non-adjacent states costs the sum

of the steps between their implied adjacent states.


Ex. 0 1 2. 0 1 and 1 2 costs the same, but 0 2 costs twice as many.

Characters
Ordered characters: A multistate characters of which the order has

been determined.

Example: Tooth row: (0) extends posteriorly to approximately half the length of the orbit; (1) ends at the anterior rim of the orbit; (2) completely

antorbital, tooth row ends anterior to the vertical strut of the lacrimal.

Characters
Continuous characters: Character for which potential values are

so infinitesimally close that there are potentially no disallowable real

numbers.

Example: Quadrate, elongation (ratio: lateromedial width of mandibular

articulation/ventrodorsal length from entocondyle to cotylus).

Coding methods

Coding methods

Characters 2 and 3 are inapplicable for taxa W (usually coded 9 or -)

Coding methods

Molecular data

Datamatrix

Taxa = OTUs (divided into outgroup and ingroup)

0 1 or 2 2 [01] ? -

(usually) plesiomorphic characters. (usually) apomorphic characters. multistate characters. polymorphic character. unknown data, missing values. inapplicable characters.

Instructions Creating a datamatrix


1) Open Mesquite

File New given a name to the .nex file you are creating. Ex. spino.nex Name: Taxa (or genera). Number of taxa: 5 (in our case). Select Make character Matrix. New character (lets say 10). Create your datamatrix by naming your OTUs (taxa), defining your character and character states and coding your taxa for each characters. 0 1 or 2 2 plesiomorphic characters. apomorphic characters. multistate characters. [01] ? polymorphic character. unknown data, missing values. inapplicable characters.

Once its done, save it (Ctrl+s).

Instructions Creating a datamatrix

Instructions Creating a datamatrix


2) Open the .nexus file with Notepad.

Remove all the text and only keep the datamatrix newly created. You must have something looking like this:
nstates 2 xread 10 5 Eustreptospondylus Baryonyx Suchomimus Irritator_Angaturama Spinosaurus ; proc/;

1000000000 11110011[12]1 1111001121 11?111-12? 111111-110

The polymorphic characters have to be bracketed with quadrangular brackets. e.g. [01] or [012].
The inapplicable characters are coded - rather than 9. There are treated the same way as ?.

Add in the beginning nstates 2 the number of different states, here two. (up to 32) xread 10 5 the number of characters number of taxa.

Instructions Creating a datamatrix


nstates 2 xread 10 5 Eustreptospondylus Baryonyx Suchomimus Irritator_Angaturama Spinosaurus ; proc/;

1000000000 11110011[12]1 1111001121 11?111-12? 111111-110

Add as a last line ; proc/; Save the file as a .txt file or as a.tnt file. For numerical (discrete characters), TNT accept up to 32 states noted 0 to 9, then A to V for state 10 to 31.

Instructions Creating a datamatrix


nstates 2 xread 10 5 Eustreptospondylus Baryonyx Suchomimus Irritator_Angaturama Spinosaurus ; ccode +8; ; proc/;

1000000000 11110011[12]1 1111001121 11?111-12? 111111-110

If you want to order some characters, add the following two lines after the datamatrix. ; ccode + 35 64; (here characters 35 and 64 are now ordered) Be aware that, in TNT, the first character is not one but zero !!!! Here for instance, there are 10 characters from 0 to 9.

Instructions datamatrix with continuous characters


nstates 32 xread 3 5 & [cont] A 1.23 3 8.7 B 2.35 ? 5.36 C 3.65 7.89 0.25 D 4.65 23.23 0.87 E 8.25 23.23 8 ; proc/; nstates 32 xread 6 5 & [cont] A 1.23 3 8.7 B 2.35 ? 5.36 C 3.65 7.89 0.25 D 4.65 23.23 0.87 E 8.25 23.23 8 & [num] A 0 0 1 B 1 1 2 C 1 2 3 D 1 2 1 E 1 3 2 ; ccode + 5 6; ; proc/;

In TNT, the values of continuous characters can go up to 65 and can have three decimals.

Principe of parsimony
Principe of parsimony: general scientific criterion for choosing

among competing hypotheses that states that we would accept the

hypothesis that explains the data most simply and efficiently.


The principle of parsimony (Occams Razor) states that a theory about

nature should be the simplest explanation that is consistent with facts.


Keep it simple.

Principe of parsimony
A phylogenetic tree is a hypothesis. There may be many possible trees, but the simplest one is probably the most accurate.

Simplest tree = shortest tree. Tree with the fewest character changes
and the minimal number of nodes.

Principe of parsimony
A cladistic analysis tries to find the most parsimonious trees (MPTs), all trees that minimize the number of evolutionary changes (steps).

Heuristic search
Heuristic search: Algorithm for constructing cladograms. Try to find

the best tree by reducing the set of trees examined and just calculating

the score for some likely trees. Does NOT guarantee to find the best
tree.

Instructions Heuristic search


3) Open the software TNT

File Open input file open the .tnt file newly created.
4) Analyze New Technology search Then select these options. Search

Instructions Visualizing the MPTs


3) Open the software TNT

File Open input file open the .tnt file newly created.
4) Analyze New Technology search Then select these options. Search Click here to visualize the consensus tree.

Instructions Visualizing the MPTs


3) Open the software TNT

File Open input file open the .tnt file newly created.
4) Analyze New Technology search Then select these options. Search

Consensus tree
Consensus tree: convenient way to summarise the agreement

between two or more trees. Branching diagram produced using a

consensus method, a method combining the grouping information


contained in a set of cladograms for the same taxa into a single topology.

Resulting consensus tree: Polytomy: node which has more than two immediate descending branches.

Consensus tree
Strict consensus tree: contains only those clusters found in all the

trees (100%).

Majority rule consensus tree: contains all clusters occurring in at

least half the trees, contains only those clusters found in a majority (>

50%) of the trees in the profile.

Consensus tree
Semi-strict consensus tree: contains all the uncontradicted

clusters in a profile of trees. Includes the clusters retained by the strict

consensus tree, but also contains any clusters that are not contradicted
by any other clusters in the profile.

Measures of character fit


Tree lenght: minimum number of character changes (steps)

required on a cladogram to account for the data.

Consistency index (CI): Measure of the amount of homoplasy in a

character relative to a given cladogram. m / s

Retention index (RI): Measure of the amount of similarity in a

character that can be interpreted as a synapomorphy. (g s) / (g m)

m = minimum number of steps a character can exhibit on any cladogram. s = minimum number of steps a character can exhibit on the cladogram in question. g = greatest number of steps a character can exhibit on any cladogram.

Support for individual clades


Bremer support (= branch support, decay index): number of

extra steps required before a clade is lost from the strict consensus tree.

Support for individual clades


Bootstrap analysis: method of creating a large

consisting

number of pseudoreplicate data sets


of the same size as the original by randomly sampling characters with

replacement.

Support for individual clades


How does it work? The analysis consist of deleting

some

characters

randomly

and

reweight the rest randomly. The MPTs for these pseudoreplicates are

then calculated. The percentage of


pseudoreplicates that recover a

given group corresponds to the measure of confidence in the group.

Instructions Bremer support, CI and RI


5) Copy and paste your .tnt file into the folder TNT which must include the scripts STATS.run and aquickie.run. Both are downloadable on Internet on these links: http://tnt.insectmuseum.org/index.php/Scripts/stats http://tnt.insectmuseum.org/index.php/Scripts/aquickie.run. 6) Open the software TNT

File Open input file open the .tnt file newly crealy. 7) In the Command line, enter the command aquickie The resulting consensus tree will be displayed, as well as the Bremer support.

Then enter the command stats which will display the Consistency and Retention indexes (Ci and Ri).

Instructions Bootstrap analysis


8) In order to perform a Bootstrap analysis:

Analyze Resampling
Then choose the following options Ok

Instructions Performing a cladistic analysis


9) In order to visualize the list of synapomorphies for each clade:

Optimize Synapomorphies List synapomorphies


then select the tree (the last one is the consensus tree). To add the list into a publication: File Output Print display buffer And then use the software CutePDF Writer freely dowloadable on the Web to save the list in a .pdf file. 10) To save the tree and arrange them by using Dendroscope. File Tree Save file Open, parenthetical give a name to the file and save it as a .tre file.

Instructions Performing a cladistic analysis


11) Open the .tre file with Notepad.

Delete all the text except the last line (consensus tree) writed like this:
(Eustreptospondylus ((Baryonyx Suchomimus )(Irritator_Angaturama Spinosaurus )))

Then replace the following things:


1. 2. 3. 4. (= space) )( ,) )) by by by by :1.0, ),( ) ):1.0)

Add as a first line: # DENDROSCOPE{TREETree And as a last line: ;} Save the new file as a .tre file.

Instructions Performing a cladistic analysis


You must then have something like this:
#DENDROSCOPE{TREE'Tree'(Eustreptospondylus:1.0,((Baryonyx:1.0,Suchomim us:1.0),(Irritator_Angaturama:1.0,Spinosaurus:1.0):1.0));}

Open the .tre file with Dendroscope. Choose to display the graph as a rectangular phylogram, a rectangular cladogram, a slanted cladogram, or a circular cladogram like this one:

You can name clades (e.g. Spinosauridae, Baryonychinae, ect.), change the font and the colour of each taxon, and add colours to each clades or stems.

Instructions Performing a cladistic analysis

Instructions Winclada
1) Open the .tps file newly created with Winclada.

2) Select all characters with your mouse (there must be in green when selected), or Chars select all chars.
Chars Make sel chars NONADDITIVE (fitch) Ok If you have ordered characters, select the characters to order, then Chars Make sel chars ADDITIVE (farris) Ok 3) To perform a Heuristic search: Analyze Ratchet (Island Hopper)

Island Hop Yes.

Instructions Winclada
4) In order to display the results and visualize the synpomorphies, select the following options:

The length of the tree as well as the Consistency index (CI) and Retention index (RI) are displayed on the bottom of the window.

Instructions Winclada
5) To perform a Bootstrap analysis:

Analyze Bootstrap/Jackknife/CR with NONA Bootstrap.

Instructions Winclada
6) To save trees:

Trees Save ALL Trees to file Name taxa (full names, NOT NONA readable) do it!
Give a name to the .tre file. Do the same procedure as with TNT in order to read the file with Dendroscope (step 11).

Instructions PAUP*
Open the .nex file newly created with Mesquite with PAUP*.

File Open
In the command line, write the following commands and press enter: Hsearch That will perform a heuristic search. Reset Maxtrees (Automatically increase by 100) if necessary .

Contree all/majrule treefile=name_tree.tre Gives the strict and majority rule consensus trees, which will be both saved with the name name_tree.tre) Describetrees Gives the tree length, the Consistency index (CI), the Homoplasy index (HI), the Retention index (RI) and the Rescaled consistenct index (RC).

Instructions PAUP*
Open the .nex file newly created with Mesquite with PAUP*.

File Open
In the command line, write the following commands and press enter: Hsearch That will perform a heuristic search. Reset Maxtrees (Automatically increase by 100) if necessary.

Contree all/majrule treefile=name_tree.tre Gives the strict and majority rule consensus trees, which will be both saved with the name name_tree.tre) BootStrap all/treefile= name_tree2.tre

That will perform a Bootstrap analysis on the MPTs and save the results with the name name_tree2.tre.

You might also like