Professional Documents
Culture Documents
Graph Theoretic
Concepts and Algorithms
for Bioinformatics
What is a graph
Formally: A finite graph G(V, E) is a pair (V, E),
where V is a finite set and E is a binary relation on V.
Recall: A relation R between two sets X and Y is a subset of X
x Y.
For each selection of two distinct Vs, that pair of Vs is
either in set E or not in set E.
a
b
c
a
c
2
Why graphs?
Many problems can be stated in terms of a graph
The properties of graphs are well-studied
Many algorithms exists to solve problems posed as graphs
Many problems are already known to be intractable
Graphs in bioinformatics
Sequences
DNA, proteins, etc.
Chemical compounds
Metabolic pathways
Graphs in bioinformatics
Phylogenetic trees
Basic definitions
Undirected graph
Directed graph
loop
loop
G=(V,E)
isolated vertex
multiple
edges
adjacent
Travel in graphs
x
d
c
Types of graphs
Simple graph
b
K5
c
Disconnected graph
with two components
d
c
8
Types of graphs
acyclic graph (forest): a graph with no cycles
tree: a connected, acyclic graph
rooted tree: a tree with a root or distinguished vertex
leaves: the terminal nodes of a rooted tree
10
5
8
-3
2
e
Digraph definitions
for digraphs only
Directed graph
a
Every edge has a head (starting point) and a
b
tail (ending point)
Walks, trails, and paths can only use edges in
the appropriate direction
In a DAG, every path connects an
c
predecessor/ancestor (the vertex at the head
of the path) to its successor/descendents
d
(nodes at the tail of any path).
x
parent: direct ancestor (one hop)
y
w
child: direct descendent (one hop)
A descendent vertex is reachable from any of
v
u
its ancestors vertices
z
Intro. to Graph Theory
10
Computer representation
undirected graphs: usually represented as digraphs with two
directed edges per actual undirected edge.
adjacency matrix: a |V| x |V| array where each cell i,j contains
the weight of the edge between vi and vj (or 0 for no edge)
adjacency list: a |V| array where each cell i contains a list of all
vertices adjacent to vi
incidence matrix: a |V| by |E| array where each cell i,j contains a
weight (or a defined constant HEAD for unweighted graphs) if
the vertex i is the head of edge j or a constant TAIL if vertex I is
the tail of edge j
b
2
10
a
b
c
d
c
8
d
4
6
10 2
adjacency
matrix
a c (8), d (4)
b
c b (6)
d c (2), b (10)
adjacency
list
a
b
c
d
t
6
2
8
t
t
t
2 10
5
4
incidence
matrix
11
Computer representation
Linked list of nodes: Node is a defined data object with labels which include a list of pointers to its children and/or parents
Class Node:
label = NIL;
parents = []; # list of nodes coming into this node
children = []; # list of nodes coming out of this node
childEdgeWeights = []; # ordered list of edged weights
12
Subgraphs
G(V,E) is a subgraph of G(V,E) if V V and E E.
induced subgraph: a subgraph that contains all possible edges
in E that have end points of the vertices of the selected V
a
a
e
d
c
G(V,E)
Intro. to Graph Theory
d
d
c
Induced subgraph of
G({a,c,d},{{c,d}}) G with V = {b,c,d,e}
13
Complement of a graph
The complement of a graph G (V,E) is a graph with the same
vertex set, but with vertices adjacent only if they were not
adjacent in G(V,E)
a
a
e
d
c
14
What is the path of total minimum weight from the source to any other vertex?
Greedy strategy works for simple problems (no cycles, no negative weights)
Longest path is a similar problem (complement weights)
We will see this again soon for fragment assembly!
b
2
10
15
Dijkstras Algorithm
1.
2.
16
10
20
0
0
20
18
Process B
10
18
Process D
10
18
Process E
10
18
B
10
20
11
3
15
17
1
e
d
c
18
20
b
e
a
f
1
5
3
3
4
e
4
d
c 2
21
K4,4
Intro. to Graph Theory
22
3
23
A
B
C
D
E
F
1
x
2
x
x
x
3
x
x
x
x
x
x
x
d
e
Colors?
f
24
a
4
d8
f 2
h
2
a
4
Intro. to Graph Theory
d8
f 2
4
h
2
25
area b
a
area d
area c
d
c
This one is in P!
26
a b c y z
a b c y z
a b c y z
a b c y z
26-ary trie
a b c y z
Intro. to Graph Theory
CAB
27
Graph traversal
There are many strategies for solving graph problems for
many problems, the efficiency and accuracy of the solution boil
down to how you search the graph.
We will consider a travel problem for example:
Given the graph below, find a path from vertex a to vertex d.
Shorter paths (in terms of edge weight sums) are desirable.
b
4
a
2
5
Intro. to Graph Theory
c
6
f
28
A greedy approach
greedy traversal: Starting with the root node, take the edge
with smallest weight. Mark the edge so that you never attempt
to use it again. If you get to the end, great! If you get to a dead
end, back up one decision and try the next best edge.
Advantages: Fast! Drawbacks: Answer is usually non-optimal
For some problems, greedy approaches are optimal, for others
the answer may usually be close to the best answers, for yet
other problems, the greedy strategy is a poor choice.
3
b
4
a
2
5
Intro. to Graph Theory
c
6
Start node: a
End node: d
Traversal order: a, c, f, e, b, d
f
29
b
4
a
2
d
Intro. to Graph Theory
c
6
Traversal order: a, b, c, d, e, f
30
b
4
a
2
d
Intro. to Graph Theory
V.state = visited
Process vertex v
Foreach edge (v,w) {
if w.state = unseen {
DFS (G, w)
process edge (v,w)
}
}
c
6
DFS (G, v)
}
Traversal order: a, b, d, e, f, c
31
b
4
a
2
c
6
Traversal order:
Path Current Best
A
0
AE
2
AEB 6
AEBD 11
11
AEF 9
11
AEFC 15
11
AC
1
11
Binary trees have at two children per node (the child may be null)
Binary search trees are organized so that each node has a label.
When searching or inserting a value, compare the target value to each node;
one out-going edge corresponds to less than and one out-going edge
corresponds to greater than.
On the average, you eliminate 50% of the search space per node if the tree
is balanced
5
8
3
2
1
Intro. to Graph Theory
6
7
10
33