You are on page 1of 19

SIGPLAN Notices 1970 July

Control Flow Analysis


Frances E. A l l e n
IBM CORPORATION

INTRODUCTION
Any static, global analysis of the e x p r e s s i o n and data r e l a t i o n -
ships in a p r o g r a m requires a k n o w l e d g e of the control flow of the
program. Since one of the p r i m a r y reasons for doing such a global
analysis in a compiler is to produce o p t i m i z e d programs, control flow
analysis has been e m b e d d e d in many compilers and has been d e s c r i b e d
in several papers. An early p a p e r by P r o s s e r [5] d e s c r i b e d the use
of B o o l e a n m a t r i c e s (or, more p a r t i c u l a r l y , c o n n e c t i v i t y matrices) in
flow analysis. The use of "dominance" r e l a t i o n s h i p s in flow analysis
was first i n t r o d u c e d by P r o s s e r and much e x p a n d e d by Lowry and
M e d l o c k [6]. R e f e r e n c e s [6,8,9] d e s c r i b e compilers w h i c h use various
forms of control flow analysis for o p t i m i z a t i o n . Some recent d e v e l o p -
ments in the area are r e p o r t e d in [4] and in [7].
The u n d e r l y i n g m o t i v a t i o n in all the d i f f e r e n t types of control
flow analysis is the need to codify the flow r e l a t i o n s h i p s in the
program. The c o d i f i c a t i o n may be in c o n n e c t i v i t y m a t r i c e s , in
predecessor-successor tables, in d o m i n a n c e lists, etc. W h a t e v e r the
form, the p u r p o s e is to f a c i l i t a t e d e t e r m i n i n g what the flow r e l a t i o n -
ships are; in other words to f a c i l i t a t e a n s w e r i n g such questions as:
is this an inner loop?, if an e x p r e s s i o n is r e m o v e d from the loop
where can it be correctly and p r o f i t a b l y placed?, w h i c h v a r i a b l e
d e f i n i t i o n s can affect this use?
In this paper the basic control flow r e l a t i o n s h i p s are e x p r e s s e d
in a d i r e c t e d graph. Various graph constructs are then found and
shown to codify i n t e r e s t i n g global r e l a t i o n s h i p s .
The first section of the paper, "Basic Concepts," is p r i m a r i l y
a catalog of r e l e v a n t i n f o r m a t i o n about d i r e c t e d graphs; similar
i n f o r m a t i o n can be found in any i n t r o d u c t o r y m a t e r i a l on the subject.
(Reference [2], for example, covers this m a t e r i a l . ) The use of
d i r e c t e d graphs to express control flow r e l a t i o n s h i p s is also given
in the first section.
In the second section of the paper, "Dominance R e l a t i o n s h i p s "
are d e f i n e d in terms of the basic concepts i n t r o d u c e d in the first
section. Most of the concepts in this s e c t i o n have, as p r e v i o u s l y

Allen. CONTROL FLOW ANALYSIS


- 1 --
SIGPLAN Notices 1970 July

mentioned, appeared in the literature before.[l,5,6]


The third section, "Intervals," discusses a graph construct
defined by Dr. John Cocke and described in [3] and [4]. In this
section intervals are defined, a procedure is given for their
construction, and their properties are given in a series of
assertions. Also discussed are procedures for finding other graph
constructs in terms of the interval constructs.
The fourth section, "Partitioning Graphs by Intervals," describes
a hierarchial sequence of graph partitions by means of intervals.
The last section before the summary gives a procedure and an
example of"The Use of the Interval Construct in Global Analysis."

BASIC CONCEPTS
A directed graph, G, can be denoted by G = (B,E) where B is the
set of nodes (blocks) (bl,b2,...,b n} in the graph and E is the set of
directed edges ((bi,bj) , (bk,b~),...). Each directed edge is repre-
sented by an ordered pair (bi,b j) of nodes (not necessarily distinct)
which indicate that a directed edge goes from node b. to node b..
Thus, there exists a successor function F which maps G into G such
that F~(b i) = {bj I (bi,b j) e E). We call this set the set of
immediate successors of a node. It may be empty. The inverse of
the successor function F~ I gives the immediate predecessors of a node:
FGi(bj) = {b i I (bi,b j) c E}. It too may be empty.
A directed graph is connected if any node in the graph can be
obtained (reached) from any other node by successive applications of
i
F G and/or FG I. We wll assume throughout this paper that the graphs
being discussed are both directed and connected.
Before introducing more graph concepts, the relevance of graphs
to program control flow is introduced.
A basic block is a linear sequence of program instructions having
one entry point (the first instruction executed) and one exit point
(the last instruction executed). It may of course have many predes-
sors and many successors and may even be its own successor. Program
entry blocks might not have predecessors that are in the program;
program terminating blocks never have successors in the program.
A control flow graph is a directed graph in which the nodes repre-
sent basic blocks and the edges represent control flow paths. Every-
thing that is said about directed graphs in this paper holds for
control flow graphs.
A subgraph of a directed graph, G = (B,E), is a directed graph
G' = (B',E') in which B' C B, E' C E, G r~ G' = G' and G ~J G' = G.
Furthermore, the successor function F~, defined for G' must'~tay
within" G'; that is for
!

b B, , i b'i )
FG,( :
{b] l (b , J
E'}
Consider the following directed graph, G, in which the nodes have
been arbitrarily named by numbering.

Allen. C O N T R O L FLOW A N A L Y S I S
-- 2 --
SIGPLAN Notices 1970 July

One of the many subgraphs in G is


G' = (B',E') in w h i c h B' = {2,3,4,5}
and E' = { ( 2 , 3 ) , ( 2 , 4 ) , ( 3 , 5 ) ( 4 , 5 ) ( 5 , 2 ) } .
G' can be d e p i c t e d by:

A path in a d i r e c t e d graph is a
d i r e c t e d subgraph, P, of ordered nodes
and edges o b t a i n e d by s u c c e s s i v e appli-
Fig. I
cations of the s u c c e s s o r function. It is
expressed as a sequence of nodes (bl,b2,...,b n) where bi+ I e r~(bi).
The edges are implied: (b.,b.+~) ~ E. The nodes and the implied edges
i i
are not n e c e s s a r i l y unique. A path in G, the graph in Fig. I, is
(2,3,5,3,5,2,4). It should be o b s e r v e d that the examples show how
some of the d e v e l o p e d n o t a t i o n is to be used: the nodes of the graph
are a r b i t r a r i l y but u n i q u e l y named and b stands for any such name.
A node, q, is said to be a s u c c e s s o r of a node, p, if there
exists some path P = (bl,... ,b n ) for w h i c h b I = p and b n = q. In the
same s i t u a t i o n p is said to be a p r e d e c e s s o r of q. It should be
noted that a node can be both a p r e d e c e s s o r and a s u c c e s s o r of
another node: P1 = (P,''',q) and P2 = (q,''',P)"
A closed path or circuit is a path in w h i c h b n = b I. The circuit
is a simple circuit if, with the e x c e p t i o n of bn, the nodes in the
circuit are distinct; o t h e r w i s e it is a composite circuit. Consider
the graph in Fig. i: it has the f o l l o w i n g simple circuits: (3,5,3),
(5,3,5), (2,3,5,2), (3,5,2,3), (5,2,3,5), (2,4,5,2), (4,5,2,4),
(5,2,4,5), (?,7). One of the composite circuits is (2,3,5,3,5,2).
Since it will g e n e r a l l y be u n i n t e r e s t i n g to consider circuits contain-
ing the same nodes and edges but in a d i f f e r e n t order, we will
g e n e r a l l y select a first node and d e s c r i b e the circuit r e l a t i v e to
that node.
The length of a path is the number of edges in the sequence.
More formally, a d i s t a n c e f u n c t i o n 6 is d e f i n e d such that for any
path P = (bl,b2,...,bn) , ~(P) = n-l. Since the shortest path ~min
b e t w e e n two points p and q is often of interest it will now be
defined: ~min(P,q) = M I N ( ~ ( P I ) , ~ ( P 2 ) . . . ) for all Pi = (P,''',q)" The
shortest path then is the Pi for w h i c h ~(Pi) = 6min(P,q)-
A strongly C o n n e c t e d r e g i o n of a d i r e c t e d graph is a d i r e c t e d
s u b g r a p h in w h i c h there is a path from any node in the s u b g r a p h t o
any other node. It i m m e d i a t e l y follows from this d e f i n i t i o n that
every node lies on at least one closed path, and is, therefore, its
own p r e d e c e s s o r and its own successor. Closed paths (circuits) are,
therefore, a special kind of strongly c o n n e c t e d r e g i o n -- one w h i c h
has a strict ordering. A strongly c o n n e c t e d r e g i o n R of a d i r e c t e d

Allen. CONTROL FLOW ANALYSIS


- 3 -
SIGPLAN Notices 1970 July

graph G is m a x i m a l if there does not exist a n o t h e r strongly c o n n e c t e d


region, R', in G for w h i c h R ~ R' ~ ~. A p r o p e r l y n e s t e d set of
s t r o n g l y c o n n e c t e d regions is a p a r t i a l l y o r d e r e d set d = { R i , R 2 . . . R n }
such that for i < j e i t h e r R i ~ R j = 0 or Ri, ~R j = R i i.e., either R i
and Rj are d i s j o i n t or Rj covers R i.
The use of a n e s t e d - s e t of s t r o n g l y c o n n e c t e d regions in control
flow analysis for o p t i m i z a t i o n was first s u g g e s t e d in [I]. In that
a p p r o a c h to control flow analysis, a set, D, of d i s j o i n t sets of
n e s t e d s t r o n g l y c o n n e c t i o n regions is found:
! T T

D = { { R i , R 2 , . . . , R n} , {Ri,R2,...,Rn}, ... }
or, for the sake of b r e v i t y , D = {d,d',...}. Each R n is a m a x i m a l ,
s t r o n g l y c o n n e c t e d r e g i o n w h i c h thereby assures that sets of n e s t e d
s t r o n g l y c o n n e c t e d regions are disjoint. We will now c o n s i d e r some
of the p r o p e r t i e s of the above construct in a d i r e c t e d graph, G:
I. D does not n e c e s s a r i l y cover G. If there are nodes in G
w h i c h are not in any s t r o n g l y c o n n e c t e d r e g i o n then they will not
be in D.
2. Each d e D is p a r t i a l l y ordered.
3. D is unordered.
4. If a node, p, is an element of a s t r o n g l y c o n n e c t e d r e g i o n
it is in one and only one d. For p e d w h e r e d = { R i , R 2 , . . . , R n}
then p e R n and may be an element of several n e s t e d R i.
As an example c o n s i d e r the graph in Figure l:
D = {{(3,5),(2,3,4,5)},{7}}. Since m u c h of the control flow a n a l y s i s
involves k n o w n i ~ g r e l a t i o n s h i p s b e t w e e n nodes in the control flow
graph, the construct, D, codifies several u s e f u l r e l a t i o n s h i p s .
H o w e v e r it has several l i m i t a t i o n s , the more serious of w h i c h are
that it does not e s t a b l i s h an o r d e r i n g on the total graph and that,
by the very nature of a g e n e r a l s t r o n g l y c o n n e c t e d region, there is
no o r d e r i n g r e l a t i o n s h i p on the nodes w i t h i n the r e g i o n o t h e r than
that given by the i m m e d i a t e s u c c e s s o r - p r e d e c e s s o r relationships.

DOMINANCE RELATIONSHIPS
S e v e r a l i n t e r e s t i n g and useful c o n s t r u c t s can be e s t a b l i s h e d
from "back d o m i n a n c e " and " f o r w a r d d o m i n a n c e " r e l a t i o n s h i p s . Before
d e f i n i n g these r e l a t i o n s h i p s two special kinds of nodes must be
defined. A node in a d i r e c t e d graph, G, w h i c h has no s u c c e s s o r s in
G is called~ a t e r m i n a l or exit node. Thus, letting x denote an exit
node, r~(x) = @. This d e f i n i t i o n suffices for control flow graphs
but, slnce a p r o g r a m entry point may also be the first node in a
closed p a t h and thereby have a p r e d e c e s s o r , an analogous d e f i n i t i o n
for entry nodes does not suffice. An entry n o d e , e, is a node in the
p r o g r a m control flow graph, C, if it contains a p r o g r a m entry point.
S e v e r a l of the c o n s t r u c t s about to be d e s c r i b e d d e p e n d u p o n h a v i n g
only one such node in the control flow graph. An a r b i t r a r y i n i t i a l
entry node e 0 is i n t r o d u c e d into the control flow graph as an
i m m e d i a t e p r e d e c e s s o r of all entry nodes:
I
rc(e0) = {e i I e i is an entry node} and Fcl(e0 ) = @.

Allen. CONTROL FLOW A N A L Y S I S

- 4 -
SIGPLAN Notices 1970 July

Since e 0 e s s e n t i a l l y r e p r e s e n t s the set of all external p r o g r a m


p r e d e c e s s o r s of the entry points, the control flow graph has not
been invalidated. H a v i n g m o d i f i e d the control flow graph to contain
e0, it is p o s s i b l e to view the control flow graph as a d i r e c t e d graph
with one initial node w h e r e an initial node is a node with no p r e d e -
cessors.
Any r e f e r e n c e to a graph in the r e m a i n d e r of this paper will be
to a connected, d i r e c t e d graph with a single entry node, e0, and a set
of exit nodes X = {Xl,X2,...). H a v i n g e s t a b l i s h e d entry and exit
nodes, we can now define the d o m i n a n c e r e l a t i o n s h i p s w h i c h exist in a
d i r e c t e d graph and are of interest in control flow analysis. (For
i n f o r m a t i o n of their role in o p t i m i z a t i o n , the reader should look at
r e f e r e n c e [6].)
A node, bi, is said to back d o m i n a t e or p r e d o m i n a t e a node, bk,
if b i is on every path from e 0 to b k~ Let yD = {PIP = (~___'of'''~}"
Then the set of back d o m i n a t o r ~ BD(bk} of b k consists of
blocks, other than b k itself, w h i c h are on all paths from e 0 to b k-
In other words
BD(b k) = {b i I b i # b k and b i ~ n ~ } .

The i m m e d i a t e back d o m i n a t o r of node b is the back d o m i n a t o r which


is "closest" to bk; that is for all b i k a n d bj in BD(bk) , b i is the
node for w h i c h

6min(bi,bk) = Minimum (~min(bj,bk), 6min(bj,bk),...) .


It can now be shown that there is one and only one immediate
back d o m i n a t o r of a node b # e 0. For suppose t h a t ( bthere
[)' blk were two
such nodes: b i and b i. T~en ~ m i n ( b i ' b k ) = ~min " But
this can only occur if b i and b k are on separate paths or if b i
Since a back d o m i n a t o r must be on every path, b i must equal b~. = bi"
F u r t h e r m o r e there must be at least one back dominator, e0, slnce
e 0 is on every P c jD .
A n o t h e r i n t e r e s t i n g o b s e r v a t i o n w h i c h can be made is that the
set of back d o m i n a t o r s BD(b k) of node k are strictly ordered by the
minimum distance function ~min. This follows from the previous
p a r a g r a p h since, if b i is the i m m e d i a t e back d o m i n a t o r of b k and
if b i ~ e0, b i must have one and only one immediate back dominator.
The set of back d o m i n a t o r s of node b k can be r e p r e s e n t e d by

BD(b k) = ( b l , b 2 , b 3 , . . . , b j ) where b I = e0 ,

b. is the i m m e d i a t e back d o m i n a t o r of b..~ and b is the back domi-


1 Itl i
nator of all b~, i < j < k.
A node biJis said [o forward d o m i n a t e or post dominate a node
b k if b i is on every path from b,. to all exit nodes. By i n t r o d u c i n g
a node x 0 into the graph such that F~l(x0) = X, the set of exit nodes
d e f i n e d earlier, the set of forward d o m i n a n c e r e l a t i o n s h i p s analogous
to the back d o m i n a n c e r e l a t i o n s h i p s can be developed. Because the
d e v e l o p m e n t so closely p a r a l l e l s that for back dominance it will not
be given. Suffice it to say that the set of forward dominators,
FD(bk) , of node b k can be e x p r e s s e d by

Allen. CONTROL FLOW ANALYSIS


- 5 -
SIGPLAN Notices 1970 July

FD(b k) = (bl,b2,... bj) where b. = x b I is the i m m e d i a t e


' j 0' forward d o m i n a t o r of b k
and for all i, i < i < j, b. is the i m m e d i a t e forward d o m i n a t o r of
-- 1
bi_ I
An a r t i c u l a t i o n node in a graph is a node w h i c h lies on every
e n t r y - e x i t path. Thus for any graph with a single entry point, e0,
the forward d o m i n a t o r s of e 0 are, t o g e t h e r with e0, the a r t i c u l a t i o n
nodes of the graph.

INTERVALS
G i v e n a node h, an i n t e r v a l l(h) is the m a x i m a l , single entry
s u b g r a p h for w h i c h h is the entry node and in w h i c h all closed paths
c o n t a i n h. The unique i n t e r v a l node h is called the i n t e r v a l head or
simply the h e a d e r node. An i n t e r v a l can be e x p r e s s e d in terms of the
nodes in it:

l(h) : (bl,b2,...,bn) ; any edge (bi,b j) for b i and bj c l(h)


is i m p l i c i t l y in I(h).
By s e l e c t i n g the p r o p e r set of h e a d e r nodes, a graph may be
p a r t i t i o n e d into a u n i q u e set of intervals. (A p a r t i t i o n of a g r a p h
G is a set of s u b g r a p h s g l , g 2 , . . . , g n such that gi ~ G, ~ gi = G and
for all i ~ j, g . ~ g : ~. Thus a graph p a r t i t i o n covers the
o r i g i n a l graph w ~i~ ~Jset of d i s j o i n t subgraphs.) A p r o c e d u r e for
p a r t i t i o n i n g a graph, G, into a unique set of i n t e r v a l s is now given:
P r o c e d u r e A.
I. E s t a b l i s h a list H for h e a d e r nodes and i n i t i a l i z e it to e 0.
2. For h E H find l(h) as follows.
2.1 Put h in l(h) as the first element of l(h)
2.2 For any b ~ G for w h i c h r~l(b) rd l(h) add b to l(h).
Thus a node is added to an i n t e r v a l if and only if all of its
i m m e d i a t e p r e d e c e s s o r s are already in the interval.
2.3 Repeat 2.2 u n t i l no more nodes can be added to l(h).
3.1 Add to H all nodes in G w h i c h are not a l r e a d y in H and w h i c h
are not in l(h) but w h i c h have i m m e d i a t e p r e d e c e s s o r s in l(h).
T h e r e f o r e a node is added to H the first time any (but not all) of its
i m m e d i a t e p r e d e c e s s o r s are m e m b e r s of an interval.
3.2 Add l(h) to the set of i n t e r v a l s b e i n g developed.
4. Select the next u n p r o c e s s e d node in H and repeat steps 2,3,4.
If there are no more u n p r o c e s s e d nodes in H, the p r o c e d u r e t e r m i n a t e s .
B e f o r e g i v i n g an e x a m p l e and b e f o r e d i s c u s s i n g the p r o p e r t i e s of
the g r a p h p a r t i t i o n c o n s t r u c t e d by the above p r o c e d u r e , a few c o m m e n t s
on the p r o c e d u r e itself may be of interest. In a p r o g r a m w r i t t e n by
the author to i m p l e m e n t this p r o c e d u r e , i n d i c a t o r s were left on each
node as to w h e t h e r or not it Was in H, and if not in H, a count was
kept of the n u m b e r of times it had been l o o k e d at d u r i n g the d e v e l o p -
ment of the current interval. This latter count was kept b e c a u s e ,
once a b l o c k is added to the current interval, only its i m m e d i a t e
s u c c e s s o r s are candidates for a d d i t i o n to the interval. Thus a q u i c k
c o m p a r i s o n of the n u m b e r of actual p r e d e c e s s o r s against the n u m b e r of

Allen. CONTROL FLOW ANALYSIS


- 6 -
SIGPLAN Notices 1970 July

times the node is v i s i t e d as a s u c c e s s o r of i n t e r v a l nodes d e t e r -


m i n e d w h e t h e r or not it could b e c o m e a m e m b e r of the current
interval. U s i n g such t e c h n i q u e s an edge in the graph w i l l n e v e r
be t r a v e r s e d more t h a n once. Thus the e x e c u t i o n time for the
p r o c e d u r e is d i r e c t l y p r o p o r t i o n a l to the n u m b e r of edges in the
graph.
The f o l l o w i n g e x a m p l e i l l u s t r a t e s the p a r t i t i o n i n g of a graph
into i n t e r v a l s :
Graph Interval s

e0
........ i(2) : 2

/ I(3) = 3,4,5,6
I(7) = 7,8
(the n a m i n g of the nodes is,
as usual, arbitrary)

exit
Figure 2
It will now be shown that the p r o c e d u r e g i v e n does i n d e e d
p r o d u c e a set of i n t e r v a l s each of w h i c h s a t i s f i e s the d e f i n i t i o n for
an interval. It w i l l later be shown that they c o l l e c t i v e l y p r o v i d e a
u n i q u e p a r t i t i o n of the graph. Thus we n e e d to show that any l(h) is
m a x i m a l , is single entry, and that all closed paths in l(h) c o n t a i n h.
A s s e r t i o n I. l(h) has only one p o s s i b l e entry node, h. S u p p o s e
there was a n o t h e r node b ~ l(h), b ~ h, w h i c h was also an entry node.
T h e n b must have at least one i m m e d i a t e p r e d e c e s s o r w h i c h is not in
l(h). But this is i m p o s s i b l e since b b e c a m e a m e m b e r of the i n t e r v a l
only w h e n all of its i m m e d i a t e p r e d e c e s s o r s were a l r e a d y i n t e r v a l
members. Hence there can be only one p o s s i b l e entry node. It should
be f u r t h e r n o t e d that h # e D will have at least one p r e d e c e s s o r
o u t s i d e the i n t e r v a l since oy step 3 in the p r o c e d u r e it b e c a m e a
h e a d e r node b e c a u s e it had a p r e d e c e s s o r in an i n t e r v a l to w h i c h it
did not belong.
A s s e r t i o n 2. All closed paths in l(h) c o n t a i n h. S u p p o s e there
is a closed p a t h P = ( b l , b 2 , . . . , b n , b I) w h i c h does not c o n t a i n h.
By the notationa,l d e f i n i t i o n e s t a b l i s h e d for paths bi_ I is an
i m m e d i a t e p r e d e c e s s o r of b i. Hence b i cannot b e c o m e a m e m b e r of l(h)
u n t i l bi_ I is a member. Also b I cannot b e c o m e a m e m b e r until b n does,
and b n cannot b e c o m e a m e m b e r u~til bn_ I does, etc. T h e r e f o r e all
closed paths in l(h) must c o n t a i n h.
A s s e r t i o n 3. l(h) is maximal. This f o l l o w s from step 2.3: nodes
are added to l(h) until no more can be.

Allen. CONTROL FLOW ANALYSIS


7-
SIGPLAN Notices 1970 July

Some p r o p e r t i e s of intervals w h i c h result from the c o n s t r u c t i o n


in p r o c e d u r e A are now given.
A s s e r t i o n 4. The h e a d e r node of an i n t e r v a l back d o m i n a t e s every
node in the interval. Since by A s s e r t i o n i, the only p o s s i b l e entry
to an i n t e r v a l is t h r o u g h the h e a d e r node, the h e a d e r node must lie in
every path from e 0 to any b l o c k in the interval.
A somewhat r e s t r i c t e d s u c c e s s o r f u n c t i o n L~ is now d e f i n e d for
the i n t e r v a l l(h): a local s u c c e s s o r function, L~(b i) is d e f i n e d for
l(h) such that for b i c l(h) L}(b i) is the set of all i m m e d i a t e
s u c c e s s o r s of b i w h i c h are in the i n t e r v a l but are not the h e a d e r
node. In other words

L~(b i) = {bj I bj ~ r~(b i) and bj # h} o

I
The local p r e d e c e s s o r f u n c t i o n is the inverse of L I i.e. is L~ I in
w h i c h L~i(h) = ~.
U s l~n g the local s u c c e s s o r f u n c t i o n a s p e c i a l type of i n t e r v a l
path can be defined: a f o r w a r d path is a p a t h F = (bl,b2,...,b n)
w h e r e bi+ 1 c L$<b~) It can be note--d that all nodes on all f o r w a r d
paths from h t~ afiy node in I(h) are also in I(h).
A s s e r t i o n 5. The nodes in an i n t e r v a l are p a r t i a l l y o r d e r e d by
the local s u c c e s s o r function. G i v e n an i n t e r v a l
I(h) = (bl(=h),b2,b ~ ...,b~) if i < j then e i t h e r b i is a p r e d e c e s s o r
of bj on some f o r w a r d pathS'or b i and bj do not co-exist on any f o r w a r d
path. This follows from the fact t h a t , w i t h the e x c e p t i o n of h, all
i m m e d i a t e p r e d e c e s s o r s of a node must be i n t e r v a l m e m b e r s b e f o r e the
node can b e c o m e a member.
A s s e r t i o n 6. The r e l a t i v e o r d e r i n g of the nodes in a back
d o m i n a t o r list and the nodes in an i n t e r v a l must be the same If b~
is a back d o m i n a t o r of b~ and both are in an i n t e r v a l l(h), clearly
b i must p r e c e d e bj in th$ i n t e r v a l list b e c a u s e it is i m p o s s i b l e to
reach bj w i t h o u t h a v i n g first r e a c h e d b i.
A s s e r t i o n 7. For any i n t e r v a l m e m b e r b k # h w i t h b a c k d o m i n a t o r
list BD(b k) = (bl(=e0),b2,...,bj). then for b~ = h, b i c BD(b k) and
all blocks b~ f o l l o w i n g b i on the b a c k d o m i n a { o r list (i < ~ _< j),
b~ is a m e m b e r of the interval. A back d o m i n a t o r b~ must be on all
paths from e 0 to b k. Since it follows h on the b a c k d o m i n a t o r list
it must be on all paths and, hence all f o r w a r d paths, from h to b k.
T h e r e f o r e it must be an i n t e r v a l member.
A s s e r t i o n 8. Any s t r o n g l y c o n n e c t e d r e g i o n in an
i n t e r v a l must contain the i n t e r v a l head This follows i m m e d i a t e l y
from the fact that all closed paths in I(h) must c o n t a i n h. An
i n t e r v a l cannot, t h e r e f o r e , c o n t a i n d i s j o i n t s t r o n g l y c o n n e c t e d
regions
A s s e r t i o n 9. If an i n t e r v a l contains a s t r o n g l y c o n n e c t e d
r e g i o n then there exists a path from every node in the r e g i o n to
every node in the interval. Since the h e a d e r node b o t h b a c k
d o m i n a t e s every node in the i n t e r v a l and is in the s t r o n g l y c o n n e c t e d
region, and since there is a path from any node in a s t r o n g l y
c o n n e c t e d r e g i o n to any other node, there must be a p a t h from every
node in the r e g i o n to every node in the interval. Unless the entire
i n t e r v a l is s t r o n g l y connected, it will not be the case that there
exists a path from every node in the i n t e r v a l to every node in the
region.
Allen. CONTROL FLOW ANALYSIS
_
SIGPLAN Notices 1970 July

For e x a m p l e in i n t e r v a l 1(3) in F i g u r e 2, there are paths from


node 4 to every node in the interval.
As a c o n s e q u e n c e of 9, it should be n o t e d that there can be a
path from bj to b i w h e n b i p r e c e d e s bj on the i n t e r v a l list. If there
is such a p~th then bj must be in t h e - s t r o n g l y c o a n e c t e d region. It
is still true h o w e v e r that there does not exist a f o r w a r d p a t h in
w h i c h bj is a p r e d e c e s s o r of b i.
C o n s i d e r the f o l l o w i n g i n t e r v a l w i t h h e a d e r node i and exit
node 6.
A s s u m e the order in w h i c h nodes
i.j ~" have b e c o m e i n t e r v a l m e m b e r s r e s u l t s

/<
in I(i) = (1,2,5,3,4,6). C l e a r l y there
are paths from 4 to all of its
p r e d e c e s s o r s in the i n t e r v a l list.
B e f o r e m a k i n g the next a s s e r t i o n ,
the m e a n i n g of " i n t e r v a l exit node"
needs to be more c a r e f u l l y defined:
an i n t e r v a l exit node is any node in an
interval, l(h), w h i c h e i t h e r has no
s u c c e s s o r s (i.e. is a t e r m i n a l node for
the entire graph) or has at least one
i m m e d i a t e s u c c e s s o r w h i c h is not in
Fisure 3 I(h).
A s s e r t i o n i0. The i n t e r v a l h e a d e r
Assume is an a r t i c u l a t i o n node for the inter-
i(1) = ( 1 , 2 , 5 , 3 , 4 , 6 ) val. Since the h e a d e r node is the only
entry to the i n t e r v a l it must be on
every e n t r y - e x i t path. An i n t e r v a l h e a d e r is not n e c e s s a r i l y an
a r t i c u l a t i o n node for the total graph.
A s s e r t i o n ii. All f o r w a r d d o m i n a t o r s of the i n t e r v a l h e a d e r
w h i c h are also i n t e r v a l m e m b e r s are, along w i t h the header, the
a r t i c u l a t i o n nodes for the interval. This a s s e r t i o n can be shown by
e x a c t l y the same r e a s o n i n g that led us to assert that the f o r w a r d
d o m i n a t o r s of e 0 are the a r t i c u l a t i o n nodes, and t o g e t h e r w i t h e0, the
only a r t i c u l a t i o n nodes of the total graph.
The a r t i c u l a t i o n nodes of the i n t e r v a l in Figure 3 are i, 4 and 6
since they are on every e n t r y - e x i t path.
In c e r t a i n a p p l i c a t i o n s a special g r a p h c o n s t r u c t called a
" t w o - t e r m i n a l s u b g r a p h " may be of interest. D e f i n e d in terms of
i n t e r v a l s , a t w o - t e r m i n a l s u b g r a p h is an i n t e r v a l w i t h one exit node.
Since an i n t e r v a l can have only one entry node the m o t i v a t i o n for the
term should be apparent. The i n t e r v a l in F i g u r e 3 is an e x a m p l e of a
t w o - t e r m i n a l subgraph.
P r o c e d u r e s w i l l now be g i v e n for f i n d i n g the s t r o n g l y c o n n e c t e d
r e g i o n in an i n t e r v a l , the a r t i c u l a t i o n nodes of the interval, and,
for each node in the i n t e r v a l the list of i n t e r v a l nodes w h i c h b a c k
d o m i n a t e it. These p r o c e d u r e s can be e m b e d d e d in p r o c e d u r e A t h e r e b y
g e n e r a t i n g not only the i n t e r v a l s but their i n t e r v a l r e l a t i o n s h i p s in
"one pass" t h r o u g h the edges in the graph.
Up to this point in the p a p e r it has b e e n c o m p l e t e l y s a t i s f a c t o r y
to r e p r e s e n t m e m b e r s of a set in terms of a list of the e l e m e n t s in it.
For e x a m p l e l(h) = (bl,b2,...,bn) w h e r e b i r e p r e s e n t s the name of the

Allen. CONTROL FLOW ANALYSIS


- 9 -
SIGPLAN Notices 1970 July

node in p o s i t i o n i in the interval. A l t h o u g h we will c o n t i n u e to


use the list form of r e p r e s e n t a t i o n in the p r o c e d u r e s to be
d e s c r i b e d , another form could be i n t r o d u c e d w h i c h more d i r e c t l y
suggests the r e l a t i o n s h i p s i n v o l v e d as well as a p o s s i b l e i m p l e m e n -
t a t i o n approach. A bit v e c t o r n o t a t i o n could be used in which, for
a given interval, l(h) = (bl,b2,...,bn) , bit p o s i t i o n i r e p r e s e n t s
node b i. By r e m e m b e r i n g the c o r r e s p o n d e n c e b e t w e e n bit p o s i t i o n s
and node names no i n f o r m a t i o n is lost. Boolean operations rather
than the set o p e r a t i o n s shown could then be used. Also the r e l a t i v e
order of the nodes in the i n t e r v a l is a u t o m a t i c a l l y kept by the bit
v e c t o r positions. Since it w o u l d c o m p l i c a t e the e x p o s i t i o n , the bit
v e c t o r form will not be used in d e s c r i b i n g the p r o c e d u r e s .
The next p r o c e d u r e g e n e r a t e s a back d o m i n a t o r list, BD(bi) , for
each node b i in the interval. Each back d o m i n a t o r list as g e n e r a t e d
is unordered. Since, however, the r e l a t i v e o r d e r i n g of nodes in the
i n t e r v a l can be used (by A s s e r t i o n 6) to order the nodes in the b a c k
d o m i n a t o r list, the correct o r d e r i n g can be d e t e r m i n e d . By u s i n g the
bit v e c t o r r e p r e s e n t a t i o n , the o r d e r i n g is kept a u t o m a t i c a l l y . If, in
that r e p r e s e n t a t i o n , a bit is one in the back d o m i n a t o r v e c t o r if and
only if the b l o c k r e p r e s e n t e d by that p o s i t i o n is a back d o m i n a t o r
then the right most one bit in the v e c t o r r e p r e s e n t s the i m m e d i a t e
back dominator.
P r o c e d u r e B.
This p r o c e d u r e finds the back d o m i n a t o r s of each node in an
interval.
i. A s s i g n the i n t e r v a l head a back d o m i n a t o r list of zero.
2. For the next node, bj, in the i n t e r v a l list (or for the
one just ~ d d e d if this p r o c e d u r e is embedded) form
BD(bj) = [ ~ ( b i ~ BD(bi) ) for all nodes, bi, w h i c h are i m m e d i a t e
1
p r e d e c e s s o r s of bj.
3. Repeat 2 until all nodes in the interval have been
processed.
As an example c o n s i d e r the i n t e r v a l of Figure 3 for w h i c h
l(h) = (1,2,5,3,4,6). The p r o c e d u r e g e n e r a t e s the f o l l o w i n g b a c k
d o m i n a t o r list for each node by the o p e r a t i o n s shown.

Nodes Immediate BD List for


(in order) Predecessors Operation for Each Node
i - (Assignment) 0
2 I I '..~'0 i
5 2 2 u I 1,2
3 i I L~O i
4 2,3 (2 ;' I) r'~ (3 ~.: i) i
6 4 4 ~./ i 1,4
Example i
In the next p r o c e d u r e , C, the i n t e r v a l a r t i c u l a t i o n nodes are
found by u s i n g the b a c k d o m i n a t o r s of i n t e r v a l exits. The result of

Allen. CONTROL FLOW ANALYSIS


- i0 -
SIGPLAN Notices 1970 July

procedure C is a list, A, of a r t i c u l a t i o n nodes for the interval.


P r o c e d u r e C.
The i n t e r v a l a r t i c u l a t i o n nodes are found by this one step
procedure.
i. A = I O (b i ~ BD(bi)) for all b i w h i c h are interval exits.
I
C o n s i d e r the i n t e r v a l of Figure 3 and the back d o m i n a t o r lists
given in E x a m p l e I. Since node 6 is the only i n t e r v a l exit,
A = 6 ~ ( 1 , 2 ) = 1,2,6 If node 5 were also an exit then
A = [5 ~ (1,2)] ~ [6 U (1,2)] and the a r t i c u l a t i o n nodes w o u l d be
i and 2.
The next procedure, p r o c e d u r e D, finds all of the local
predecessors Of a node.
P r o c e d u r e D.
The local p r e d e c e s s o r s , LP(bi) , for each node, hi, in an i n t e r v a l
are found by:
I. A s s i g n the i n t e r v a l head a local predecessor list of zero:
m P ( b l ~ = 0.
For the next node, b., in the i n t e r v a l list
LP(bj) = ~=)(b V LP(b )) for a~l nodes b i w h i c h are i m m e d i a t e
i i i
p r e d e c e s s o r s of b..
3. Repeat step 2 until all nodes in the i n t e r v a l have been
processed
C o n s i d e r i n g again the example in Figure 3 the following LP lists
are g e n e r a t e d by p r o c e d u r e D.
Nodes Imm. Pred. Operation LP Lists
i - (As signment ) 0
2 I I ,~ 0 i
5 2 2 ...~I
.. 1,2
3 i I '.."0 I
4 2,3 (2 U i) ~ (3 ~-' l) 1,2,3
6 4 4 <' (1,2,3) 1,2,3,4
Example 2
The next p r o c e d u r e , E, uses the results of p r o c e d u r e D for the
interval "latching" nodes to find the s t r o n g l y c o n n e c t e d r e g i o n in
the interval. A l a t c h i n g node is any node in the i n t e r v a l w h i c h has
the h e a d e r node as an i m m e d i a t e successor. An e q u i v a l e n t d e f i n i t i o n
for a l a t c h i n g node is that it is any node in the i n t e r v a l w h i c h is
an immediate p r e d e c e s s o r of the i n t e r v a l head. In Figure 3 nodes 4
and 5 are latching nodes. It should be n o t e d that the interval head
itself can be a l a t c h i n g node. From p r e v i o u s assertions it follows that
if the i n t e r v a l does not contain any l a t c h i n g nodes then the
interval does not contain a strongly c o n n e c t e d region. The f o l l o w i n g
p r o c e d u r e then w o u l d be invoked only if the i n t e r v a l had at least
one l a t c h i n g node.

Allen. CONTROL FLOW ANALYSIS


- ii -
SIGPLAN Notices 1970 July

P r o c e d u r e E.
The s t r o n g l y c o n n e c t e d region, SCR, of an i n t e r v a l can be found
by this one step procedure.
I. SCR =k-JCb i _ u LP(bi)) for all b. w h i c h are i n t e r v a l l a t c h i n g
nodes, i 1
U s i n g the results of E x a m p l e 2, we get SCR=[4~(I,2,3)]u[5<~(I,2)].
T h e r e f o r e the s t r o n g l y c o n n e c t e d r e g i o n of the i n t e r v a l in F i g u r e 3 is
comprise4 of the nodes (1,2,3,4,5).
A n o t h e r p r o c e d u r e , E', for f i n d i n g the s t r o n g l y c o n n e c t e d r e g i o n
in an interval is to start from the l a t c h i n g nodes and i t e r a t i v e l y
mark all i m m e d i a t e p r e d e c e s s o r s until the h e a d e r node is r e a c h e d and
marked. W h e n e v e r a m a r k e d p r e d e c e s s o r is found in this p r o c e d u r e it
is not n e c e s s a r y to continue the m a r k i n g of its i m m e d i a t e p r e d e c e s s o r s
since they will already have been marked. This p r o c e d u r e has the
a d v a n t a g e of not r e q u i r i n g that the LP lists be set up and is p r o b a b l y
p r e f e r a b l e if the only use of LP lists is to find the s t r o n g l y
c o n n e c t e d region.
A formal d e s c r i p t i o n of p r o c e d u r e E' is not given; the above
i n f o r m a l d e s c r i p t i o n should a d e q u a t e l y suggest such a d e s c r i p t i o n .

PARTITIONING GRAPHS BY I N T E R V A L S
H a v i n g c o n s i d e r e d the p r o p e r t i e s of any g i v e n interval, it will
now be shown that the set of intervals ~ = {l(h ), l(h ), l(h ) }
I 2 "
g e n e r a t e d by p r o c e d u r e A forms, as asserted, a unique p a r t i t i o ~ of
the graph G. R e c a l l i n g the d e f i n i t i o n of a p a r t i t i o n , we t h e r e f o r e
need to show that ~ covers G and that for any two i n t e r v a l s l(h i)
and l(hj) in ~ , l(hi) ~ l(hj) = ~. F u r t h e r m o r e we want to show
that ~ # is unique.
A s s e r t i o n 12. ~ # covers G. S u p p o s e there is a b e G w h i c h is not
in any l(h) e ~ . Since G is a c o n n e c t e d graph, b must e i t h e r be e 0
or have at least one p r e d e c e s s o r . But if b = e 0 it is an element
of en, the first i n t e r v a l constructed.
l(l~)b ~ e0, then, since it must have at least one p r e d e c e s s o r , by
step 2.2 it must b e c o m e a m e m b e r of the i n t e r v a l c o n t a i n i n g the
p r e d e c e s s o r or must, by steps 3 and 4, b e c o m e an i n t e r v a l head. (In
order to e s t a b l i s h that the p r e d e c e s s o r s must be m e m b e r s of i n t e r v a l s
we can r e c u r s i v e l y apply the above r e a s o n i n g u n t i l b = e0). Hence
covers G.
A s s e r t i o n 13. The e l e m e n t s of ~) are d i s j o i n t , that is for any
l(h) and l(h'), e l e m e n t s of ~:~, l(h) ~ l(h') = ~.
A s s e r t i o n 13a. The h e a d e r nodes must be d i s t i n c t , that is for
any h and h', h # h' By step 3 in P r o c e d u r e A a given node can
appear at most once in H and by step 4 a node in H can be p r o c e s s e d
(used to head an interval) only once.
A s s e r t i o n 13b. T h e ' h e a d e r node of one i n t e r v a l cannot be an
element of a n o t h e r interval. Suppose i n t e r v a l head, h, is an element
of i n t e r v a l l(h'). By 13a, h ~ h' T h e r e f o r e all i m m e d i a t e p r e d e -
cessors of h must, by step 2.2, also be in l(h'). But for h to have
become an i n t e r v a l head some but not all of its i m m e d i a t e p r e d e c e s s o r s
must have b e e n m e m b e r s of an interval, say l(h"). We will now show
that h" must also be an element of l(h'). C o n s i d e r an i m m e d i a t e

Allen. CONTROL FLOW ANALYSIS


- 12 -
SIGPLAN Notices 1970 July

p r e d e c e s s o r b of h w h i c h is in both intervals l(h') and l(h"), b is


back d o m i n a t e d by both h' and h" and since the back d o m i n a t o r s of a
b l o c k are s t r i c t l y o r d e r e d either h' back dominates h" or vice versa.
But since h has p r e d e c e s s o r s w h i c h are not in h", h" cannot back
d o m i n a t e h. We can t h e r e f o r e conclude that h' back dominates h".
By a s s e r t i o n s 6 and 7 h" must be an element of l(h'). Proceeding
i n d u c t i v e l y h must e v e n t u a l l y equal h' w h i c h by A s s e r t i o n 13a is
impossible. Hence it is not p o s s i b l e for a h e a d e r node to be an
element of an interval.
A s s e r t i o n 13c. The i n t e r s e c t i o n of any two intervals is null.
Suppose there is a b e l(h) ~ l(h'). By a s s e r t i o n s 13a and 13b w e
know that b is not a h e a d e r node of any i n t e r v a l i n c l u d i n g l(h) and
l(h'). But for b to be in the i n t e r s e c t i o n it must be a m e m b e r of
each interval. Hence all of the i m m e d i a t e p r e d e c e s s o r s of b must be
in both intervals and, as a c o n s e q u e n c e , they must also be in the
i n t e r s e c t i o n of the two intervals. P r o c e e d i n g i n d u c t i v e l y we must
e v e n t u a l l y find an interval header in the i n t e r s e c t i o n w h i c h is
i m p o s s i b l e by 13a and 13b. The elements of ~ must t h e r e f o r e be
disjoint.
A s s e r t i o n 14. ~ = {l(h),l(h') ...} is unique. H a v i n g shown
that each l(h) is m a x i m a l and that elements of <p are disjoint, it
is s u f f i c i e n t to show that the set of h e a d e r nodes H is unique.
Clearly the first h e a d e r node, e0, is always in the set of header
nodes. Since l(e0) is m a x i m a l the set of nodes w h i c h become members
of H after the c o n s t r u c t i o n of l(e 0) is unique. Pick any h s H and
construct l(h). A g a i n b e c a u s e i n t e r v a l s are m a x i m a l , the set of nodes
in the graph w h i c h are i m m e d i a t e successors of nodes in l(h) but are
not t h e m s e l v e s in l(h) is unique. (Some of these i m m e d i a t e succes-
sors may h o w e v e r already be in H b e c a u s e they have i m m e d i a t e
p r e d e c e s s o r s in intervals w h i c h were c o n s t r u c t e d before l(h) and
indeed they may already have been used to construct intervals.) Since
we are able to pick any node in H and, after interval c o n s t r u c t i o n ,
find a unique set of h e a d e r nodes, the order of p r o c e s s i n g H does not
affect the h e a d e r nodes found. It follows t h e r e f o r e that, after e0,
h e a d e r nodes can be added to H in any order. By i n d u c t i o n we claim
that the h e a d e r nodes are unique and t h e r e f o r e ~ is unique.
H a v i n g d e s c r i b e d the r e l a t i o n s h i p s of the total set of intervals
to the total graph and, p r i o r to that, h a v i n g shown some of the
inter-relationships of nodes in a given interval, we now want to
enlarge the scope of an interval so that the i n t e r r e l a t i o n s h i p s in
larger sets of nodes can be derived.
The intervals d e s c r i b e d thus far have been formed from the
e l e m e n t a l nodes of the graph (the basic blocks of the control flow
graph). For reasons w h i c h will be apparent shortly, we d e s i g n a t e
these intervals as the basic or first order intervals and the graph
from w h i c h they were d e r i v e d as the basic or first order graph. Since
we will be d e r i v i n g h i g h e r order graphs and intervals we will use
s u p e r s c r i p t s to d e s i g n a t e the order, e.g. ll(h) ~ ~ i .
A second order graph is d e r i v e d from the first order graph and
intervals by m a k i n g each first order i n t e r v a l into a node. The
i m m e d i a t e p r e d e c e s s o r s of such a node in the second order graph are
all the immediate p r e d e c e s s o r s of the o r i g i n a l h e a d e r node w h i c h were

Allen. CONTROL FLOW ANALYSIS

- 13 -
SIGPLAN Notices 1970 July

not members of the interval; the i m m e d i a t e successors of such a


node are all of the immediate, n o n - i n t e r v a l successors of the
o r i g i n a l exit nodes.
Second order intervals are the intervals in the second order
graph. W i t h respect to the second order graph, they have all of
the p r o p e r t i e s d e r i v e d for first order intervals. Since the nodes
of the second order i n t e r v a l s are first order intervals we have by
our p r o c e d u r e d e r i v e d some i n t e r - i n t e r v a l r e l a t i o n s h i p s .
S u c c e s s i v e l y h i g h e r order graphs can be d e r i v e d until the n-th
order graph either consists of a single node or is "irreducible".
This latter case will be d e s c r i b e d after we give an example of a
graph w h i c h "reduces" to a single node. In the example only m u l t i -
node intervals are renamed in the derived graph.
i G2 G3 G4
G

~..~2 ~3 ~4

Ii(1) -- 1 I2(i) = I I3(1)--l,ll I4(12) -- 12


i(2) -- 2 I2(2)--2,9,10
I(3)=3,4,5,6
1(7)=7,8
Figure 4
A r e d u c i b l e graph is a graph whose n-th order d e r i v e d g r a p h
is a single node. An i r r e d u c i b l e graph is a g r a p h for w h i c h there
does not exist an n-th order d e r i v e d g r a p h c o n s i s t i n g of a single
node. This will h a p p e n w h e n e v e r every i n t e r v a l in a graph is
c o m p o s e d of only one node and contains no i n t e r n a l flow paths.

Allen. CONTROL FLOW ANALYSIS


- 14 -
SIGPLAN Notices 1970 July

Examples of irreducible graphs are


A method for "splitting" an irreduc-
) Q ible graph is given in reference [I0]. By
this method an equivalent, reducible graph
~\ ~I is produced. It may be of interest that a
S ~ ~~ - and ~ !/i~ program written by the author to analyze
the control flow of FORTRAN programs found
that over 90% of the control flow graphs
k f were reducible. (The data consisted of
..... 75 "real" programs.)
Assuming graph G I is eventually
reduced to a single node several interest-
..... ing observations can be ~ade.
i. Every node in G ~ is in one and
only one interval ll(h) in ~ I which is in turn a node in G 2 and
hence in one and only one interval 12(h) in ~ ,~2, etc.
2. Therefore for a given basic block, a unique, strictly ordered
set of membership in successively higher order intervals exists:
b i c ll(h I) ~ 12(h2) ~ ... c In(hn).
3. Because the nodes in an interval are partially ordered by the
local successor function the nodes in the entire graph are ordered
by this function. This may be depicted by:

I
1

b 2 Recalling the the nodes in a graph
I 1 I represent the basic blocks of a program
n-i g
i and therefore contain instructions, some
i n 12 uses of the interval construct in global
. analysis will now be sketched. The
in_l J ~ h primary purpose of the skeletal procedure
! given to to show how some of the interval
m ~
L ~ bn relationships may be applied to any one
of many types of analyses. The analysis
might typically involve looking for redundant instructions, determin-
ing variable definition and use relationships, etc.
Procedure F (Skeletal).
The use of intervals in global analysis is sketched by this
procedure:
I. Process each basic block in the program, collecting informa-
tion of global interest at the entry and exit. Set the order number,
k, to I.
2. For each k-order interval:
2.1 Proceed through the blocks in the interval in their interval
order. For each block the information previously collected (either
by step I or by the last iteration of step 2) is first modified to
reflect the effects of interval predecessors then promulgated to
interval successors.
2.2 After the completion of step 2.1, the information collected
at the exits of latching nodes (if there are any) must be promulgated.
This may require redoing step 2.1 after information on entry to the
interval head has been modified by the information on the latching
nodes.
Allen. CONTROL FLOW ANALYSIS
15-
SIGPLAN Notices 1970 July

As a result of step 2 i n f o r m a t i o n of global i n t e r e s t is left at


the i n t e r v a l head and at the exits.
3. The i n f o r m a t i o n c o l l e c t e d by p r o c e s s i n g an i n t e r v a l is
a s s o c i a t e d with the node w h i c h r e p r e s e n t s it in the next h i g h e r
order graph. The order number, k, is i n c r e a s e d by I and steps 2
and 3 r e p e a t e d until the n-th order graph is reached. (We are
a s s u m i n g it consists of a single node and no edges so does not
need p r o c e s s i n g . )
4. Set k to n-2 to i n i t i a l i z e for steps 5 and 6 w h i c h will
p r o p a g a t e the i n f o r m a t i o n d e p o s i t e d with each node in each of the
graphs b a c k to the basic blocks.
5. A s s o c i a t e with the head of each k - o r d e r i n t e r v a l the
i n f o r m a t i o n left at the node w h i c h c o r r e s p o n d s to it in the k+l
order graph.
6. For e a c h k - o r d e r i n t e r v a l p r o c e e d t h r o u g h the blocks in
i n t e r v a l order p r o m u l g a t i n g the i n f o r m a t i o n from the head of the
i n t e r v a l to each node in the interval.
7. Reduce k by one and repeat steps 5 and 6 until the first
order i n t e r v a l s have b e e n processed.
8. W h a t e v e r g l o b a l i n f o r m a t i o n has been carried t h r o u g h steps
i t h r o u g h 7 is now a v a i l a b l e at each block.
C o n s i d e r the graph in F i g u r e 4, r e d r a w n here with some variable
d e f i n i t i o n s a s s o c i a t e d w i t h some nodes.

,,, ///-~B 3 ...,.\


.

I " B /
, 4B3

, ()

rN,
Figure 5 Figure 6
Suppose the analysis is to d e t e r m i n e w h i c h basic b l o c k s each
d e f i n i t i o n can "reach". A s s u m i n g that step I of the p r o c e d u r e has
c o d i f i e d i n f o r m a t i o n about the v a r i a b l e s d e f i n e d in each block,
we will now very s k e t c h i l y show how this i n f o r m a t i o n is p r o p a g a t e d
by P r o c e d u r e F. The first order i n t e r v a l s are p r o c e s s e d . The
i n f o r m a t i o n a s s o c i a t e d w i t h blocks i and 2 is not c h a n g e d but by

Allen. CONTROL FLOW ANALYSIS


- 16 -
SIGPLAN Notices 1970 July

processing 1(3) we get, by step 2.1 followed by step 2.2, the


following information.
a. the definition of B and C in blocks 3 and 4 can reach any
block in the interval. This information can be left encoded with
each block.
b. the definitions can also reach the interval exit and can
therefore affect uses outside the interval. This is, therefore,
encoded at the exit.
Interval 1(7) is processed next and the fact that the definition
of A in 7 can affect uses outside the interval is recorded.
Figure 7 shows the information left with each node. (Subscripts have
been added to the variables to identify which node it was originally
in. Step 3 of the procedure now yields the graph in Figure 7.
~..j Ci=

"~!iBI= A7,B3,C 4

/" B3= AT,B3,C 4

A7,B3,C 4
A7=

Figure 7 Figure 8

Repeating step 2 for the second order intervals we find that all
definitions in the interval can reach every node in the interval.
This information is left encoded with each node as depicted in
Figure 8. Step 3 now yields Figure 9.

i!i' Ci= ~/

Bi= Bi,C I
B3=
C4=
AT=
Figure 9 Figure i0
Processing the third order interval 13(1) = (i,ii) yields the fact
that B I and C I can reach node II. Figure I0 shows this.
We now apply steps 4, 5, 6 and 7 of the procedure. Starting
with the second order graph we associate the information left with
nodes I and Ii with intervals I~(I) and I~(2). Thus the fact that
B I and C I reach node ii means that they reach the interval 12(2).
This is shown in Figure ii.

Allen. CONTROL FLOW ANALYSIS

- 17 -
SIGPLAN Notices 1970 July

{i

B l,cl A7,B3,C4,B1,Ci
(2; Y2

Z !~A 7 ,B 3 ,C 4 ,B I ,C 1
I t

Figure ii Figure 12
Step 6 propagates, through the graph of Figure 8, the information that
BI and C I can reach the interval head to the other nodes in the inter-
val. (In our attempt to merely sketch the application of Procedure F
to a global analysis we have omitted some information which is
obviously vital at this point: whether or not a definition reaching
a node entry will be able to reach its exit(s). This information can
be trivially collected during the analysis. Had it been done we would
know that B I could not reach the exit of node 9.) The propagated
information is associated with each node in the interval as shown in
Figure 12. The first order graph is now processed. Step 5 associates
the node information of the second order graph with their corresponding
interval heads. Figure 13 shows this.
,
d
~I A7,B3,C4,Bi,C 1 A7,B3,C4,Bi,C I
~,,
"~7,B3,C4,Bi ,Ci !d k A7,B3,C4,Bi,C I
{ "" ~k
A7,B3,C4,C I for both
~'...'j L
4 and 5
" /
A7,B3,C4,C I
I Z
1 /
"d A7,B3,C4,B1,C 1 A7,B3,C4,Ci,B I
"-. <7 :7

A7,B3,C4,Ci,B I

Fisure 13 Figure 14
The information is propagated by step 6 through the graph of Fig. 6
and yields the result depicted in Fig. 14. We now know which nodes
can be reached by every definition. Although the details of this
example are beyond the scope of this paper it is worth commenting that
bit vector techniques exist for the example in which a series of
simple boolean operations on vectors codifying multiple definitions
are used to carry the information through the graph.
Allen. CONTROL FLOW ANALYSIS
18-
SIGPLAN Notices 1970 July

SUMMARY
The interval construct described in this paper has many proper-
ties which facilitate global analysis and which are of particular
interest in optimization. The partial ordering relationships
between nodes in an interval provide a natural processing order; the
ability to partition a graph into a hierarchy of intervals each of
which is partially ordered lets us propagate information rapidly
through the graph; the dominance relationships in a graph are easily
discovered; nests of strongly connected regions can be detected.
Although this paper has not shown how all of these constructs can be
found most of them should be apparent. The use of intervals in
optimization has only been hinted at; for a good explanation the
reader is referred to reference [3].

ACKNOWLEDGEMENTS
As stated in the introduction, the interval concept is due to
Dr. John Cocke who is also the major contributor of many of the
other ideas in this paper. Dr. J. T. Schwartz first formalized many
aspects of intervals. The author whishes to particularly thank both
of these people for not only their ideas but also for the continuing
encouragement.

REFERENCES
I. Allen, F. E., "Program Optimization," Annual Review in Automatic
Programmins, Vol. 5, Pergamon, New York, 1969.
2. Berge, C., The Theory of Graphs, Methuen & Co., Ltd., London,
1964.
3. Cocke, John, "Global Common Sub-expression Elimination,"
in these Proceedings.
4. Cocke, J. and Schwartz, J. T., "Programming Languages and their
Compilers," Preliminary Notes, Courant Institute of Mathematical
Sciences, New York Univ., N. Y., April 1970.
5. Prosser, R. T., "Applications of Boolean Matrices to the Analysis
of Flow Diagrams," Proc. Eastern Joint Computer Conf. Dec. 1959,
Spartan Books, N. Y., pp. 133-138.
6. Lowry, Edward S. and Medlock, C. W., "Object Code Optimization,"
CACM, Jan. 1969, pp. 13-22.
7. Earnest, C. P., Balke, K. G. and Anderson, J., "Analysis of
Graphs by Strict Ordering of Nodes" (unpublished).
8. Busam, Vincent A. and Englund, Donald E., "Optimization of
Expressions in Fortran," Comm. ACM., Dec. 1969, pp. 666-674.
9. Mendicino, Sam. F. et al., "The LPLTRAN Compiler," CACM,
Nov. 1969, pp. 747-755.
i0. Cocke, John and Miller, Raymond, "Some Analysis Techniques for
Optimizing Computer Programs," Proc. Second Intl. Conf. of Systems
Sciences, Hawaii, Jan. 1969.

Allen. CONTROL FLOW ANALYSIS

- 19 -

You might also like