Professional Documents
Culture Documents
INTRODUCTION
Any static, global analysis of the e x p r e s s i o n and data r e l a t i o n -
ships in a p r o g r a m requires a k n o w l e d g e of the control flow of the
program. Since one of the p r i m a r y reasons for doing such a global
analysis in a compiler is to produce o p t i m i z e d programs, control flow
analysis has been e m b e d d e d in many compilers and has been d e s c r i b e d
in several papers. An early p a p e r by P r o s s e r [5] d e s c r i b e d the use
of B o o l e a n m a t r i c e s (or, more p a r t i c u l a r l y , c o n n e c t i v i t y matrices) in
flow analysis. The use of "dominance" r e l a t i o n s h i p s in flow analysis
was first i n t r o d u c e d by P r o s s e r and much e x p a n d e d by Lowry and
M e d l o c k [6]. R e f e r e n c e s [6,8,9] d e s c r i b e compilers w h i c h use various
forms of control flow analysis for o p t i m i z a t i o n . Some recent d e v e l o p -
ments in the area are r e p o r t e d in [4] and in [7].
The u n d e r l y i n g m o t i v a t i o n in all the d i f f e r e n t types of control
flow analysis is the need to codify the flow r e l a t i o n s h i p s in the
program. The c o d i f i c a t i o n may be in c o n n e c t i v i t y m a t r i c e s , in
predecessor-successor tables, in d o m i n a n c e lists, etc. W h a t e v e r the
form, the p u r p o s e is to f a c i l i t a t e d e t e r m i n i n g what the flow r e l a t i o n -
ships are; in other words to f a c i l i t a t e a n s w e r i n g such questions as:
is this an inner loop?, if an e x p r e s s i o n is r e m o v e d from the loop
where can it be correctly and p r o f i t a b l y placed?, w h i c h v a r i a b l e
d e f i n i t i o n s can affect this use?
In this paper the basic control flow r e l a t i o n s h i p s are e x p r e s s e d
in a d i r e c t e d graph. Various graph constructs are then found and
shown to codify i n t e r e s t i n g global r e l a t i o n s h i p s .
The first section of the paper, "Basic Concepts," is p r i m a r i l y
a catalog of r e l e v a n t i n f o r m a t i o n about d i r e c t e d graphs; similar
i n f o r m a t i o n can be found in any i n t r o d u c t o r y m a t e r i a l on the subject.
(Reference [2], for example, covers this m a t e r i a l . ) The use of
d i r e c t e d graphs to express control flow r e l a t i o n s h i p s is also given
in the first section.
In the second section of the paper, "Dominance R e l a t i o n s h i p s "
are d e f i n e d in terms of the basic concepts i n t r o d u c e d in the first
section. Most of the concepts in this s e c t i o n have, as p r e v i o u s l y
BASIC CONCEPTS
A directed graph, G, can be denoted by G = (B,E) where B is the
set of nodes (blocks) (bl,b2,...,b n} in the graph and E is the set of
directed edges ((bi,bj) , (bk,b~),...). Each directed edge is repre-
sented by an ordered pair (bi,b j) of nodes (not necessarily distinct)
which indicate that a directed edge goes from node b. to node b..
Thus, there exists a successor function F which maps G into G such
that F~(b i) = {bj I (bi,b j) e E). We call this set the set of
immediate successors of a node. It may be empty. The inverse of
the successor function F~ I gives the immediate predecessors of a node:
FGi(bj) = {b i I (bi,b j) c E}. It too may be empty.
A directed graph is connected if any node in the graph can be
obtained (reached) from any other node by successive applications of
i
F G and/or FG I. We wll assume throughout this paper that the graphs
being discussed are both directed and connected.
Before introducing more graph concepts, the relevance of graphs
to program control flow is introduced.
A basic block is a linear sequence of program instructions having
one entry point (the first instruction executed) and one exit point
(the last instruction executed). It may of course have many predes-
sors and many successors and may even be its own successor. Program
entry blocks might not have predecessors that are in the program;
program terminating blocks never have successors in the program.
A control flow graph is a directed graph in which the nodes repre-
sent basic blocks and the edges represent control flow paths. Every-
thing that is said about directed graphs in this paper holds for
control flow graphs.
A subgraph of a directed graph, G = (B,E), is a directed graph
G' = (B',E') in which B' C B, E' C E, G r~ G' = G' and G ~J G' = G.
Furthermore, the successor function F~, defined for G' must'~tay
within" G'; that is for
!
b B, , i b'i )
FG,( :
{b] l (b , J
E'}
Consider the following directed graph, G, in which the nodes have
been arbitrarily named by numbering.
Allen. C O N T R O L FLOW A N A L Y S I S
-- 2 --
SIGPLAN Notices 1970 July
A path in a d i r e c t e d graph is a
d i r e c t e d subgraph, P, of ordered nodes
and edges o b t a i n e d by s u c c e s s i v e appli-
Fig. I
cations of the s u c c e s s o r function. It is
expressed as a sequence of nodes (bl,b2,...,b n) where bi+ I e r~(bi).
The edges are implied: (b.,b.+~) ~ E. The nodes and the implied edges
i i
are not n e c e s s a r i l y unique. A path in G, the graph in Fig. I, is
(2,3,5,3,5,2,4). It should be o b s e r v e d that the examples show how
some of the d e v e l o p e d n o t a t i o n is to be used: the nodes of the graph
are a r b i t r a r i l y but u n i q u e l y named and b stands for any such name.
A node, q, is said to be a s u c c e s s o r of a node, p, if there
exists some path P = (bl,... ,b n ) for w h i c h b I = p and b n = q. In the
same s i t u a t i o n p is said to be a p r e d e c e s s o r of q. It should be
noted that a node can be both a p r e d e c e s s o r and a s u c c e s s o r of
another node: P1 = (P,''',q) and P2 = (q,''',P)"
A closed path or circuit is a path in w h i c h b n = b I. The circuit
is a simple circuit if, with the e x c e p t i o n of bn, the nodes in the
circuit are distinct; o t h e r w i s e it is a composite circuit. Consider
the graph in Fig. i: it has the f o l l o w i n g simple circuits: (3,5,3),
(5,3,5), (2,3,5,2), (3,5,2,3), (5,2,3,5), (2,4,5,2), (4,5,2,4),
(5,2,4,5), (?,7). One of the composite circuits is (2,3,5,3,5,2).
Since it will g e n e r a l l y be u n i n t e r e s t i n g to consider circuits contain-
ing the same nodes and edges but in a d i f f e r e n t order, we will
g e n e r a l l y select a first node and d e s c r i b e the circuit r e l a t i v e to
that node.
The length of a path is the number of edges in the sequence.
More formally, a d i s t a n c e f u n c t i o n 6 is d e f i n e d such that for any
path P = (bl,b2,...,bn) , ~(P) = n-l. Since the shortest path ~min
b e t w e e n two points p and q is often of interest it will now be
defined: ~min(P,q) = M I N ( ~ ( P I ) , ~ ( P 2 ) . . . ) for all Pi = (P,''',q)" The
shortest path then is the Pi for w h i c h ~(Pi) = 6min(P,q)-
A strongly C o n n e c t e d r e g i o n of a d i r e c t e d graph is a d i r e c t e d
s u b g r a p h in w h i c h there is a path from any node in the s u b g r a p h t o
any other node. It i m m e d i a t e l y follows from this d e f i n i t i o n that
every node lies on at least one closed path, and is, therefore, its
own p r e d e c e s s o r and its own successor. Closed paths (circuits) are,
therefore, a special kind of strongly c o n n e c t e d r e g i o n -- one w h i c h
has a strict ordering. A strongly c o n n e c t e d r e g i o n R of a d i r e c t e d
D = { { R i , R 2 , . . . , R n} , {Ri,R2,...,Rn}, ... }
or, for the sake of b r e v i t y , D = {d,d',...}. Each R n is a m a x i m a l ,
s t r o n g l y c o n n e c t e d r e g i o n w h i c h thereby assures that sets of n e s t e d
s t r o n g l y c o n n e c t e d regions are disjoint. We will now c o n s i d e r some
of the p r o p e r t i e s of the above construct in a d i r e c t e d graph, G:
I. D does not n e c e s s a r i l y cover G. If there are nodes in G
w h i c h are not in any s t r o n g l y c o n n e c t e d r e g i o n then they will not
be in D.
2. Each d e D is p a r t i a l l y ordered.
3. D is unordered.
4. If a node, p, is an element of a s t r o n g l y c o n n e c t e d r e g i o n
it is in one and only one d. For p e d w h e r e d = { R i , R 2 , . . . , R n}
then p e R n and may be an element of several n e s t e d R i.
As an example c o n s i d e r the graph in Figure l:
D = {{(3,5),(2,3,4,5)},{7}}. Since m u c h of the control flow a n a l y s i s
involves k n o w n i ~ g r e l a t i o n s h i p s b e t w e e n nodes in the control flow
graph, the construct, D, codifies several u s e f u l r e l a t i o n s h i p s .
H o w e v e r it has several l i m i t a t i o n s , the more serious of w h i c h are
that it does not e s t a b l i s h an o r d e r i n g on the total graph and that,
by the very nature of a g e n e r a l s t r o n g l y c o n n e c t e d region, there is
no o r d e r i n g r e l a t i o n s h i p on the nodes w i t h i n the r e g i o n o t h e r than
that given by the i m m e d i a t e s u c c e s s o r - p r e d e c e s s o r relationships.
DOMINANCE RELATIONSHIPS
S e v e r a l i n t e r e s t i n g and useful c o n s t r u c t s can be e s t a b l i s h e d
from "back d o m i n a n c e " and " f o r w a r d d o m i n a n c e " r e l a t i o n s h i p s . Before
d e f i n i n g these r e l a t i o n s h i p s two special kinds of nodes must be
defined. A node in a d i r e c t e d graph, G, w h i c h has no s u c c e s s o r s in
G is called~ a t e r m i n a l or exit node. Thus, letting x denote an exit
node, r~(x) = @. This d e f i n i t i o n suffices for control flow graphs
but, slnce a p r o g r a m entry point may also be the first node in a
closed p a t h and thereby have a p r e d e c e s s o r , an analogous d e f i n i t i o n
for entry nodes does not suffice. An entry n o d e , e, is a node in the
p r o g r a m control flow graph, C, if it contains a p r o g r a m entry point.
S e v e r a l of the c o n s t r u c t s about to be d e s c r i b e d d e p e n d u p o n h a v i n g
only one such node in the control flow graph. An a r b i t r a r y i n i t i a l
entry node e 0 is i n t r o d u c e d into the control flow graph as an
i m m e d i a t e p r e d e c e s s o r of all entry nodes:
I
rc(e0) = {e i I e i is an entry node} and Fcl(e0 ) = @.
- 4 -
SIGPLAN Notices 1970 July
BD(b k) = ( b l , b 2 , b 3 , . . . , b j ) where b I = e0 ,
INTERVALS
G i v e n a node h, an i n t e r v a l l(h) is the m a x i m a l , single entry
s u b g r a p h for w h i c h h is the entry node and in w h i c h all closed paths
c o n t a i n h. The unique i n t e r v a l node h is called the i n t e r v a l head or
simply the h e a d e r node. An i n t e r v a l can be e x p r e s s e d in terms of the
nodes in it:
e0
........ i(2) : 2
/ I(3) = 3,4,5,6
I(7) = 7,8
(the n a m i n g of the nodes is,
as usual, arbitrary)
exit
Figure 2
It will now be shown that the p r o c e d u r e g i v e n does i n d e e d
p r o d u c e a set of i n t e r v a l s each of w h i c h s a t i s f i e s the d e f i n i t i o n for
an interval. It w i l l later be shown that they c o l l e c t i v e l y p r o v i d e a
u n i q u e p a r t i t i o n of the graph. Thus we n e e d to show that any l(h) is
m a x i m a l , is single entry, and that all closed paths in l(h) c o n t a i n h.
A s s e r t i o n I. l(h) has only one p o s s i b l e entry node, h. S u p p o s e
there was a n o t h e r node b ~ l(h), b ~ h, w h i c h was also an entry node.
T h e n b must have at least one i m m e d i a t e p r e d e c e s s o r w h i c h is not in
l(h). But this is i m p o s s i b l e since b b e c a m e a m e m b e r of the i n t e r v a l
only w h e n all of its i m m e d i a t e p r e d e c e s s o r s were a l r e a d y i n t e r v a l
members. Hence there can be only one p o s s i b l e entry node. It should
be f u r t h e r n o t e d that h # e D will have at least one p r e d e c e s s o r
o u t s i d e the i n t e r v a l since oy step 3 in the p r o c e d u r e it b e c a m e a
h e a d e r node b e c a u s e it had a p r e d e c e s s o r in an i n t e r v a l to w h i c h it
did not belong.
A s s e r t i o n 2. All closed paths in l(h) c o n t a i n h. S u p p o s e there
is a closed p a t h P = ( b l , b 2 , . . . , b n , b I) w h i c h does not c o n t a i n h.
By the notationa,l d e f i n i t i o n e s t a b l i s h e d for paths bi_ I is an
i m m e d i a t e p r e d e c e s s o r of b i. Hence b i cannot b e c o m e a m e m b e r of l(h)
u n t i l bi_ I is a member. Also b I cannot b e c o m e a m e m b e r until b n does,
and b n cannot b e c o m e a m e m b e r u~til bn_ I does, etc. T h e r e f o r e all
closed paths in l(h) must c o n t a i n h.
A s s e r t i o n 3. l(h) is maximal. This f o l l o w s from step 2.3: nodes
are added to l(h) until no more can be.
I
The local p r e d e c e s s o r f u n c t i o n is the inverse of L I i.e. is L~ I in
w h i c h L~i(h) = ~.
U s l~n g the local s u c c e s s o r f u n c t i o n a s p e c i a l type of i n t e r v a l
path can be defined: a f o r w a r d path is a p a t h F = (bl,b2,...,b n)
w h e r e bi+ 1 c L$<b~) It can be note--d that all nodes on all f o r w a r d
paths from h t~ afiy node in I(h) are also in I(h).
A s s e r t i o n 5. The nodes in an i n t e r v a l are p a r t i a l l y o r d e r e d by
the local s u c c e s s o r function. G i v e n an i n t e r v a l
I(h) = (bl(=h),b2,b ~ ...,b~) if i < j then e i t h e r b i is a p r e d e c e s s o r
of bj on some f o r w a r d pathS'or b i and bj do not co-exist on any f o r w a r d
path. This follows from the fact t h a t , w i t h the e x c e p t i o n of h, all
i m m e d i a t e p r e d e c e s s o r s of a node must be i n t e r v a l m e m b e r s b e f o r e the
node can b e c o m e a member.
A s s e r t i o n 6. The r e l a t i v e o r d e r i n g of the nodes in a back
d o m i n a t o r list and the nodes in an i n t e r v a l must be the same If b~
is a back d o m i n a t o r of b~ and both are in an i n t e r v a l l(h), clearly
b i must p r e c e d e bj in th$ i n t e r v a l list b e c a u s e it is i m p o s s i b l e to
reach bj w i t h o u t h a v i n g first r e a c h e d b i.
A s s e r t i o n 7. For any i n t e r v a l m e m b e r b k # h w i t h b a c k d o m i n a t o r
list BD(b k) = (bl(=e0),b2,...,bj). then for b~ = h, b i c BD(b k) and
all blocks b~ f o l l o w i n g b i on the b a c k d o m i n a { o r list (i < ~ _< j),
b~ is a m e m b e r of the interval. A back d o m i n a t o r b~ must be on all
paths from e 0 to b k. Since it follows h on the b a c k d o m i n a t o r list
it must be on all paths and, hence all f o r w a r d paths, from h to b k.
T h e r e f o r e it must be an i n t e r v a l member.
A s s e r t i o n 8. Any s t r o n g l y c o n n e c t e d r e g i o n in an
i n t e r v a l must contain the i n t e r v a l head This follows i m m e d i a t e l y
from the fact that all closed paths in I(h) must c o n t a i n h. An
i n t e r v a l cannot, t h e r e f o r e , c o n t a i n d i s j o i n t s t r o n g l y c o n n e c t e d
regions
A s s e r t i o n 9. If an i n t e r v a l contains a s t r o n g l y c o n n e c t e d
r e g i o n then there exists a path from every node in the r e g i o n to
every node in the interval. Since the h e a d e r node b o t h b a c k
d o m i n a t e s every node in the i n t e r v a l and is in the s t r o n g l y c o n n e c t e d
region, and since there is a path from any node in a s t r o n g l y
c o n n e c t e d r e g i o n to any other node, there must be a p a t h from every
node in the r e g i o n to every node in the interval. Unless the entire
i n t e r v a l is s t r o n g l y connected, it will not be the case that there
exists a path from every node in the i n t e r v a l to every node in the
region.
Allen. CONTROL FLOW ANALYSIS
_
SIGPLAN Notices 1970 July
/<
in I(i) = (1,2,5,3,4,6). C l e a r l y there
are paths from 4 to all of its
p r e d e c e s s o r s in the i n t e r v a l list.
B e f o r e m a k i n g the next a s s e r t i o n ,
the m e a n i n g of " i n t e r v a l exit node"
needs to be more c a r e f u l l y defined:
an i n t e r v a l exit node is any node in an
interval, l(h), w h i c h e i t h e r has no
s u c c e s s o r s (i.e. is a t e r m i n a l node for
the entire graph) or has at least one
i m m e d i a t e s u c c e s s o r w h i c h is not in
Fisure 3 I(h).
A s s e r t i o n i0. The i n t e r v a l h e a d e r
Assume is an a r t i c u l a t i o n node for the inter-
i(1) = ( 1 , 2 , 5 , 3 , 4 , 6 ) val. Since the h e a d e r node is the only
entry to the i n t e r v a l it must be on
every e n t r y - e x i t path. An i n t e r v a l h e a d e r is not n e c e s s a r i l y an
a r t i c u l a t i o n node for the total graph.
A s s e r t i o n ii. All f o r w a r d d o m i n a t o r s of the i n t e r v a l h e a d e r
w h i c h are also i n t e r v a l m e m b e r s are, along w i t h the header, the
a r t i c u l a t i o n nodes for the interval. This a s s e r t i o n can be shown by
e x a c t l y the same r e a s o n i n g that led us to assert that the f o r w a r d
d o m i n a t o r s of e 0 are the a r t i c u l a t i o n nodes, and t o g e t h e r w i t h e0, the
only a r t i c u l a t i o n nodes of the total graph.
The a r t i c u l a t i o n nodes of the i n t e r v a l in Figure 3 are i, 4 and 6
since they are on every e n t r y - e x i t path.
In c e r t a i n a p p l i c a t i o n s a special g r a p h c o n s t r u c t called a
" t w o - t e r m i n a l s u b g r a p h " may be of interest. D e f i n e d in terms of
i n t e r v a l s , a t w o - t e r m i n a l s u b g r a p h is an i n t e r v a l w i t h one exit node.
Since an i n t e r v a l can have only one entry node the m o t i v a t i o n for the
term should be apparent. The i n t e r v a l in F i g u r e 3 is an e x a m p l e of a
t w o - t e r m i n a l subgraph.
P r o c e d u r e s w i l l now be g i v e n for f i n d i n g the s t r o n g l y c o n n e c t e d
r e g i o n in an i n t e r v a l , the a r t i c u l a t i o n nodes of the interval, and,
for each node in the i n t e r v a l the list of i n t e r v a l nodes w h i c h b a c k
d o m i n a t e it. These p r o c e d u r e s can be e m b e d d e d in p r o c e d u r e A t h e r e b y
g e n e r a t i n g not only the i n t e r v a l s but their i n t e r v a l r e l a t i o n s h i p s in
"one pass" t h r o u g h the edges in the graph.
Up to this point in the p a p e r it has b e e n c o m p l e t e l y s a t i s f a c t o r y
to r e p r e s e n t m e m b e r s of a set in terms of a list of the e l e m e n t s in it.
For e x a m p l e l(h) = (bl,b2,...,bn) w h e r e b i r e p r e s e n t s the name of the
P r o c e d u r e E.
The s t r o n g l y c o n n e c t e d region, SCR, of an i n t e r v a l can be found
by this one step procedure.
I. SCR =k-JCb i _ u LP(bi)) for all b. w h i c h are i n t e r v a l l a t c h i n g
nodes, i 1
U s i n g the results of E x a m p l e 2, we get SCR=[4~(I,2,3)]u[5<~(I,2)].
T h e r e f o r e the s t r o n g l y c o n n e c t e d r e g i o n of the i n t e r v a l in F i g u r e 3 is
comprise4 of the nodes (1,2,3,4,5).
A n o t h e r p r o c e d u r e , E', for f i n d i n g the s t r o n g l y c o n n e c t e d r e g i o n
in an interval is to start from the l a t c h i n g nodes and i t e r a t i v e l y
mark all i m m e d i a t e p r e d e c e s s o r s until the h e a d e r node is r e a c h e d and
marked. W h e n e v e r a m a r k e d p r e d e c e s s o r is found in this p r o c e d u r e it
is not n e c e s s a r y to continue the m a r k i n g of its i m m e d i a t e p r e d e c e s s o r s
since they will already have been marked. This p r o c e d u r e has the
a d v a n t a g e of not r e q u i r i n g that the LP lists be set up and is p r o b a b l y
p r e f e r a b l e if the only use of LP lists is to find the s t r o n g l y
c o n n e c t e d region.
A formal d e s c r i p t i o n of p r o c e d u r e E' is not given; the above
i n f o r m a l d e s c r i p t i o n should a d e q u a t e l y suggest such a d e s c r i p t i o n .
PARTITIONING GRAPHS BY I N T E R V A L S
H a v i n g c o n s i d e r e d the p r o p e r t i e s of any g i v e n interval, it will
now be shown that the set of intervals ~ = {l(h ), l(h ), l(h ) }
I 2 "
g e n e r a t e d by p r o c e d u r e A forms, as asserted, a unique p a r t i t i o ~ of
the graph G. R e c a l l i n g the d e f i n i t i o n of a p a r t i t i o n , we t h e r e f o r e
need to show that ~ covers G and that for any two i n t e r v a l s l(h i)
and l(hj) in ~ , l(hi) ~ l(hj) = ~. F u r t h e r m o r e we want to show
that ~ # is unique.
A s s e r t i o n 12. ~ # covers G. S u p p o s e there is a b e G w h i c h is not
in any l(h) e ~ . Since G is a c o n n e c t e d graph, b must e i t h e r be e 0
or have at least one p r e d e c e s s o r . But if b = e 0 it is an element
of en, the first i n t e r v a l constructed.
l(l~)b ~ e0, then, since it must have at least one p r e d e c e s s o r , by
step 2.2 it must b e c o m e a m e m b e r of the i n t e r v a l c o n t a i n i n g the
p r e d e c e s s o r or must, by steps 3 and 4, b e c o m e an i n t e r v a l head. (In
order to e s t a b l i s h that the p r e d e c e s s o r s must be m e m b e r s of i n t e r v a l s
we can r e c u r s i v e l y apply the above r e a s o n i n g u n t i l b = e0). Hence
covers G.
A s s e r t i o n 13. The e l e m e n t s of ~) are d i s j o i n t , that is for any
l(h) and l(h'), e l e m e n t s of ~:~, l(h) ~ l(h') = ~.
A s s e r t i o n 13a. The h e a d e r nodes must be d i s t i n c t , that is for
any h and h', h # h' By step 3 in P r o c e d u r e A a given node can
appear at most once in H and by step 4 a node in H can be p r o c e s s e d
(used to head an interval) only once.
A s s e r t i o n 13b. T h e ' h e a d e r node of one i n t e r v a l cannot be an
element of a n o t h e r interval. Suppose i n t e r v a l head, h, is an element
of i n t e r v a l l(h'). By 13a, h ~ h' T h e r e f o r e all i m m e d i a t e p r e d e -
cessors of h must, by step 2.2, also be in l(h'). But for h to have
become an i n t e r v a l head some but not all of its i m m e d i a t e p r e d e c e s s o r s
must have b e e n m e m b e r s of an interval, say l(h"). We will now show
that h" must also be an element of l(h'). C o n s i d e r an i m m e d i a t e
- 13 -
SIGPLAN Notices 1970 July
~..~2 ~3 ~4
I
1
b 2 Recalling the the nodes in a graph
I 1 I represent the basic blocks of a program
n-i g
i and therefore contain instructions, some
i n 12 uses of the interval construct in global
. analysis will now be sketched. The
in_l J ~ h primary purpose of the skeletal procedure
! given to to show how some of the interval
m ~
L ~ bn relationships may be applied to any one
of many types of analyses. The analysis
might typically involve looking for redundant instructions, determin-
ing variable definition and use relationships, etc.
Procedure F (Skeletal).
The use of intervals in global analysis is sketched by this
procedure:
I. Process each basic block in the program, collecting informa-
tion of global interest at the entry and exit. Set the order number,
k, to I.
2. For each k-order interval:
2.1 Proceed through the blocks in the interval in their interval
order. For each block the information previously collected (either
by step I or by the last iteration of step 2) is first modified to
reflect the effects of interval predecessors then promulgated to
interval successors.
2.2 After the completion of step 2.1, the information collected
at the exits of latching nodes (if there are any) must be promulgated.
This may require redoing step 2.1 after information on entry to the
interval head has been modified by the information on the latching
nodes.
Allen. CONTROL FLOW ANALYSIS
15-
SIGPLAN Notices 1970 July
I " B /
, 4B3
, ()
rN,
Figure 5 Figure 6
Suppose the analysis is to d e t e r m i n e w h i c h basic b l o c k s each
d e f i n i t i o n can "reach". A s s u m i n g that step I of the p r o c e d u r e has
c o d i f i e d i n f o r m a t i o n about the v a r i a b l e s d e f i n e d in each block,
we will now very s k e t c h i l y show how this i n f o r m a t i o n is p r o p a g a t e d
by P r o c e d u r e F. The first order i n t e r v a l s are p r o c e s s e d . The
i n f o r m a t i o n a s s o c i a t e d w i t h blocks i and 2 is not c h a n g e d but by
"~!iBI= A7,B3,C 4
A7,B3,C 4
A7=
Figure 7 Figure 8
Repeating step 2 for the second order intervals we find that all
definitions in the interval can reach every node in the interval.
This information is left encoded with each node as depicted in
Figure 8. Step 3 now yields Figure 9.
i!i' Ci= ~/
Bi= Bi,C I
B3=
C4=
AT=
Figure 9 Figure i0
Processing the third order interval 13(1) = (i,ii) yields the fact
that B I and C I can reach node II. Figure I0 shows this.
We now apply steps 4, 5, 6 and 7 of the procedure. Starting
with the second order graph we associate the information left with
nodes I and Ii with intervals I~(I) and I~(2). Thus the fact that
B I and C I reach node ii means that they reach the interval 12(2).
This is shown in Figure ii.
- 17 -
SIGPLAN Notices 1970 July
{i
B l,cl A7,B3,C4,B1,Ci
(2; Y2
Z !~A 7 ,B 3 ,C 4 ,B I ,C 1
I t
Figure ii Figure 12
Step 6 propagates, through the graph of Figure 8, the information that
BI and C I can reach the interval head to the other nodes in the inter-
val. (In our attempt to merely sketch the application of Procedure F
to a global analysis we have omitted some information which is
obviously vital at this point: whether or not a definition reaching
a node entry will be able to reach its exit(s). This information can
be trivially collected during the analysis. Had it been done we would
know that B I could not reach the exit of node 9.) The propagated
information is associated with each node in the interval as shown in
Figure 12. The first order graph is now processed. Step 5 associates
the node information of the second order graph with their corresponding
interval heads. Figure 13 shows this.
,
d
~I A7,B3,C4,Bi,C 1 A7,B3,C4,Bi,C I
~,,
"~7,B3,C4,Bi ,Ci !d k A7,B3,C4,Bi,C I
{ "" ~k
A7,B3,C4,C I for both
~'...'j L
4 and 5
" /
A7,B3,C4,C I
I Z
1 /
"d A7,B3,C4,B1,C 1 A7,B3,C4,Ci,B I
"-. <7 :7
A7,B3,C4,Ci,B I
Fisure 13 Figure 14
The information is propagated by step 6 through the graph of Fig. 6
and yields the result depicted in Fig. 14. We now know which nodes
can be reached by every definition. Although the details of this
example are beyond the scope of this paper it is worth commenting that
bit vector techniques exist for the example in which a series of
simple boolean operations on vectors codifying multiple definitions
are used to carry the information through the graph.
Allen. CONTROL FLOW ANALYSIS
18-
SIGPLAN Notices 1970 July
SUMMARY
The interval construct described in this paper has many proper-
ties which facilitate global analysis and which are of particular
interest in optimization. The partial ordering relationships
between nodes in an interval provide a natural processing order; the
ability to partition a graph into a hierarchy of intervals each of
which is partially ordered lets us propagate information rapidly
through the graph; the dominance relationships in a graph are easily
discovered; nests of strongly connected regions can be detected.
Although this paper has not shown how all of these constructs can be
found most of them should be apparent. The use of intervals in
optimization has only been hinted at; for a good explanation the
reader is referred to reference [3].
ACKNOWLEDGEMENTS
As stated in the introduction, the interval concept is due to
Dr. John Cocke who is also the major contributor of many of the
other ideas in this paper. Dr. J. T. Schwartz first formalized many
aspects of intervals. The author whishes to particularly thank both
of these people for not only their ideas but also for the continuing
encouragement.
REFERENCES
I. Allen, F. E., "Program Optimization," Annual Review in Automatic
Programmins, Vol. 5, Pergamon, New York, 1969.
2. Berge, C., The Theory of Graphs, Methuen & Co., Ltd., London,
1964.
3. Cocke, John, "Global Common Sub-expression Elimination,"
in these Proceedings.
4. Cocke, J. and Schwartz, J. T., "Programming Languages and their
Compilers," Preliminary Notes, Courant Institute of Mathematical
Sciences, New York Univ., N. Y., April 1970.
5. Prosser, R. T., "Applications of Boolean Matrices to the Analysis
of Flow Diagrams," Proc. Eastern Joint Computer Conf. Dec. 1959,
Spartan Books, N. Y., pp. 133-138.
6. Lowry, Edward S. and Medlock, C. W., "Object Code Optimization,"
CACM, Jan. 1969, pp. 13-22.
7. Earnest, C. P., Balke, K. G. and Anderson, J., "Analysis of
Graphs by Strict Ordering of Nodes" (unpublished).
8. Busam, Vincent A. and Englund, Donald E., "Optimization of
Expressions in Fortran," Comm. ACM., Dec. 1969, pp. 666-674.
9. Mendicino, Sam. F. et al., "The LPLTRAN Compiler," CACM,
Nov. 1969, pp. 747-755.
i0. Cocke, John and Miller, Raymond, "Some Analysis Techniques for
Optimizing Computer Programs," Proc. Second Intl. Conf. of Systems
Sciences, Hawaii, Jan. 1969.
- 19 -