You are on page 1of 6

A METHOD TO EXPOSE THE HIDDEN STRUCTURE OF FORTRAN PROGRAMS

Loren P° Meissner
Lawrence Laboratory, University of California
Berkeley, Calif. 94720

Keywords: Structured programming; Flow graph; Reducibility

Abstract the program design, and that keywords are useful


only to the extent that they reveal structure
Program structure is inherent in program which already exists.
design; therefore special keywords such as "if...
then ... else" or "d_~o ... while" are useful o-~ly The Fortran user's dilemma. The difficulty
to the extent that they reveal that structure. with Fortran programs, from this point of view, is
not that they lack structure but that their struc-
A simple listing of Fortran program state- ture may be difficult to discern. The keywords
ments is ineffective for revealing program struc- that appear in the control statements of a Fortran
ture. Proposals have been made for manually program do little to enhance the recognition of
inserting keywords, comments, indentations, etc., program structure; indeed, in most instances (per-
either during a separate preprocessing stage or haps with the exception of "do") they tend to
during the normal coding process. obscure it. Because of the lack of such keywords
as "if ... then ... else" and "do ... while," For-
We show how the flow graph can provide inde- tran users are forced to use the keyword "f~q_t__o_o"
pendent structural information. Although the flow in a variety of ways, to implement many different
graph may be said to exist as soon as a program constructs. This makes the structure of a Fortran
has been designed, it is most readily generated program difficult to recognize from a listing of
from the program statements. "Bad" structure can the program statements.
be detected objectively, and "good" programs can
be reconstituted to reveal their block structure For a Fortran user who is convinced of the
more clearly. Our implementation is based on an benefits of structured programming, is there any
algorithm suggested by Peterson et al (CACM, Aug- choice but to switch to some other language?
ust 1973). We have extended this algorithm to Although he concedes that Fortran is far from
automatically detect block exits. ideal, he may feel that most other existing lan-
guages are not much better [15]. He would like to
Background continue using Fortran, and yet he would like to
adhere to the principles of structured program-
In recent years a great deal of discussion ming.
[1] - [14] has centered around the structure of
computer programs, particularly as this structure Software aids should be developed to help the
is reflected in the flow of control of the program Fortran program designer understand the structure
during execution. In much of this discussion, of his programs. A particular objective should be
however, one point has perhaps received inadequate to help him distinguish in some manner between the
attention. It has not been strongly enough empha- different uses of "go to." Even in Dijkstra's
sized that the structure of a program is pretty letter [2], the point is made that the "go to" per
firmly established during the program design se is not so "harmful" as its "unbridled use." It
phase, and therefore that not much can be done should be possible to find an objective way of
during the coding phase to change or to augment distinguishing between those uses of "go to" that
program structure. are unbridled and therefore harmful, and those on
the other hand that are bridled and hence benign.
There is little doubt that when the designer
of a program thinks of it as being composed of Extensions to Standard Fortran. One possible
sequences, alternatives, and repetitions, t h e way to gain some of the advantages of structured
resulting program will generally be easier to com- programming would be to add the keywords "if ...
prehend (by the designer or by others later on), then ... else" and "do ... while" to Fortran.
will be easier to prove correct, and will be eas- Several experiments have been made in this direc-
ier to compile (and especially to optimize). But tion, including "structured Fortran" extensions
excessive attention has been devoted to the coding named DEFT [16], IFTRAN [17], L I ~ S [18], MORTRAN
phase, with excessive emphasis on the use of par- [19], SFTran [20], and SPIFFY [21]. (No doubt
ticular keywords ("if ... then ... else" and "d_~o there are others.)
... while") and the implication that program
structure resides in these keywords. It has some- The obvious approach is to implement a For-
times been forgotten that structure is inherent in tran extension as a preprocessor, which accepts

193
programs written in a "structured" language more the beginning and end of each block of statements
or less resembling Fortran, and translates them in an alternative clause or a repetitive clause.
into Standard Fortran programs which can then be Furthermore, Hull would restrict the use of "go
processed in the ordinary way. However, the pre- to" statements to those ways that are necessary in
processor approach introduces an additional lan- implementing structured programming constructs.
guage level during debugging, and therefore these However, it is not clear whether the use of "cor-
implementations may prove to be important princi- rect" structural principles can be adequately
pally as experimental tools or testing grounds for enforced in a manual system of this kind.
the evaluation of the Fortran extension ideas they
embody. Meanwhile, further experiments of this Using flow graph information to expose program
nature should certainly he encouraged, and wide structure
dissemination of various proposals (along with
reports of experiences of their users) should be A simple listing of Fortran program statements
promoted. is not adequate for revealing program structure.
The extensions and informal techniques described
Informal techniques. Other proposals involve above attempt to correct this deficiency with aux-
the use of an informal language to develop a iliary information that is inserted manually in
pseudo-program which the programmer then trans- the listing, either during a separate preprocess-
lates by hand into Standard Fortran. One such ing stage or during the normal coding process.
proposal is the "Programming Design Language
(PDL)" [22]. The control structures are expressed We propose to tap an independent source of
using keywords such as "if ... then ... else" structural information, and to make the program
along with systematic indentation, while the flow graph available to the user. Although this
statements controlled by these keywords are writ- information exists (in a sense) as soon as the
ten out in plain English. In principle, such a program has been designed, it is most readily cap-
pseudo-program could be translated with equal ease tured by the computer after the program statements
into Fortran, Cobol, Algol, PL/I, or any other have been written. We have implemented some soft-
language. ware which will scan a Fortran program or subpro-
gram, and will generate and display its flow
T. E. Hull [23] proposes that comment cards be graph. The flow graph is also used to produce a
manually inserted into Standard Fortran programs restructured listing of the program statements, as
in certain prescribed ways, e.g., to insert key- illustrated in Fig. 1 (which is based on an exam-
words such as "if ... then ... else" or to mark ple discussed by Hull [23]). The techniques

*C start
I ÷ (2) E P S = 1.0E-5
*C LIO: begin iterative s t r u c t u r e
2 ~ (10, 3) DO 4 I = 1, 11
*C ...
X = 1.0 + F L O A T (I - 1) / 10.0
T E R M = 1.0
S U M = 1.0
R N = 1.0
*C L5: begin i t e r a t i v e structure
3 ÷ (4) 1 TFRM = - TERM * X / RN
SUM = SUM + TERM
4 ÷ (7, 5) I F (ABS (TERM) .GE. EPS) GO TO 3
*C ...
*C sequence break
7 ÷ (8) 3 R N = R N + 1.0
8 ~ (3)t GO TO 1
*C r e t u r n arc
*C sequence break
*C end L5
5 ÷ (6) 2 WRITE (6, 100) X, S U M
100 F O R M A T (1X, F 4.1, F 10.5)
6 ÷ (9) GO TO 4
*C sequence break
9 ÷ (2)+ 4 CONTINUE
*C r e t u r n arc
~C end LIO
10 ÷ (11)
11 ~ (12) STOP
12 END

Figure i. Restructured listing based on the linearized flow graph of a program (adapted from Hull [23]) to
compute a table of values of the exponential function for negative arguments. On the left is a
representation of the flow graph in numeric form, using node n~wbers that have been assigned con-
secutively (and do not necessarily agree with statement labels). Comments preceded by an asterisk
were generated by the analysis algorithm.

194
described in the remainder of this paper are based
on a synthesis and extension of the studies and
proposals of Hecht and Ullman [24] and of Peter-
son, Kasami, and Tokura [25]. (These two papers SINGLE MULTIPLE
are hereinafter referred to as HU and PKT, respec- ENTRY ENTRY
tively.)

In the current preliminary version of our SINGLE D


program, a node of the flow graph corresponds to EXIT
each:
labelled statement (except "format" state-
ments), MULTIPLE E
"go t__oo"statement (including computed "go EXIT
to"),
"if" statement (logical or arithmetic),
"do" statement,
"stop," "return," or "end" statement. Figure 2. Classification of programs, according to
An initial node is also created at the beginning the characteristics of their flow
of the program, and an extra node is created for graphs. Flow graphs of programs in set
each nesting level of a "do" loop. An arc of the D are "D-charts": that is, they are com-
flow graph leads from each node to those nodes posed entirely of sequences, alternative
which can immediately follow it in the execution structures, and iterative structures.
sequence. Programs in set E may also include
multi-level exit structures. The set
Well structured program f l o w ~ _ s . A flow D u E contains all "well structured" or
graph corresponding to a program that is composed "reducible" programs. Programs in set
entirely of sequences, alternative clauses, and F can be reduced to set D by "node
iterative clauses, is called a "D-chart." Flow splitting." Any program can be reduced
graphs of this form have been studied extensively to set D u E by node splitting.
[26], [27]. It has been proved that the flow
graph of an arbitrary program can be reduced to a
D-chart; however, this reduction may increase the
length of a program or alter its execution
sequence.

The term "D-chart" includes flow graphs cor-


responding to programs whose alternative clauses
may contain more than two branches (e.g., "case" respond to the well formed flow graphs discussed
clauses). However, the test for completion of an in PKT.
iteration clause must be made at the beginning of Accordingly, the term 'Well structured" may
the loop I. In a D-chart, every subgraph has one be adopted to describe the class of programs which
entry and one exit. Thus the flow graph of the are composed of sequences, alternative clauses,
program in Fig. 1 is not a D-chart. One way to iterative clauses, and multi-level exits. In Fig.
translate this program to the "do ... while" form 2, the region D represents the set of programs
Js to include one additional execution of the whose flow graphs are D-charts. Programs that
assignment RN = RN + 1.0 , after the variable TERM require multi-level exits (or some equivalent mod-
has already reached its final value. Knuth and ification) in addition to the properties of
Floyd [5], Ashcroft and Manna [6], and PKT all D-charts are represented by the region E. Thus
give examples of programs whose flow graphs cannot the set D u E comprises the well structured pro-
be reduced to D-charts without some essential grams (according to this terminology).
(although perhaps minor) modification. Such chan-
ges, motivated by a desire to force all programs PKT gives an alternate characterization of
into the D-chart mold, may obscure rather than the flow graphs of programs in the set D w E. It
clarify the inherent program structure. is shown that a program can be composed entirely
from these four structural units, if and only if
Recent discussions of structured programming its flow graph does not contain any strongly con-
[14], [28] tend to the concensus that a fourth nected 2 subgraph with more than one entry node.
basic structural unit, the multi-level exit, It does not seem unreasonable to exclude from the
should be permitted in addition to sequences, class of well structured programs one whose flow
alternative clauses, and iterative clauses. graph contains some strongly connected subgraph
Experience shows that incorporation of this struc- with more than one entry node, and which therefore
tural form is generally justified from the stand-
point of program comprehension, even though any
program can, in principle, be recast to avoid it. A strongly connected subgraph is one which
Programs composed from these four structures cor- has the property that between any two of i t s
nodes i and j there is at least one path
(sequence of arcs) leading in each direction,
It is immaterial to this discussion whether i.e., from i to j and from j to i. An entry
the test is also made prior to the first node of a subgraph is a node in the subgraph
iteration ("d__oo... while"), or only prior to that is the endpoint of an arc originating
iterations after the first ("do ... until"). outside the subgraph.

195
ward" flow of control, or even to produce an
"upward" or backward flow in a manner that is
equivalent to the normal flow of control in a "do"
loop. An improper use of the " g o t o" statement,
on the other hand, would be one which introduces
\ more than one entry into a strongly connected sub-
graph of the program flow graph. (Our algorithm
does not proceed with the flow graph analysis
after it finds such an "improper" "go to," but
instead it returns the program, along with some
5
flow graph information, to the originator for cor-
rection.)
I0
Conversion of well structured flow graphs to
11
nested form. In a well structured program flow
graph, each strongly connected subgraph has a
unique entry node. If we delete all return arcs
(arcs leading to the entry node of a strongly con-
nected subgraph, from within the subgraph), the
resulting graph will have no strongly connected
components. Nevertheless, it may contain sub-
graphs of "hammock" form, corresponding to alter-
native clauses in the program. That is, there may
be a pair of nodes (such as nodes 2 and 8 in Fig.
3) that are joined by more than one path in the
same direction. PKT shows how to arrange the
nodes of such a graph (having no strongly connec-
Figure 3. Linearized flow graph of an interpola-
ted subgraphs) into a single linear sequence or
tion subprogram, constructed according
vector, in such a manner that no arc goes from any
to Peterson's algorithm [25]. In an
node to another node that precedes it in this
actual application, many more nodes
sequence 4. Thus all flow is "downward" except for
might appear between nodes i0 and ii.
the return arcs that have been temporarily removed
Note that these nodes will be incorpo-
from consideration.
rated within the iteration structure.
The key to this algorithm (and, incidentally,
the most complex part from the computational
cannot be composed from the four "permissible"
standpoint) involves the discovery of the lowest
structural units 3 .
cover for each merge node. [A merge node is any
node with more than one arc leading to it; and its
HU shows that a program flow graph is reduci-
lowest cover is the "lowest" node (in the sense
ble (in a certain sense which is important for
that arcs of the flow graph are directed "down-
program verification and especially for optimiza-
ward") through which every path passes that leads
tion) if and only if it contains no strongly con-
to the merge node.] Whenever it is discovered
nected subgraph with more than one entry point.
that the next node (to be chosen as an element of
Thus the set of programs having reducible flow
the vector) is the lowest cover for some merge
graphs also corresponds exactly to the set D u E
node, that merge node is pushed onto a stack. It
of Fig. 2. HU also shows that any Fortran program
is shown in PKT that this guarantees no node will
whose transfers to previous statements are all
be included twice in the vector. On the other
caused by the normal termination of "do" loops is
hand, when a node in the vector has all its suc-
reducible.
cessors already on the stack, the next node is
obtained by popping the stack.
A comparison of HU and PKT suggests an objec-
tive criterion for distinguishing between correct
This use of an auxiliary pushdown stack indu-
and incorrect uses of the "gn to " statement in
ces an implicit nesting relation among the nodes
Fortran programs. In a well structured program,
of the flow graph. The level of nesting of a node
such a statement may be used to implement a "down-
may be defined to be the depth of the stack at the
time the node was placed in the vector. Pushing a
node on the stack increases the nesting depth, and
PKT shows how to correct a program that is
therefore begins a subsequence of nodes that are
not well structured, by using a transforma-
all at (or deeper than) a certain nesting level.
tion called "node splitting." This is a way
This subsequence is, in effect, a block within the
of preserving one entry node of each strongly
flow graph, corresponding to a block of program
connected subgraph and removing all the
statements. This block ends at the point where
others. Each entry node to be removed is
the node is popped from the stack and incorporated
duplicated, along with that portion of the
in the vector.
subgraph connecting it to the remaining
entry node. Programs whose flow graphs can
be reduced by node splitting to D-charts cor-
An alternative node sequencing algorithm, due
respond to the region F in Fig. 2. Any pro-
to R. Tarjan [29], was brought to our atten-
gram flow graph can be reduced by node split-
tion while this paper was in preparation.
ting to the set D u E.

196
Automatic detection of exit arcs. The proce- end of each block, the statements from which flow
dure described so far is based closely upon the returns to a loop entry point, and the statements
algorithm described in PKT. However, some exper- causing control to exit from a block.
ience with this procedure drew our attention to an
anomalous result. When the exit from a loop forms In this listing, the labelled statements and
the only path to a certain part of the program, other statements for which a node was generated
the information contained in the program flow are printed in sequence according to the node num-
graph makes it appear that that part of the pro- bers in the vector generated by our modified PKT
gram belongs entirely inside the loop. For exam- algorithm. (Following each such statement in the
ple, consider the interpolation program illustra- listing are those non-control statements that fol-
ted in Fig. 3. A loop is used to search for a lowed it in the original source program.)
certain value of an index, and when an appropriate
index value is found, control exits from the loop Thus the statements in the restructured list-
and the entire remainder of the program is then ing are not, in general, in the original source
executed. The PKT algorithm incorporates virtu- program sequence; this original sequence is indi-
ally the entire program within the loop. cated by the node numbers appearing at the extreme
left side of the listing, and breaks in the orig-
We have extended the algorithm to detect arcs inal sequence are signalled by "sequence break"
that exit from a loop. The strongly connected comment lines. Such sequence breaks obviously
subgraphs (labelled by their entry nodes) form a garble the flow among the statements of the
tree. An exit arc may be defined as an arc lead- restructured listing, and these statements would
ing from one strongly connected subgraph to have to be modified to some extent to correct the
another one that is not contained within it. We flow and thus convert the restructured listing
find the outermost strongly connected subgraph into a correct program. In Fig. i, for example,
containing the source node of the exit arc but not the statement GO TO 4 (node 6) should be deleted,
containing its destination node. Before the entry and the test at node 4 should be reversed to read
node of this subgraph is placed in the vector, the I F (ABS (TERM) .LT. EPS) GO TO 2 . We have not
destination node of the exit arc is pushed onto undertaken, in the present version of our algo-
the auxiliary stack, thus creating an additional rithm, to produce the needed corrections automat-
block level. The resulting nested flow graph is ically.
shown in Fig. 4.
Remarks
Reconstituting the program listing
This same technique can, in principle, be
We use the results of this flow graph analy- applied to programs written in languages other
sis to produce a restructured version of the orig- than Fortran; the main requirement would be a sim-
inal program (see Fig. i). Nesting levels are ple adaptation of the process of generating the
displayed by means of successive indentations. flow graph from the program statements. However,
Comments are inserted to mark the beginning and it is the Fortran language that seems most clearly
to be in need of automatic aids to program struc-
ture recognition.

We have implemented this technique initially


as a post-processor (which operates upon programs
that have already been compiled, and thus may be
assumed to be free from syntax errors). However,
certain advantages would accrue from its incorpor-
ation as an integral (presumably optional) part of
a compiler. Most compilers already have some
reasonable equivalent of the program flow graph
available; and conversely, the results of our
analysis should be of value in code optimization.

5 Experience of users. We have applied this


algorithm to a few programs written by experienced
20
programmers whose habits are probably rather con-
~ " 72
servative. We often found that the nodes of the
linearized flow graph corresponded to the state-
ments of the program in their original sequence,
7 and that most of the exceptions resulted from the
arbitrary selection between a pair of arcs emana-
8
ting from a single node.

The most common case of "ill structure" that


was found in this limited sample consisted of
"exception" processing applied during execution of
a loop. For example, during a loop to process the
characters of an input string, a special sequence
Figure 4. A modified version of the algorithm has
of statements is executed whtn the end of a card
been applied to the same flow graph.
is reached. If this same exceptional condition
Note that nodes i0 and ii have been
can occur before the loop is entered (e.g., while
moved outside the iteration structure.
searching for the beginning of an input string),

197
even a fairly conservative programmer may succumb 15. A. Ralston, "The future of higher level lan-
to the temptation to code a jump to the processing guages (in teaching)"
segment inside the loop. Proceedings International Computing Symposium
1973, North-Holland, Amsterdam, 1974
Further information. An expanded version of
this paper (LBL Report 3004) is available from the 16. C. A. Steele and A. E. Sedgwick, "DEFT: a dis-
author upon request. ciplined extension of Fortran"
Technical Report, Department of Computer Sci-
Bibliography ence, University of Toronto, 1973

i. C. Bohm and G. Jacopini, "Flow diagrams, Tur- 17. E. F. Miller, "Extensions to Fortran to sup-
ing machines, and languages with only two port structured programming"
formation rules" SIGPLAN Notices 8:6, 1973
Comm. ACM 9:3, 1966
18. L. Miller, "LINUS: a structured language for
2. E. W. Dijkstra, "Go to statement considered instructional use"
harmful" SIGCSE Bulletin 6:1, 1974
Comm. ACM 11:3, 1968
19. A. J. Cook, "A user's guide to CDC 7600/6600
3. R. M. Burstall, "Proving properties of'pro- MORTRAN"
grams by structural induction" TM 150.1, Computation Group, Stanford Linear
Computer Journal 12:1, 1969 Accelerator, Stanford, 1973

4. D. E. Cooper, "Programs for mechanical program 20. J. Flynn, "SFTran user's guide"
verification" Report 914-ICM-337, Jet Propulsion Labora-
Machine Intelligence 6, 1971 tory, Pasadena, 1973

5. D. E. Knuth and R. W. Floyd, "Notes on avoid- 21. L. Carpenter, (private con~nunication)


ing go to statements" Boeing Computer Services, Inc., Seattle, 1974
Information Processing Letters i, North-
Holland, Amsterdam, 1971 22. "Improved technologies for applications devel-
opment"
6. E. Ashcroft and Z. Manna, "The translation of (Report), Productivity Techniques Department,
g o to programs into while programs" IBM Corporation, Bethesda, Md., 1973
Proceedings IFIP-71, North-Holland, Amster-
dam, 1972 23. T. E. Hull, "Would you believe structured For-
tran?"
7. F. T. Baker, "System quality through struc- SIGNUM Newsletter 8:4, 1973
tured programming"
Proceedings AFIPS 1972 Fall Joint Computer 24. M. S. Hecht and J. D. Ullman, "Flow graph
Conference, Vol. I, 1972 reducibility"
Soc. Ind. Appl. Math., Journal on Computing
8. E. W. Dijkstra, "Notes on structured program- 1:2, 1972
ming"
in Structured Programming by Dahl, Dijkstra, 25. W. W. Peterson, T. Kasami, and N. Tokura, "On
and Hoare, Academic Press, 1972 the capabilities of while, repeat, and exit
statements"
9. E. W. Dijkstra, "The humble programmer" Comm. ACM 16:8, 1973
Comm. ACM 15:10, 1972
26. J. Bruno and K. Steiglitz, "The expression of
I0. C. A. R. Hoare, "A note on the for statement" algorithms by charts"
Bit 12:3, 1972 TR 88, Computer Science Laboratory, Princeton
University, 1971
ii. M. C. Hopkins, "A case for the g o t_o_o"
SIGPLAN Notices 7:11, 1972 27. D. C. Cooper, "Some transformations and stand-
ard forms of graphs, with applications to
12. B. M. Leavenworth, "Programming with(out) the computer programs"
go t_~o" Machine Intelligence 2, American Elsevier,
Proceedings ACM 1972; SIGPLAN Notices 7:11, N. Y., 1968
1972
28. G. V. Bochmann, "Multiple exits from a loop
13. H. D. Mills, "How to write correct programs without the go to"
and know it" Comm. ACM 16:7, 1973
IBM Report, Gaithersburg, Md., 1972
29. R. Tarjan, "Testing flow graph reducibility"
14. W. A. Wulf, "A case against the go t_~o" TR 73-159, Department of Computer SCience,
Proceedings ACM 1972; SIGPLAN Notices 7:11, Cornell University, 1973
1972

198

You might also like