You are on page 1of 14

Teaching an Old Dog New Tricks

The Problem is Complexity


My first experience with software quality was in 1976,
when I sat in front of an ASR-33 and laboriously typed
2 pages of BASIC code from David Ahl's "Creative
Computing" into my high school's Hewlett-Packard
21MX. Supposedly it would let me simulate a lunar
landing, but I discovered when I told it to "RUN" that
the colons scattered all over the listing were actually
important to the correct functioning of the program.
Oops!

I wish I could say that in that moment I achieved some


kind of enlightenment about code quality - but instead
I think I learned what most programmers learn - fiddle
with it until it stops complaining. Then, not much later, (This book got me
I learned that a program that's not complaining still hooked)
may not be working right. If I recall correctly, my lunar
landing simulation went off into an infinite loop, and
I've been hooked on computing ever since.

A lot has changed since my first year as a programmer, but two things have
changed hardly at all:
The more complicated the program is, the harder it is to get it right.
It's really hard to tell the difference between a program that works and one
that just appears to work.

Like a lot of programmers, I jumped into coding without even knowing what a
"debugger" was. When I first encountered a program designed to help you write
programs, it was like a big light-bulb going on in my head. By then, like most
programmers, I had considerable practice at debugging using "PRINT"
statements - although, by then, I had graduated to "printf( )" even the early
version of "adb(1)" was a huge step in the right direction.

Old Tricks: Saber-C


Fast-forward a few years and I encountered my next big paradigm-shifting
program to help me write programs: Saber-C.(1) At that time I was a presales
support consultant for DEC and was constantly writing small bits of code for
customers and internal users. DEC offered Saber-C on ULTRIX, and I decided to
play with it because it was described to me as a "kind of super duper debugger."
It turned out that Saber-C was a C language interpreter, which was fantastic
because the run-time environment was fully simulated rather than simply being
allowed to run, then monitored as you'd get in a debugger. So, if you allocated an
int, and tried to use it as a char *, you got an error. Since it tracked the
allocated size of a memory object as well as its type, you'd get a warning if you
accessed off the end of an array, or walked a structure member in a structure you
had just freed. I don't know how many times I've seen code where someone is
freeing a linked list like this:

struct listelem {
int stuff;
struct listelem *next;
};

/* simplified to make it obvious what I am doing wrong */


freelist(struct listelem *lp)
{
while(lp != (struct listelem *)0) {
free(lp);
lp = lp->next;
}
}

That piece of code is especially pernicious since it'll work almost all of the time -
until you run it on a weird architecture or with a memory allocator that compacts
the freed space in a way that changes the contents of lp->next after the call to
free(). I could show you the countless scars - literally, the death of a thousand
cuts - that I've suffered from this kind of minor sloppiness.

For me, using Saber-C was an eye-opener. It gave me a whole new approach to
development, since I could use the interpreter to directly call functions from a
command line, without having to write a test-harness with a main( ) routine
and controlled inputs and outputs. My test-harnesses were just a file of direct
calls to the function I was writing, which I could feed directly into the interpreter
with a mouse-click. Being able to do that, without having to go through a
compile/link/debug cycle, my code-creation sped up dramatically and I was
catching bugs in "real-time" as I wrote each block of code. After a little while, I
think I can safely say that the quality of my code skyrocketed.

When I left Digital and went to Trusted Information Systems, I got involved in
developing an internet firewall for The White House, as a research project under
DARPA. As technical lead of that project, I had a small budget and used it to buy
a copy of Saber-C, to serve as my main development environment. I wrote,
debugged, and tuned the TIS Firewall Toolkit (FWTK)(2) entirely under Saber-C.
The resulting code was remarkably robust and stable, though it eventually
succumbed to feature-creep and the resulting code-rot as an increasing number
of programmers had their hands in the code-base.
What can I say I learned from my Saber-C experience? First off, that there is no
excuse for writing unreliable software. Humility - and an acceptance that you can
and do make mistakes - is the key to learning how to program defensively.
Programming defensively means bringing your testing process as close as
possible to your coding so that you don't have time to make one mistake and
move on to another one. I also learned that code that I thought was rock-solid
was actually chock-full of unnoticed runtime errors that worked right 99% of the
time, or were reliable on one architecture but not another. I used to use Saber-C
as my secret weapon to convince my friends I had sold my soul to The Devil:
whenever they were dealing with a weird memory leak or a wild pointer that was
making their programs crash with a corrupted stack or mangled free list. Usually
Saber-C could pinpoint the problem in a single pass. I don't write as much
software as I used to (not by a long shot) but I keep an old Sparc Ultra-5 with
Saber-C in my office rack for when I need it.

New Tricks: Fortify


As we've all discovered in the last decade, it's not enough for code to simply
work correctly, anymore. Today's software has to work correctly in the face of a
high level of deliberate attack from "security researchers" (3) or hackers eagerly
attempting to count coup by finding a way to penetrate the software. Reliability
tools like Saber-C help produce code that is relatively free of run-time errors, but
the run-time testing performed by the typical programmer does not take into
account the kind of tricks hackers are likely to attempt against the software once
it is in the field. I know that I, personally, am guilty of this: when I wrote input
routines for processing (for example) a user login, I worried about what would
happen if the line was too long, or if a field was missing, or quotes didn't match -
and that was about it. The recent history of internet security shows that most
programmers take the same approach: worry about getting things as right as you
can based on the threats you know about, then cross your fingers and ship the
software. Most programmers who are aware that security is a consideration will
learn maxims like "don't use strcat( )" and might use snprintf( ) instead
of sprintf( ), but the environment is constantly changing, and so are the
rules - it is impossible to keep up.

(How not to ship code that blows up elegantly in your customer's face)
As you can see above, people are working on dealing with detecting some of
these flaws at run-time. Obviously, I'm a big fan of run-time error detection, but I
think it should be done while the code is being developed, not while it's being run
by the user. This error happened, as I was writing this, when QuickTime Player
didn't like something in the music I was listening to. This lucky accident is a case
in point of "better than nothing, but still half-assed."

Three years ago I joined a technical advisory board (TAB) for a company called
Fortify, that produces a suite of software security tools. One of the big
advantages of being a TAB member for a company is that you can usually mooch
a license for their software if you want a chance to play with it. As part of another
project I'm involved in, I am leading development of a website that is being coded
in JSP - since security is always a concern, I wanted to be able to convince our
engineers to use a code security tool. So I asked Fortify for an evaluation copy of
their latest version, planning on running some code through it to see how well it
worked.

Fortify's tools are built around a source code analyzer that renders a variety of
programming languages (Java, C, ...) into an intermediate form which is then
processed with a set of algorithms that attempt to identify and flag dangerous
coding constructs, possible input problems, and so forth. For example, one of the
approaches Fortify uses is similar to the "tainting" system that can be found in
the Perl programming language: as an input enters the system it is tracked
through the code-flow. Places where that input is used to compose other pieces
of data are examined, and a flag is raised if the "tainted" data might find its way
into a composed command such as an SQL call or shell escape ("injection
attack"). In C programs, tainted data is tracked by size to verify that it is not
copied into a memory area that is too small for it ("buffer overflow attack"). Later
on, I'll show you what that looks like, when Fortify correctly discovered a potential
buffer overflow in some of my code. This is incredibly useful because of the
prevalence of buffer overflows and injection attacks. But the problem, really, is
that there are too many attack paradigms in play today for any programmer to
keep track of. Security specialists, even, have a hard time keeping track of all the
ways in which code can be abused - it's just too much to expect a "typical
programmer on a deadline" to be able to effectively manage.

After a bit of thinking, I cooked up the idea of taking Fortify's toolset and running
it against my old Firewall Toolkit (FWTK) code from 1994, to see if my code was
as good as I thought it was. As it happens, that's a pretty good test, because
there are a couple of problems in the first version of the code that I already knew
about - it would be interesting to see if Fortify was able to find them.

Building Code With Fortify Source Code Analyzer


(SCA)
The source code analyzer acts as a wrapper around the system's compiler. In
this case, since I was working with C code, I was using it as a front-end ahead of
gcc.

(Running code through sourceanalyzer)

Before I could get my old code to build on the version of Linux I was using, I had
to fix a bunch of old-school UNIX function calls that had been obsoleted. Mostly
that meant changing old dbm calls to use gdbm instead, and replacing
crypt( ) with a stub function. I also removed the X-Windows gateway proxy
from the build process because I don't install X on my systems and didn't have all
the necessary header files and libraries. It turns out that, if you were reasonably
careful about how you pass the $(CC) compiler variable into your makefiles, you
can run the code through source analyzer without having to alter your build
process at all.

This is a crucial point, for me. I think that one major reason developers initially
resist the idea of using a code checker is because they're horrified by the
possibility that it will make their build process more difficult. That's a serious
consideration, when you consider how the infinite various flavors of UNIX have
become subtly incompatible, and how smart/complex it has made the typical
build process. I admit that my original decision to use FWTK had something to do
with my comfort in knowing it has a very minimalist build process and relatively
few moving parts. As it turned out, adding sourceanalyzer to the FTWK build (see
above) required absolutely no changes at all; I simply passed a new version of
CC on the command line, e.g:

make CC='sourceanalyzer -b fwtk gcc'

When you run the code through the source analyzer, you provide a build identifier
(in this case -b fwtk) that Fortify uses later when it's time to assemble the
analyzed source code into a complete model for security checking. Running the
source analyzer ahead of the compilation process slowed things down a bit, but
not enough to bother measuring. My guess is that for a large piece of software it
might add a significant delay - but, remember, you're not going to do a security
analysis in every one of your compile/link/debug cycles.

(Performing an analysis run with sourceanalyzer)

Once you've run the code through sourceanalyzer, it's time to perform the full
analysis. Invoking sourceanalyzer again with the "-scan" option pulls the
collected scan results together and does a vulnerability analysis pass against
them. In the example above, I ran it with flags to create an output file in "FVDL" -
Fortify's Vulnerability Description Language, an XML dialect. Fortify supports
several different data formats for the analysis results; I used the XML in this
example because I wanted to be able to look at it. Like most XML, it's extremely
verbose; the preferred analysis format is something called "FPR" - in this
example performance was not my objective.

Running the scan process is a lot more intensive than the first pass. I was not
running on beefy hardware (1.2 Ghz Celeron with 1 Gb of RAM running Linux)
and the analysis took several minutes to complete. Earlier, before I switched to
the machine with 1 Gb of RAM, I was running on an older server with a 500Mhz
processor with 256Mb of RAM - Fortify warned me that "performance would
suffer" and it wasn't kidding! If you're going to run Fortify as a part of your
production process, you should make sure that your analyst's workstation has
plenty of memory.

One thing that impressed me about how Fortify's system works is that it's very
environment-agnostic. The fact that I didn't have X-Windows on my Linux box
didn't matter; I was able to generate the analysis using command line
incantations that took me all of five minutes to figure out. I have to confess at this
point that I barely read the documentation; I was able to rely almost completely
on a "quick walkthrough" example. After the FPR/FVDL analysis file was
produced, I fired up the Audit Workbench (Fortify's graphical user interface) on
my Windows workstation, and accessed the results across my LAN from a
samba-shared directory. I was pleasantly surprised that, not only did it work
flawlessly, Audit Workbench asked me if I wanted to specify the root directory for
my source tree because the files appeared to be remote-mounted. I ran all my
analysis across the LAN and everything worked smoothly.

(Summary of the FWTK analysis)

When you open the FPR/FVDL with Audit Workbench you are presented with a
summary of findings, broken down in terms of severity and by categories of
potential problems such as Buffer Overflow, Resource Injection, Log Forging, etc.
You are then invited to suppress categories of problems.
(Turning off different analysis issues)

I thought that being able to suppress categories was very cool. For example, on
the FWTK code I was assuming that the software was running on a secured
platform. One of the options for category suppression is "file system inputs" -
which disables the warnings for places where the software's input comes from a
configuration file instead of over a network. Turning this off greatly reduced the
number of "Hot" warnings in the FWTK - it turns out there were a few (ahem!)
small problems with quote-termination in my configuration file parsing routines. If
someone was on the firewall and able to modify the configuration file, the entire
system's security is already compromised - so I turned off the file system inputs
and environment variable inputs.
(Audit Workbench in action - click for a detail view)

The screenshot above is the main Audit Workbench interface. Along the upper
left side is a severity selector, which allows you to focus only on Hot, Warning,
etc., issues. Within the upper left pane is a tree diagram of the currently chosen
issue and where it occurs in the code, with a call flow below. I found this view to
be extremely useful because it did a great deal to jog my memory of inter-
function calling dependencies in code I'd written over fifteen years ago. If you
look closely at the screen-shot above you'll see that one of the candidate buffer
overflows in the FTP proxy had to do with where the user's login name is parsed
out using a routine called enargv(). I distinctly recall that routine as being fairly
complicated but, more importantly, I know I didn't write it with the idea that it
might be subjected to nasty quote-balancing games from an attacker. When you
click on the lines in the small display, it pops you to the source code in question,
so you can see exactly what is going on. In this example, I'm taking a string (the
user's login name that they provided to the proxy) and tokenizing it from their
input using enargv( ), sprintf()ing it into a buffer then handing it to
syslog(). Ouch! As it turns out with closer inspection, there were some controls
on the length that the value of authuser could reach, at that point, but it's
definitely dodgy code that deserved review.

Let's follow the code-path and you'll see what I mean. It's a good example of how
unexpected interactions between your code and your own library routines can get
you into trouble. The problem starts in ftp-gw.c:

ftp-gw.c: usercmd() (code removed for this example)


char buf[BSIZ]; /* BSIZ is 2048 */
char tokbuf[BSIZ];
char mbuf[512];
char *tokav[56];

/* getline called to read remote user's command input */


if((x = getline(0,(unsigned char *)buf,sizeof(buf) - 1)) < 0)
return(1);
if(buf[0] == '\0')
return(sayn(0,badcmd,sizeof(badcmd)-1));

tokac = enargv(buf,tokav,56,tokbuf,sizeof(tokbuf));
if(tokac <= 0)
return(sayn(0,badcmd,sizeof(badcmd)-1));

So far, so good. Fortify has identified that some input is coming in on a socket,
through its analysis of getline() and is tracking the use of that input as it flows
through the code. It's also performing analysis to track the sizes of the data
objects as they are used. In this example, that's my problem. I won't walk you
through all the code of enargv() but it's a string tokenizer that "understands"
quotes and whitespace and builds an argc/argv-style array of tokens. It's pretty
good about checking for overflows, but if invoked with the correct input, in this
case, enargv()could be coerced into returning a string that was one character
smaller than BSIZ, or 2048 bytes. And that's where the code analyzer flags that
this potentially large blob of data is getting used in some risky ways:

ftp-gw: line 679 (where the audit workbench identified "multiple issues")
cmd_user()

char buf[1024];
char mbuf[512];

/* some processing done in which "dest" is plucked out of the


tokav/tokac set
that was parsed earlier with enargv()
...
then: */
{
sprintf(mbuf,"Permission denied for user %.100s to
connect to %.512s",authuser,dest);
syslog(LLEV,"deny host=%.512s/%.20s connect to %.512s
user=%.100s",rladdr,riaddr,dest,authuser);
say(0,mbuf);
return(1);
}

Ouch!! I am trying to stuff a string that is potentially up to 2048 characters into


mbuf which is 512. What was my mistake? Simple: I relied on a function that I
had written years before, and had forgotten that it would potentially return strings
that were as large as the original input. This kind of mistake happens all the time.
I'm not making excuses for my own bad code - the point is that it's absolutely
vital to go through your code looking for this kind of mistake. I know that, if you
had asked me at the time when I wrote it, I would have sworn that "It was written
with insane care; there are no unchecked inputs!" And I'd have been wrong. This
was a fairly subtle mistake that I overlooked for 3 years while I was actively
working on the code, and Fortify found it immediately.

My over-reliance on syslog() also turned out to be a problem a number of


years after the FWTK was released. One version of UNIX (that shall remain
nameless....) syslog() function had an undersized fixed-length buffer that it
used to construct the log message's date/time stamp and the message - since
the FWTK code tended to push lengthy messages containing (among other
things) host DNS names, an attacker could craft a buffer overrun and push it
through the FWTK's overeager logging. This is another place where an
automated tool is useful: if the tool's knowledge-base contained the information
that sending syslog() messages longer than 1024 bytes was risky on some
operating systems, the code analyzer would have checked all my syslog()
function calls for inputs longer than 1024 bytes. In fact, when the code analyzer's
knowledge-base gets updated with a new rule, it's going to retroactively trigger
you auditing code you knew was OK but that may no longer be OK in the light of
your new knowledge.

One feature of Audit Workbench that I didn't use for this experiment was the
analyst's workflow. At the bottom of the screen is a panel with a "Suppress Issue"
button and some text entry fields and drop-menus. These implement a problem
tracking and reviewing system that looks extremely well thought-out. If I were
resuming maintenance of this software and was establishing a code review/audit
cycle, I could now review each of the hot points Fortify identified and either fix
bugs, redesign routines, or mark the hot point at reviewed and accepted. For a
larger body of code this would allow multiple analysts to coordinate working on a
regular review process. If I were a product manager producing a piece of
security-critical software, I would definitely use Fortify to establish a regular audit
workflow as part of my release cycle.

The number of issues Fortify identified in the FWTK code is pretty daunting. After
a day spent digging into them, I found that a lot of the items that were flagged
were false positives. Many of those, however, were places where my initial
reaction to to the code was "uh-oh!" until I was able to determine that someplace
earlier in the input there was a hard barrier on an input size, or some other
control against inappropriate input. It made me reassess my coding style, too,
because I realized that it's important to keep the controls close to the danger-
points, rather than letting them be scattered all over the code. When I collect
network input, I typically scrub it by making sure it's not too long, etc., then pass it
down to other routines which work upon it in a state of blind trust. In retrospect I
realize that doing it that way could lead to a situation in which the lower-level
routines get inputs that they trust, which came via a path that I forgot to
adequately scrub. The example of my use of enargv() in the FTP proxy is a
case of exactly that kind of mistake. When I first started using Saber-C, I felt like
it was a tool that taught me things that made me a better programmer. I feel the
same way about Fortify.

Experience with Other Open Source Packages


Once I had gotten some hands on with Fortify, I decided to assess the level of
difficulty in applying it against a few other security critical, popular, Open Source
programs. My main concern was whether the build process would admit
substituting sourceanalyzer for gcc without having to spend hours editing
makefiles. The results were good - postfix, courier IMAPd, syslog-ng, BIND, and
dhcpd all turned out to work if I specified sourceanalyzer as $(CC). I.e.: all I had
to do was tell it:
make CC='sourceanalyzer -b postfix gcc'
and it ran through from the command line with no changes in the build process at
all. Sendmail proved to be a little bit more difficult but only because I was too
stubborn to read the build instructions (two minutes) and figured it out by reading
through the build scripts and makefiles instead (two hours).

I did not attempt to do an in-depth, or even cursory, analysis of the Open Source
packages. With each one, however, I spent some time reviewing the hot listed
items. Some of the packages had many items flagged; one of the larger code-
bases had several thousand. Interestingly, I found I could immediately tell which
sections were older and better thought-out, and which sections were developed
by less experienced programmers. I imagine you could use the hot list, sorted by
code module, to make a pretty good guess as to which of your programmers was
more experienced and more careful. As I was reviewing one hot listed item that
looked particularly promising, I discovered a comment above the questionable
line of code that appeared to have originated from some kind of manual code
audit workflow tool that I haven't identified. The comment claimed that the
operation was safe and that and had been reviewed by so-and-so in version
such-and-such. It's nice to see that these important pieces of our software
"critical infrastructure" are, in fact, being audited for security holes!

One of the packages I examined had a number of very questionable-looking


areas in a cryptographic interface that is normally not enabled by default. My
suspicion is that the implementation of that interface was thrown into the software
as an option and isn't widely used, so there has been relatively little motivation to
clean it up. The software that you'd expect would come under concerted attack
and review by the hacker and "security researcher" community proved to be fairly
clean. I didn't see anything that immediately jumped out as a clear hole, although
I found several dozen places in a module of one program where I felt it was worth
sending an Email message to the maintainer suggesting a closer review.

Notification
One topic in security about which I have been exceptionally vocal is the question
of how to handle vulnerabilities when they are discovered. I personally believe
that the hordes of "security researchers") that are constantly searching for new
bugs are largely a wasteful drain on the security community. The economy of
"vulnerability disclosure," in which credit is claimed in return for discovering and
announcing bugs, has had a tremendous negative impact on many vendors'
development cycles and product release cycles. Many of these larger vendors
have begun using automated code-checking tools like Fortify in-house, to
improve their software's resistance to attack. Indeed, if the "security researchers"
actually wanted to be useful, they'd be working as part of the code audit team for
Oracle, or Microsoft. But then they couldn't claim their fifteen minutes of fame on
CNN or onstage at DEFCON.

My decision to use the FWTK code-base was partly influenced by the fact that it's
very old code and has largely fallen out of use. I felt that, if I had discovered a lot
of vulnerabilities in a widely-used piece of software, I'd feel morally obligated to
invest a lot of my time in making sure they were fixed. As a matter of philosophy,
I don't approve of releasing information about bugs that will place people at risk -
and, in the case of the FWTK I knew I wasn't going to annoy the author of the
code. Not too much, anyhow! I think that this approach worked pretty well,
especially considering the number of potential problems I found in my code.

While I was running Fortify against the other open source packages that I
mentioned earlier, I indentified six exploitable vulnerabilities in two of the
packages. To say that that was "scary" would be an understatement, since I
invested under an hour in poking randomly about in the results from each
package. I followed procedures that have worked for me since the mid-1980's: I
researched the owner of the module(s) in question, contacted them personally,
and told them what I'd found. Contrary to the ideology of the "full disclosure"
crowd, everyone I contacted was extremely responsive and assured me that the
bugs would be fixed and in the next release; No hooplah, no press briefing, no
rushing out a patch. I won't get my fifteen minutes of fame on CNN but that's all
right. I'd rather be part of the solution than part of the problem.

Lessons Learned
This experience was very interesting and valuable for me. First off, it gave me a
much-needed booster-shot of humility about my code. Having a piece of software
instantly point out a dozen glaring holes in your code is never fun - but it's an
important sensation to savour.

More importantly, it showed me that tools like Fortify really do work, and that they
find vulnerabilities faster and better than a human. That's a significant result if
you're involved in software development for products that are going to find
themselves exposed to the Internet. Since the FWTK code was developed using
extensive run-time checking with Saber-C, it proved to be extremely solid and
reliable, but Saber-C never checked specifically for security flaws. As it turns out,
there is a major difference between the kind of analysis your tools should do for
run-time reliability as opposed to security. Clearly, both are necessary. As a
developer (or "former developer" anyway) I am deeply concerned about how
difficult it is to do a reliable code-build on the various flavors of UNIX/Linux that
are popular - adding source code checking to the build cycle was initially scary
but it turned out that the fear wasn't merited. I admit I was pleasantly surprised.

The "many eyes" theory of software quality doesn't appear to hold true, either.
FTWK was widely used for almost ten years, and only one of the problems I
found with Fortify was a problem I already knew about. So FWTK was a piece of
software (in theory) examined by "many eyes" that did not see those bugs, either.
When you consider that code-bases like Microsoft's and Oracle's can number in
the tens of millions of lines, it's simply not realistic to expect manual code-audits
to be effective. Engineers I've talked to at Microsoft are using some form of
automated code checking, but I don't know what it is. Oracle is using a suite of
tools including Fortify. I suppose the inevitable end-game with automatic
vulnerability checkers is that they will become available to both sides. That'll still
be a good thing because it will go a long ways toward reducing the current trade
in the obvious-bug-of-the-month. If we can push code quality for Internet
applications to the point where a "security researcher" or a hacker has to invest
weeks or months to find an exploit, instead of hours, the world will be a much
better place.

Acknowledgements

I would like to thank Brian Chess at Fortify for supporting this research through
the loan of a sourceanalyzer license. Jacob West at Fortify was kind enough to
answer some of my questions, and to gently RTFM me when I needed it.

You might also like