You are on page 1of 6

Chapter 2: Assembly Language – Quick

Start
Low-level programming means "speaking" with the computer in a language that is native to
it, the joy of low-level hardware access, aerobatics of the free flight of fancy, and
practically unlimited possibilities of self-expression. Assembly language is much easier
than most high-level programming languages. It is considerably easier than C++, and it is
possible to master Assembly within several months. The key point here is initially taking
the right start, and, looking ahead with confidence, proceed in the right direction instead of
blindly wandering in the dark.
Hacker that hasn't mastered the Assembly language is not a hacker. Such a person is like an
oarsman without an oar. It is impossible to achieve serious goals in the field of hacking
having mastered only high-level programming languages. To investigate the application,
source code of which is not available (and, most frequently this is exactly the case), it is
necessary to discover and analyze its algorithm, which is spread over the jungle of machine
code. There are lots of translators from machine code to assembler (such programs are
known as disassemblers); however, it is impossible to recover the source code by machine
code.
Investigation of undocumented features of the operating system is also carried out using
Assembly. Other tasks that can be done using this language include searching for
backdoors, neutralizing viruses, customizing applications for the hacker's own goals,
reverse engineering, cracking secret algorithms… In fact, this list is endless. The area of
application of the Assembly language is so wide that it is much easier to list the areas to
which it has no relation at all.
Assembly language is a powerful tool that gives unlimited power over the system. This is
not an arcane theory. On the contrary, this is hardcore. Having mastered it, you'll master
such techniques as self-modifying code, polymorphism, andtidebugging and
antidisassembling techniques, exploits, genetically modified worms, espionage over the
system events, password eavesdropping, etc.
In other words, Assembly is like the sixth, or even seventh, sense combined with analogue
of another sight. For instance, consider the situation when an infamous General Protection
Fault window pops up, containing an error message informing the user about critical error.
Application programmers, cursing and swearing, obediently close the application and to be
at a loss (they only guess that this is the program's karma). All these messages and dumps
are unintelligible for them. However, the situation is absolutely different for the ones that
have mastered Assembly! These guys go by the specified address, correct the bug in cold
blood, and often manage to recover unsaved data!
Assembly Language Philosophy
Assembly is a low-level language that operates with machine codes and concepts. Do not
even try to find a command for displaying the "hello, world!" string, because there is not
such command here. Here is a brief list of the actions that the processor is capable of
carrying out: add, subtract, divide, multiply or compare two numbers, and, depending on
the result of this operation, pass the control to appropriate program branch, send a number
from location to location, write a number to the port or read some number from a port.
Peripheral devices are controlled exactly through the ports or through special memory
region (video memory, for example). In order to output a character to the terminal, it is
necessary to first consult the technical documentation for the video adapter, and in order to
read a sector from the hard disk – the documentation supplied along with that drive.
Fortunately, this part of the job is delegated to hardware drivers, and programmer mustn't
carry it out manually. Furthermore, in normal operating systems, such as Windows NT,
ports are not available from the application level.
Another machine concept that needs to be mastered is the register. Well, in fact it is very
difficult to explain what is the register without sin against the truth. The register is
something that looks like a register, but isn't such a thing, in fact. In ancient computer a
register was a part of the data-processing device. The processor cannot add two numbers
loaded into the main memory. Before carrying out this operation, it must load them into
registers. This is the situation as it appears at the micro level. Above this level, there is the
machine command interpreter, without which no contemporary processor can do. Yes,
machine codes are interpreted. PDP-11 already didn't require the programmer to previously
load the data into the registers, and pretended that it was taking them directly from the
memory. In reality, however, the data were secretly loaded into the internal registers. After
carrying out arithmetical operations, the result was either written to the memory or into a
"logical" register, which actually was a cell of a very fast memory.
In x86, registers are as virtual as they were in PDP. However, in contrast to PDP, they have
partially retained their specialization. Some commands (mul, for example) work with
strictly defined set of registers that cannot be changed. This is the payment for the
backward compatibility with previous versions. Another disappointing limitation consists in
that x86 doesn't support "memory to memory" addressing, and one of the numbers being
processed must be loaded into the register or represent a direct value. Actually, 5-% of an
Assembly program is made up of the data exchange commands.
All these actions take place on the arena called address space. Address space is simply a
set of the virtual memory cells available to the processor. Operating systems like
Windows 9x and most UNIX clones create an individual 4-GB region of memory for each
application, where it is possible to distinguish at least three areas: code segment, data
segment and the stack.
The stack is simply method of storing data. In fact, it is something like combination of list
and array (see the famous book by Donald Knuth). The push command loads a new
portion of data on top of the stack, and the pop command retrieves the contents of the stack
top. This allows for storing data in memory without the need of taking care of their absolute
addresses. Actually, this is very convenient! Function calls are carried out in exactly this
manner. The call func command pushes the address of the next command onto the
stack, and ret pops it from the stack. The pointer to the current position of the stack top is
stored in the ESP register. As relates to the stack bottom, only the length of the address
space formally limits the stack. In practice, it is limited by the amount of memory allocated
to it. The direction of the stack growth is from higher addresses to the lower ones. In other
words, the stack grows from bottom to top.
The EIP register contains the pointer to the next executable command. It is not available to
direct modification. The EAX, EBX, ECX, EDX, ESI, EDI, EBP registers are called general-
purpose registers and can freely participate in any arithmetical operations or memory-
access operations. There are seven such 32-bit registers in total. The first four registers
(EAX, EBX, ECX and EDX) can be accessed by their 16-bit halves storing the least significant
words — AX, BX, CX, and DX, respectively. Each of these words, in turn, is divided into
most significant and least significant bytes — AH/AL, BH/BL, CH/CL, and DH/DL,
respectively. It is important to understand that AL, AX and EAX are not three different
registers, but, on the contrary, three different parts of the same register!
Furthermore, there are also other registers – segment registers, multimedia registers,
mathematical coprocessor registers, debug registers, etc. Without a comprehensive manual,
beginners can be easily confused and get lost in this jungle. At the beginning, however, I
won't consider them too excessively.

Explaining Assembly Concepts on C Examples


The main Assembly command is the mov data-exchange command which can be
considered as a kind of equivalent of the assignment operator. For example, c = 0x333
can be written as follows in the Assembly language: mov eax, 333h (note the difference
in format used for hexadecimal number representation). Also, it is possible to write it as
follows: mov eax, ebx (write the value of the EBX register into the EAX register EBX).
The pointers are enclosed in square brackets. The a = *b construct of the C language will
appear as follows in Assembly: mov eax, [ebx]. If desired, it is possible to add the
offset to the pointer. Thus, a = b[0x66] in C is equivalent to mov eax, [ebx + 66h]
in Assembly.
Variables are declared using the following directives: db (one-byte variable), dw (two-byte
variable), dd (double-word variable), etc. Sign property is not specified when variables are
declared. The same variable can be interpreted differently in different program sections:
both as signed or unsigned number. To load a variable into the pointer, either the lea or the
mov command with the offset directives are used. Consider the following example
(Listing 2.1):

Listing 2.1. Main methods of data exchange

LEA EDX,b ;// The EDX register contains the pointer to the b variable
MOV EBX,a ;// The EBX register contains the value of the a variable
MOV ECX, offset a ;// The ECX register contains the pointer to the a variable
MOV [EDX],EBX ;// Copy the a variable to the b variable a b

MOV b, EBX ;// Copy the a variable to the b variable

MOV b, a ;// !!!Error!!! This is an invalid operation!!!


;// Both arguments of the MOV command cannot be
;// located in the memory!

a DD 66h ;// Declare the a variable of the double word type


;// and initialize it with the 66h number
b DD ? ;// Declare uninitalized b variable of the dword type
Now consider conditional jumps. Assembly language has no if operators, and in practice
this operation must be carried out in two stages. The cmp command allows to compare two
numbers, and saves the result of this comparison in flags. Flags are special-purpose bits of
the specialized register, which will not be considered here, because its detailed description
would take too much precious book space. For the moment, it is enough to memorize, that
there are three main states: less than (below or less), greater than (above or greater) and
equal to (equal). The family of the conditional operator commands jx check the condition
specified as x and, if this condition is true, carry out the jump to the specified address. For
example, je jumps if two numbers are equal (Jump if Equal), and jne – if two numbers
are not equal (Jump if Not Equal). Similarly, jb/ja commands work with unsigned
numbers, and jl/jg — with the unsigned ones. Any two conditions that are not mutually
exclusive can be combined, for example: jbe — jump if one unsigned number is below or
equal to another one. The jmp command corresponds to unconditional jump.
The cmp/jx construct is very much alike the IF xxx GOTO in basic rather than to a
similar C construct. Here are several examples illustrating its use (Listing 2.2):

Listing 2.2. Main types of conditional jumps

CMP EAX, EBX ;// Compare EAX and EBX


JZ xxx ;// Jump to xxx in case if they are equal

CMP [ECX], EDX ;// Compare *ECX and EDX


JAE yyy ;// If unsigned *ECX >= EDX then jump to yyy
Implementation of function calls in Assembly is much more complicated than in C. First,
there are at least two types of calling conventions – C and Pascal. According to the C
calling convention, function arguments are passed to the function from right to left, and the
code that calls the function clears them from the stack. According to the Pascal calling
convention, the situation is absolutely opposite! Arguments are passed from left to right,
and the functions must clear them from the stack on its own. Most API functions of the
Windows operating system observe the combined calling convention known as stdcall,
according to which arguments are passed according to the C calling convention, and cleared
from the stack according to the Pascal calling convention. The value returned by the
function is loaded into the EAX register. For passing 64-bit values, the EDX:EAX pair of
registers is used. Naturally, these conventions must be observed only when calling external
functions, such as API functions, library functions, etc. Internal functions are not required
to correspond these conventions, and can pass argument in any imaginable manner, for
example, via the registers.
Listing 2.3. demonstrates the simplest example of the function call:

Listing 2.3. Calling API functions of the operating system

PUSH offset LibName ;// Push the string offset onto the stack
CALL LoadLibrary ;// Function call
MOV h, EAX ;// EAX contains the returned value

Assembly Inserts as a Workbench


It is extremely difficult to program in pure Assembly! A program with minimal
functionality contains a horrible lot of various constructs, interacting in a sophisticated
manner to one another, and starting to behave unpredictably without any notice. At one
stroke, you isolate yourself from an environment to which you are accustomed already. It is
very easy to add two numbers in Assembly, however, displaying the result on the screen is
a different matter…
Assembly inserts are the way out. Classical manuals on the Assembly language, from the
starting pages drown the reader in the depths of the system programming, which, in fact, is
frightening by its awful complexity of the processor architecture and operating system.
Assembly inserts, on the other hand, allow programmers to remain in the development
environment (C/C++ and/or Pascal), to which they are accustomed and gradually, without
abrupt changes, allow them to get acquainted with the internal world of the processor. The
same approach allows the programmers to begin the study of Assembly language from the
32-bit protected processor mode. The point is, that in its pure form the protected mode is so
complicated, that it is practically impossible to master it from the jumpstart. Because of
this, all classical manuals start the discussion from the description of the obsolete 16-bit
real mode. This discussion turns out to be not only unnecessary dead weight, but also an
excellent means of confusing beginners. Perhaps you remember that famous proverb:
"forget what you have learnt before…". Based on my own personal experience, and on the
experience of my friends and colleagues, I dare say that the approach to study of the
Assembly language based on the Assembly inserts surpasses all the other ones by at least
the following two categories:
 Efficiency – practically within three or four days of intense learning the programmer
that has not been involved in Assembly programming before, will write quite decent
programs
 The ease of mastering – study of the Assembly language goes seamlessly, without
any difficulties, and doesn't require serious efforts. At no stage of the learning process
does the student risk getting drowned in tons of difficult and irrelevant information.
Each further step is intuitively clear, and all potential obstacles have carefully been
removed from the road.
Well, we have no need to wait! To declare assembly inserts in Microsoft Visual C++ the
__asm keyword is used. The simplest Assembly insert appears as shown in Listing 2.4.
Listing 2.4. The simplest Assembly insert that adds two numbers

main()
{
int a = 1; // Declare the a variable and assign it the value of 1
int b = 2; // Declare the b variable and assign it the value of 1
int c; // Declare the c variable, without initializing it

// Start of the Assembly insert


__asm{
mov eax, a ;// Load the value of the a variable into the EAX register
mov ebx, b ;// Load the value of the b variable into the EBX register
add eax, ebx ;// Add EAX to EBX, and write the result into EAX
mov c, eax ;// Load the EAX value into the c variable
}
// End of Assembly insert

// Output the contents of the c variable


// using the customary printf function
printf("a + b = %x + %x = %x\n",a,b,c);
}

You might also like