Professional Documents
Culture Documents
Start
Low-level programming means "speaking" with the computer in a language that is native to
it, the joy of low-level hardware access, aerobatics of the free flight of fancy, and
practically unlimited possibilities of self-expression. Assembly language is much easier
than most high-level programming languages. It is considerably easier than C++, and it is
possible to master Assembly within several months. The key point here is initially taking
the right start, and, looking ahead with confidence, proceed in the right direction instead of
blindly wandering in the dark.
Hacker that hasn't mastered the Assembly language is not a hacker. Such a person is like an
oarsman without an oar. It is impossible to achieve serious goals in the field of hacking
having mastered only high-level programming languages. To investigate the application,
source code of which is not available (and, most frequently this is exactly the case), it is
necessary to discover and analyze its algorithm, which is spread over the jungle of machine
code. There are lots of translators from machine code to assembler (such programs are
known as disassemblers); however, it is impossible to recover the source code by machine
code.
Investigation of undocumented features of the operating system is also carried out using
Assembly. Other tasks that can be done using this language include searching for
backdoors, neutralizing viruses, customizing applications for the hacker's own goals,
reverse engineering, cracking secret algorithms… In fact, this list is endless. The area of
application of the Assembly language is so wide that it is much easier to list the areas to
which it has no relation at all.
Assembly language is a powerful tool that gives unlimited power over the system. This is
not an arcane theory. On the contrary, this is hardcore. Having mastered it, you'll master
such techniques as self-modifying code, polymorphism, andtidebugging and
antidisassembling techniques, exploits, genetically modified worms, espionage over the
system events, password eavesdropping, etc.
In other words, Assembly is like the sixth, or even seventh, sense combined with analogue
of another sight. For instance, consider the situation when an infamous General Protection
Fault window pops up, containing an error message informing the user about critical error.
Application programmers, cursing and swearing, obediently close the application and to be
at a loss (they only guess that this is the program's karma). All these messages and dumps
are unintelligible for them. However, the situation is absolutely different for the ones that
have mastered Assembly! These guys go by the specified address, correct the bug in cold
blood, and often manage to recover unsaved data!
Assembly Language Philosophy
Assembly is a low-level language that operates with machine codes and concepts. Do not
even try to find a command for displaying the "hello, world!" string, because there is not
such command here. Here is a brief list of the actions that the processor is capable of
carrying out: add, subtract, divide, multiply or compare two numbers, and, depending on
the result of this operation, pass the control to appropriate program branch, send a number
from location to location, write a number to the port or read some number from a port.
Peripheral devices are controlled exactly through the ports or through special memory
region (video memory, for example). In order to output a character to the terminal, it is
necessary to first consult the technical documentation for the video adapter, and in order to
read a sector from the hard disk – the documentation supplied along with that drive.
Fortunately, this part of the job is delegated to hardware drivers, and programmer mustn't
carry it out manually. Furthermore, in normal operating systems, such as Windows NT,
ports are not available from the application level.
Another machine concept that needs to be mastered is the register. Well, in fact it is very
difficult to explain what is the register without sin against the truth. The register is
something that looks like a register, but isn't such a thing, in fact. In ancient computer a
register was a part of the data-processing device. The processor cannot add two numbers
loaded into the main memory. Before carrying out this operation, it must load them into
registers. This is the situation as it appears at the micro level. Above this level, there is the
machine command interpreter, without which no contemporary processor can do. Yes,
machine codes are interpreted. PDP-11 already didn't require the programmer to previously
load the data into the registers, and pretended that it was taking them directly from the
memory. In reality, however, the data were secretly loaded into the internal registers. After
carrying out arithmetical operations, the result was either written to the memory or into a
"logical" register, which actually was a cell of a very fast memory.
In x86, registers are as virtual as they were in PDP. However, in contrast to PDP, they have
partially retained their specialization. Some commands (mul, for example) work with
strictly defined set of registers that cannot be changed. This is the payment for the
backward compatibility with previous versions. Another disappointing limitation consists in
that x86 doesn't support "memory to memory" addressing, and one of the numbers being
processed must be loaded into the register or represent a direct value. Actually, 5-% of an
Assembly program is made up of the data exchange commands.
All these actions take place on the arena called address space. Address space is simply a
set of the virtual memory cells available to the processor. Operating systems like
Windows 9x and most UNIX clones create an individual 4-GB region of memory for each
application, where it is possible to distinguish at least three areas: code segment, data
segment and the stack.
The stack is simply method of storing data. In fact, it is something like combination of list
and array (see the famous book by Donald Knuth). The push command loads a new
portion of data on top of the stack, and the pop command retrieves the contents of the stack
top. This allows for storing data in memory without the need of taking care of their absolute
addresses. Actually, this is very convenient! Function calls are carried out in exactly this
manner. The call func command pushes the address of the next command onto the
stack, and ret pops it from the stack. The pointer to the current position of the stack top is
stored in the ESP register. As relates to the stack bottom, only the length of the address
space formally limits the stack. In practice, it is limited by the amount of memory allocated
to it. The direction of the stack growth is from higher addresses to the lower ones. In other
words, the stack grows from bottom to top.
The EIP register contains the pointer to the next executable command. It is not available to
direct modification. The EAX, EBX, ECX, EDX, ESI, EDI, EBP registers are called general-
purpose registers and can freely participate in any arithmetical operations or memory-
access operations. There are seven such 32-bit registers in total. The first four registers
(EAX, EBX, ECX and EDX) can be accessed by their 16-bit halves storing the least significant
words — AX, BX, CX, and DX, respectively. Each of these words, in turn, is divided into
most significant and least significant bytes — AH/AL, BH/BL, CH/CL, and DH/DL,
respectively. It is important to understand that AL, AX and EAX are not three different
registers, but, on the contrary, three different parts of the same register!
Furthermore, there are also other registers – segment registers, multimedia registers,
mathematical coprocessor registers, debug registers, etc. Without a comprehensive manual,
beginners can be easily confused and get lost in this jungle. At the beginning, however, I
won't consider them too excessively.
LEA EDX,b ;// The EDX register contains the pointer to the b variable
MOV EBX,a ;// The EBX register contains the value of the a variable
MOV ECX, offset a ;// The ECX register contains the pointer to the a variable
MOV [EDX],EBX ;// Copy the a variable to the b variable a b
PUSH offset LibName ;// Push the string offset onto the stack
CALL LoadLibrary ;// Function call
MOV h, EAX ;// EAX contains the returned value
main()
{
int a = 1; // Declare the a variable and assign it the value of 1
int b = 2; // Declare the b variable and assign it the value of 1
int c; // Declare the c variable, without initializing it