You are on page 1of 5

Language Translators

Language translators convert programming source code into language that the computer
processor understands. Programming source code has various structures and commands, but
computer processors only understand machine language. Different types of translations must
occur to turn programming source code into machine language, which is made up of bits of
binary data. The three major types of language translators are compilers, assemblers, and
interpreters.

1. Compilers
Most 3GL and higher-level programming languages use a compiler for language translation. A
compiler is a special program that takes written source code and turns it into machine language.
When a compiler executes, it analyzes all of the language statements in the source code and
builds the machine language object code. After a program is compiled, it is then a form that the
processor can execute one instruction at a time.

In some operating systems, an additional step called linking is required after compilation.
Linking resolves the relative location of instructions and data when more than one object module
needs to be run at the same time and both modules cross-reference each otherüs instruction
sequences or data.

Most high-level programming languages come with a compiler. However, object code is unique
for each type of computer. Many different compilers exist for each language in order to translate
for each type of computer. In addition, the compiler industry is quite competitive, so there are
actually many compilers for each language on each type of computer. Although they require an
extra step before execution, compiled programs often run faster than programs executed using an
interpreter.

A compiler is a computer program (or set of programs) that transforms source code written in a
computer language (the source language) into another computer language (the target language,
often having a binary form known as object code). The most common reason for wanting to
transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level
programming language to a lower level language (e.g., assembly language or machine code). A
program that translates from a low level language to a higher level one is a decompiler. A
program that translates between high-level languages is usually called a language translator,
source to source translator, or language converter. A language rewriter is usually a program
that translates the form of expressions without a change of language.

A compiler is likely to perform many or all of the following operations: lexical analysis,
preprocessing, parsing, semantic analysis, code generation, and code optimization.
a) NATIVE AND CROSS COMPILERS

A native or hosted compiler is one whose output is intended to directly run on the same type of
computer and operating system that the compiler itself runs on. The output of a cross compiler is
designed to run on a different platform. Cross compilers are often used when developing
software for embedded systems that are not intended to support a software development
environment.

The output of a compiler that produces code for a virtual machine (VM) may or may not be
executed on the same platform as the compiler that produced it. For this reason such compilers
are not usually classified as native or cross compilers.

b) ONE PASS AND MULTI PASS COMPILERS

Classifying compilers by number of passes has its background in the hardware resource
limitations of computers. Compiling involves performing lots of work and early computers did
not have enough memory to contain one program that did all of this work. So compilers were
split up into smaller programs which each made a pass over the source (or some representation of
it) performing some of the required analysis and translations.

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of
writing a compiler and one pass compilers generally compile faster than multi-pass compilers.
Many languages were designed so that they could be compiled in a single pass (e.g., Pascal).

The front end analyzes the source code to build an internal representation of the program, called
the intermediate representation or IR. It also manages the symbol table, a data structure mapping
each symbol in the source code to associated information such as location, type and scope. This
is done over several phases, which includes some of the following:

1. Line reconstruction. Languages which strop their keywords or allow arbitrary spaces
within identifiers require a phase before parsing, which converts the input character
sequence to a canonical form ready for the parser. The top-down, recursive-descent,
table-driven parsers used in the 1960s typically read the source one character at a time
and did not require a separate tokenizing phase. Atlas Autocode, and Imp (and some
implementations of Algol and Coral66) are examples of stropped languages whose
compilers would have a Line Reconstruction phase.
2. Lexical analysis breaks the source code text into small pieces called tokens. Each token
is a single atomic unit of the language, for instance a keyword, identifier or symbol name.
The token syntax is typically a regular language, so a finite state automaton constructed
from a regular expression can be used to recognize it. This phase is also called lexing or
scanning, and the software doing lexical analysis is called a lexical analyzer or scanner.
3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports
macro substitution and conditional compilation. Typically the preprocessing phase occurs
before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates
lexical tokens rather than syntactic forms. However, some languages such as Scheme
support macro substitutions based on syntactic forms.
4. Syntax analysis involves parsing the token sequence to identify the syntactic structure of
the program. This phase typically builds a parse tree, which replaces the linear sequence
of tokens with a tree structure built according to the rules of a formal grammar which
define the language's syntax. The parse tree is often analyzed, augmented, and
transformed by later phases in the compiler.
5. Semantic analysis is the phase in which the compiler adds semantic information to the
parse tree and builds the symbol table. This phase performs semantic checks such as type
checking (checking for type errors), or object binding (associating variable and function
references with their definitions), or definite assignment (requiring all local variables to
be initialized before use), rejecting incorrect programs or issuing warnings. Semantic
analysis usually requires a complete parse tree, meaning that this phase logically follows
the parsing phase, and logically precedes the code generation phase, though it is often
possible to fold multiple phases into one pass over the code in a compiler
implementation.

2. Assembler
An assembler translates assembly language into machine language. Assembly language is one
step removed from machine language. It uses computer-specific commands and structure similar
to machine language, but assembly language uses names instead of numbers.

An assembler is similar to a compiler, but it is specific to translating programs written in


assembly language into machine language. To do this, the assembler takes basic computer
instructions from assembly language and converts them into a pattern of bits for the computer
processor to use to perform its operations.

Typically a modern assembler creates object code by translating assembly instruction


mnemonics into opcodes, and by resolving symbolic names for memory locations and other
entities.[1] The use of symbolic references is a key feature of assemblers, saving tedious
calculations and manual address updates after program modifications. Most assemblers also
include macro facilities for performing textual substitution—e.g., to generate common short
sequences of instructions to run inline, instead of in a subroutine.

Assemblers are generally simpler to write than compilers for high-level languages, and have
been available since the 1950s. Modern assemblers, especially for RISC based architectures,
such as MIPS, Sun SPARC, and HP PA-RISC, as well as x86(-64), optimize instruction
scheduling to exploit the CPU pipeline efficiently.

There are two types of assemblers based on how many passes through the source are needed to
produce the executable program. One-pass assemblers go through the source code once and
assumes that all symbols will be defined before any instruction that references them. Two-pass
assemblers (and multi-pass assemblers) create a table with all unresolved symbols in the first
pass, then use the 2nd pass to resolve these addresses. The advantage in one-pass assemblers is
speed - which is not as important as it once was with advances in computer speed and
capabilities. The advantage of the two-pass assembler is that symbols can be defined anywhere
in the program source. As a result, the program can be defined in a more logical and meaningful
way. This makes two-pass assembler programs easier to read and maintain.

More sophisticated high-level assemblers provide language abstractions such as:

 Advanced control structures


 High-level procedure/function declarations and invocations
 High-level abstract data types, including structures/records, unions, classes, and sets
 Sophisticated macro processing
 Object-Oriented features such as encapsulation, polymorphism, inheritance, interfaces

3. Interpreters
Many high-level programming languages have the option of using an interpreter instead of a
compiler. Some of these languages exclusively use an interpreter. An interpreter behaves very
differently from compilers and assemblers. It converts programs into machine-executable form
each time they are executed. It analyzes and executes each line of source code, in order, without
looking at the entire program. Instead of requiring a step before program execution, an
interpreter processes the program as it is being executed.

In computer science, an interpreter is a computer program which reads source code written in a
high-level programming language, transforms the code to machine code, and executes the
machine code. Using an interpreter, a single source file can produce equal results even in vastly
different systems (e.g. a PC and a PlayStation3). Using a compiler, a single source file can
produce equal results only if it is compiled to distinct, system-specific executables.

Interpreting code is slower than running the compiled code because the interpreter must analyze
each statement in the program each time it is executed and then perform the desired action,
whereas the compiled code just performs the action within a fixed context determined by the
compilation. This run-time analysis is known as "interpretive overhead". Access to variables is
also slower in an interpreter because the mapping of identifiers to storage locations must be done
repeatedly at run-time rather than at compile time. There are various compromises between the
development speed when using an interpreter and the execution speed when using a compiler.
Some systems (e.g., some LISPs) allow interpreted and compiled code to call each other and to
share variables. This means that once a routine has been tested and debugged under the
interpreter it can be compiled and thus benefit from faster execution while other routines are
being developed. Many interpreters do not execute the source code as it stands but convert it into
some more compact internal form. For example, some BASIC interpreters replace keywords with
single byte tokens which can be used to find the instruction in a jump table. An interpreter might
well use the same lexical analyzer and parser as the compiler and then interpret the resulting
abstract syntax tree.
A compiler takes a text file written in a programming language, and converts it into binary code
that a processor can understand: it makes an ".exe" file. You compile only once, then always run
the "exe" file. Borland Turbo C is a compiler: you write in C in a text file, then you compile to
get and exe file.
An interpreter does the same, BUT in real time: each time you run the code, it is "compiled", line
by line: Basic is an interpreter.
An assembler is similar, in the way that, instead of taking a plain text file, ie in C, it takes a code
written in Assembler Mnemonics, and convert it into binaries.
All "executable" files are in binaries (just 1's and 0's) - maybe viewed in hex (0x12de...)
In a nutshell: A compiler takes your source programming code and converts it into an executable
form that the computer can understand. This is a very broad explanation though, because some
compilers only go so far as to convert it into a binary file that must then be "linked" with several
other libraries of code before it can actually execute. Other compilers can compile straight to
executable code. Still other compilers convert it to a sort of tokenized code that still needs to be
semi-interpreted by a virtual machine, such as Java.
An interpreter does not compile code. Instead, it typically reads a source code file statement by
statement and then executes it. Most early forms of BASIC were interpeted languages.
An assembler is similar to a compiler, except that it takes source code written in "Assembly
Language", which is just shorthand for the actual machine/processor specific instructions, values,
and memory locations, and it converts those instructions to the equivalent machine language.
Very fast and small executable code but very tedious to write.
Incidentally, many compilers, especially older C compilers, for example, actually convert the C
source code to assembly language and then pass it through an assembler. The benefit is that
someone adept at assembly can tweak the compiler-generatd assembler code for speed or size.

You might also like