Professional Documents
Culture Documents
Department Subject
CHAPTER 1
OBJECTIVE
To be introduced to the language processing system and basic structure of the compiler.
POINTS TO EMPHASIZE
A computer program is a set of instructions for a computer to perform a specific task. Programs generally fall into these categories applications, utilities or services. Programs are written in a programming language then translated into machine code by a compiler and linker so that the computer can execute it directly or run it line by line (interpreted) by an interpreter program. Popular scripting languages like Visual Basic in Microsoft Office are interpreted. The terms executable, program and application are often used to mean the same thing. Strictly though a program or executable refers to one file that can be run. An application can be anything from a single executable program to something the size of Ms Word, made up of many executables. A utility is a small program that performs a task. Developers often write small utilities to do some useful task. Because they are basically throw away programs they are rarely tested or written to be maintainable. A service in Win 32 is a type of application that performs tasks usually without interacting with the user. Services start when the operating system begins. They run in the background, unobstrusively doing many of the tasks needed to keep the computer running smoothly. Typical services include handling Input and Output (I/O) such as the printer queues or networked devices, automated updates, Bluetooth and wireless networked devices. Developers can write their own services though this is harder than creating non service applications and debugging services presents unique challenges; a service is running all of the time, even before a user has logged in and/or after they have logged out.
System programming (or systems programming) is the activity of programming system software. The primary distinguishing characteristic of systems programming when compared to application programming is that application programming aims to produce software which provides services to the user (e.g. word processor), whereas systems programming aims to produce software which provides services to the computer hardware (e.g. disk defragmenter). It requires a greater degree of hardware awareness. System software is computer software designed to operate the computer hardware and to provide a platform for running application software
Preprocessors: Preprocessors have the capability to augment the source language by providing the ability to include files and expand macros. The preprocessor inserts procedure calls to implement the extensions at runtime. Compiler: The language processor that translates the complete source program as a whole in machine code before execution is called compiler.
Compiler
Source program
(a program written in high-level programming language
Target program(the
Error message
Fig 1.3 A compiler
equivalent program in machine code)
Input
Target Program
output
An important role of the compiler is to report any errors in the source program that it detects during the translation process. If the target program is an executable machinelanguage program, it can be called by the user to process inputs and produce outputs. Interpreter: An interpreter is another common language processor. Instead of producing a target program as a translation, an interpreter appears to directly execute the operations specified in the source program on inputs supplied by the user. Example: BASIC, Javascript, LISP.
Source program Input
Interpreter
Fig 1.5 An interpreter
output
The machine-language target program produced by a compiler is usually much faster than an interpreter at mapping inputs to output. An interpreter can give better error diagnostics than a compiler, because it executes the source program statement by statement. Assembler: Assembler is used to translate the program written in Assembly language into machine code. Assembly code is a mnemonic version of machine code in which names are used for machine instructions and memory addresses. An assembler needs to assign memory locations to symbols (identifiers) and use the numeric location addresses in the target machine language produced. The assembler should take care that the same address must be used for all occurrences of a given identifier and two different identifiers must be assigned two different locations. The simplest way to achieve this is to make two passes over the input. During the first pass, each time a new identifier is encountered an address is assigned and the pair (identifier, address) is stored in a symbol table. During the second pass, whenever an identifier is encountered , its address is looked up in the symbol table and this value is used in the generated machine instruction. Linker: It is a tool that merges the object files produced by separate compilation or assembly and creates an executable file. The linker has another input the libraries. To the linker the libraries look like other programs compiled and assembled. Three tasks Searches the program to find library routines used by the program, e.g. printf(), math routines, Determines the memory locations that code from each module willoccupy and relocates its instructions by adjusting absolute references Resolves references among files Loaders: A loader loads the resulting executable file from linker into memory. It takes a relocatable code, alters the relocatable addresses and places the altered instructions and data in memory at proper locations. Steps Read executable files header to determine the size of text and data segments Create a new address space for the program Copies instructions and data into address space Copies arguments passed to the program on the stack
Initializes the machine registers including the stack ptr Jumps to a startup routine that copies the programs arguments from the stack to registers and calls the programs main routine
1.3.Phases Of Compiler
Each phase transforms the source program from one representation into another representation. They communicate with error handlers and the symbol table. A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. Each phase encounters error. However, after detecting an error, a phase must deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected. The syntax and semantic analysis phases usually handle a large fraction of errors detectable by the compiler. The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. For details regarding different phases refer the powerpoint on first chapter.