Professional Documents
Culture Documents
Kenneth C. Louden
San Jose State University
1. INTRODUCTION
2. SCANNING
3. CONTEXT-FREE GRMMARS AND PARSING
4. TOP-DOWN PARSING
5. BOTTOM-UP PARSING
6. SEMANTIC ANALYSIS
7. RUNTIME ENVIRONMENT
8. CODE GENERATION
Main References:
(1) 《编译原理及实践》,(美)Kenneth C. Louden 著,冯博琴、冯岚等译,机械工
业出版社。
(2) 《编译程序原理与技术》,李赣生、王华民编著,清华大学出版社。
(3) 《程序设计语言:编译原理》,陈火旺等编著,国防工业出版社。
1
Chapter 1 Introduction
Source Target
compiler
program program
2
1.1 A brief history of compiler
1. In the late1940s, the stored-program computer invented by John von Neumann
Programs were written in machine language, such as(Intel 8x86 in IBM PCs)
c7 06 0000 0002
means to move number 2 to the location 0000
3. FORTRAN language and its compiler: between 1954 and 1957, developed by
the team at IBM , John Backus.
The first compiler was developed
3
1.2 Programs related to compilers
1. Interpreters: Another language translator.
It executes the source program immediately.
Interpreters
Compilers Depending on the language in use and the situation
Interpreters: BASIC ,LISP and so on.
2. Assemblers
A translator translates assembly language into object code
3. Linkers
Collects code separately compiled or assembled in different object files into a file.
Connects the code for standard library functions.
Connects resources supplied by the operating system of the computer.
4. Loaders
Relocatable : the code is not completely fixed .
Loaders resolve all relocatable address relative to the starting address.
5. Preprocessors
Preprocessors: delete comments, include other files, perform macro substitutions.
6. Editors
Produce a standard file( structure based editors)
7. Debuggers
Determine execution errors in a compiled program.
8. Profilers
Collect statistics on the behavior of an object program during execution.
Statistics: the number of times each procedure is called, the percentage of
execution time spent in each procedure.
9. project managers
coordinate the files being worked on by different people.
sccs(source code control system ) and rcs(revision control system) are project
manager programs on Unix systems.
4
The translation process
The phase of a compiler:
Source code
scanner
tokens
parser
Literal table
Syntax tree
Semantic analyzer
Symbol table
Annotated tree
Code generator
Target code
Target code
5
1. The scanner
Lexical analysis: input a stream of characters, output tokens
a[index] = 4 + 2
Tokens: a, [, index, ], = , 4, + , 2
The task of the scanner: the recognition of tokens, enter identifiers into the symbol
table, or enter literal into the literal table.
2. The parser
Determine the structure of the program
Input: the forms of tokens
Output: a parse tree or a syntax tree
a syntax tree is a condensation of the information contained in the parse tree.
expression
Assign-expression
expression = expression
Subscript-expression Additive-expressive
expression + expression
expression
[ expression ]
Number 4 Number 2
Identifier a Identifier index
6
3. The semantic analyzer
Static semantics: be cannot be conveniently expressed as syntax and analyzed by
the parser, but can be determined prior to execution.
For example: declarations and type checking,data types
Assign-expression
Subscript-expression Additive-expression
integer integer
Three–address code:
(intermediate code: any internal representation for the source code used by the
compiler)
t = 4+2
a[index] = t
intermediate code: any internal representation for the source code used by the
compiler. (syntax tree ,three-address, four-address and so on)
7
5. The code generator
Input: intermediate code or IR
Output: machine code, code for the target machine
8
Major data structures in a compiler
1. tokens:
a value of an enumerated data type the sets of tokens
2. the syntax tree:
each node is a record whose fields represent the information
collected by the parser and semantic analyzer
3. the symbol table:
information associated with identifiers:
functions, variables, constants, and data types.
the scanner
the parser insertion
The symbol table interacts with the semantic analyzer deletion
the optimization access
code generation
4. the literal table:
store: constants and strings
need quick insertion and lookup, need not allow deletions
5. intermediate code :
this code kept as an array of text strings, a temporary text file, or as a
linked list of structures.
6. temporary files
using temporary files to hold the products of intermediate steps
for example: backpatch address during code generation
if x = 0 then …….else …….
Code : CMP x, 0
JNE NEXT
………
NEXT:
……….
9
1.5 other issues in compiler structure
Viewing the compiler’s structure from different angles:
1. Analysis and synthesis
analysis :
lexical analysis 、syntax analysis、semantic analysis (optimization)
synthesis:
code generation (optimization)
Advantage: portability
3. passes
passes: process the entire source program several times
the initial pass: construct a syntax tree or intermediate code from the source
The structure and behavior of the runtime environment of the language affect
compiler construction
10
Bootstrapping and porting
Host language: the language in which the compiler itself is written.
T-diagram:
S T
(H is expected to be the same as T)
H
A B B C A C
H H H
A B
A B
H H K
M K
11
The solution to the first situation mentioned above:
A H
A H
B B H
H H
A H
A H
B B K
K K
S T
A H
A H
A A H
Compiler H H
written in own
language A
A H
A H
A A H
Compiler written H
in own language H
A
12
Solution to the porting:
In order to port the compiler from old host H to the new host K, use the old
compiler to produce a cross compiler and recompile the compiler to generate the new
one.
Step 1 A k
A k
A A H
Compiler source H
code retargeted H
to K
Step 2
A K
A H
A A K
Compiler source
code retargeted H K
to K
13
The TINY sample language and compiler
Language TINY: as a running example ( as a source language )
Target language: assembly language (TM machine)
14
The TINY compiler
The TM Machine
The simulator of the TM machine can directly execute the assembly files.
15