Professional Documents
Culture Documents
History
Basic was introduced in 1964 at Dartmouth college in the US as one of the first computer timesharing systems that allowed students to actually log on to and use a computer interactively. It ran on mainframe computers with teletype terminals attached. It was an interpretive language so that as you typed commands in it stored them and then executed them when you said RUN
IBM 5100
In 1975 IBM introduced the 5100, the first personal computer with built in screen and storage. It had the option of being supplied either with Basic or with APL another interpretive language. Expensive and not widely used.
PET
Launched in 1977 the PET was the first successful mass market personal computer. It again came with Basic as an interpreter. Much cheaper due to use of 8 bit microprocessors.
Tiny Basic
Tiny basic was developed by amateurs wanting a small programming language that would fit into 2 kilobytes of ROM which was a standard cheap ROM chip in 1977. It ran on hobby machines like the Altair ( top left) and can still be obtained for contemporary hobby machines like the TinyBrick computer (bottom left)
A version Ti Basic also run on some calculators like the TI-83 on the right which use the Z80 chip used on early PCs
Here is a very simple Tiny Basic programme 10 FOR I := 1 TO 5 20 PRINT I 30 NEXT I 40 END The language has numbered lines which should go up in ascending order. On an interpreter the line numbers normally substitute for an editor, allowing you to replace individual lines
Control structure
The version you will be working with is very simple it only has three control structures: For loops Goto statements If statements
For loops
A FOR loop has the structure 10 FOR I := 1 TO 100 20 LET A := I+A 30 NEXT I The lines between the FOR and the NEXT lines are executed 100 times in this case. For loops can be nested provided that each loop uses a different iteration variable.
Jumps
An unconditional jump to another line can be done using the GOTO statement, a conditional jump can be done using an IF statement which transfers to another line. 10 IF A>B THEN 30 20 GOTO 40 30 PRINT A 35 GOTO 50 40 PRINT B 50 END
Input output
There are 3 input output commands supported in the version of basic you will be working with, shown below. They allow reading and writing of integers. 10 READ I 20 PRINT 2*I 30 PRINTLN 40 END
LET statements
The LET keyword allows you to perform assignments to variables 320 LET J:= I*2+1 There is no need to declare variables. In the original Basic variables were either single letters, or a letter followed by a digit thus P,S,N1, Q9, T would all be valid In many Tiny Basic systems only a single letter is used.
REM allows comments GOSUB and RETURN allow for subroutines DIM allows for array variables.
10 DIM A(10) 20 GOSUB 100 30 PRINT S 100 REM calculate sum in A 105 FOR I:= 1 TO 10 110 LET S:= S+A(I) 120 NEXT I 130 RETURN
Your tasks
You will be working with a Basic compiler that I have written and will have to modify it to extend the language slightly 1. Allow variables to be strings of letters and digits starting with a letter 2. Add the REM statement to the language to allow comments 3. Add the DIM statement and support for array indexing to the language.
Interpret or compile
The early versions of Basic were all interpreters, that is to say the statements were translated into equivalent machine operations every time they were executed. Advantages of interpreters
Allow interactive use Can be implemented in very little code
Advantage of compilers
Allow much faster execution once programme is compiled
Phases of translation
They differ in the way they cause execution to take place. In an interpreter a computed jump is performed to a routine that will execute a particular type of statement. In a compiler a sequence of machine instructions are output.
10 LET A:=12
91
92
000C
dispatcher
Note codes are above hex 80 decimal 128, and thus outside ASCII range
Why tokenize
It performs data compression so the tokenized programme takes up less space in memory this used to be very important It allows faster interpretation since what is being interpreted is now a byte code which can be interpreted by a simple mechanism. Note that in Basic the semantics are always defined by the first token.
Token 80H 81H 82H 83H 84H Note codes are above 80H decimal 128, and thus outside ASCII range
Tokenizing Keyboard
On small computers and calculators the tokenizer was sometimes integrated into the keyboard scanning software so that it directly returned a token for a single key stroke, so that for example SHIFT P generated the PRINT token.
Assembler works on machine registers On Intel assemblers the mov instruction moves data mov ax,[varstart] Means load the ax register with the word at label varstart Mov ax, [si] Means move the ax register with the word pointed to by si register Mov ax, [si*2+mylab] Means mov the word at address 2*si+varstart into ax Case is not significant in opcodes or register names
arithmetic
Add ax, varstart Means add the address of label varstart to ax sub ax,[si] Means ax = ax- memory[si ] Add ax, si Means add the si register to ax
A 41
Number 12
91
92
000C
This shows the typical feature of a fast interpreter, a small short sequence of assembly code that performs rapid dispatch to interpretive routines using byte codes. Only 3 instructions are used to do the dispatch
; checks it is a letter ; address of var in ax ; look for a := ; evaluate expression ; result in ax pop di ; recover the address mov [di],ax ; do the assignment jmp advance ; this moves to the ; next line Note that the interpreter is made up of a sequence of calls to routines that do subsidiary matching tasks to recognise <letter>. := <epression>
checkletter: movxb ax,[si] ; sub al,A ; jle notletter ; cmp al, 26 ; jge notletter ; ; ax now in range inc si ; add ax,ax ; add ax, varstart; ; return
get next char into ax register subtract letter A if negative was not a letter compare with 26 if al>=26 not a letter 0..25 move past the letter map to range 0..50 add the start address of the variables in memory
Expressions
Suppose we define an expression to be either 1. An identifier : A, B etc 2. A number : 1, 14 etc 3. An expression followed by an operator followed by another expression: A+1, B-C etc 4. An expression in brackets : ( A+9) The interpreter routine for expressions must recognise these cases
Expression code
expression: cmp [si],( ; check for ( jneq nobracket inc si ; found it so move past call expression ; must be an expression cmp [si],) ; check we have ) jneq error ; othewise it is an error inc si ; move past jmp checkop ; go look for an operator nobracket: cmp [si],numprefix; check for number prefix jneq mustbeletter ; look for a letter mov ax,[si+1] ; assume the number follows add si ,3 ; move pointer past it jmp checkop ; go look for an operator
At this point we have the expression value so far in the ax register. We will only look for + and here, you can imagine the other operations
Checkop: cmp [si],+ jne tryminus inc si ; move past push ax ; save value so far call expression; look for another expression pop di ; get back first value add ax,di ; add to the second return ; with result in ax tryminus: cmp [si],- etc etc
Efficiency
I have obviously only given you a part of an interpreter here but it is enough to show several things 1. The style of tight hand coded assembler that they typically used allowed a very small interpreter. 2. The way the code is structured by the syntax of the Basic 3. That you are lucky if one instruction in 10 or 20 does real computational work, rather than parsing and checking
The major motivation is to get greater speed. Against this the complexity of a compiler is much greater, both the size of the compiler and the number of tools needed to build it. Also you have a slower debug cycle time for programmes: edit, compile, run instead of just edit, run