Professional Documents
Culture Documents
An electronic equipment that performs high-speed arithmetic, logical or transfer operations or that assembles, stores, correlates, or otherwise processes information. Hardware + Software
EXAMPLE
Addition Function is is programmed by dedicated the interconnection A 4-bit adder circuit a specialized, computer of wires Computer Elements are Logic Gates
HARDWARE PROGRAMMABLE
Finite State Machine As powerful (in theory) as any possible computer Still Used in Computability Theory, find out what they cant do
Completed in 1952 Storage: 1000 words 1 word=10 decimal digits Programmed using Paper Tape
Sequential Storage
SOFTWARE PROGRAMMABLE
2.
Memory issues:
(FETCH)
Data Valid Signal on Control Bus Actual Memory Data on Data Bus
3.
(FETCH)
Contains Registers
5.
May Require more Memory Accesses *Some consider step 4. To be part of fetch; others part of Execute May Require Interaction with Peripherals
Quantifying Memory
Measured in the quantity of BInary digiT (BIT)
1 nybble 1 byte 1 word 1 doubleword 1 quadword 1 paragraph 1 page 1 segment (max) = = = = = = = = 4 bits 8 bits 16 bits 32 bits 64 bits 16 256 bytes 65,636 bytes
Standard
Capacity Measures
1 kilobyte 1 megabyte 1 gigabyte 1 terabyte (kB) (MB) (GB) (TB) = = = = 210 bytes 220 bytes 230 bytes 240 bytes
X Y
A1 B1 0 1 2 3 s1 s0 X Y Cin FA S Cout C1
A2 B2 0 1 2 3 s1 s0 X Y Cin FA S Cout C3 C2
S1 S0
ALU
n
n+1
CLK S1 S0
Synchronization
Controls when to Fetch/Execute Generates Timing Signals Handles External Events - Interrupts
Computer Organization
Principle Components
CPU - (Central Processing Unit)
Fetch/Execute Machine
Main Memory
An Array of Storage Locations for Bits Data and Instructions Stored Here
Secondary Storage
Memory that is Cheap Memory that is Slow
80x86 microprocessors
1972 Intel Corp. 8008 1978 8086
20 bit address instead of 16 1MB memory access / 64K Bus Interface Unit/ Execution Unit instruction fetch/ execution Internal Registers : Data = 16bits HW multiplier/Divider External arithmetic processor
8088
8bit external Bus Can use cheap and simple 8bit memory interface 16bit register / 20 bit address bits 1982 XT : 16 K memory, 4.77 MHz
Address Bus
Data Bus
BIU
AH BH CH
AL BL CL DL BP DI SI SP CS ES SS DS IP
ADD
1 2 3 4 5 6
EU
DH
Instruction Queue
ALU/EXECUT
FLAGS
80186/80188
Single Computer in a chip 8086(8) + clock generator + timer + interrupt controller + DMA (Direct Memory Access) controller + IO interface
80286
16bit data/ 24bit address Operation modes: Real mode / protected mode
Real mode : same as 8086 Protected mode : multi- tasking programming Many segments in memory Once in a protected mode, cannot return real mode -> pitfall
80386
1985 : 32 bit data/address 4GB physical memory access Real mode : same as 8086 Protected mode : descriptor register controls tasks, allocates segment Segment size boundary, size Virtual Memory support
80386
Windows, OS/2 2 clock cycles for memory access Cache 16 added instructions 386SX : 16bit data/ 24bit address bits
80486
RISC (Reduced Instruction Set Computer) concept is applied Improved 386 performance 5 stage pipeline 80387 floating point processor DX2/DX4 : fast internal bus/slow external bus(clock)
Pentium
Super-scalar processor Separate 2 Pipelines Code cache/data cache 5 -8 -stage pipeline 64 bit external bus
Instruction optimization
The Pentium processor has been optimized to run critical instructions in fewer clock cycles than the Intel486 processor
Pentium extension
The Pentium processor has fewer instruction set extensions than the Intel486 processors. The Pentium processor also has a set of extensions for multiprocessor (MP) operation. This makes a computer with multiple Pentium processors possible
Pentium Pro
Two separate silicon die : processor + second cache(256K or 512K) Internal bus : 32 bit External data bus : 64 bit Address bus : 36bit for 64GB 100% compatible with 80x86 programs 3 processor instruction + 2 floaing point instructions
P6 Microarchitecture
1st level cache = 8KB instruction cache + 8KB data cache 2nd Level cache = 1 MB static RAM, 64 bits bus CENTERPIECE =Out of Order Execution called Dynamic Execution) 3 functions
Dynamic Data Flow Analysis (DDFA) Speculative Execution (SE) execute instructions beyind a branch
Pentium 4
NetBurst Architecture
1. Hyper pipeline technology: more pipelines: 20 31 pipes 2. Rapid Execution Engine: the ALU in the core of the CPU actually operate at twice the core clock frequency 3. Execution Trace Cache: It stores decoded micro-operations, so that when executing a new instruction, instead of fetching and decoding the instruction again, the CPU directly accesses the decoded microops from the trace cache, thereby saving a considerable time
High clock speeds (up to 4 GHz) SSE2 and SSE3 instruction sets to accelerate media processing Integration of HyperThreading
make one physical CPU work as two logical and virtual CPUs
HyperThreading Technology
Figure shows a comparison of a processor that supports HT Technology (implemented with two logical processors) and a traditional dual processor system.
The technology enables a single physical processor to execute two or more separate code streams (threads) concurrently logical processors The logical processors in an IA-32 processor supporting HT Technology share the core resources of the physical processor. This includes the execution engine and the system bus interface. After power up and initialization, each logical processor can be independently directed to execute a specified thread, interrupted, or halted.
Homework
Explain the concept of pipeline
Explain which is the difference in addressing the memory and ad dressing the peripheral devices Explain the role of the retirement unit from P6 microarchitecture Explain the difference between 1st level cache and 2nd level cache Explain the concept of speculative execution SIMD and SSE2 stand for (Explain). Explain the concept of HyperThreading Which is the number bits allocated for data bus, respectively addr ess bus for the following microprocessors: 8080, Pentium IV, 8048 4, 80286, 8086, 80186.
(C2)
FETCH
EXECUTE
1) Read Instruction from Memory 2) Decode/Interpret Instruction 3) Increment Instruction Address Register 1) Control Unit - Input is Decoded Instruction 2) Control Signals Set 3) Data is Processed
BIU
FETCH
FETCH
FETCH
FETCH
EU
WAIT
EXECUTE
EXECUTE
EXECUTE
time
Address Bus
Data Bus
BIU
AH BH CH
AL BL CL DL BP DI SI SP CS ES SS DS IP
ADD
1 2 3 4 5 6
EU
DH
Instruction Queue
ALU/EXECUT
FLAGS
To bring the instructions into the internal QUEUE To control the content of the queue To computes the address To generates the control signals
BIU - contents
Bloc for controlling the signals FIFO memory to implement the 6 byte s queue Instruction pointer (next instruction to be executed) ALU to calculate the address Internal communication registers Registers for memory segmentation
EU Execution Unit
Decoding of instructions ALU General registers (accessible by user) Internal registers (internal operations) Register to store the status and contr ol of the program
AH BH CH DH
AL BL CL DL
0
AX BX CX DX
CS DS SS ES
0
Instruction Pointer Stack Pointer Base Pointer Source Index Destination Index
IP SP BP SI DI
} }
Automatically Incremented
Programmer can Control with jump and branch
AH BH CH DH
AL BL CL DL
Can Be Used Separately as 1-byte Registers AX AH:AL Temporary Storage to Avoid Memory Access
Faster Execution Avoids Memory Access
AH BH CH DH
AL BL CL DL
AX, Accumulator Main Register for Performing Arithmetic mult/div must use AH, AL accumulator Means Register with Simple ALU BX, Base Point to Translation Table in Memory Holds Memory Offsets; Function Calls CX, Counter Index Counter for Loop Control DX, Data After Integer Division Execution - Holds Remainder
Used to point to Data Determines Memory Address (along with other registers) DS, ES commonly used
Used to point to Data in Stack Structure (LIFO) Used with SP or BP SS:SP or SP:BP are valid Segment Addresses
x OF DF IF TF SF ZF x AF x PF x CF
Status Flags
Indicate Current Processor Status
CF OF ZF SF Carry Flag Overflow Flag Zero Flag Sign Flag Arithmetic Carry Arithmetic Overflow Zero Result; Equal Compare Negative Result; NonEqual Compare Even Number of 1 bits Used with BCD Arithmetic
PF
AF
Parity Flag
Auxiliary Carry
Control Flags
Influence the 8086 During Execution Phase DF: Direction Flag Increment/Decrement
Enables Interrupts
Allows Single-Step
MOV AH,[SI]
CS DS SS ES
0
Instruction Pointer Stack Pointer Base Pointer Source Index Destination Index
IP SP BP SI DI
} }
ES
SS DS
19 Physical Address
IP
Segmented Addressing
CS ES SS
memory paragraph 1
DS
00010h
paragraph 2
00020h
paragraph 3
BP DI SI
Each paragraph has phys. address that is multiple of 10h BIU is responsible for appending 0000 to Segment
only need 16-bit segment registers
SP IP
Code Segment
Segment Registers
CS ES SS
DS
Segment Registers: Point to Base Address Index Registers: Contain Offset Value fragmentation Notation (Segmented Address):
CS:IP DS:SI ES:DI SS:BP SS:SP
Data Segment
00000h
System Memory
Maximum Memory Size: 220 = 1,048,576 = 1MB Newer Processors (Pentium) Can Utilize More Memory Wider Address Registers 32 bits
232 = 4,294,967,296 = 4GB
Code Segment
Segment Registers
CS ES SS
DS
Logical, Segmented Address: 0FE6:012Bh Offset, Index Address: 012Bh Physical Address: 0FE60h 65120 + 012Bh 299 0FF8Bh 65149
Data Segment
00000h
System Memory
0729H
0728H 0727H 0726H 0725H 0724H 0723H 0722H
69H
AAH 2EH 00H 55H 02H 72H 11H
AD5F9H
AD5F8H AD5F7H AD5F6H AD5F5H AD5F4H AD5F3H AD5F2H
Base Address = ACEDH Logical Address = 0724H Physical Address = ACED0H + 0724H = AD5F4H M[ACED:0724] = M[AD5F4] = 5502H
0725H
5H 0101
5H 0101
hex binary
072CH 072BH 072AH 0729H 0728H 0727H 0726H 0725H 0724H 0723H 0722H 0721H 0720H 071FH 071EH 071DH
18H A3H 7EH 69H AAH 2EH 00H 02H 55H 11H 20H 72H DEH ADH FAH CEH
AD5FCH AD5FBH AD5FAH AD5F9H AD5F8H AD5F7H AD5F6H AD5F5H AD5F4H AD5F3H AD5F2H AD5F1H AD5F0H AD5EFH AD5EEH AD5EDH
Assume: M[DS:DI] Contains a Pointer Value DS = AD5Fh; DI = 0005h (All Segments Start on Paragraph Boundary) SI M[DS:DI] Then: Pointer is M[DS:DI] = M[AD5F:0005] = M[AD5F5] = 0002h M[DS:SI] = M[DS:(DS:DI)] = M[DS:0002h] = M[AD5F:0002] = M[AD5F2] = 1120h
071CH
CAH
FEH
AD5ECH
Type of Memory Reference Instruction Fetch Stack Operation Variable (except following) - String Source - String Destination - BP used as Base Register - BX Used as Base Register
Alternate Segment Base Offset None IP None SP CS, ES. SS Effective Address CS, ES, SS SI None DI CS, DS, ES Effective Address CS, ES, SS Effective Address
Keypoints
(C3)
Exceptions:
-Segm segm -Segm immediate value
Memory addressing
2) Arithmetic instructions
3) Logic instructions
ADD, ADC INC AAA, DAA SUB, SBB DEC AAS, DAS
(C4)
INTRUCTION SET
2) Arithmetic instructions
3) Logic instructions
Important: -Dest & source the same size in bits -Register can not be IP -Transfer memory memory is not possible -Flags are not changed
2)
ES=1000, DS=5000, DI=100, SI=200 exchange the values of mem locations (bytes):
1) 2) Using only MOV Using XCHG
3) 4) 5) 6) 7)
Use based with index addressing AX ES:[3000h] Interchange DS with ES AXBXCXDXAX (2 solutions give other solutions) Interchange AX with BX without: MOV, XCHG (**) For laboratory: propose 2-3 exercises similar as above
Exercise XLAT
1) 2) 3) 4) Draw the schematic principle Where is applied: encryption, conversion Example with ASCII codes Write an encryption algorithm give the solution:
1) Input from port 100h 2) Encrypt 3) Send to the port 200h
ADD, ADC INC AAA, DAA SUB, SBB DEC AAS, DAS
(C5)
INTRUCTION SET - 2
3) Logic instructions
4) Shifts/rotate instructions + LOOPS
Exercise XLAT
Conversion of the digit from
AL (0, 1, ...9, A, ...F) in the corresponding ASCII code (30h,...39h, 41h, ...46h)
Give 2 solutions
Example AAA
;Example AAA MOV AH, 09h MOV AL, 05h ADD AL, AH MOV AH, 0 AAA
; example AAS
Examples
1) 2) 3) 4) 5) AX= BX-CX Substraction on bytes (SBB) (AX,BX) = (AX,BX) (CX, DX) Al=al-2 Mov bx, 0; Dec bx
On BYTE: On WORD:
Examples MUL
Ex: val1 DB val2 DW mov mul mov mul
mov
3 257 al, 0ah val1 ax,100h val2
AX=3*AL (mul + add) AX= 5*AL 7*BL 2 solutions AL = BCD representation of a number on 2 digits. BL its binary representation.
Examples DIV
AL=AL/3 BL = BL/2 (give 2 solutions) AL= AL/5 BL/7
Remarks (AL AX)
Examples:
Conditional JUMP
JUMP IF less/bellow less or equal/bellow or eq equal/zero not equal/not zero greater or equal/above eq greater/above carry/not carry sign/not sign JL/JB JLE/JBE JE/JZ JNE/JNZ JGE/JAE JG/JA JC/JNC JS/JNS
Exercises
AL = max(BL, CL) AL = BL+CL (if DL>0), BL-CL (DL<0), 0 DL = 0 if AL is odd, 1 if AL is even AL = ASCII code (digit from AL)
TEST dest,source
Exercises
mov and or xor test ax, 0abcdh ax,0ffh ah, 0fch al,ah al,1
(C6)
INTRUCTION SET - 3
3) Logic instructions
4) Shifts/rotate instructions + LOOPS
ROL
Right:
1011 0001 1101 1000
Right:
CY=x CY=1
Examples
a) mov al, 0ffh shl al,1 mov cl,3 shl al,cl -----------------c) mov al, 0fch mov cl,4 rol al,cl shr al,cl
b) sal al,1; sar al,1; (mul 2) (div 2)
Exercises (1)
1) Store in BL the value of bit a4 from AL a) b) mov cl,3 and al,00010000b shl al,cl mov cl,4 mov cl,7 shr al,cl shr al,cl mov bl,al mov bl,al Find out other 2 solutions !!
Exercises (2)
2) AX = ??xy; (x,y=hexa). Obtain AX=0x0y push cx mov cl,4 rol ax,cl; AX=?xy? and ah,0Fh AX=0xy? shr al,cl; AX=0x0y pop cx Propose another solution !!
Exercises (3)
3) AX = 8*AL 7*BL mov cl,3 cbw sal ax,cl xchg ax,bx cbw mov dx,ax sal ax,cl sub ax,dx sub bx,ax xchg ax,bx
Exercises (4)
4) Counts into DL the number of bits 1 from AX xor dl,dl mov cx,16 nextbit rcl ax,1 jnc zerobit inc dl zerobit dec cx jnz nextbit Propose another 2 solutions !!
Exercises (5)
5) Fill in the first 256 bytes from DS with the values: 00, 01, 02, ..., FFh mov si,0 mov cx,256 xor al,al NextByte: mov byte ptr[si],al inc al inc si dec cx jnz NextByte
LOOP instruction
Syntax: LOOP Equivalent with: DEC JNZ label
CX label
Examples
et1: mov xor add loop cx,100 ax,ax ax,2 et1
Example: read from keyboard (port 60h) and write to printer (port 378h):
in
mov out
al,60h
dx,378h dx,al
Examples IN/OUT
1) Generate a rectangular signal on D0 from port 300h 2) Control the frequency 3) Give the solution to control the duration of the signal 4) Signal with another form 5) Connect a DAC and generate a triangular signal (maximum frequency) 6) A dynamic light: control the speecd
LODSB(W)
Example LODSB
DX = Sum(DS:SI=100h), i=1...20 xor dx,dx mov cx,20 mov si,100h cld nextbyte: lodsb cbw add dx,ax loop nextbyte Propose the solution without using LODS
STOSB(W)
Example STOSB
Generate a string of 256 bytes (00, 01, ...FFh) at the address ES:DI=300H mov di,300h xor al,al mov cx,256 cld sto: stosb inc al loop sto nop Propose the solution without using STOSB
MOVSB
Examples MOVSB (transfer 20 bytes from DS:SI=100h to ES:DI=300h) mov si,100h mov si,100h mov di,300h mov di,300h mov cx,20 mov cx,20 cld e: mov al,[si] e: MOVSB mov [di],al loop e inc si inc di Obs: dec cx REP jnz e MOVSB
Example: returns into BX the offset of the first 00 byte encountered in the string of 100 bytes found at the address ES:DI=200h OR 0FFFF if not found.
mov mov mov s: cmp je inc loop mov f0: mov nop
mov mov cx, mov cld REPNE SCASB jcxnz mov f: dec mov nop
f di,0 di bx,di
Exercise
Data Acquisition System: - Control port, RD/WR: 200h: (START on D0 active in 1, EOC on D7 active in 0) - data port, RD, 300h - Read 10.000 samples in ES:DI=100h
MACRO - instructions
= a group of instructions identified by a unique name (the name of the MACRO) and interpreted as a new instruction - a MACRO needs to be defined: name MACRO param_1, ..., param_n .... instructions .... ENDM In order to use the MACROs: - They need to be defined - They could be used afterwards
Exercises (homework)
MACRO for:
Obtaining in AX the sum of a string of bytes (the offset address in DS and the number of the bytes are the parameters) Obtaining in AL the maximum of a string of bytes (the address and the number of the bytes are the parameters) Returning in CX the length of a string of bytes that ends with 00h (the address is transmited as parameter)
CALL - instructions
The need to substitute sequences of programs which are repeated OR group the functionalities in a single software entity Example diagram Types of CALLS Intrasegment (NEAR Calls) - IP Intersegment (FAR Calls) CS, IP
Execution Steps
1) 2) CALL is classified as NEAR or FAR NEAR FAR save IP on the stack - save CS on the stack - save IP on the stack 3) JUMP at the address CS:IP 4) Execute the sequence, until RET is encountered 5) RET execution: NEAR - get IP from the stack - get IP from the stack - get CS from the stack 6) JUMP at the address CS:IP
Procedures
Definition: namep PROC ... ... instructions ... ... RET; namep ENDP Usage: ... call ... ... namep ... {NEAR or FAR}
PAY ATTENTION !!
Example 1 (find_max in a string; receives in SI begining of the string and the length in CX; returns the max in AL and the index of max in BX)
find_max PROC mov mov dec cmp jge mov mov inc inc loop RET ENDP .... mov mov call ... NEAR al, byte ptr [si] bx,si cx al, byte ptr[si +1] Ok al, byte ptr[si+1] bx,si bx si c
c:
Ok:
Example 2 (strlen returns in CX the length of a string of bytes finished with 00h; receives into DI the begining of the string)
strlen PROC xor xor cld scasb jz inc jmp ret ENDP str .... mov call ...
cx,cx al,al
comp:
exit cx comp
exit: Usage:
lea
di,str)
Usage: mov mov mov mov al, [a] si, offset a al, [si] si, offset str
PARA 0
PUBLIC
'DATE'
;preg.pt.ret ;preg. DS
EXE/COM