D.Venkofo Voro Prosod Assisfonf Professor Depf. of Compufer Science & Engg. SSMCE Objective After completing this Session you will be able to understand : ALU Control unit Memory I/O Basic Architecture MU Aux Memory CU ALU I/P O/P Memory Organization Memory Cells Memory Control Logic MBR/MDR M A R REad Write A D D R E S S D A T A Memory hierarchy R1 R2 ALU Flag Register A B C D Working Reg Incrementer Complementer Shifter System Bus A Simplified Control Unit Control Unit Fetch Unit Decode Unit Execution Unit Write Back Unit Fetch Decode Execute Write Back Internal organization of CU Program counter(PC/IAC) M A R Memory MDR Opcode operand 1 operand2 Opcode Decoder Clk generato r Pulse sequencer Pulse distributer Generations of Computers First Generation (1945-54) Single CPU Accumulator based. Fixed point Execution All the operations. PC ALP and MLP Ex: ENIAC,Princeton IAS ,IBM 701 II Generation(1955-64) Index Register Fixed point and Floating point. Multiplexed Memory Batch Processing Subroutine and Libraries. IOP RTL II Generation (Contd..) HLL such as FORTRAN,COBOL,ALGOL EX: IBM7090, UNIVAC LARC III Generation (1965-74) Pipelining Cache memory Virtual Memory Multiprogramming Time sharing Ex: IBM 360/370, CDC 6600/7600,TI ASC DE PDP8 IV Generation (1975-90) Parallel Computers Shared Memory and Distributed Memory. Multiprocessing O.S (MACH) Ex:IBM3090,BBN TC 2000, VAX 9000 CRAY X-MP V-Generation MPP Scalable Latency Tolerant Terra flops Heterogeneous Processing. Ex: KABRU,FUJITSU,PARAGON Addressing Modes Addressing modes Instruction formats at a glance Four address format Three address format Two address format Single address format Zero address format Instruction Set Architectures Examples Instruction Types A computer must have instructions capable of performing three types of operations. Data transfers instructions: Transfers information data from one location to another with out changing the binary information contents Name Mnemonic Meaning Load LD Transfers data from memory to processor register Store ST Transfers data from processor register to Memory Move MOV Transfers data from one register to another Register it has also been used to transfer data between cpu reg and memory (or) between Two memory locations. Exchange XCH Used to swap the data between two registers (or) between reg and memory vice versa. Input IN Data transfers between the processor regis Output OUT and input or output terminal. PUSH PUSH Transfers data between processor register and POP POP memory stack Data Manipulation Instructions: These are the instructions that perform arithmetic, logical and shift operations. Name Mnemonic Meaning Increment INC Increment the contents of a register Decrement DEC Decrement the contents of a register ADD ADD Add the contents of two registers (or) Contents of a register and memory. Subtraction SUB Subtract the contents of two registers (or) Contents of a register and memory. Multiply MUL Multiply the contents of two registers (or) Contents of a register and memory. Divide Div Divide the contents of two registers (or) Contents of a register and memory. Add with ADDC Add the contents of two registers with carry carry (or)Contents of a reg and memory with carry Subtract with SUBB Sub the contents of two registers with borrow carry (or)Contents of a reg and memory with borrow borrow. Negate NEG Perform 2 complementation on the given number Arithmetic operations: Logical Operations Name Mnemonic Meaning Clear CLR Clear the contents of a register or Memory location Complement COM Complement the contents of a register. AND AND Perform logical AND operation OR OR Perform logical OR operation Exclusive OR XOR Perform logical XOR operation Clear carry CLRC Clear the carry flag Set carry SETC Set the carry flag. Complement COMC Complement the carry flag Carry Enable EI Set the interrupt flag Interrupt Disable DI Disable interrupt flag Interrupt Shift operations Name Mnemonics Logical Shift Right SHR Logical Shift Left SHL Arithmetic Shift Left SHLA Arithmetic Shift Right SHRA Rotate Right ROR Rotate Left ROL Rotate Right Thru Carry RORC Rotate Left Thru Carry ROLC Program Control Instructions Name Mnemonics Branch BR Jump JMP Skip SKP Call CALL Return RET Compare (By Subtraction) CMP Test (By AND ing) TST Memory Operand Addressing Big End Little End Memory Operand Addressing b7-b0 b15-b8 b23-b16 B31-b24 0 1 2 3 0 1 2 3 Big Endian Little Endian Little Endian Big Endian b7-b0 b15-b8 b23-b16 b31-b24 b31-b24 b23-b16 b15-b8 b7-b0 0 1 2 3 0 1 2 3 Ex: Intel series Ex: Motorola series Buses All the Functional units must be connected Different type of connection for different type of unit Memory Input/Output CPU Memory Connection Receives and sends data Receives addresses (of locations) Receives control signals Read Write Timing Input/Output Connection(1) Similar to memory from computers viewpoint Output Receive data from computer Send data to peripheral Input Receive data from peripheral Send data to computer Input/Output Connection(2) Receive control signals from computer Send control signals to peripherals e.g. spin disk Receive addresses from computer e.g. port number to identify peripheral Send interrupt signals (control) CPU Connection Reads instruction and data Writes out data (after processing) Sends control signals to other units Receives (& acts on) interrupts Buses There are a number of possible interconnection systems Single and multiple BUS structures are most common e.g. Control/Address/Data bus (PC) e.g. Unibus (DEC-PDP) What is a Bus? A communication pathway connecting two or more devices Usually broadcast Often grouped A number of channels in one bus e.g. 32 bit data bus is 32 separate single bit channels Power lines may not be shown Data Bus Carries data Remember that there is no difference between data and instruction at this level Width is a key determinant of performance 8, 16, 32, 64 bit Address bus Identify the source or destination of data e.g. CPU needs to read an instruction (data) from a given location in memory Bus width determines maximum memory capacity of system e.g. 8080 has 16 bit address bus giving 64k address space Control Bus Control and timing information Memory read/write signal IO read/write Interrupt request Clock signals Bus Interconnection Scheme Input Memory CPU Output C.B D.B A.B Single Bus Issues Lots of devices on one bus leads to: Propagation delays Long data paths mean that co-ordination of bus use can adversely affect performance Deadlocks may occur. Bus performance depends on bus bandwidth. Most systems use multiple buses to overcome these problems STACK A B C Full Empty SP 000 000 111 111 Register stack PUSH operation Sp=0 ,EMPTY=1, Full =0 sp sp+ 1 M[sp] DR if (sp=0) then Full =1 Empty =0 POP operation DR M[sp] sp sp-1 if (sp=0) then Empty=1 Full =0 Machine Language Operations are represented in binary coded format. Difficult to read and write Difficult to interpret the meaning Difficult to debug the code. Assembly Language Set of symbols and the rules to use these symbols constitute a language called Assembly language. Operations are represented in symbolic notations. Need an Assembler to translate ALP in to MLP Assembler ASSEMBLER Assembly Language Machine Language Assembler Directives ORG START EQU DB DW END Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Performance Performance contd. Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance? Response Time (latency) How long does it take for my job to run? How long does it take to execute a job? How long must I wait for the database query? Throughput How many jobs can the machine run at once? What is the average execution rate? How much work is getting done? Computer Performance: TIME, TIME, TIME Response & throughput If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase? Elapsed Time counts everything (disk and memory accesses, I/O , etc.) a useful number, but often not good for comparison purposes CPU time doesn't count I/O or time spent running other programs can be broken up into system time, and user time Our focus: user CPU time time spent executing the lines of code that are "in" our program Execution Time For some program running on machine X, Performance X = 1 / Execution time X "X is n times faster than Y" Performance X / Performance Y = n Problem: machine A runs a program in 20 seconds machine B runs the same program in 25 seconds Book's Definition of Performance Clock Cycles Instead of reporting execution time in seconds, we often use cycles Clock ticks indicate when to start activities (one abstraction): cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) A 200 Mhz. clock has a cycle time time seconds program = cycles program
seconds cycle
1 200 10 6
10 9 = 5 nanoseconds So, to improve performance (everything else being equal) you can either ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate. How to Improve Performance seconds program = cycles program
seconds cycle Performance Performance is determined by execution time Do any of the other variables equal performance? # of cycles to execute program? # of instructions in program? # of cycles per second? average # of cycles per instruction? average # of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isnt. Performance Execution Time = I.C * CPI * Cycle Time I.C => Instrn count (Dynamic Vs Static) - Dynamic I.C useful in perfornmance (timing) - Static I.C useful in code size. Performance CPI Avg Cycles Per Instruction(CPI) computed from the instruction mix. ALU L/S Cond Uncond Jmp Jmp Freq 40 25 20 15 cycles 4 5 3 2 CPI 3.75 Performance Execution Time = I.C * CPI * Cycle Time - I.C depends on ISA - CPI depends on ISA and organization - CT depends on organization and technology Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) Example: "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" how about making it 5 times faster? Principle: Make the common case fast Amdahl's Law Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = ------------- = ------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected Amdahls Law ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced Speedup overall = ExTime old ExTime new Speedup enhanced = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced Amdahls Law Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = ExTime new = Amdahls Law Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedup overall = 1 0.95 = 1.053 ExTime new = ExTime old x (0.9 + .1/2) = 0.95 x ExTime old