You are on page 1of 19

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.

DEPARTMENT OF COMPUTER SCIENCE Department of computer science & Engineering


Title of the Course: Advanced Computer Architecture Course Code: 10CS81 Type of the Course: Lecture Designation: Core Total Hrs. 52 Hrs/Week: 04 Exam Hours: 03 Exam Marks: 100 Semester: 08 Course Assignment Methods: Continuous (Three IA Tests & One Main VTU Examination) Prerequisites: 1. Familiarity with computer organization 2. Basic concepts of cache memory and microprocessor Syllabus: PART - A UNIT - 1 FUNDAMENTALS OF COMPUTER DESIGN: Introduction; Classes of computers; Defining computer architecture; Trends in Technology, power in Integrated Circuits and cost; Dependability; Measuring, reporting and summarizing Performance; Quantitative Principles of computer design. 6 hours UNIT - 2 PIPELINING: Introduction; Pipeline hazards; Implementation of pipeline; What makes pipelining hard to implement? 6 Hours UNIT - 3 INSTRUCTION LEVEL PARALLELISM 1: ILP: Concepts and challenges; Basic Compiler Techniques for exposing ILP; Reducing Branch costs with prediction; Overcoming Data hazards with Dynamic scheduling; Hardware-based speculation. 7 Hours UNIT - 4 INSTRUCTION LEVEL PARALLELISM 2: Exploiting ILP using multiple issue and static scheduling; Exploiting ILP using dynamic scheduling, multiple issue and speculation; Advanced Techniques for instruction delivery and Speculation; The Intel Pentium 4 as example. 7 Hours

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.

PART - B UNIT - 5 MULTIPROCESSORS AND THREAD LEVEL PARALLELISM: Introduction; Symmetric shared-memory architectures; Performance of symmetric sharedmemory multiprocessors; Distributed shared memory and directory-based coherence; Basics of synchronization; Models of Memory Consistency. 7 Hours UNIT - 6 REVIEW OF MEMORY HIERARCHY: Introduction; Cache performance; Cache Optimizations, Virtual memory. 6 Hours UNIT - 7 MEMORY HIERARCHY DESIGN: Introduction; Advanced optimizations of Cache performance; Memory technology and optimizations; Protection: Virtual memory and virtual machines. 6 Hours UNIT - 8 HARDWARE AND SOFTWARE FOR VLIW AND EPIC: Introduction: Exploiting Instruction-Level Parallelism Statically; Detecting and Enhancing Loop-Level Parallelism; Scheduling and Structuring Code for Parallelism; Hardware Support for Exposing Parallelism: Predicated Instructions; Hardware Support for Compiler Speculation; The Intel IA-64 Architecture and Itanium Processor; Conclusions. 7 Hours

TEXT BOOK: 1. Computer Architecture, A Quantitative Approach John L. Hennessey and David A. Patterson:, 4th Edition, Elsevier, 2007. REFERENCE BOOKS: 1. Advanced Computer Architecture Parallelism, Scalability Kai Hwang:, Programability, Tata Mc Grawhill, 2003.

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


2. Parallel Computer Architecture, A Hardware / Software Approach David E. Culler, Jaswinder Pal Singh, Anoop Gupta:, Morgan Kaufman, 1999.

Course Overview and its relevance to program: The term architecture in computer literature can be traced to the work of Lyle R. Johnson, Muhammad Usman Khan and Frederick P. Brooks, Jr., members in 1959 of the Machine Organization department in IBMs main research center. Johnson had the opportunity to write a proprietary research communication about Stretch, an IBM-developed supercomputer for Los Alamos Scientific Laboratory. In computer science and computer engineering, computer architecture or digital computer organization is the conceptual design and fundamental operational structure of a computer system. It's a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit (CPU) performs internally and accesses addresses in memory. It may also be defined as the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer technology has made incredible progress in the roughly from last 55 years. This rapid rate of improvement has come both from advances in the technology used to build computers and from innovation in computer design. Advanced computer architecture aims to develop a thorough understanding of high-performance and energy-efficient computers as a basis for informed software performance engineering and as a foundation for advanced work in computer architecture, compiler design, operating systems and parallel processing. This course contains pipelined CPU architecture instruction set design and pipeline structure, dynamic scheduling using scoreboarding and Tomasulo's algorithm, register renaming, software instruction scheduling and software pipelining, superscalar and long-instruction-word architectures (VLIW, EPIC and Itanium), branch prediction and speculative execution. The cache memory associativity, allocation and replacement policies, multilevel caches, cache performance issues. uniprocessor cache coherency issues are discussed with examples. Implementations of shared memory, the cache coherency problem. the bus-based 'snooping' protocol, scalable shared memory using directory-based cache coherency are explained with practical examples. Applications: 1. To understand various computer architectures currently used in market 2. To understand parallel programming. 3. To design new computer architectures

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


PART_A UNIT I UNIT WISE PLAN Chapter Number: 1 No of Hours: 06 Unit Title: FUNDAMENTALS OF COMPUTER DESIGN Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Classes of computers, Practical knowledge of computer architecture Trends in Technology, Power in IC and cost Quantitative Principles Performance Real processor examples

Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction; Classes of computers Defining computer architecture Trends in Technology, power in Integrated Circuits and cost Dependability. Measuring, reporting and summarizing Performance Quantitative Principles of computer design

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


Assignment Questions: Q1) Q2) Q3) Q4) Explain the growth in processor and computer performance using a graph. Explain the different classes of computers. Define computer architecture. Discuss the 7 dimensions of ISA. Explain the meaning of following MIPS instructions and explain instruction formats. . Q5) List the most important functional requirements an architect faces. ` Q6) Explain the different trends in technology. Q7) Write the formulas for the following (i) Power dynamic (ii) Energy dynamic (iii) Power static A 20% reduction in voltage may result in a 10% reduction in frequency. What would be the impact on dynamic power. Q8) Write the formulas for the following. (i) cost of IC (ii) cost of die (iii) dies per wafer (iv) die yield Find the die yield for a die that is 2.0 cm on a side, assuming a defect density of 0.3 per cm2 and is 4. Q9) Explain MTTF and MTTR. Calculate reliability of a redundant power supply if MTTF of Power supply is 5*105 hours and it takes on average 48 hours for a human operator to repair the system. Assume two power supplies are available. Q10) Explain the different desktop and server benchmarks.

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


UNIT II UNIT WISE PLAN Chapter Number: Appendix A Unit Title: PIPELINING No of Hours: 06

Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Pipeline basics, hazards Implementation of pipeline Pipeline to design parallel processors Performance evalvation of pipeline processors Applications of pipeline

Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction Pipeline hazards Pipeline hazards continued Implementation of pipeline Implementation of pipeline continued What makes pipelining hard to implement?

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


Assignment Questions: Q1) What is pipelining? Explain the basics of RISC instruction set. Q2) Explain the simple implementation of a RISC instruction set Q3) Explain the classic five stage pipeline for RISC processor and explain the use of pipeline registers. Q4) Assume that unpipelined processor has a 1ns clock cycle and that it uses 4 cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 30%, 20% and 50% respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.3ns of overhead to the clock. Ignoring any latency impact, how much speedup in the instruction execution rate will we gain from a pipeline? Q5) Explain the major hurdles of pipelining-pipeline hazards in brief. Q6) Explain in detail the data hazard with an example. Q7) Discuss branch hazards along with reducing pipeline branch penalties and scheduling branch delay slot. Q8) Explain the simple implementation of MIPS with a neat diagram Q9) Explain the basic pipeline for MIPS and discus implementation of control for MIPS & branches. Q10) Explain the five categories of exceptions.

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.

UNIT III UNIT WISE PLAN Chapter Number: 2 No of Hours: 07 Unit Title: INSTRUCTION LEVEL PARALLELISM 1

Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Parallel processing using ILP Static and Dynamic scheduling Speculation Implementation of scheduling algorithms. Implementation of reducing branch costs

Lesson Plan: L1. L2. L3. L4. L5. L6. L7. ILP: Concepts and challenges Basic Compiler Techniques for exposing ILP Reducing Branch costs with prediction Reducing Branch costs with prediction -Examples. Overcoming Data hazards with Dynamic scheduling Overcoming Data hazards with Dynamic scheduling- Examples Hardware-based speculation

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


Assignment Questions: Q1) What is ILP? What are the ILP Concepts and challenges? Q2) Discuss data dependences and hazards. Q3) Discuss control dependences with examples . Q4) Explain the basic Compiler Techniques for exposing ILP Examples. Q5) Explain the methods for reducing branch costs with prediction. Q6) Explain the method for overcoming Data hazards with Dynamic scheduling. Q7) Explain the various fields in reservation station with an example. Q8) Explain tomasulo algorithm using loop based example. Q9) Explain hardware-based speculation and explain the basic structure of a FP unit using tomasulo algorithm and extended to handle speculation.

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.

UNIT IV UNIT WISE PLAN Chapter Number: 2 No of Hours: 07 Unit Title: INSTRUCTION LEVEL PARALLELISM 2

Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. ILP-multiple issue and static scheduling Dynamic scheduling Instruction delivery Exploiting ILP Intel Pentium 4 for understanding ILP

Lesson Plan: L1. L2. L3. L4. L5. L6. L7. Exploiting ILP using multiple issue and static scheduling Exploiting ILP using dynamic scheduling, multiple issue and speculation Exploiting ILP using dynamic scheduling, multiple issue and speculation-examples Advanced Techniques for instruction delivery and Speculation Advanced Techniques for instruction delivery and Speculation-examples The Intel Pentium 4 as example. The Intel Pentium 4 as example-analysis

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


Assignment Questions: Q1) List the five primary approaches in use for multiple-issue processors and their primary characteristics. Q2) Explain the basic VLIW approach for exploiting ILP using an example. Q3) Explain exploiting ILP using dynamic scheduling, multiple issue and speculation examples. Q4) Explain increasing instruction fetch bandwidth for instruction delivery and Speculation. Q5) Explain increasing instruction fetch bandwidth for instruction delivery and Speculation. Q6) Explain the Pentium 4 microarchitecture with a neat diagram Q7) List the important characteristics of the recent pentiun 4 640 Q8) Explain the analysis of the perfiormance of the Pentium 4.

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


PART_B UNIT V UNIT WISE PLAN Chapter Number: 4 No of Hours: 07 Unit Title: MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Multiprocessors Shared-memory architectures Distributed shared memory Performance of symmetric sharedmemory multiprocessors Synchronization and Memory Consistency

Lesson Plan: L1. L2. L3. L4. L5. L6. L7. Introduction to multiprocessors Symmetric shared-memory architectures Performance of symmetric sharedmemory multiprocessors Distributed shared memory Directory-based coherence; Basics of synchronization Models of Memory Consistency

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.

Assignment Questions: Q1) Explain the taxonomy of parallel architectures and draw the basic structure of shared memory and distributed memory multiprocessor Q2) Suppose you want to achieve a speedup of 80 with 100 processors. What fraction of the original computation can be sequential? Q3) What is multiprocessor cache coherence? Explain with an example. Q4) What are the basic schemes for enforcing coherence? Explain in brief . Q5) Explain Snooping protocols and basic implementation techniques with an example protocol Q6) Explain Performance of symmetric sharedmemory multiprocessors for a commercial workload Q7) Explain distributed shared memory and directory-based coherence with an example protocol. Q8) Explain basics of synchronization Q9) Explain Models of Memory Consistency

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


UNIT VI UNIT WISE PLAN Chapter Number: Appendix C No of Hours: 06 Unit Title: REVIEW OF MEMORY HIERARCHY

Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Cache memory Virtual memory Mathematical and theory aspects of cache Problems based on cache Cache Optimization methods

Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction Cache performance Cache Optimizations Virtual memory Numerical Problems-1 Numerical Problems-2

Assignment Questions:

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.

Q1) Assume we have a computer where the CPI is 2.0 when all memory accesses hit in the cache. The only data accesses are loads and stores ,and these total 40% of the instructions. If the miss penalty is 35 clock cycles and the miss rare is 3%, how much faster would the computer be if all instructions were cache hits? Q2) What do you mean by memory stall cycles? List the different formulae for memory stall cycles. Q3) Explain different block placement methods with neat diagrams. Q4) Explain the following terms: (i)Write through (ii) Write back (iii) Write stall and Write buffer (iv) Write allocate (v) No-write allocate Q5) Explain the organization of Opteron data cache with a neat diagram. Q6) Explain multilevel caches to reduce miss penalty. Discuss average memory access time, local miss rate, global miss rate w.r.t. multilevel caches. Q7) Suppose that in 1000 memory references there are 50 misses in the first level cache and 30 misses in the second level cache. What are the various miss rates? Assume the miss penalty from L2 cache to memory is 250 clock cycles, the hit time of L2 cache is 15 clock cycles, the hit time of L1 is 1 clock cycle, and there are 1.4 memory references per instruction. What is the average memory access time and average stall cycles per instruction? Q8) Compare paging and segmentation with neat diagrams. Q9) List the typical levels in memory hierarchy with their important features.

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


UNIT VII UNIT WISE PLAN Chapter Number: No of Hours: 06 Unit Title: REVIEW OF MEMORY HIERARCHY

Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Memory hierarchy and cache optimization Memory technology Virual machines Cache performance Protection using virtual memory and virtual machines

Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction to memory hierarchy design Advanced optimizations of Cache performance Memory technology and optimizations Protection: Virtual memory Virtual machines. Numerical problems

Assignment Questions:

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.

Q1) Explain the following optimizations techniques which reduce hit time. (i)Small and simple caches (ii) Way prediction (iii) Trace caches Q2) Explain the compiler optimization techniques to reduce miss rate. Q3) Differentiate between SRAM and DRAM. Draw the internal organization of 64M bit DRAM. Q4) List eleven advanced optimizations of cache performance and explain any one. Q5) Explain optimizations techniques for increasing cache bandwidth. Q6) Explain memory technology and optimizations. Q7) Explain optimizations techniques for Reducing miss penalty. Q8) Explain protection via virtual memory. Q9) Explain protection via virtual machines. Q10) Explain Xen virtual machine .

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


UNIT VIII UNIT WISE PLAN Chapter Number: Appendix G No of Hours: 07 Unit Title: HARDWARE AND SOFTWARE FOR VLIW AND EPIC

Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. VLIW. EPIC Intel IA-64 Architecture, Itanium Processor Loop-Level Parallelism, Code for Parallelism Hardware Support for Parallelism

Lesson Plan: L1. L2. L3. L4. L5. L6. L7. Introduction: Exploiting Instruction-Level Parallelism Statically Detecting and Enhancing Loop-Level Parallelism Scheduling and Structuring Code for Parallelism Hardware Support for Exposing Parallelism: Predicated Instructions Hardware Support for Compiler Speculation The Intel IA-64 Architecture Itanium Processor; Conclusions.

B.L.D.E.As V.P Dr P.G. H College Of Engineering & Technology, Bijapur.


Assignment Questions: Q1) Explain the methods, advantages and disadvantages for exploiting instruction-level parallelism statically. Q2) Explain the methods for detecting and enhancing loop-level parallelism Q3) Explain software pipelining using symbolic loop unrolling. Q4) Explain global code scheduling Q5) Explain hardware support for exposing parallelism using predicated instructions in detail. Q6) Explain hardware support for compiler speculation. Q7) Explain superblocks using a flowchart Q8) Explain IA-64 instruction set architecture Q9) Explain Itanium 2 processor in detail.

You might also like