Professional Documents
Culture Documents
PART - B UNIT - 5 MULTIPROCESSORS AND THREAD LEVEL PARALLELISM: Introduction; Symmetric shared-memory architectures; Performance of symmetric sharedmemory multiprocessors; Distributed shared memory and directory-based coherence; Basics of synchronization; Models of Memory Consistency. 7 Hours UNIT - 6 REVIEW OF MEMORY HIERARCHY: Introduction; Cache performance; Cache Optimizations, Virtual memory. 6 Hours UNIT - 7 MEMORY HIERARCHY DESIGN: Introduction; Advanced optimizations of Cache performance; Memory technology and optimizations; Protection: Virtual memory and virtual machines. 6 Hours UNIT - 8 HARDWARE AND SOFTWARE FOR VLIW AND EPIC: Introduction: Exploiting Instruction-Level Parallelism Statically; Detecting and Enhancing Loop-Level Parallelism; Scheduling and Structuring Code for Parallelism; Hardware Support for Exposing Parallelism: Predicated Instructions; Hardware Support for Compiler Speculation; The Intel IA-64 Architecture and Itanium Processor; Conclusions. 7 Hours
TEXT BOOK: 1. Computer Architecture, A Quantitative Approach John L. Hennessey and David A. Patterson:, 4th Edition, Elsevier, 2007. REFERENCE BOOKS: 1. Advanced Computer Architecture Parallelism, Scalability Kai Hwang:, Programability, Tata Mc Grawhill, 2003.
Course Overview and its relevance to program: The term architecture in computer literature can be traced to the work of Lyle R. Johnson, Muhammad Usman Khan and Frederick P. Brooks, Jr., members in 1959 of the Machine Organization department in IBMs main research center. Johnson had the opportunity to write a proprietary research communication about Stretch, an IBM-developed supercomputer for Los Alamos Scientific Laboratory. In computer science and computer engineering, computer architecture or digital computer organization is the conceptual design and fundamental operational structure of a computer system. It's a blueprint and functional description of requirements and design implementations for the various parts of a computer, focusing largely on the way by which the central processing unit (CPU) performs internally and accesses addresses in memory. It may also be defined as the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer technology has made incredible progress in the roughly from last 55 years. This rapid rate of improvement has come both from advances in the technology used to build computers and from innovation in computer design. Advanced computer architecture aims to develop a thorough understanding of high-performance and energy-efficient computers as a basis for informed software performance engineering and as a foundation for advanced work in computer architecture, compiler design, operating systems and parallel processing. This course contains pipelined CPU architecture instruction set design and pipeline structure, dynamic scheduling using scoreboarding and Tomasulo's algorithm, register renaming, software instruction scheduling and software pipelining, superscalar and long-instruction-word architectures (VLIW, EPIC and Itanium), branch prediction and speculative execution. The cache memory associativity, allocation and replacement policies, multilevel caches, cache performance issues. uniprocessor cache coherency issues are discussed with examples. Implementations of shared memory, the cache coherency problem. the bus-based 'snooping' protocol, scalable shared memory using directory-based cache coherency are explained with practical examples. Applications: 1. To understand various computer architectures currently used in market 2. To understand parallel programming. 3. To design new computer architectures
Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction; Classes of computers Defining computer architecture Trends in Technology, power in Integrated Circuits and cost Dependability. Measuring, reporting and summarizing Performance Quantitative Principles of computer design
Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Pipeline basics, hazards Implementation of pipeline Pipeline to design parallel processors Performance evalvation of pipeline processors Applications of pipeline
Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction Pipeline hazards Pipeline hazards continued Implementation of pipeline Implementation of pipeline continued What makes pipelining hard to implement?
UNIT III UNIT WISE PLAN Chapter Number: 2 No of Hours: 07 Unit Title: INSTRUCTION LEVEL PARALLELISM 1
Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Parallel processing using ILP Static and Dynamic scheduling Speculation Implementation of scheduling algorithms. Implementation of reducing branch costs
Lesson Plan: L1. L2. L3. L4. L5. L6. L7. ILP: Concepts and challenges Basic Compiler Techniques for exposing ILP Reducing Branch costs with prediction Reducing Branch costs with prediction -Examples. Overcoming Data hazards with Dynamic scheduling Overcoming Data hazards with Dynamic scheduling- Examples Hardware-based speculation
UNIT IV UNIT WISE PLAN Chapter Number: 2 No of Hours: 07 Unit Title: INSTRUCTION LEVEL PARALLELISM 2
Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. ILP-multiple issue and static scheduling Dynamic scheduling Instruction delivery Exploiting ILP Intel Pentium 4 for understanding ILP
Lesson Plan: L1. L2. L3. L4. L5. L6. L7. Exploiting ILP using multiple issue and static scheduling Exploiting ILP using dynamic scheduling, multiple issue and speculation Exploiting ILP using dynamic scheduling, multiple issue and speculation-examples Advanced Techniques for instruction delivery and Speculation Advanced Techniques for instruction delivery and Speculation-examples The Intel Pentium 4 as example. The Intel Pentium 4 as example-analysis
Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Multiprocessors Shared-memory architectures Distributed shared memory Performance of symmetric sharedmemory multiprocessors Synchronization and Memory Consistency
Lesson Plan: L1. L2. L3. L4. L5. L6. L7. Introduction to multiprocessors Symmetric shared-memory architectures Performance of symmetric sharedmemory multiprocessors Distributed shared memory Directory-based coherence; Basics of synchronization Models of Memory Consistency
Assignment Questions: Q1) Explain the taxonomy of parallel architectures and draw the basic structure of shared memory and distributed memory multiprocessor Q2) Suppose you want to achieve a speedup of 80 with 100 processors. What fraction of the original computation can be sequential? Q3) What is multiprocessor cache coherence? Explain with an example. Q4) What are the basic schemes for enforcing coherence? Explain in brief . Q5) Explain Snooping protocols and basic implementation techniques with an example protocol Q6) Explain Performance of symmetric sharedmemory multiprocessors for a commercial workload Q7) Explain distributed shared memory and directory-based coherence with an example protocol. Q8) Explain basics of synchronization Q9) Explain Models of Memory Consistency
Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Cache memory Virtual memory Mathematical and theory aspects of cache Problems based on cache Cache Optimization methods
Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction Cache performance Cache Optimizations Virtual memory Numerical Problems-1 Numerical Problems-2
Assignment Questions:
Q1) Assume we have a computer where the CPI is 2.0 when all memory accesses hit in the cache. The only data accesses are loads and stores ,and these total 40% of the instructions. If the miss penalty is 35 clock cycles and the miss rare is 3%, how much faster would the computer be if all instructions were cache hits? Q2) What do you mean by memory stall cycles? List the different formulae for memory stall cycles. Q3) Explain different block placement methods with neat diagrams. Q4) Explain the following terms: (i)Write through (ii) Write back (iii) Write stall and Write buffer (iv) Write allocate (v) No-write allocate Q5) Explain the organization of Opteron data cache with a neat diagram. Q6) Explain multilevel caches to reduce miss penalty. Discuss average memory access time, local miss rate, global miss rate w.r.t. multilevel caches. Q7) Suppose that in 1000 memory references there are 50 misses in the first level cache and 30 misses in the second level cache. What are the various miss rates? Assume the miss penalty from L2 cache to memory is 250 clock cycles, the hit time of L2 cache is 15 clock cycles, the hit time of L1 is 1 clock cycle, and there are 1.4 memory references per instruction. What is the average memory access time and average stall cycles per instruction? Q8) Compare paging and segmentation with neat diagrams. Q9) List the typical levels in memory hierarchy with their important features.
Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. Memory hierarchy and cache optimization Memory technology Virual machines Cache performance Protection using virtual memory and virtual machines
Lesson Plan: L1. L2. L3. L4. L5. L6. Introduction to memory hierarchy design Advanced optimizations of Cache performance Memory technology and optimizations Protection: Virtual memory Virtual machines. Numerical problems
Assignment Questions:
Q1) Explain the following optimizations techniques which reduce hit time. (i)Small and simple caches (ii) Way prediction (iii) Trace caches Q2) Explain the compiler optimization techniques to reduce miss rate. Q3) Differentiate between SRAM and DRAM. Draw the internal organization of 64M bit DRAM. Q4) List eleven advanced optimizations of cache performance and explain any one. Q5) Explain optimizations techniques for increasing cache bandwidth. Q6) Explain memory technology and optimizations. Q7) Explain optimizations techniques for Reducing miss penalty. Q8) Explain protection via virtual memory. Q9) Explain protection via virtual machines. Q10) Explain Xen virtual machine .
Learning Objectives: At the end of this unit students will understand: 1. 2. 3. 4. 5. VLIW. EPIC Intel IA-64 Architecture, Itanium Processor Loop-Level Parallelism, Code for Parallelism Hardware Support for Parallelism
Lesson Plan: L1. L2. L3. L4. L5. L6. L7. Introduction: Exploiting Instruction-Level Parallelism Statically Detecting and Enhancing Loop-Level Parallelism Scheduling and Structuring Code for Parallelism Hardware Support for Exposing Parallelism: Predicated Instructions Hardware Support for Compiler Speculation The Intel IA-64 Architecture Itanium Processor; Conclusions.