Professional Documents
Culture Documents
The contexts of two or more threads are often stored in separate on-chip register sets. Formally speaking, CMT(Chip Multi-Threading), is a processor technology that allows multiple hardware threads of execution (also known as strands) on the same chip, through multiple cores per chip, multiple threads per core, or a combination of both. Let's see various techniques that enable hardware multithreading: 1. Multiple Cores per Chip CMP(Chip Multi-Processing, a.k.a. Multicore), is a processor technology that combines multiple processors (a.k.a. cores) on the same chip. (see Figure 2 (b)) The idea is very similar to SMP, but implemented within a single chip. [10] is the most famous paper about this technology. 2. Multiple Threads per Core 2.1 Vertical Multithreading - Instructions can be issued only from a single thread in any given CPU cycle. - Interleaved Multithreading(a.k.a. Fine Grained Multithreading), the instruction(s) of other threads is fetched and fed into the execution pipeline(s) at each processor cycle. So context switches at every CPU cycle.(see Figure 1 (b)) - Blocked Multithreading(a.k.a. Coarse Grained Multithreading), the instruction(s) of other threads is executed successively until an event in current execution thread occurs that may cause latency. This delay event induces a context switch. (see Figure 1 (c)) 2.2 Horizontal Multithreading - Instructions can be issued from multiple threads in any given cycle. This is so called Simultaneous multithreading (SMT): Instructions are simultaneously issued from multiple threads to the execution units of a superscalar processor. Thus, the wide superscalar instruction issue is combined with the multiple-context approach. (see Figure 2 (a))
Figure 2 - Multiple Thread Multiple Issue (from [3]) In summary[3]: - Unused instruction slots, which arise from latencies during the pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed among those thread contexts that are loaded in the register sets. - Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads in each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor to utilize a larger part of the issue bandwidth by issuing instructions from different threads simultaneously.
Notes: Superpipeline - extreme pipeline processor technology, where the instruction pipeline is divided into extreme amount (usually, 8+) of pipe-lined stages. Superscalar - (a.k.a. multiple issue), is a processor technology, where multiple instructions can be issued to the instruction execution unit.