You are on page 1of 6

SBST for On-Line Detection of Hard Faults in Multiprocessor

Applications Under Energy Constraints


A. Merentitis, D. Margaris, N. Kranitis,
A. Paschalis, and D. Gizopoulos

Department of Informatics & Telecommunications, University of Athens, Greece


Department of Informatics, University of Piraeus, Greece
{amer, nkran, paschali}@di.uoa.gr, dgizop@unipi.gr

AbstractSoftware-Based Self-Test (SBST) has emerged as an integrity problems, as well as serious power consumption
effective method for on-line testing of processors integrated in and overheating issues, especially when the circuit has to be
non safety-critical systems. However, especially for multi-core placed in special test modes [2]. For on-line testing, that
processors, the notion of dependability encompasses not only aims at detecting and/or correcting operational faults, test
high quality on-line tests with minimum performance overhead
methods that are based on hardware usually require special
but also methods for preventing the generation of excessive
power and heat that exacerbate silicon aging mechanisms and scheduling in order to avoid overheating that can cause
can cause long term reliability problems. In this paper, we circuit failures and long time reliability problems (e.g.
initially extend the capabilities of a multiprocessor simulator in accelerate silicon aging and wear-out phenomena like
order to evaluate the overhead in the execution of the useful electromigration, Negative Bias Temperature Instability and
application load in terms of both performance and energy Time Dependent Dielectric Breakdown). The
consumption. We utilize the derived power evaluation abovementioned problems are exacerbated for multi-core
framework to assess the overhead of SBST implemented as a processors, where heat dissipation is a real concern and
test thread in a multiprocessor environment. A range of typical temperature-related failures are more likely.
processor configurations is considered. The application load
Reliability in multiprocessor environments is a very
consists of some representative SPEC benchmarks, and various
scenarios for the execution of the test thread are studied hot topic and therefore various works have addressed
(sporadic or continuous execution). Finally, we apply in a different aspects of the problem. However, the majority of
multiprocessor context an energy optimization methodology these works usually address only the problem of intermittent
that was originally proposed to increase battery life for faults and soft-errors (e.g. [3]-[6]). These techniques are
battery-powered devices. The methodology reduces based on duplication of instructions from the actual
significantly the energy and performance overhead without application that may or may not be executed on the same
affecting the test coverage of the SBST routines. hardware and therefore are well suited for detecting short
lived transient faults but not long lived intermittent faults
Keywords- software-based self-testing; hard faults;
multiprocessors; on-line test; low energy optimization
(e.g. related to voltage drops or temperature issues) [7].
Even more so, they are not efficient for the detection of hard
faults, because they are based on actual application
I. INTRODUCTION instructions meaning that they tend to test the same logic
New types of defects appearing in deep submicron repeatedly, while large portions of the processor
technologies require at-speed testing in order to achieve functionality remain untested.
high test quality. Moreover, many types of faults are On-line test methodologies that are able to address the
increasingly difficult to detect during manufacturing testing problem of hard faults in functional units are relatively few
due to voltage stress and power limitations during burn-in (e.g. [1], [8], [9]) and most of them also only consider the
that can cause the test to be ineffective. If such faults escape, performance overhead. An interesting non-intrusive
they are likely to cause hard failures during the useful approach was proposed in [7], targeting the concurrent on-
lifetime of the system. In order to tolerate faults encountered line test of hard faults in simultaneous multi-threaded
during operation, a reliable system requires mechanisms for (SMT) processors using a test thread. Issues related to fault
detection, recovery and repair. Given the existence of low- coverage, generation of high quality vectors, employing the
cost mechanisms for system recovery and repair in assistance of hardware via test points and signature registers
contemporary chip multiprocessors (CMPs) the remaining were not addressed.
major challenge is the development of low-cost defect Software-Based Self-Test has been proposed [10]-[20] as
detection techniques [1]. a low-cost solution for testing of processors integrated in
In the multimillion gate SoC era, design and test non-safety critical applications that can be used either as an
engineers, apart from the usual challenges, also face signal alternative or a supplement to other test methods. It is using

978-1-4244-7723-4/$26.00 2010
c IEEE 62
existing processor resources for test pattern generation and intrusive, since it does not require any hardware
application, with no hardware or frequency overhead for the modifications. However, the impact in performance and
design. Moreover, it can be used for flexible and efficient energy overhead needs to be assessed.
on-line testing, because unlike most hardware solutions, it A brief theoretical analysis of the parameters that
allows to dynamically trade-off between reliability and contribute to power consumption is required in order to set
performance overhead [1]. Finally, the fact that SBST is the scope of the problem. Power consumption in CMOS
performed in normal mode using the processor Instruction circuits can be either static or dynamic. Leakage current
Set Architecture (ISA) alleviates the problem of excessive drawn continuously from the power supply causes static
toggle activity that is beyond the specification of the circuit power dissipation. Dynamic dissipation occurs during output
and can cause immediate circuit failures. However, long switching due to the short-circuit current and
term reliability problems need to be also addressed by charging/discharging of load capacitance. The importance of
energy optimization of the SBST routines. The main static power consumption increases as dimensions scale
contributions of this paper are summarized as follows: down, however current CMOS technologies are dominated
We assess the performance and energy overhead of a by dynamic power consumption. For a single node the latter
SBST test thread strategy using a novel power can be approximated by the following mathematic formula:
evaluation framework that synthesizes the P = C L S Vdd2 f CLK ,
capabilities of different tools.
The SBST test thread we used is not pseudorandom where CL is the equivalent load capacitance, Vdd is the power
as in [7] but deterministic, consisted of routines with supply voltage, S is the number of node switches and fCLK is
proven test capabilities. the operating frequency. Additionally, energy consumption
We evaluate the performance and energy overheads for a period T equals E = PavT and T = N, where Pav is the
in a multi-core processor that executes SPEC average power consumption over period a T, N is the total
benchmarks. number of execution cycles and is the clock period. It is
We apply the low energy optimization methodology apparent from the previous formulas that, for a given circuit
of [21] in a multi-core context in order to reduce the and technology, energy consumption can be reduced if the
energy overhead. test program has small cycle count and low average power
The rest of the paper is organized as follows. Some consumption.
preliminaries are briefly discussed in Section II. Section III However, these two factors cannot always be optimized
highlights the simulation environment and the flow we have simultaneously. A systematic methodology for low energy
used for generating performance overhead and energy optimization of SBST routines was proposed in [21] to
metrics. Experimental results are provided in Section IV. reduce the energy consumption for periodic non-concurrent
Finally, Section V concludes the paper. testing in battery-powered processors. Another important
advantage of that methodology is that it can be applied
II. ENERGY OPTIMIZATION IN THE T EST T HREAD independently before the application of other optimization
STRATEGY methods, such as scheduling algorithms that exploit thread-
The idea of a test thread for the detection of hard faults level parallelism (TLP) to speed up the execution of self-test
was introduced in [7]. The main assumptions of this routines (e.g. [23]) or algorithms for the scheduling of
approach are outlined in the following: the primary thread is online self-test tasks in hard real-time systems (e.g. [24]). In
executed normally in a simultaneous multithreading this paper, after we study the impact of the test thread
environment. When enough resources are available and the strategy in terms of performance and energy we proceed to
overall system load allows it, a secondary or test thread is assess the effectiveness of the methodology proposed in [21]
executed to detect hard faults in the underlying hardware. In in a multi-core, concurrent test scenario, implemented as a
fact, previous works show that even in a system that test thread. The test thread is comprised by deterministic
executes several threads there are often plenty of free high-RTL code [15] for functional units like the MAC,
resources [22]. As the hardware that executes the primary register file and pipeline logic [17], deterministic code
thread is not always the same (e.g. different core in a multi- generated by constrained ATPG according to the
core system or different ALU in a superscalar system), the methodology of [14] for the ALU and MAC adder, as well
test thread is also executed in different hardware, allowing as verification-based functional code for control-oriented
the detection of faults in a broad scope. components as introduced in [15] and extended in [20].
The test thread can be implemented through a variety of
hardware, software or hybrid techniques and its execution III. S IMULATION ENVIRONMENT
can be scheduled from the operating system or the hardware
The power evaluation framework utilized in this paper is
itself if the overall resource utilization is low. It can be
comprised by a combination of tools from the test
executed either sporadically (e.g. before checkpoints) or
technology and computer architecture technical areas. More
continuously. If a pure software approach, based on SBST is
specifically, the SBST routines are selected from a test
used, as the one here, the test thread strategy is entirely non-

2010 IEEE 16th International On-Line Testing Symposium 63


routine library that contains routines that are already Following this step, the selected routines are combined in a
validated in terms of test effectiveness using commercial test thread and this is compiled by the appropriate cross-
fault simulation tools. However, the most important part of compiler and executed together with the normal application
the multi-core power evaluation framework is the extended to generate the performance and energy overheads
CMP-SIM simulator. CMP-SIM [25] is an architectural- introduced by SBST. As a next step, in order to evaluate the
level simulator that extends Simplescalar with the capability impact of the energy optimization methodology of [21] in
for multi-core/multithread simulation. It offers a multi-core multi-core processors, the methodology of [21] is applied
micro architectural environment with a detailed cycle- and the same steps are repeated for the optimized routines.
accurate model for the key pipeline structures and it
implements a cache coherence protocol similar to MESI. In
order to be able to generate energy metrics in a multi-core
context we extended the original publically available version
of CMP-SIM with the incorporation of the energy libraries
and functions from a dedicated power simulator, Wattch.
The Wattch framework [26] is an architectural-level
simulator for estimating power consumption. It can
accurately model four main categories of processor elements
(array structures, memories, combinational logic and wires,
clock networks). Experimental validation of the generated
reports for numerous cases (including commercial
processors like Alpha 21264 and MIPS R10000) has shown
that the estimated power consumption is within 10-13% of
the values reported by the tools operating at the post layout
netlist. The power models incorporated by Wattch are
general and accurate [27] meaning that the results are valid
for a broad range of processors. These properties constitute
Wattch an ideal tool for evaluating the SBST test thread Figure 1: The multiprocessor power evaluation framework
strategies from an energy point of view.
The extension of CMP-SIM with the libraries and In this paper two different configurations, covering a
functions of Wattch involved the modification of about 20% wide range of processor benchmarks are considered.
of the approximately 10,000 lines of C code that comprise Detailed characteristics for the simulated processor
the original version of the simulator. This involved the configurations are presented in Table I.
addition of complex structures for maintaining the energy TABLE I. SIMULATED PROCESSOR C ONFIGURATIONS
metrics for every component and pipeline stage in every
Simple Advanced
core, as well as numerous modifications for preserving the
Branch Miss Penalty 5 cycles 8 cycles
integrity of the simulation and the validity of the results in Decode Width 4 8
multithread simulation. The flexibility offered by the Issue Width 4 8
produced configurable architectural-level multi-core Commit Width 4 8
simulator allows us to cover a broad design space and L1 Data Cache 16 KB 32 KB
L1 Instruction Cache 16 KB 32 KB
generate metrics for a wide range of processor models.
L2 Unified Cache 1024 KB 4096 KB
Thus, it is possible to evaluate the performance and energy L1 D-Cache Latency 2 cycles 4 cycles
overhead of the test thread across different configurations L1 I-Cache Latency 2 cycles 4 cycles
and under various scenarios of processor load. L2 Cache Latency 12 cycles 18 cycles
In order to evaluate the impact of the SBST thread in Integer ALUs 2 4
Floating Point ALUs 1 2
terms of both performance and energy, we have used the L/S Queue Size 16 32
flow that is depicted in Figure 1. Initially the considered Register Unit Size 32 64
processor specification is used to set the basic parameters of
the simulator configuration (number of cores, number of IV. EXPERIMENTAL RESULTS
functional units per core, cache sizes, cache and memory In this section the overhead of the test thread in terms of
latency, etc), as well as to select the appropriate SBST performance and energy consumption is evaluated for the
routines from an existing library of self-test routines (e.g. considered processor configurations, under various
[15], [16], [17]). Specifically, the considered routines scenarios. Specifically, simulations are performed for the
include deterministic High-RTL routines for functional case of a dual-core and quad-core processor that only
modules that are characterized by regularity, constrained executes the useful application, as well as for the case that
ATPG routines for non-regular functional modules and the same processor concurrently with the useful application
verification-based routines for control oriented modules. executes a test thread. Finally, a scenario of sporadic

64 2010 IEEE 16th International On-Line Testing Symposium


execution of the test thread is considered (i.e. before every up while energy consumption remains approximately the
checkpoint), as well as a scenario of a test thread that runs same (notable increase is exhibited only for the Vortex
continuously in the available cores (e.g. [1], [7]). For all benchmark). On the other hand, going from the simple to the
these cases two processor configurations are used (Table I). advanced processor model offers performance gains that are
The first step for evaluating the overhead of the test comparable to the previous case. Specifically, depending on
thread strategy is to consider the case that only the useful the nature of the benchmarks they can be higher or lower
application load is executed (either in a dual-core or quad- particularly for the Parser benchmark the longer cache miss
core execution environment). Table II presents Instructions penalties (reflected in the IPC) even result to slower
Per Cycle (IPC), number of cycles and energy consumption execution in the advanced model. However, the energy
for some representative SPEC2000 benchmarks, assuming requirements are increased significantly for all benchmarks.
that checkpoints are taken every 10 million instructions. The After evaluating the requirements of the useful
selected benchmarks cover a wide range of computer application load, we proceed to assess the impact of the test
operations, including scientific simulations, CAD tools, thread. Initially we consider the case of sporadic execution
desktop applications and databases. For every benchmark before every checkpoint. In this scenario the overhead of the
we fast-forward the first 10 million instructions test thread is mainly determined by its size in cycles. Since
(initialization part) and then execute up to the next deterministic SBST is used for most of the components [20],
checkpoint normally. Table III presents the same results as the cycle count for every execution of the test thread is small
Table II for the advanced processor configuration. (less than 100,000 cycles for all the different
configurations), thus we expect the overhead to be relatively
TABLE II. C YCLES AND ENERGY OF THE APPLICATION LOAD FOR
THE SIMPLE PROCESSOR CONFIGURATION
small as well. The overhead for the simple and advanced
configurations is presented in Tables IV and V, respectively.
SPEC 2000
IPC Cycles Energy (mJ)
Benchmark TABLE IV. DELAY AND ENERGY OVERHEAD FOR THE SIMPLE
Apsi 2.13 4,687,577 230.6 PROCESSOR CONFIGURATION SPORADIC EXECUTION
Fma3d 1.41 7,071,555 235.7
2-Core

SPEC 2000 % Delay % Energy


Mcf 2.07 4,821,497 285.5 IPC
Benchmark Overhead Overhead
Parser 1.26 7,902,626 323.8 Apsi 2.13 1.53% 2.37%
Vortex 1.50 6,680,683 253.6 Fma3d 1.42 1.01% 1.85%
2-Core

Apsi 2.67 3,750,039 235.1 Mcf 2.08 1.36% 1.92%


Fma3d 2.16 4,632,756 233.9 Parser 1.27 0.91% 1.18%
4-Core

Mcf 2.20 4,553,620 287.4 Vortex 1.50 1.24% 1.50%


Parser 1.59 6,272,347 323.9 Apsi 2.68 1.07% 2.39%
Vortex 2.13 4,692,960 265.5 Fma3d 2.17 0.84% 1.85%
4-Core

Mcf 2.21 0.92% 1.90%


TABLE III. C YCLES AND ENERGY OF THE APPLICATION LOAD Parser 1.61 0.65% 1.16%
FOR THE ADVANCED PROCESSOR CONFIGURATION
Vortex 2.15 0.73% 1.37%
SPEC 2000
IPC Cycles Energy (mJ)
Benchmark TABLE V. DELAY AND ENERGY OVERHEAD FOR THE ADVANCED
Apsi 2.67 3,750,088 405.8 PROCESSOR CONFIGURATION SPORADIC EXECUTION
Fma3d 1.92 5,207,359 306.6
2-Core

SPEC 2000 % Delay % Energy


Mcf 2.20 4,553,645 416.3 IPC
Benchmark Overhead Overhead
Parser 1.20 8,302,716 488.9 Apsi 2.67 1.39% 2.15%
Vortex 1.88 5,316,419 295.3 Fma3d 1.93 0.92% 1.16%
2-Core

Apsi 3.00 3,333,406 408.7 Mcf 2.20 1.21% 1.48%


Fma3d 2.48 4,039,791 305.8 Parser 1.21 0.83% 1.02%
4-Core

Mcf 2.33 4,285,763 417.5 Vortex 1.89 1.08% 1.21%


Parser 1.53 6,519,926 491.2 Apsi 3.02 0.77% 2.12%
Vortex 2.91 3,430,864 311.9 Fma3d 2.50 0.59% 1.19%
4-Core

Mcf 2.35 0.83% 1.44%


The results that are presented in Tables II and III lead to
Parser 1.55 0.48% 1.02%
some interesting outcomes. The first of these outcomes
Vortex 2.94 0.51% 1.09%
concerns the performance and energy consumption of the
processor when only the useful application load is executed. When we introduce the test thread in the form of sporadic
The simulations show that for most of the benchmarks testing before every checkpoint, we derive that the overall
moving to a quad-core processor provides an average speed-

2010 IEEE 16th International On-Line Testing Symposium 65


delay overhead is always below 2% (it could be higher if the It is clear that in the case of continuous execution the
test thread consisted of more routines) and it is reduced overhead is increased, especially in terms of energy
further for the quad-core processor model. It should also be consumption. Moreover, it is interesting to note that while
noted that between the dual-core model for the advanced the energy overhead is approximately uniform across
configuration and the quad-core model for the simple different benchmarks for the simple configuration, it varies
configuration the latter has smaller delay overhead, since the significantly for the advanced configuration, due to the
test thread is executed independently in the available core. increased delays associated with misses in the speculative
Finally, regarding energy consumption we derive that it is performance aiding mechanisms.
affected more significantly than delay, because in order to However, high energy consumption for a significant
compensate for the extra load (and increase IPC) all amount of time can cause overheating that is a major
processor configurations trigger hardware that was concern for long term silicon reliability because it
previously in sleep mode (e.g. due to clock gating) more accelerates silicon aging and wear-out mechanisms. Thus,
often. Therefore, the extra load is only partially translated to apart from the main application, the test thread should also
delay, while energy consumption is impacted directly. be optimized to avoid such problems, especially in the case
A different scenario that is also very interesting from a of continuous execution. In this direction we apply the
reliability point of view is to continuously execute the test energy optimization methodology proposed in [21], in order
thread, as long as resources are available. In this scenario the to study its effectiveness in a multi-core environment. The
impact of the test thread in terms of delay and energy methodology of [21] consists of the following steps:
consumption is not determined by the number of cycles Energy-aware loop synthesis, deployed to minimize the
(since it executes continuously) but from the properties of byte count of the routine and to better exploit caches,
the routines that comprise it. An assessment of the energy Loop transformations for loop-based routines,
overhead for this scenario is presented in Tables VI and VII. Instruction substitution for replacement of instructions
with equivalent but more energy efficient,
TABLE VI. DELAY AND ENERGY OVERHEAD FOR THE SIMPLE Modified Register Name Adjustment (RNA) to
PROCESSOR CONFIGURATION CONTINUOUS EXECUTION
minimize bit toggles on the address decoders and buses.
SPEC 2000 % Delay % Energy
Benchmark
IPC
Overhead Overhead
Application of the energy optimization methodology on the
Apsi 26.72% 63.6%
routines of [20] that cover the complete range of SBST
2,72
techniques (deterministic high-RTL, constrained ATPG and
Fma3d 1,92 19.01% 61.2%
2-Core

verification-based) shows that the test coverage, measured in


Mcf 2,72 23.36% 65.3%
the gate level netlists of two processor benchmarks [21] is
Parser 1,75 16.91% 62.0%
reduced by less than 0.5%. Results on a per module basis are
Vortex 2,01 20.24% 69.9%
also provided in [21] and indicate that deterministic routines
Apsi 3,59 20.09% 61.3% (high-RTL or constrained ATPG) are not affected while
Fma3d 2,95 18.34% 61.7% verification based routines exhibit some degradation in test
4-Core

Mcf 3,01 17.85% 65.7% coverage because strict equivalence cannot be achieved. The
Parser 2,27 13.26% 61.9% effect of the methodology in terms of test coverage for
Vortex 2,97 15.82% 58.1% different SBST techniques is derived from the nature of the
optimization steps and thus is similar for all routines [21].
TABLE VII. DELAY AND ENERGY OVERHEAD FOR THE
The gains in terms of energy consumption are presented in
ADVANCED PROCESSOR CONFIGURATION CONTINUOUS EXECUTION Table VIII and Table IX.
SPEC 2000 % Delay % Energy TABLE VIII. DELAY AND ENERGY OVERHEAD FOR THE SIMPLE
IPC PROCESSOR CONFIGURATION OPTIMIZED TEST THREAD
Benchmark Overhead Overhead
Apsi 3,50 23.08% 44.0% SPEC 2000 % Delay % Energy
IPC
Fma3d 2,62 18.44% 77.0% Benchmark Overhead Overhead
2-Core

Mcf 2,93 21.21% 77.3% Apsi 2,86 20.28% 44.1%


Parser 16.59% 76.5% Fma3d 2,00 14.39% 41,7%
2-Core

1,67
Vortex 2,53 20.06% 62.7% Mcf 2,82 18.61% 48.9%
Apsi 17.62% 43.3% Parser 1,81 12.75% 45.2%
4,12
Vortex 2,10 15.18% 49.6%
Fma3d 3,42 16.91% 77.4%
4-Core

Mcf 16.35% 77.0% Apsi 3,72 15.67% 41.2%


3,24
Fma3d 3,05 14.19% 42.3%
4-Core

Parser 2,19 12.87% 75.9%


Vortex 4,09 15.13% 66.8% Mcf 3,13 13.20% 49.1%
Parser 2,33 10.34% 41.8%
Vortex 3,07 12.18% 39.7%

66 2010 IEEE 16th International On-Line Testing Symposium


TABLE IX. DELAY AND ENERGY OVERHEAD FOR THE ADVANCED [7] Eric F. Weglarz , Kewal K. Saluja , T. M. Mak, Testing of Hard
PROCESSOR CONFIGURATION OPTIMIZED TEST THREAD Faults in Simultaneous Multithreaded Processors, International On-
Line Testing Symposium, , p.95, July 12-14, 2004.
SPEC 2000 % Delay % Energy [8] Y. Li, S. Makar and S. Mitra, "CASP: Concurrent Autonomous Chip
IPC
Benchmark Overhead Overhead Self-Test Using Stored Test Patterns", Design, Automation and Test
Apsi 3,66 17.54% 31.2% in Europe (DATE), March 2008, pp. 885-890.
Fma3d 14.16% 51.9% [9] O. Khan, S. Kundu A Self-Adaptive System Architecture to Address
2,72
2-Core

Transistor Aging, Design, Automation and Test in Europe (DATE),


Mcf 3,04 16.71% 55.7% 2009.
Parser 1,73 12.48% 56.4% [10] J. Shen, J. A. Abraham, Synthesis of Native Mode Self-Test
Vortex 2,64 15.03% 39.6% Programs in Journal of Electronic Testing: Theory and Applications
(JETTA), Volume 13, Number 2, October 1998, pp. 137-148 (12).
Apsi 4,28 13.21% 35.8% [11] I. Bayraktaroglu, J. Hunt, D. Watkins, Cache Resident Functional
Fma3d 12.85% 52.3% Microprocessor Testing: Avoiding High Speed IO Issues,
3,54
4-Core

International Test Conference (ITC), 2006, paper 27.2.


Mcf 3,34 12.77% 55.1% [12] F. Corno, G. Cumani, M. Sonza Reorda, G. Squillero, Fully
Parser 2,26 9.69% 54.4% Automatic Test Program Generation for Microprocessor Cores,
Design Automation & Test in Europe (DATE) 2003, pp.1006-1011.
Vortex 4,21 11.80% 42.5%
[13] E. Sanchez, M. Sonza Reorda, G. Squillero, "On the Transformation
of Manufacturing Test Sets into On-Line Test Sets for
Juxtaposing the results of Table VIII with Table VI, as well Microprocessors," International Symposium on Defect and Fault
as the results of Table IX with Table VII shows that both the Tolerance in VLSI Systems, 2005, pp.494-502.
performance and the energy overhead are reduced [14] L. Chen, S. Ravi, A. Raghunathan, S. Dey, A Scalable Software-
significantly (especially for the latter on average Based Self-Testing Methodology for Programmable Processors,
Design Automation Conference (DAC) 2003, pp. 548-553.
approximately one third of the overhead is removed). [15] N. Kranitis, A. Paschalis, D. Gizopoulos, G. Xenoulis, Software-
Based Self-Testing of Embedded Processors, IEEE Transactions on
V. CONCLUSIONS Computers, vol. 54, no. 4, pp. 461-475, April 2005.
In this paper, we extended the capabilities of a [16] A. Paschalis, D. Gizopoulos, Effective software-based self-test
strategies for on-line periodic testing of embedded processors, IEEE
multiprocessor simulator in order to evaluate the impact of a Transactions on CAD, Vol. 24, no.1, pp. 88 99, Jan. 2005.
SBST test thread strategy in the execution of the useful [17] N. Kranitis, A. Merentitis, N. Laoutaris, G. Theodorou, A. Paschalis,
application load in terms of both performance and energy. D. Gizopoulos, C. Halatsis, Optimal periodic testing of intermittent
The application load consisted of some representative SPEC faults in embedded pipeline processor applications, Design,
Automation and Test in Europe (DATE), 2006, pp. 65-71.
benchmarks, and various scenarios of sporadic or [18] C.H.P. Wen, L.C. Wang, K.T. Cheng, W.T. Liu; J.J. Chen,
continuous execution of the test thread were evaluated. A Simulation-based target test generation techniques for improving the
broad range of processor configurations was studied. robustness of a software-based-self-test methodology, International
Finally, we applied for the first time in a multiprocessor Test Conference (ITC), 2006, pp. 936 945.
[19] A. Apostolakis, D. Gizopoulos, M. Psarakis, and A. Paschalis
context an energy optimization methodology that was Software-Based Self-Testing of Symmetric Shared-Memory
originally proposed to increase battery life for on-line test of Multiprocessors, IEEE Transactions on Computers, vol. 58, no. 12,
battery-powered processors. Simulation results indicate that pp. 1682-1694, July 2009.
the methodology reduces significantly the energy and [20] N. Kranitis, A. Merentitis, G. Theodorou, A. Paschalis, D.
Gizopoulos, "Hybrid-SBST Methodology for Efficient Testing of
performance overhead in the multiprocessor application. Processor Cores" IEEE Design & Test of Computers, vol.25, no.1,
pp.64-75, Jan-Feb 2008.
REFERENCES [21] A. Merentitis, N. Kranitis, A. Paschalis, D. Gizopoulos, Low Energy
On-Line SBST of Embedded Processors, International Test
[1] K. Constantinides, O. Mutlu, T. Austin, V. Bertacco, Software-Based Conference (ITC), paper 12.1, 2008.
Online Detection of Hardware Defects: Mechanisms, Architectural [22] D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous
Support, and Evaluation, International Symposium on multithreading: maximizing on-chip parallelism, International
Microarchitecture (MICRO), 2007, pp 97-108. Symposium on Computer Architecture (ISCA), pp. 392-403, 1995.
[2] M. Nicolaidis, Y. Zorian, On-line Testing for VLSI A [23] A. Apostolakis, M. Psarakis, D. Gizopoulos, A. Paschalis, I. Parulkar,
Compendium of approaches, in Journal of Electronic Testing: Exploiting Thread-Level Parallelism in Functional Self-Testing of
Theory and Applications (JETTA), Vol. 12, No. 1-2, 1998, pp 7-20. CMT Processors, European Test Symposium (ETS), May, 2009.
[3] S. S. Mukherjee, M. Kontz, and S. K. Reinhardt, Detailed design and [24] D. Gizopoulos, Online Periodic Self-Test Scheduling for Real-Time
evaluation of redundant multithreading alternatives, Annual Processor-Based Systems Dependability Enhancement, IEEE
International Symposium on Computer Architecture (ISCA), Transactions on Dependable and Secure Computing, vol. 6, no. 2, pp.
30(2):99-110, 2002. 152-158, April 2009.
[4] F. Rashid, K. K. Saluja, and P. Ramanathan, Fault tolerance through [25] Sandeep Baldawa and Rama Sangireddy, CMP-SIM: An
re-execution in multiscalar architecture, International Conference on Environment for Simulating Chip Multiprocessor (CMP)
Dependable Systems and Networks, pp. 482-491, 2000. Architectures, University of Texas at Dallas, October 2006.
[5] E. Rotenberg, AR-SMT: A microarchitectural approach to fault [26] D. Brooks, V. Tiwari, M. Martonosi, Wattch: a framework for
tolerance in microprocessors, International Symposium on Fault- architectural level power analysis and optimizations, International
Tolerant Computing (FTCS), pp. 84-91, 1999. Symposium on High-Performance Computer Architecture (HPCA),
[6] T. N. Vijaykumar, I. Pomeranz, and K. Cheng, Transient fault 2000, pp. 83- 94.
recovery using simultaneous multithreading, International [27] D. Brooks, P. Bose, M. Martonosi, Power-performance simulation:
Symposium on Computer Architecture (ISCA), 30(2):87-98, 2002. design and validation strategies, ACM SIGMETRICS, vol. 31, pp.
13-18, March 2004.

2010 IEEE 16th International On-Line Testing Symposium 67

You might also like