Elysium VLSI 2010

Elysium Technologies Private Limited
ISO 9001:2008 A leading Research and Development Division

Madurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore
Abstract Very Large Scale Integration 2010 - 2011
01 Novel Vth Hopping Techniques for Aggressive Runtime Leakage Control
The continuous increase of leakage power consumption in deep sub-micro technologies necessitates more aggressive
leakage control. Runtime leakage control (RTLC) is effective, since runtime circuits generally have significant amount
of idleness. However, current RTLC techniques are only used when circuits have long idleness, rendering the
techniques less profitable. The reason is due to the large energy and delay overhead when performing RTLC mode
transition. We propose two novel techniques, workload-adaptive Vth hopping (WAVTH) and hierarchical Vth hopping
(HIVTH), to tackle the overhead problems and enable aggressive runtime leakage control. Experimental results show
19.2% average improvement on leakage saving with WAVTH and HIVTH over basic Vth hopping. The optimum design
points of these two techniques are determined through accurate modeling
Leakage- Aware Energy Minimization using Dynamic Voltage Scaling and Cache Reconfiguration in Real-
02 Time Systems
System optimization techniques are widely used to improve energy efficiency as well as overall performance. Dynamic
voltage scaling (DVS) is acknowledged to be successful in reducing processor energy consumption. Due to the
increasing significance of the memory subsystem’s energy consumption, dynamic cache reconfiguration (DCR)
techniques are recently proposed at the aim of saving cache subsystem’s energy consumption. As the manufacturing
technology scales into the order of nanometers, leakage current, both in the processor and cache subsystem,
becomes a significant contributor in the overall power dissipation. In this paper, we efficiently integrate processor
voltage scaling and cache reconfiguration together that is aware of leakage power to minimize overall system energy
consumption. Experimental results demonstrate that our approach outperforms existing techniques by on average 12 -
23%.
03 Modeling of RF- MEMS BAW Resonator
Due to the demand of smaller and more portable devices the applications of MEMS resonators are rapidly increasing.
Solidly Mounted Resonators (SMR) based on Bulk Acoustic Wave (BAW) technology follow MEMS principles to build
high performance microwave filters for RF communication. In this paper we will provide the architecture of SMRs by
discussing the designing aspects of its core structures which are within foundry CMOS processes using RF design
software Advanced Design System (ADS). Conventional VLSI processes are followed for the fabrication of the SMRs.
The results from the fabricated data are compared and discussed.
#230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India
(: +91 452-4390702, 4392702, 4390651
Website: www.elysiumtechnologies.com,www.elysiumtechnologies.info
Email: info@elysiumtechnologies.com
04 Modeling and Design Considerations of Coupled Inductor Converters
In this part of the sequel on the modeling and analysis of coupled inductors and coupled inductor based multiphase
switching converters, the recently developed symmetrical coupled inductor model is first extended to include the
inductor winding dc resistance (DCR). The extended model is then used to analyze the influence of the coupling on the
DCR based current sensing schemes popularly used in multi-phase switching regulators. It is found that the time-
constant matching condition in coupled inductor converters needs to be modified to include the coupling coefficient.
The proposed model is also used to derive the small-signal control-to-output transfer function of the converters
incorporating coupled inductors, with which the effect of coupling on the dynamic behaviors of the converter power
stage, such as resonant frequency and damping factor, can be easily evaluated.
05 Instruction Selection in ASIP Synthesis using Functional Matching
In embedded systems, Application Specific Instruction Set Processors (ASIPs) are used commonly with the aim to get
high performance without losing flexibility. A crucial operation required during ASIP synthesis (in particular, selection
of custom instructions) as well as code generation for ASIPs is identifying portions of an application program that can
be executed by custom functional units (CFUs). Most existing solutions achieve this by matching structure of patterns
corresponding to CFUs with sub-graphs of application data flow graphs. Often it happens that the computations
performed by the two are equivalent, but due to structural dissimilarities the match is missed. What is needed is a
method that can match two graphs functionally rather than structurally. In this paper, we present a novel method to do
this and give implementation results to show its effectiveness.
06 Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting
In this work, the authors present the idea of inexact decision making, and its application to threshold voting of Hamming
weights, used in bus coding schemes, N-Modular Redundancy (NMR), Median filtering, and other pattern matching
applications. Decision circuits can be tweaked, to perform in an inexact manner, in order to optimize in terms of delay and
power, but still maintaining high system accuracy. One such circuit identified by the authors is the threshold voting of
Hamming weights. A majority voter is a special case of this family of circuits. The proposed inexact voter consumes up to 8
times less power than the exact voter, with negligible reduction in system accuracy and performance. The leakage power is
also reduced by a factor of 3. The inexact voter allows for a higher frequency of operation, by reducing the critical path delay
by a factor of up to 3.4. The results obtained validate the application of inexact decision making, with respect to threshold
voters.
(: +91 452-4390702, 4392702, 4390651
07 Implementation of a Novel Phoneme Recognition System using TMS320C6713 DSP
A number of techniques have been proposed in the literature for phoneme based speech recognition system. In this
paper, a technique for automatic phoneme recognition using zero-crossings (ZC) and magnitude sum function (MSF) is
proposed. The number of zero-crossings and Magnitude sum function per frame are extracted and a Minimum Distance
Classifier is proposed to recognize the phonemes in each frame with these features. In order to increase the
recognition accuracy of phonemes, a finite state machine is also proposed. The performance of the proposed
phoneme recognition system is evaluated using TTS database and compared with the system using Linear Predictive
Coefficients (LPC) feature inputs. Phoneme recognition accuracies of 70.93% and 55.25% are obtained for the system
using LPC and the one using ZC along with MSF respectively. However, using the finite state machine proposed in this
paper, 100% recognition accuracy is obtained for both the techniques. The computational costs required for
recognizing various sentences using both of the feature extraction techniques are evaluated. It is observed that the
proposed technique requires about 9.3 times lower computational cost than the one using LPC. The proposed
technique is adopted for the implementation of the phoneme recognition system on Texas Instruments TMS320C6713
floating point processor. The different ways to reduce the recognition time for the target device is explored and
reported in this paper. The technique proposed here is also applicable for speech inputs from other database.
08 Impact of Temperature on Test Quality
The usage of more advanced, less mature processes during manufacturing of semiconductor devices has increased
the need for performing unconventional types of testing, like temperature-testing, in order to maintain the same high
quality levels. However, performing temperature-testing is costly. This paper proposes a viable low-cost alternative to
temperature testing that quantifies the impact of temperature variations on the test quality and also determines optimal
test conditions. The test flow proposed is empirically validated on an industrial-standard die. The results obtained
show that majority of the defects that were originally detected by temperature-testing are also detected by the
proposed test flow, thereby reducing the dependence on temperature testing to achieve zero-defect quality. Details of
an interesting defect behavior at cold test conditions are also presented.
09 Identifying the bottlenecks to the RF performance of FinFETs
: In this work, the high frequency (RF) performance of FinFETs is investigated in detail using a two-level parasitic
model comprising outer and inner parasitic capacitances in addition to parasitic series resistances. Use of scaling
relations of these parasitic capacitances with numbers of fins and fingers allows extraction of these elements. Next, by
defining a series of reference surfaces, each associated with a certain set of parasitic elements; we proceed to
calculate the RF Figures of Merit, namely fT and fmax at these surfaces. These are called ‘available fT (fmax)’ in this
work. Analysis of the available fT (fmax) gives insight into the extent to which different parasitics affect the FinFET’s
RF performance. The main bottleneck to the FinFET’s RF performance is identified, solutions are proposed and
relevant trade-offs are discussed.
(: +91 452-4390702, 4392702, 4390651
10 Identifying Tests for Logic Fault Models Involving Subsets of Lines without Fault Enumeration
Bridging and interconnect open faults are defined using subsets of lines. We study the possibility of identifying input
vectors that are effective as test vectors for such faults without enumerating the faults. This process does not require
accurate layout information, it can handle very large numbers of faults, and it deals with undetectable faults implicitly.
We describe a static test compaction process that uses the ability to identify effective test vectors without enumerating
faults. This process selects a subset T of a given test set U such that T is guaranteed to detect the same faults as U.
We also describe a test generation process based on the same concept. Finally, we show how this concept can be
used to compare test sets.
Hamming Distance Based Reordering and Column wise Bit Stuffing with Difference Vector: A Better
11 Scheme or Test Data Compression with Run Length Based Codes
Because of increased design complexity and advanced fabrication technologies, the number of tests and
corresponding data volume increases rapidly. As the large size of test data volume is becoming one of the major
problems in testing System on- a-Chip (SoC), several compression coding schemes have been proposed in past. Run
Length Coding is one of the most familiar coding methodologies for compression. In this paper, we present a new
scheme named Hamming Distance Based Reordering and Column wise Bit Stuffing with Difference Vector (HDR-
CBSDV), Which can be used with any run length based code technique for better compression ratio. Four techniques
have been applied in this scheme: Selection of first vector, Hamming Distance Based Reordering, Column wise Bit
Stuffing and Difference Vector. Instead of directly applying any known run length code like Golomb, Frequency
Directed Run Length (FDR), Extended FDR (EFDR), Modified FDR (MDFR) or Shifted Alternate FDR (SAFDR) to given
test set, if we apply the proposed scheme to test set prior to applying the run length base code, the compression
obtained is improved drastically. The experimental results on ISCAS89 Benchmark circuits shows that the test data
compression ratio improves significantly for each case. It is also noteworthy that in most of the case, this scheme
does not involve any extra silicon area over-head compared to the base code with which it used. For few cases, it
requires an extra XOR gate and feedback path only. The proposed scheme can be easily integrated into the existing
industrial flow.
12 Functional Refinement: A Generic Methodology for Managing ESL Abstractions
Ever increasing complexity of SoCs has resulted in starting the system design at a higher level of abstraction. System-
level design methodology envisages step-wise refinement of high-level models towards final RTL. However, current
practices are limited to only interface-refinement and the true functionality refinement is performed by developing a
different model for each abstraction-level. This results in minimal re-use of existing model, loss of efforts and high
maintenance cost of multiple models. This paper presents a novel methodology that enables seamless refinement of IP
model functionality from one level to another. The presented methodology is generic to be applied to various SoC
design tasks. This paper demonstrates the application of the methodology to software energy-estimation for a DSP
and functionalcum- timing refinement of DDR-memory model. The proposed methodology resulted in complete re-use
of the existing models, easy availability of various model abstractions and 20% savings in development-effort of a new
model.
(: +91 452-4390702, 4392702, 4390651
13 Exploring use of NoC for Reconfigurable Video Coding
MPEG RVC is a standard under development which addresses the issues of standardization of video coding tools and
multi-format codec design. It is likely to have SoC based solutions developed for MPEG RVC in future. In this paper we
evaluate Network on Chip (NoC) as an on-chip interconnection mechanism for MPEG RVC SoC. We use MPEG RVC
reference C code for MPEG 2 and AVC intra-only decoding, and the Open source NoC named NoCem for simulation.
We make a new simulation platform over NoCem by using VHDL FLI. This platform allows us to make cycle accurate
measurements. We experiment with different input resolutions and different configurations of NoCem and measure the
network performance and overhead for reconfiguration. The results point that NoC is potential candidate for on-chip
interconnection mechanism for MPEG RVC SoC.
14 Experimental Results and Study of a Modified Adaptive Bus Voltage Controller
Two-stage power conversion is used in many applications. The determination of the bus voltage in two stage converter
is important to achieve high overall power conversion efficiency. Conventional two-stage converters utilize either a
fixed DC bus voltage or a variable but predetermined DC bus voltage, which do not necessarily result in an optimum
operation with optimum bus voltages under variable conditions. The paper presents a modified adaptive bus voltage
controller for two-stage power converter. The controller adaptively converges to the optimum bus voltage that yields to
the maximum power conversion efficiency under variable operating conditions. The modified adaptive two-stage bus
voltage controller is evaluated using results obtained from a proof of concept experimental prototype
15 Electrical Modeling of Lithographic Imperfections
Lithographic wavelength of 193nm has been used for past few generations of patterning and is likely to remain in use
for next few technology generations (at least till 28nm technology half-node) as well. This deep sub-wavelength
patterning has resulted in wafer shapes not resembling drawn rectilinear shapes. The resulting non-rectangular
devices and wires are not handled by current generation modeling and analyses methods. In this paper, we present a
survey of electrical modeling methods for such lithographic imperfections especially on transistor layers. We also
discuss use contexts of such models as well as briefly present electrical implications of the likely future patterning
candidate, namely double patterning
(: +91 452-4390702, 4392702, 4390651
Design Procedure for High Frequency Operation of the Modified Series Resonant APWM Converter with
16 Improved Efficiency and Reduced Size
In this paper, a generalized analysis for the auxiliary network in a modified series resonant asymmetrical pulse width-
modulated (APWM) converter is performed to produce a design procedure that ensures ZVS is achieved for any
converter design. New equations that correctly predict the magnitude of auxiliary current are obtained by accounting
for the trapezoidal nature of the waveforms associated with high frequency operation, and the dead time between the
switches in the half bridge. A design example of a 48V/1.2V, 25A converter operating at 1MHz is chosen to highlight the
validity of the proposed design and that superior results can be achieved if the resonant tank is designed in tandem
with the auxiliary network. Experimental results verify that ZVS is achieved, and that the proposed design reduces the
auxiliary inductor by close to a factor of 3.
17 Design of Reversible Latches Optimized for Quantum Cost, Delay and Garbage Outputs
Reversible logic has extensive applications in emerging nanotechnologies, such as quantum computing, optical
computing, ultra low power VLSI and quantum dot cellular automata. In the existing literature, designs of reversible
sequential circuits are presented that are optimized for the number of reversible gates and the garbage outputs. The
optimization of the number of reversible gates is not sufficient since each reversible gate is of different computational
complexity, and thus will have a different quantum cost and delay. While the computational complexity of a reversible
gate can be measured by its quantum cost, the delay of a reversible gate is another parameter that can be optimized
during the design of a reversible sequential circuit. In this work, we present novel designs of reversible latches that are
optimized in terms of quantum cost, delay and the garbage outputs. The optimized designs of reversible latches
presented in this work are the D Latch, JK latch, T latch and SR latch
18 Design of NoC for SoC with Multiple Use Cases Requiring Guaranteed
Many SoC architectures aimed at the multimedia domain support multiple use cases where only a subset of the
applications is active at any time. Further, each multimedia application itself poses strict constraints on core-to-core
communication latency. This paper presents an approach for automated synthesis of NoC architectures for such an
SoC. We evaluated our design approach through comparisons with two existing techniques aimed at generating best
effort and guaranteed throughput designs. Designs generated by our approach showed a marked improvement in both
power consumption (12.3% decrease) and resource requirements (12.9% decrease) in comparison to the best effort
NoC design approach. In comparison to the existing guaranteed throughput design approach our designs can
guarantee core-to-core latency while consuming less power (8.1% decrease) and resources (7.9% decreases).
(: +91 452-4390702, 4392702, 4390651
19 Design of Low-Cost High-performance Floating-point Fused Multiply- Add with Reduced Power
This paper presents a floating-point fused multiply-add (FMA) unit with low-cost and low power techniques. To
improve the performance, two single-precision operations can be performed concurrently with one double-precision
data path, which is very useful in multimedia and even scientific applications. Moreover, to reduce the additional area
costs for supporting two single-precision operations in parallel, multiple double-precision units, i.e., the multiplier,
shifter and adder, are reused as much as possible. A modified dual-path algorithm is proposed by classifying the
exponent difference into three cases and implementing them with CLOSE and FAR paths, which can reduce latency
and facilitate lowering power consumption by enabling only one of the two paths. In addition, in case of FADD
instructions, the multiplier in the first stage is bypassed and kept in stable mode, which can significantly improve
FADD instruction performance and lower power consumption. The overall FMA unit has a latency of 4 cycles while the
FADD operation has 3 cycles. Each cycle has a time delay of about 0.66ns in the ST 65nm CMOS technology.
Compared with the conventional double-precision FMA, about 13% delay is reduced and about 22% area is increased,
which is acceptable since two single-precision results can be generated simultaneously.
20 Design Considerations for BEOL MIM Capacitor Modeling in RF CMOS Processes
Modeling ofMIM capacitors in high frequency RF applications depends heavily on the design of test structures. An
external substrate ring is shown to be essential in capturing and modeling the inherent inductance of the MIM
capacitor. Additionally, deembedding of series parasitics plays a very important role in modeling of MIM capacitors
since these devices have very low series resistance. Various short Structures were studied and their impacts on the
MIM characteristics are reported. It is shown that a short structure with the shortest path to ground is best suited to
deembed the series parasitics.
21 Coverage Management with Inline Assertions and Formal Test Points
This paper studies the problem of coverage management with two emerging formalisms in simulation based validation,
namely formal specification of test points and the use of inline temporal assertions. We present methods for checking
whether a test-bench with inline assertion covers a set of formal test points. This is particularly useful in developing
verification IPs for standard on-chip protocols where the development team must make sure that the test bench
provided in the verification IP checks all the important aspects of the protocol. We demonstrate the efficacy of our
approach over the ARM AMBA verification IP.
22 Clocking-based Coplanar Wire Crossing Scheme for QCA
Quantum-dot Cellular Automata is one of the promising next-gen fabrics for circuits. Coplanar wire crossings is one of
the more elegant features of this new low power computing paradigm. However, these need two types of cells and are
known to be neither easy to fabricate nor very robust. In this work, we propose coplanar wire crossing using a single
type of QCA cells, by applying the concept of Time Division Multiplexing to design the crossing. This has massive
implications in fabrication and fault tolerance of QCA circuits.
(: +91 452-4390702, 4392702, 4390651
23 Channel Optimization for the Design of High Speed I/O links
The continuous increase in microprocessor performance demands an equal order of increase in the bandwidth
requirements on the memory and I/O interfaces. Providing the required bandwidth at an acceptable cost is a challenge
to the system packaging engineer. This paper discusses how a passive channel can be optimized in a cost effective
way to provide the maximum bandwidth. The paper focuses on the design methodology including modeling the
channel, identifying the channel bottle-necks, optimizing around the bottle-necks and verifying the conclusions
through simulation. Finally the simulation results are verified through hardware measurements.
24 Bridgeless Buck PFC Rectifier
TA new bridgeless buck PFC rectifier that substantially improves efficiency at low line of the universal line range is
introduced. By eliminating input bridge diodes, the proposed rectifier’s efficiency is further improved. Moreover, the
rectifier doubles its output voltage, which extends useable energy of the bulk capacitor after a drop-out of the line
voltage. The operation and performance of the proposed circuit was verified on a 700-W, universal-line experimental
prototype operating at 65 kHz. The measured efficiencies at 50% load from 115-V and 230-V line are both close to
96.4%. The efficiency difference between low line and high line is less than 0.5% at full load. A second-stage half-
bridge converter was also included to show that the combined power stages easily meet Climate Saver Computing
Initiative Gold Standard
Bottleneck Identification Techniques leading to Simplified Performance Models for Efficient Design
25 Space Exploration in VLSI Memory Systems
High performance VLSI systems are being built as multiprocessor systems-on-chip. The number of processors and
their performance is rising rapidly while the change is slower for the memories. The memory system is often a
performance bottleneck in terms of either its bandwidth or latency. We propose sensitivity analysis as a means to
pinpoint the bottleneck. We introduce a novel randomized technique to measure the sensitivities within cycle accurate
simulators. The sensitivity measures identify the bottleneck regions of the design space, within which simplified
performance models can be used for optimization. We demonstrate this methodology on the Augmint-MemSim
simulator, which is a cycle accurate model for multi-processor systems with a distributed memory sub-system. We
empirically show that: (i) Performance predictions from simplified models are strongly correlated with the simulator in
the high sensitivity regions. (ii) The simplified models speed up design Space exploration by 2 – 3 orders of magnitude
over the simulator resulting in better design solutions.
(: +91 452-4390702, 4392702, 4390651
26 Architectural Comparison of Analog and Digital Duty Cycle Corrector for High Speed I/O Link
To achieve high speed data signaling rates with the internal fast clock operating at half its speed, the XDR (extreme
data rate) I/O link employs dual-edge signaling where in data bits are transmitted on both the edges (rise/fall) of
transmit clock. Duty cycle correction technique is used to provide high frequency low jitter clocks that have 50% duty
cycle. This paper compares two different techniques to implement duty cycle corrector (DCC). These techniques are
implemented in high speed I/O operating at data rate of 4Gbps and 6.4Gbps in TSMC 65nm & TSMC 40nm technology
achieving an output duty cycle error below ±2% for ±10% input duty cycle error.
27 Analyzing Energy-Delay Behavior in Room Temperature Single Electron Transistors
This paper presents Single Electron Transistor (SET) devices operating at room temperature as an attractive option to
implement low energy consumption circuits with low-tomoderate performance requirements. Currently, such circuits
are implemented using CMOS technologies operating at low supply voltages. CMOS is usually leakage dominated at
such a low voltage regime and various optimizations are necessary to design low energy circuits. By discussing the
energy-delay trade-offs for SET devices and comparing them to those of contemporary CMOS technology, we present
an argument that SET devices may be more favorable compared to CMOS from the energy and delay standpoints at
low supply voltages.
28 Analysis, Design and Simulation of Capacitive Load Balanced Rotary Oscillatory Array
The high frequency of the rotary clocking technology is often susceptible to implementation parameters such as the
variation in the total capacitive load distribution between the rings. SPICE simulations performed on the rotary rings
with Unbalanced capacitive load distribution shows a 30.31% variation in the simulated frequencies across the rings.
To address this problem, two novel methodologies called OCLB and SOCLB, are formulated for the optimal capacitive
load balancing and suboptimal capacitive load balancing with minimized wire length, respectively. SPICE simulations
performed with OCLB show 0.30% variation in the simulated frequencies across the rings. Further, SOCLB results in
an average wire length improvement of 69.24% over OCLB with a relatively balanced capacitive load distribution.
SPICE simulations performed with SOCLB show 2.40% variation in the simulated frequencies across the rings,
improved significantly over the 30.31% variation of the unbalanced case.
(: +91 452-4390702, 4392702, 4390651
29 An L-band Fractional-N Synthesizer with noise-less Active Capacitor scaling
In a charge-pump based type-II analog Phase Locked Loop (PLL), the loop filter often uses a small resistor along with a
big integrating capacitor for good phase noise performance. This comes at the cost of large silicon area or external
component. The noise from the resistor contributes to the output phase noise through both feedback and feed-forward
paths and hence has a presence in the output over a very wide frequency band. In this PLL, the loop filter avoids the
feed-forward and limits the contribution of the resistor noise over a narrow frequency band. This technique allows a
large resistor to be used with a small capacitor without phase noise penalty. The achieved independent control of
bandwidth and stabilizing zero gives better stability and reduces noise peaking. The integrated phase error achieved at
1.3GHz is -38dBc.
An improvised MOS transistor model suitable for Geometric Program based analog circuit sizing in Sub-
30 micron technology
This paper presents ways to improve accuracy of performance prediction for geometric program based analog design
in submicron regime. Geometric program requires a special monomial form of the device model it uses. The major
sources of inaccuracy in this basic model have been identified and it has been shown that slightly relaxing the strict
monomial form in order to include second order effects can greatly improve the accuracy. In order to make use of this
model we deploy it in collaboration with an iterative solution betterment scheme, by solving the sizing problem as a
sequence of geometric programs instead of a single one. We illustrate the efficacy of our scheme through a folded-
cascode op-amp sizing example.
31 An Improved High Resolution CMOS Timing Generator Using Array of Digital Delay Lock Loops
In this paper, an improved high resolution CMOS timing generator using array of digital delay lock loops is presented.
The timing generator is implemented as an array of delay locked loops. This architecture enables a timing generator
with sub gate delay resolution to be implemented. The proposed Delay Lock Loops use novel start controlled Dual
Phase and frequency Detector along with a charge pump where the injected charge approaches zero as the loop
approaches lock on the leading edge and the trailing edge of an input clock reference. The delay lock loop locks to
both the leading and trailing clock edges as the start controlled dual phase and frequency detector along with charge
pump convert the phase difference into voltage, which greatly reduces the timing jitter. In the start controlled dual
phase and frequency detector, the start-controlled circuit is used to provide a precise output without the Locking
problem. The results show that the total delay time between the input and the output of the DLL (Delay Lock Loop) is
one clock cycle and all of the delay cells provide precise output without false locking or harmonic locking. Test results
show a timing jitter of less than 5 pS for the DLL circuit and has very low phase sensitivity errors. The timing generator
implemented as an array of delay locked loops has exponentially reduced the locking time as well avoids false locking
or harmonic locking.An experimental proto type was simulated using 0.35µ technology with a supply voltage of 3.3V.
(: +91 452-4390702, 4392702, 4390651
32 An Efficient Method for Bottom-Up Extraction of Analog Behavioral Model Parameters
This paper presents a fast, accurate and robust method for bottom-up extraction of analog behavioral model
parameters from the corresponding transistor level netlists. The proposed Verilog-A in-loop simulation based
modeling approach is generic and can estimate the parameters of the corresponding model of any given circuit using
relevant test-benches, thus removing the need to implement structure based estimation tools for each circuit. The
models are usually non-linear with respect to the parameters and often the optimization problem becomes nonconvex.
A hybrid method based on co-operation and switching between search and gradient methods is proposed for
achieving significantly faster convergence to the global minima even in Presence of local minima in such non-convex
cases. This method is applied by the authors to a wide variety of analog circuits, and is demonstrated in the paper
using two distinctly different analogcircuits. Simulation results comparing the model and transistor level netlist show
that high level of accuracy can be achieved. The comparison of the search, gradient and the proposed hybrid method
is presented
33 An Efficient Design of a Reversible Barrel Shifter
The key objective of today’s circuit design is to increase the performance without the proportional increase in power
consumption. In this regard, reversible logic has become an immensely promising technology in the field of low power
computing and designing. On the other hand, data shifting and rotating are required in many operations such as
arithmetic and logical operations, address decoding and indexing etc. In this consequence, barrel shifters, which can
shift and rotate multiple bits in a single cycle, have become a common design choice for high speed applications. For
this reason, this paper presents an efficient design of a reversible barrel shifter. It has also been shown that the new
circuit outperforms the previously proposed one in terms of number of gates, number of garbage outputs, delay and
quantum cost.
34 Accelerating Synchronous Sequential Circuits using an Adaptive Clock
In this paper we propose a scheme for enhancing the timing performance of a pre-designed synchronous sequential
circuit. In the proposed scheme, a circuit is driven by two clocks. One of them is the conventional clock while the other
one, having a shorter period, is applied when the circuit stabilizes well before the critical delay. We use a symbolic
algorithm to analyze the timing behavior of the synchronous Sequential circuit and provide add-on circuitry to select
the appropriate clock based on the current state of the circuit. We demonstrate an appreciable gain (67% in average) In
timing performance on several benchmark circuits
(: +91 452-4390702, 4392702, 4390651
35 A Unified Solution to Scan Test Volume, Time, and Power Minimization
The double-tree scan-path architecture, originally proposed for low test power, is adapted to simultaneously reduce
the test application time and test data volume under external testing. Experimental results show significant
performance improvements over other existing scan architectures.
36 A Unified Approach for IP Protection across Design Phases in a Packaged Chip
IP values contributed by the distinct design tools in specific design phases are recognized by observing the signature
of the owner of each tool as functional or scan mode output of the fabricated chip, for certain input vector secret to the
owner. An existing approach inserts watermark through reordering of single scan chain, and solely identifies the
owner of the logic design tool. Here we propose a novel scheme to watermark the recent reconfigurable scan
architectures, operating in both scan tree and single scan mode. The signature of the owner of physical design tool
along with that of logic design tool can separately be embedded while designing the scan tree and also verified from
the packaged chip without conflict using two distinct modes. A bi-objective minimization of overhead in routing and
power is supported through our scheme. Experimental results on design overhead and robustness for ISCAS’89
benchmarks are encouraging.
37 A Reconfigurable Architecture for Secure Multimedia Delivery
This paper introduces a reconfigurable architecture for ensuring secure and real-time video delivery through a novel
parameterized construction of the Discrete Wavelet Transform (DWT). This parameterized construction promises
multimedia encryption and is also well-suited to a hardware implementation due to our derivation of rational filter
coefficients. We achieve an efficient and high-throughput reconfigurable hardware implementation through the use of
LUT-based constant multipliers enabling run-time reconfiguration of encryption key. We also compare our prototype
(using a Xilinx Virtex 4 FPGA) to several existing implementations in the research literature and show that we achieve
superior performance as compared to both traditional CPU-based and custom VLSI approaches while adding features
for secure multimedia delivery.
38 A P4VT (Power-Performance-Process-Parasitic-Voltage-Temperature) Aw are Dual-VTh Nano-CMOS VCO
We present the design flow for a P4VT (Power- Performance-Process-Parasitic-Voltage-Temperature) aware voltage
controlled oscillator (VCO). Through simulations, we have shown that parasitic, process, voltage and temperature have
a drastic effect on the performance (center frequency) of the VCO. A design optimization of the VCO, along with dual-
threshold power minimization has been performed in the presence of worst-case variations. The end product of the
proposed methodology is a P4VT-optimal dual-threshold 90nm VCO layout. We have achieved 16.4% power (including
(: +91 452-4390702, 4392702, 4390651
leakage) minimization with 10% degradation in center frequency compared to the target frequency, in the presence of
worst-case variations
39 A Novel Circuit to Optimize Access Time and Decoding Schemes in Memories
As the microprocessor speed increases from 500MHz to 1GHz and beyond, SOC designers are forced to innovate new
schemes in their use of cache memory for high speed access. In this paper, clock to word line path delay is optimized
using a novel circuit design technique. Using this novel circuit, clock to word line path delay is optimized by 2.5 times
at worst case corner. For a typical memory instance frequently used in cache memories) whose access time is of the
order of 800ps and where read and write operation occurs in the same clock cycle, overall access time is improved by
18% at worst case corner. For this case, write margin is improved by 2.26 times at worst case corner for write
operation. A decoding scheme is also discussed in this paper which describes how to choose the best pre-decoding
and post-decoding schemes based on minimum pre-decoded lines, minimum stack size in post decoder and maximum
granularity of xdecoders
40 A Non Quasi-Static Small Signal Model for Long Channel Symmetric DG MOSFET
We propose a compact model for small signal non quasi static analysis of long channel symmetric double gate
MOSFET. The model is based on the EKV formalism and is valid in all regions of operation and thus suitable for RF
circuit design. Proposed model is verified with professional numerical device simulator and excellent agreement is
found well beyond the cut-off frequency.
41 A new Hetero-material Stepped Gate (HSG) SOI LDMOS for RF Power Amplifier Applications
In this paper, we propose a new hetero-material stepped gate (HSG) SOI LDMOS in which the gate is divided into three
sections - an n+ gate sandwiched between two p+ gates and the gate oxide thickness increases from source to drain.
This new device structure improves the inversion layer charge density in the channel, results in uniform electric field
distribution in the drift region and reduces the gate to drain capacitance. Using two-dimensional simulation, the HSG
LDMOS is designed and compared with the conventional LDMOS. We demonstrate that the proposed device exhibits
28% improvement in breakdown voltage, 32% reduction in on-resistance, 13% improvement in transconductance, 9%
reduction in gate to drain charge and 38% reduction in switching delay. HSG LDMOS may be effectively deployed in RF
power amplifier applications
42 A Methodology for Power Aw are High-Level Synthesis of Co-Processors from Software Algorithms
Hardware co-processors are used for accelerating specific compute-intensive tasks dedicated to video/audio codec,
encryption/ decryption, etc. Since many of these data-processing tasks already have efficient software algorithms, one
could reuse those to synthesize the co-processor IPs. However, such software algorithms are usually sequential and
(: +91 452-4390702, 4392702, 4390651
written in C/C++. High-level Synthesis (HLS) helps in converting software implementation to register transfer level
(RTL) hardware design. Such co-processor based systems show enhanced performance but often have greater
power/energy consumption. Therefore, the automated synthesis of such accelerator IPs must be power-aware.
Downstream power savings features such as clock-gating are unknown during HLS. Designer is forced to take such
power-aware decisions only after logic synthesis stage, causing an increase in design time and effort. In this paper, we
present a design automation solution to facilitate various granularities of clock-gating at high-level C description of the
design
43 A Hierarchical Methodology for Word-Length Optimization of Signal Processing Systems
The problem of converting floating point algorithms to implementation friendly fixed point formats is often solved as
an optimization problem where the precision is traded to gain in the implementation cost. The complexity of the
problem is known to grow exponentially with more optimizable variables. This paper proposes a divide and conquer
technique to solve the growing size of the problem. The approach In this technique is original in the sense that it is
formulated from a designer’s perspective rather than merely attempting to divide and conquer at the algorithmic level.
This paper introduces the single noise source model based on which the proposed technique is built. A mixed
approach for error propagation is also explained keeping in view of the elements in the circuit that cannot be handled
analytically
44 A Hardware Scheduler for Real Time Multiprocessor System on Chip
This paper presents the design and implementation of a low power Hardware scheduler for multiprocessor system-on-
chips. The Pfair scheduling algorithm is considered with three different implementation schemes: replicated software
scheduler running on each processor, single software scheduler running on a dedicated processor and the proposed
hardware scheduler. Experimental evaluation with benchmarks shows that the hardware scheduler outperforms the
other two schemes in terms of energy consumption by an order of magnitude of 105 and scheduling delay by an order
of magnitude of 103.
45 A Graph-based I/O Pad Pre-placement Technique for use with Analytic FPG A Placement Methods
Typical analytic placement methods seek to minimize total squared wire length by solving a linear equation system.
However, to avoid trivial solutions, certain blocks must be assigned locations on the Field Programmable Gate Array
(FPGA) fabric prior to optimization. A simple way to achieve this is to assign blocks randomly. However, this does not
always result in the best solution. In this paper, we present a novel algorithm, called Shrub Place, for pre-assigning I/O
blocks to I/O pads around the perimeter of the FPGA. To verify the efficacy of our pre-placement algorithm, we
integrated the algorithm into the analytic placer in [1, 2]. When tested with the 20 MCNC benchmarks [11], our results
show a reduction in wire length is Possible, with very little additional execution time required to perform the pre-
placement.
(: +91 452-4390702, 4392702, 4390651
46 G A Combined DOE-ILP Based Power and Read Stability Optimization in Nano-CMOS SRAM
A novel design approach for simultaneous power and stability (static noise margin, SNM) optimization of nano- CMOS
static random access memory (SRAM) is presented. A 45nm single-ended seven transistor SRAM is used as a case
study. The SRAM is subjected to a dual-VTh assignment using a novel combinedDesign of Experiments and Integer
Linear Programming (DOE-ILP) algorithm, resulting in 50.6% power reduction (including leakage) and 43.9% increase
in the read SNM. The process variation analysis of the optimal SRAM carried out considering twelve device parameters
shows the robustness of the design.
23.97GHz CMOS Distributed Voltage Controlled Oscillators with Inverter Gain Cells and Frequency Tuning
47 by Body Bias and MOS Varactors Concurrently
Tunable VCOs operating around 24GHz in 0.18µm CMOS are reported. Simple CMOS inverters are used as gain stages
and tuning is achieved with a novel Method using both body-bias as well as MOS varactors concurrently and
compared for Performances. The novel tuning method allows for a wider tuning range than using a single method.
Here forward body bias (FBB) type tuning of p-FETs has 9- 10 times higher tuning bandwidth as compared to MOS
varactors tuning when the latter is connected in series (before output collection point) but equal or nearly equal tuning
when the Varactor pair is connected in parallel (to drain transmission line). Six monolithically integrated novel
distributed voltage ontrolled oscillators (D-VCOs) with a novel gain cell comprising of CMOS inverter are designed.
Top Layer metal is used for coplanar waveguide (CPW) for onchip inductors. First D-VCO OSC-1 has 3-stages of the
gain cell and oscillating at 23.97GHz, the second D-VCO OSC-2 has 4-stages of gain cell and oscillating at 18.64GHz,
both K-band oscillators use body bias variation of p-FETs for wide frequency tuning. For further tuning after body bias
type of tuning, MOS Varactors are added in series to OSC-1 and OSC-2 resulting in designs respectively OSC-3 and
OSC-4, while in parallel resulting in designs respectively OSC-3a and OSC-4a. OSC-3 is oscillating at 23.53GHz and
OSC-4 is oscillating at 18.09GHz. OSC-3a is oscillating at 22.79GHz with 340MHz tuning by each of these two tuning
techniques (doubling of tuning bandwidth as total tuning is 680MHz). OSC-4a is oscillating at 17.77GHz (resulting Ku-
band VCO from K-band for substantial design reuse) with 240MHz tuning by FBB and 200MHz tuning by Varactor pair
(total tuning of 440MHz). The phase noise is reported at 1MHz offset from the carrier, for example it is -102.4dBc/Hz for
18.64GHz D-VCO. These oscillators are emitting very low power in 2nd and 3rd harmonics.
A 90mW/GFlop 3.4GHz onfigurable Fused/Continuous Multiply- Accumulator for Floating-point and Integer
48 Operands in 65nm
This paper describes energy efficient and reconfigurable fused/continuous Multiply-Accumulator (MAC) architecture
for single-precision Floating-point and 16- bit signed integer operands. This eight-stage pipelined and single-cycle
throughput MAC design contains a bit level pipelined multiplier, followed by fast sparse-tree adder and single cycle
accumulator loop with delayed normalization logic. Operation driven energy control is achieved using dynamic clock
and fine grained power gating techniques. Power gating is employed in 98% of design to save 79% of leakage power in
idle mode, at 1.2V supply and 110C. The use of fully shared logic in the multiplier, accumulator and normalization
blocks for different operations enables a compact design of 0.54mm2 containing 117K transistors in eight-metal 65nm
(: +91 452-4390702, 4392702, 4390651
CMOS technology. The 15-FO4 design provides 6.8GFLOPS of performance with total energy efficiency of
90mW/GFLOP at 1.2V and 3.4GHz operation.
49 A6 bit 800MHzTIADC based on Successive Approximation in 65nm Standard CMOS Process
Applications like Ultra-wideband radio, Optical Communication require sampling rates of at least 500MS/s with low
resolution. The potential energy savings of successive approximation based time– interleaved A-D conversion
architecture overrides traditional flash architecture. This paper presents a 6- bit 800 MS/s ADC in 65 nm
STMicroelectronics standard CMOS process. The ADC uses 8-channel time interleaved SAR topology and achieves 36
dB SNR and 43 dB SFDR with 13.5 mW power consumption from 1.2 V supply. The resulting FOM is 0.3251 pJ/step. The
timing mismatch among the channels is reduced by clock-edge reassignment technique. The high speed specification
of the system requires the design of low offset comparator. Power consumption and jitter are reduced by using shift
register based phase generator.
50 4 GHz 130nm Low Voltage PLL Based on Self Biased Technique
This paper explores a PLL core design that can satisfy a wide range of high frequency serial data communication
applications. There exist several high frequency serial data communication protocols that co-exist today. The PLL
design requirements for all these clock frequencies separately call for enormous design effort in terms of time and
cost. It is desired to design a PLL core which makes it possible to address a wide segment of clock frequency
requirement. The PLL achieves this using single 1.2V supply, it doesn’t use any special mask layers and also doesn’t
need a bandgap reference for its operation. This PLL is based on self-biased technique and achieves high process
technology independence, fixed damping factor, fixed bandwidth to operating frequency range and input phase offset
cancellation. Here the self biased PLL in 130nm CMOS technology achieves the frequency range of 400 MHz to 4GHz.
The PLL core is designed to accept a wide range of input reference frequencies
51 Voltage-Frequency Planning for Thermal- Aware, Low-Power Design of Regular 3-D NoCs
Network-on-Chip combined with Globally Asynchronous Locally Synchronous paradigm is a promising architecture for
easy IP integration and utilization with multiple voltage levels. For power reduction, multiple voltage-frequency levels
are successfully applied to 2-D NoCs, but never with a generic approach to 3-D counterparts; in which low heat
conductivity of insulator layers makes high dense temperature distribution at layers away from heat sink. In this paper,
a thermal-aware methodology for regular 3-D NoCs based on multiple voltage levels is proposed. Given an application
task graph, this methodology determines an efficient mapping of tasks onto network tiles, considering inherent
computation and communication requirements of the tasks and thermal resistance from any silicon layer to the
ambient. Then, a heuristic approach is utilized to determine voltage and frequency specifications of all IP cores, such
that total power is reduced, dissipated heat is properly conducted to the layers close to the heat sink, and application
requirements (in terms of deadline) are satisfied. The experiments confirm a significant saving in total power while
performance of the running application is guaranteed.
(: +91 452-4390702, 4392702, 4390651
52 Transition Inversion based Low Power data coding scheme for Buffered Data Transfer
In this work the authors propose a data coding protocol that leads to power reduction for block data transfer in off-chip
buses. I/O pads driving off-chip buses contribute to a major portion of power dissipation in chips. Also, block data
transfer is Preferred in most systems like caches, DMA etc. In this proposed work, the prior knowledge of the block of
data to be transmitted, when it is stored in the buffer, is exploited in a serial fashion to reduce transitions on every bus
line. Statistical analysis shows up to 31.9% reduction in transitions. Benchmark results show that it leads to 29%
reduction in power consumption. The technique provides added error detection on the lines of parity bit technique,
with similar average error detection capability
53 Towards Active-Passive Co-Synthesis of Multi-Gigahertz Radio Frequency Circuits
This paper proposes a methodology and framework for rapid active-passive co-synthesis of radio frequency circuits.
The presented approach leverages advances in accelerated three-dimensional electromagnetic simulation technology
to construct Maxwell-accurate parametric macro models of on-chip passives, in particular spiral inductors. These
macromodels can be used in the context of SPICE level synthesis, thereby enabling concurrent sizing of passive and
active components of radio frequency circuits. Moreover, macromodels obviate the need for topology exploration and
parametric RLC model generation via optimization for on-chip passives. The co-synthesis framework is enabled by
nonlinear, hyper-dimensional regression for macromodel generation and a simulated annealing based optimization
scheme. As examples, and to demonstrate the efficacy of the proposed approach, two standard low-noise amplifier
topologies are synthesized with tight performance constraints by co-optimization of circuit parameters and inductor
geometries.
54 The dawn of 22nm era: Design and CAD challenges
Technology scaling clearly has been the driver of semiconductor and thereby EDA industry. In the semiconductor
industry today, 45nm CMOS designs are in full production and 32nm design rules and infrastructure are already in
place for designs starting later this year. It will not be long before the beat of 22nm will be upon us. Due to ever
increasing cost of doing design, design productivity and more specifically, cost of design has become a major
bottleneck in large scale design projects. Due to this cost crunch, automated synthesis techniques have been
becoming increasingly important and this is bound to become a major trend going into 22nm for high performance
SoCs. In addition, in 22nm and beyond, 3D IC technology has the potential of easing the system performance
challenge problem. In order to exploit the full potential of 3D technology, new challenges in the area of physical
design, thermal analysis, system level design and analysis need to be addressed. 3D interconnects have the potential
of reducing critical paths delays significantly, which are typically between memory and the interfacing logic. In
addition, now that the physical limits are beginning to impact scaling, the question is: how can we cost effectively
design with complicated technology requirements presented by 22nm node and how the design automation
community can help to achieve this goal? What are the challenges at 22nm and what would design look like going into
22nm and beyond? In this paper, we will focus on the major design and CAD challenges associated with 22nm and
beyond.
(: +91 452-4390702, 4392702, 4390651
55 Test Pattern Generation and Compaction for Crosstalk Induced Glitches and Delay Faults
VLSI circuits have become more susceptible to signal integrity related failures with the ever decreasing process
geometries. Detection of crosstalk induced faults is thus important as capacitive crosstalk is one of the major sources
of signal integrity related failures. Crosstalk glitch can result in erroneous output if the glitch effect propagates to a
primary output or to an intermediate flip-flop. Similarly the crosstalk induced delay effects can also result in latching of
an incorrect value if the delay exceeds the allowed margins. In this work a test generation and compaction method is
proposed for crosstalk faults. Test patterns are generated by simultaneously considering the coupling capacitance,
timing and functional incompatibilities between the victim and aggressor nets, to produce the practical maximum
crosstalk noise. A unique method is proposed for finding the functional incompatibilities between interconnects. The
generated test set is then compacted initially through pattern merging and then further through the fault-chaining
algorithm. Three different implementations of this algorithm are compared on crosstalk test sets generated for
ISCAS’85 benchmark circuits. Results show considerable reduction in crosstalk pessimism for the given layout and
timing, as well as up to 75% reduction in overall test set size.
56 Synthesizability of 3 party Formal Specifications-Does my controller see enough?
This paper presents the problem of bounded synthesizability of formal specifications in the context of three party
systems, consisting of a machine, its environment and a controller. The overall objective is to determine whether it is
possible to synthesize both the machine and its controller for a given Linear Temporal Logic (LTL) specification over
the signals in the machine and the controller interfaces.
Synchronization of Concurrently-Implemented Fluidic Operations in Pin-Constrained Digital Microfluidic

57 Biochips
The implementation of bioassays in pin-constrained biochips may involve pin-actuation conflicts if the concurrently
implemented fluidic operations are not carefully synchronized. We propose a two-phase optimization method to
identify and Synchronize the fluidic operations that can be executed in parallel. The goal is to implement these fluidic
operations without pinactuation conflict, and minimize the duration of implementing the outcome sequence after the
synchronization. The effectiveness of the proposed two-phase optimization method is demonstrated for a
representative 3-plex assay performed on a fabricated pin constrained biochip
(: +91 452-4390702, 4392702, 4390651
58 Robust System Design
Robust system design ensures that future systems continue to meet user expectations despite rising levels of
underlying disturbances. This paper discusses two essential aspects of robust system design: 1. Effective post-silicon
validation, despite staggering complexity of future systems, using a new technique called Instruction Footprint
Recording and Analysis (IFRA). 2. Cost-effective design of systems that overcome CMOS reliability challenges through
built-in tolerance to errors in hardware during system operation. A combination of Built-In Soft Error Resilience
(BISER) and circuit failure prediction, together with on-line self-test/diagnostics and software-orchestrated
optimization across multiple abstraction layers, enable design of cost-effective resilient systems.
59 RF SOI Switch FET Design and Modeling Tradeoffs for GSM Applications
A single-pole double-throw novel switch device in 0.18µm SOI complementary metal-oxide semiconductor (CMOS)
process is developed for 0.9 GHz wireless GSM systems. The layout of the device is optimized keeping in mind the
parameters of interest for the RF switch. A sub circuit model, with the standard surface potential (PSP) model as the
intrinsic FET model along with the parasitic elements is built to predict the Ron and Coff of the switch. The measured
data agrees well with the model. The eight FET stacked switch achieved an Ron of 2.5 ohms and an Coff of 180 fF.
60 Rethinking Threshold Voltage Assignment in 3D Multicore Designs
Due to the inherent nature of heat flow in 3D integrated circuits, stacked dies exhibit a wide range of thermal charac-
teristics. The strong dependence of leakage with temperature and process variation plays havoc in achieving system
level energy efficiency in such systems, complicating the task of power provisioning in 3D multicores. In this paper, we
address this power provisioning challenge in 3D ICs by advocating a novel microprocessor design paradigm, where
the circuit designers are aware of the in- tended placement of a die in a 3D stack. We present a con- crete application
of this paradigm through a threshold voltage (Vt) assignment algorithm for a 3D multicore system, where we
specifically account for: (a) the change in the role of leakage power, (b) expected operating frequency, and (c)
dependency of PV induced leakage variation and Vt levels. Detailed simulation based experiments with our proposed
algorithm show 2–15% improvement in energy efficiency for a typical multicore system organized as 3D stacked dies
61 Processor Architecture Design Using 3D Integration Technology
The emerging three-dimensional (3D) chip architectures, with their intrinsic capability of reducing the wire length, is
one of the promising solutions to mitigate the interconnect problem in modern microprocessor designs. 3D memory
stacking also enables much higher memory bandwidth for future chip-multiprocessor design, mitigating the “memory
wall” problem. In addition, heterogenous integration enabled by 3D technology can also result in innovation designs
for future microprocessors. This paper serves as a survey of various approaches to design future 3D microprocessors,
leveraging the benefits of fast latency, higher bandwidth, and heterogeneous integration capability that are offered by
3D technology.
(: +91 452-4390702, 4392702, 4390651
62 Post assembly timing closure for multi million gate chips
A hierarchical timing closure methodology is presented. It has timing closure effectiveness of flat methods, while
capacity and run time efficiency of subchip based methods. The unique proposition is that it performs flat logic
physical optimization of cross subchip timing paths, while at the same time, abides to hierarchy rules. The principle
and details of the methodology are provided. Experimental result on multi million gate designs shows its timing
closure effectiveness with run time gains of 50% on optimization steps, and peak memory reduction as well.
63 Pinpointing Cache Timing Attacks on AES
The paper analyzes cache based timing attacks on optimized codes for Advanced Encryption Standard (AES). The
work justifies that timing based cache attacks create hits in the first and second rounds of AES, in a manner that the
timing variations leak information of the key. To the best of our knowledge, the paper justifies for the first time that
these attacks are unable to force hits in the third round and con- cludes that a similar third round cache timing attack
does not work. The paper experimentally verifies that protecting only the first two AES rounds thwarts cache based
timing attacks.
64 Parametric Fault Diagnosis of Nonlinear Analog Circuits using Polynomial Coefficients
We propose a method for diagnosis of parametric faults in analog circuits using polynomial coefficients of the circuit
model [15]. As a sequel to our recent work [14], where circuit response is modeled as polynomial for uncovering
parametric faults in nonlinear circuits, we propose diagnosis of such faults using sensitivity of coefficients of the
estimated polynomial to circuit parameters. The proposed method requires no design for test hardware as might be
added to the circuit by some other methods. The proposed method is illustrated for a benchmark elliptic filter. It is
shown to uncover several parametric faults causing deviations as small as 5% from the nominal values.
65 Optimized Stage Ratio of Tapered CMOS Inverters for Minimum Power and Mismatch Jitter Product
In this paper, an optimum stage ratio (tapering factor) for a tapered CMOS inverter chain is derived to minimize the
product of power dissipation and jitter variance due to device mismatch. Analysis shows that this optimum stage ratio
(2.4) is lower than that of minimum delay (3.6) and minimum power-delay (6.35) product. This analysis is verified by
simulation results using standard 180nm as well as 90nm CMOS technology. Knowledge of the optimum stage ratio
helps to design low power low mismatch jitter buffers for multi phase clock generation circuits that can drive Large
load capacitances.
(: +91 452-4390702, 4392702, 4390651
66 Optical Lithography Simulation with Focus Variation Using Wavelet Transform
Printed image on silicon wafer differs from layout due to optical diffraction. Optical proximity correction (OPC) is a
layout distortion technique to improve printed image. During manufacturing, parameters such as focus, dose and
resist thickness may vary within tolerance margins. These factors contribute to additional distortion of expected
printed shape, not addressed directly by OPC. To ensure a robust IC, a process window consideration is extremely
important while running lithography simulations as we scale the technology even further, where the sensitivity of
patterns printed on silicon to process variations is very high. Optical Lithography simulation has always been an
important link in the chain for Design for manufacturability (DFM) and a lot of research has been put into making it
faster and more accurate. However, being a compute intensive process, speeding up litho simulation without
significant Compromise in accuracy has always been tricky. In this paper we propose a new method to approximate
litho simulation based on wavelet transform as opposed to the traditional method employed and we validate the speed
and accuracy of our simulator by comparing our results with those of a popular commercial Lithography simulator
considering focus variations. While our simulator suffers from an RMS error of < 6%, the major gains are (1) an
increase in simulation speed of > 20X and (2) the ability to simulate very large circuit masks where the commercial
software fails and direct incorporation of (3) manufacturing process variation. This allows litho simulation against
multiple manufacturing process corners, which in turn helps in producing robust design.
67 On-Chip Inductor-less DC-DC Boost Converter with Non-Overlapped Rotational-Interleaving Scheme
An architecture of inductor-less DC-DC boost converter for high efficiency and low output ripple is proposed. Output
ripple is reduced by splitting flying capacitors into a number of smaller elements and using a new switching scheme
called Non-Overlapped Rotational-Interleaving (NORI). The proposed switching scheme also helps to eliminate
reversion and shoot through current hence improves the power efficiency. The proposed converter is designed in
0.18µM CMOS thick gate process having 440pF total flying capacitance. The target specification of load current is 1mA
- 23mA for 5V - 6.5V output voltage from an input supply of 3.3V. The achieved peak Power efficiency is 89% at 10mA
load current as compare to 83% peak power efficiency obtained from the best existing architecture designed in same
technology. The output ripple at 10mA load current is 2.2mV in presence of only 50pF load capacitance
68 On Minimization of Test Application Time for RAS
Conventional Random access scan (RAS) for testing has lower test application time, low power dissipation, and low
test data volume compared to standard serial scan chain based design. In this paper, we present two cluster based
techniques, namely, Serial Input Random Access Scan and Variable Word Length Random Access Scan to reduce test
application time even further by exploiting the parallelism among the clusters and performing write operations on
multiple bits. Experimental results on benchmarks circuits show on an average 2-3 times speed up in test write time
and average 60% reduction in write test data volume.
(: +91 452-4390702, 4392702, 4390651

Elysium VLSI 2010

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Elysium VLSI 2010

Uploaded by

Copyright:

Available Formats

Elysium Technologies Private Limited

ISO 9001:2008 A leading Research and Development Division

Abstract Very Large Scale Integration 2010 - 2011

01 Novel Vth Hopping Techniques for Aggressive Runtime Leakage Control

03 Modeling of RF- MEMS BAW Resonator

04 Modeling and Design Considerations of Coupled Inductor Converters

05 Instruction Selection in ASIP Synthesis using Functional Matching

06 Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting

07 Implementation of a Novel Phoneme Recognition System using TMS320C6713 DSP

08 Impact of Temperature on Test Quality

09 Identifying the bottlenecks to the RF performance of FinFETs

12 Functional Refinement: A Generic Methodology for Managing ESL Abstractions

13 Exploring use of NoC for Reconfigurable Video Coding

14 Experimental Results and Study of a Modified Adaptive Bus Voltage Controller

15 Electrical Modeling of Lithographic Imperfections

20 Design Considerations for BEOL MIM Capacitor Modeling in RF CMOS Processes

21 Coverage Management with Inline Assertions and Formal Test Points

22 Clocking-based Coplanar Wire Crossing Scheme for QCA

23 Channel Optimization for the Design of High Speed I/O links

24 Bridgeless Buck PFC Rectifier

27 Analyzing Energy-Delay Behavior in Room Temperature Single Electron Transistors

29 An L-band Fractional-N Synthesizer with noise-less Active Capacitor scaling

32 An Efficient Method for Bottom-Up Extraction of Analog Behavioral Model Parameters

33 An Efficient Design of a Reversible Barrel Shifter

34 Accelerating Synchronous Sequential Circuits using an Adaptive Clock

35 A Unified Solution to Scan Test Volume, Time, and Power Minimization

36 A Unified Approach for IP Protection across Design Phases in a Packaged Chip

37 A Reconfigurable Architecture for Secure Multimedia Delivery

38 A P4VT (Power-Performance-Process-Parasitic-Voltage-Temperature) Aw are Dual-VTh Nano-CMOS VCO

39 A Novel Circuit to Optimize Access Time and Decoding Schemes in Memories

43 A Hierarchical Methodology for Word-Length Optimization of Signal Processing Systems

44 A Hardware Scheduler for Real Time Multiprocessor System on Chip

49 A6 bit 800MHzTIADC based on Successive Approximation in 65nm Standard CMOS Process

50 4 GHz 130nm Low Voltage PLL Based on Self Biased Technique

53 Towards Active-Passive Co-Synthesis of Multi-Gigahertz Radio Frequency Circuits

54 The dawn of 22nm era: Design and CAD challenges

56 Synthesizability of 3 party Formal Specifications-Does my controller see enough?

Synchronization of Concurrently-Implemented Fluidic Operations in Pin-Constrained Digital Microfluidic

58 Robust System Design

60 Rethinking Threshold Voltage Assignment in 3D Multicore Designs

61 Processor Architecture Design Using 3D Integration Technology

62 Post assembly timing closure for multi million gate chips

63 Pinpointing Cache Timing Attacks on AES

64 Parametric Fault Diagnosis of Nonlinear Analog Circuits using Polynomial Coefficients

66 Optical Lithography Simulation with Focus Variation Using Wavelet Transform

67 On-Chip Inductor-less DC-DC Boost Converter with Non-Overlapped Rotational-Interleaving Scheme

68 On Minimization of Test Application Time for RAS

You might also like